Cloudera Engineering Blog · Use Case Posts
In December 2012, we described how an internal application built on CDH called Cloudera Support Interface (CSI), which drastically improves Cloudera’s ability to optimally support our customers, is a unique and instructive use case for Apache Hadoop. In this post, we’ll follow up by describing two new differentiating CSI capabilities that have made Cloudera Support yet more responsive for customers:
Why would any company be interested in searching through its vast trove of email? A better question is: Why wouldn’t everybody be interested?
Email has become the most widespread method of communication we have, so there is much value to be extracted by making all emails searchable and readily available for further analysis. Some common use cases that involve email analysis are fraud detection, customer sentiment and churn, lawsuit prevention, and that’s just the tip of the iceberg. Each and every company can extract tremendous value based on its own business needs.
This week’s Cloudera Sessions roadshow will make it to Denver, Colo., on Thursday, where the customer Fireside Chat will feature Intelligent Software Solutions (ISS) Chief Architect of Global Enterprise Solutions, Wes Caldwell. ISS helps many government organizations – including several within the U.S. Department of Defense — deploy next-generation data management and analytic solutions using a combination of systems integration expertise and custom-built software.
During the Fireside Chat, Cloudera’s COO Kirk Dunn will engage Wes in a conversation to discuss the business use cases for Hadoop that ISS sees most often in the field, primarily within two buckets: batch analytics and real-time applications. Wes will also share his thoughts on some of the more recent innovations within the Apache Hadoop ecosystem, such as Cloudera Impala and Solr integrations.
In its first leg of its tour of the United States earlier this year (see photos here), The Cloudera Sessions proved to be an invaluable single-day event for business and technical leaders exploring practical applications of Apache Hadoop. So valuable, in fact, that we’ve extended the tour with dates/cities this September and October.
We’re kicking off the second leg of our Cloudera Sessions roadshow this week, starting in San Francisco on Wednesday and Philadelphia on Friday. The spring series of the Cloudera Sessions was a big hit, which is why we’re back with a new and improved agenda for the fall, to offer even more options that will help attendees — ranging from developers to line-of-business managers and executives — navigate the Big Data journey. The expanded fall series agenda includes an application development lab (based on CDK) that coincides with the general session throughout the morning, and two tracks for clinics after lunch.
One portion of the general session that was a big hit throughout the spring series and that will return this fall is the Fireside Chat, during which the Cloudera executive host sits with one or two customers to talk about their “real life” experiences and lessons learned with Apache Hadoop. The Fireside Chat gives local customers an opportunity to showcase the work they’re doing, and allows attendees to hear from real users what worked, what didn’t, how they got started with Hadoop, and best practices learned along the way.
One of the first questions Cloudera customers raise when getting started with Apache Hadoop is how to select appropriate hardware for their new Hadoop clusters.
Although Hadoop is designed to run on industry-standard hardware, recommending an ideal cluster configuration is not as easy as delivering a list of hardware specifications. Selecting hardware that provides the best balance of performance and economy for a given workload requires testing and validation. (For example, users with IO-intensive workloads will invest in more spindles per core.)
This week, I’d like to shine a spotlight on innovative work the National Institutes of Health (NIH) is working on, leveraging Big Data, in the area of genomic research. Understanding DNA structure and functions is a very data-intensive, complex, and expensive undertaking. Apache Hadoop is making it more affordable and feasible to process, store, and analyze this data, and the NIH is embracing the technology for this reason. In fact, it has initiated a Big Data center of excellence — which it calls Big Data to Knowledge (BD2K) — to accelerate innovations in bioinformatics using Big Data, which will ultimately help us better understand and control various diseases and disorders.
Bob Gourley — a friend of Cloudera’s who wears many hats including publisher of CTOvision.com, CTO of Crucial Point LLC, and GigaOm analyst — recently interviewed Dr. Mark Guyer, the deputy director of the NIH’s National Human Genome Research Institute (NHGRI), about the BD2K effort.
For those of you attending this week’s StampedeCon event in St. Louis, I’d encourage you to check out the “Thinking in MapReduce” session presented by Cerner’s Ryan Brush. The session will cover the value that MapReduce and Apache Hadoop offer to the healthcare space, and provide tips on how to effectively use Hadoop ecosystem tools to solve healthcare problems.
Big Data challenges within the healthcare space stem from the standard practice of storing data in many siloed systems. Hadoop is allowing pharmaceutical companies and healthcare providers to revolutionize their approach to business by making it easier and more cost efficient to bring together all of these fragmented systems for a single, more accurate view of health. The end result: smarter clinical care decisions, better understanding of health risks for individuals and populations, and proactive measures to improve health and reduce healthcare costs.
Users of CDH, Cloudera’s Big Data platform, are solving big problems and building amazing solutions with Apache Hadoop. We at Cloudera are very proud of our customers’ accomplishments, and it’s time to showcase them. This year we’re thrilled to present the first annual Data Impact Awards, an awards program designed to recognize Hadoop innovators for their achievements in five categories:
The Data Warehousing Institute (TDWI) runs an annual Best Practices Awards program to recognize organizations for their achievements in business intelligence and data warehousing. A few months ago, I was introduced to Motorola Mobility’s VP of cloud platforms and services, Balaji Thiagarajan. After learning about its interesting Apache Hadoop use case and the success it has delivered, Balaji and I worked together to nominate Motorola Mobility for the TDWI Best Practices Award for Emerging Technologies and Methods. And to my delight, it won!
Chances are, you’ve heard of Motorola Mobility. It released the first commercial portable cell phone back in 1984, later dominated the mobile phone market with the super-thin RAZR, and today a large portion of the massive smartphone market runs on its Android operating system.