Cloudera Engineering Blog · Use Case Posts
We’re kicking off the second leg of our Cloudera Sessions roadshow this week, starting in San Francisco on Wednesday and Philadelphia on Friday. The spring series of the Cloudera Sessions was a big hit, which is why we’re back with a new and improved agenda for the fall, to offer even more options that will help attendees — ranging from developers to line-of-business managers and executives — navigate the Big Data journey. The expanded fall series agenda includes an application development lab (based on CDK) that coincides with the general session throughout the morning, and two tracks for clinics after lunch.
One portion of the general session that was a big hit throughout the spring series and that will return this fall is the Fireside Chat, during which the Cloudera executive host sits with one or two customers to talk about their “real life” experiences and lessons learned with Apache Hadoop. The Fireside Chat gives local customers an opportunity to showcase the work they’re doing, and allows attendees to hear from real users what worked, what didn’t, how they got started with Hadoop, and best practices learned along the way.
One of the first questions Cloudera customers raise when getting started with Apache Hadoop is how to select appropriate hardware for their new Hadoop clusters.
Although Hadoop is designed to run on industry-standard hardware, recommending an ideal cluster configuration is not as easy as delivering a list of hardware specifications. Selecting hardware that provides the best balance of performance and economy for a given workload requires testing and validation. (For example, users with IO-intensive workloads will invest in more spindles per core.)
This week, I’d like to shine a spotlight on innovative work the National Institutes of Health (NIH) is working on, leveraging Big Data, in the area of genomic research. Understanding DNA structure and functions is a very data-intensive, complex, and expensive undertaking. Apache Hadoop is making it more affordable and feasible to process, store, and analyze this data, and the NIH is embracing the technology for this reason. In fact, it has initiated a Big Data center of excellence — which it calls Big Data to Knowledge (BD2K) — to accelerate innovations in bioinformatics using Big Data, which will ultimately help us better understand and control various diseases and disorders.
Bob Gourley — a friend of Cloudera’s who wears many hats including publisher of CTOvision.com, CTO of Crucial Point LLC, and GigaOm analyst — recently interviewed Dr. Mark Guyer, the deputy director of the NIH’s National Human Genome Research Institute (NHGRI), about the BD2K effort.
For those of you attending this week’s StampedeCon event in St. Louis, I’d encourage you to check out the “Thinking in MapReduce” session presented by Cerner’s Ryan Brush. The session will cover the value that MapReduce and Apache Hadoop offer to the healthcare space, and provide tips on how to effectively use Hadoop ecosystem tools to solve healthcare problems.
Big Data challenges within the healthcare space stem from the standard practice of storing data in many siloed systems. Hadoop is allowing pharmaceutical companies and healthcare providers to revolutionize their approach to business by making it easier and more cost efficient to bring together all of these fragmented systems for a single, more accurate view of health. The end result: smarter clinical care decisions, better understanding of health risks for individuals and populations, and proactive measures to improve health and reduce healthcare costs.
Users of CDH, Cloudera’s Big Data platform, are solving big problems and building amazing solutions with Apache Hadoop. We at Cloudera are very proud of our customers’ accomplishments, and it’s time to showcase them. This year we’re thrilled to present the first annual Data Impact Awards, an awards program designed to recognize Hadoop innovators for their achievements in five categories:
The Data Warehousing Institute (TDWI) runs an annual Best Practices Awards program to recognize organizations for their achievements in business intelligence and data warehousing. A few months ago, I was introduced to Motorola Mobility’s VP of cloud platforms and services, Balaji Thiagarajan. After learning about its interesting Apache Hadoop use case and the success it has delivered, Balaji and I worked together to nominate Motorola Mobility for the TDWI Best Practices Award for Emerging Technologies and Methods. And to my delight, it won!
Chances are, you’ve heard of Motorola Mobility. It released the first commercial portable cell phone back in 1984, later dominated the mobile phone market with the super-thin RAZR, and today a large portion of the massive smartphone market runs on its Android operating system.
In this Customer Spotlight, I’d like to emphasize some undeniably positive use cases for Big Data, by looking at some of the ways the healthcare and life sciences industries are innovating to benefit humankind. Here are just a few examples:
Mount Sinai School of Medicine has partnered with Cloudera’s own Jeff Hammerbacher to apply Big Data to better predict and understand disease processes and treatments. The Mount Sinai School of Medicine is a top medical school in the US, noted for innovation in biomedical research, clinical care delivery, and community services. With Cloudera’s Big Data technology and Jeff’s data science expertise, Mount Sinai is better equipped to develop solutions designed for high-performance, scalable data analysis and multi-scale measurements. For example, medical research and discovery areas in genotype, gene expression and organ health will benefit from these Big Data applications.
This is the week of Apache HBase, with HBaseCon 2013 taking place Thursday, followed by WibiData’s KijiCon on Friday. In the many conversations I’ve had with Cloudera customers over the past 18 months, I’ve noticed a trend: Those that run HBase stand out. They tend to represent a group of very sophisticated Hadoop users that are accomplishing impressive things with Big Data. They deploy HBase because they require random, real-time read/write access to the data in Hadoop. Hadoop is a core component of their data management infrastructures, and these users rely on the latest and greatest components of the Hadoop stack to satisfy their mission-critical data needs.
Today I’d like to shine a spotlight on one innovative company that is putting top engineering talent (and HBase) to work, helping to save the planet — literally.
Earlier this week, we hosted The Cloudera Forum to reveal Cloudera’s “Unaccept the Status Quo” vision and to announce the public beta launch of Cloudera Search. The event featured a panel discussion between representatives from four companies that are embracing the latest big data innovations, moderated by our own CEO Mike Olson. Those are the companies I’d like to highlight in this week’s spotlight, for obvious reasons. The panelists were… (drumroll, please):