Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.
In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:
Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.
Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.
CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,
It’s easy to get started with Hadoop administration because Linux system administration is a pretty well-known beast, and because systems administrators are used to administering all kinds of existing complex applications. However, there are many common missteps we’re seeing that make us believe there’s a need for some guidance in Hadoop administration. Most of these mistakes come from a lack of understanding about how Hadoop works. Here are just a few of the common issues we find:
Lack of configuration management
It makes sense to start with a small cluster and then to scale out over time as you find initial success and your needs grow.
I was positively blown away by the enthusiasm, creativity, and productivity exhibited by the participants in the CDH3b2 Hackathon. We had over twenty participants from established companies like Oracle and Akamai, stealth-mode startups and one-man consulting shops. At one point we had 9 simultaneous hacking projects going, with groups of one to five people. At the end of the day, participants voted on the most interesting project, which won a prize – an iPod Nano for each participant on that project.
Just today we heard another question about integrating Apache Hadoop with Business Intelligence tools. This is one of the most common questions we receive from enterprises adopting or evaluating Hadoop. In the early stages of their projects, customers are generally not sure how to connect their BI tools to Hadoop, and when it makes sense to do so. As I wrote in BI Considerations and Hadoop Part 1, Cloudera encourages you to use your existing infrastructure wherever possible,