Author Archives: Jeff Bean

About Jeff Bean

Jeff is a Solutions Architect at Cloudera. He has worked as a presales engineer for IBM/Cognos, BEA Systems, and Actuate specializing in data integration. He's also done software engineering for SCO, and an open source startup called Lutris Technologies.

How-to: Configure JDBC Connections in Secure Apache Hadoop Environments

Categories: Hive How-to Impala Platform Security & Cybersecurity

Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.

In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:

Read more

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Categories: Hadoop YARN

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,

Read more

Avoiding Common Hadoop Administration Issues

Categories: General

It’s easy to get started with Hadoop administration because Linux system administration is a pretty well-known beast, and because systems administrators are used to administering all kinds of existing complex applications. However, there are many common missteps we’re seeing that make us believe there’s a need for some guidance in Hadoop administration. Most of these mistakes come from a lack of understanding about how Hadoop works. Here are just a few of the common issues we find:

Lack of configuration management

It makes sense to start with a small cluster and then to scale out over time as you find initial success and your needs grow.

Read more

Notes From the Hackathon at Cloudera

Categories: General

I was positively blown away by the enthusiasm, creativity, and productivity exhibited by the participants in the CDH3b2 Hackathon. We had over twenty participants from established companies like Oracle and Akamai, stealth-mode startups and one-man consulting shops. At one point we had 9 simultaneous hacking projects going, with groups of one to five people. At the end of the day, participants voted on the most interesting project, which won a prize – an iPod Nano for each participant on that project.

Read more

Considerations for Apache Hadoop and BI (part 2 of 2)

Categories: General Hadoop

Just today we heard another question about integrating Apache Hadoop with Business Intelligence tools. This is one of the most common questions we receive from enterprises adopting or evaluating Hadoop. In the early stages of their projects, customers are generally not sure how to connect their BI tools to Hadoop, and when it makes sense to do so. As I wrote in BI Considerations and Hadoop Part 1, Cloudera encourages you to use your existing infrastructure wherever possible,

Read more