Author Archives: Alex Kozlov

About Alex Kozlov

Alex Kozlov has over 15 years of experience in distributed processing, data analysis and data warehousing at SGI, Hewlett-Packard, Turn, and Adchemy. Alex received an equivalent of Ph.D. in Physics and Mathematics from Moscow State University in 1988 and earned another doctorate in Applied Physics from Stanford University in 1998.

How-to: Include Third-Party Libraries in Your MapReduce Job

Categories: Hadoop HBase How-to MapReduce

“My library is in the classpath but I still get a Class Not Found exception in a MapReduce job” – If you have this problem this blog is for you.

Java requires third-party and user-defined classes to be on the command line’s “classpath” option when the JVM is launched. The hadoop wrapper shell script does exactly this for you by building the classpath from the core libraries located in /usr/lib/hadoop-0.20/ and /usr/lib/hadoop-0.20/lib/ directories.

Read More

Hadoop/HBase Capacity Planning

Categories: Hadoop HBase HDFS MapReduce ZooKeeper

Apache Hadoop and Apache HBase are gaining popularity due to their flexibility and tremendous work that has been done to simplify their installation and use.  This blog is to provide guidance in sizing your first Hadoop/HBase cluster.  First, there are significant differences in Hadoop and HBase usage.  Hadoop MapReduce is primarily an analytic tool to run analytic and data extraction queries over all of your data, or at least a significant portion of them (data is a plural of datum).  

Read More