Author Archives: Bilung Lee

What’s New in CDH4.1 Mahout

Categories: CDH Data Science Mahout

Cloudera recently announced the general availability of CDH4.1, an update to our open-source, enterprise-ready distribution of Apache Hadoop and related projects. Among various components, Apache Mahout is a relatively recent addition to CDH (first added to CDH3u2 in 2011), but is already attracting increasing interest out in the field. 

Mahout started as a sub-project of Apache Lucene to provide machine-learning libraries in the area of clustering and classification. It later evolved into a top-level Apache project with much broader coverage of machine-learning techniques (clustering,

Read More

Cloudera Connector for Teradata 1.0.0

Categories: Sqoop

Apache Sqoop (incubating) provides an efficient approach for transferring big data between Hadoop related systems (such as HDFS, Hive, and HBase) and structured data stores (such as relational databases, data warehouses, and NoSQL systems). The extensible architecture used by Sqoop allows support for a data store to be added as a so-called connector. By default, Sqoop comes with connectors for a variety of databases such as MySQL, PostgreSQL, Oracle, SQL Server, and DB2.

Read More

What’s New in Apache Sqoop 1.4.0-incubating

Categories: Sqoop

This blog was originally posted on the Apache Blog.

Apache Sqoop recently celebrates its first incubator release, version 1.4.0-incubating.  There are several new features and improvements added in this release.  This post will cover some of those interesting changes.  Sqoop is currently undergoing incubation at The Apache Software Foundation.  More information on this project can be found at

Customized Type Mapping (SQOOP-342)

Sqoop is equipped with a default mapping from most SQL types to appropriate Java or Hive counterparts during import. 

Read More