Category Archives: Sqoop

Understanding Connectors and Drivers in the World of Sqoop

Categories: Data Ingestion Sqoop

Note: This post was originally published at in a slightly different form.

Apache Sqoop is a tool for doing highly efficient data transfers between relational databases and the Apache Hadoop ecosystem. One significant benefit of Sqoop is that it’s easy to use and can work with a variety of systems inside as well as outside of that ecosystem. Thus, with one tool, you can import or export data from all databases supporting the JDBC interface with the same command-line arguments exposed by Sqoop.

Read More

This Month in the Ecosystem

Categories: Community General Hadoop HBase Hive Kite SDK Sqoop

The ecosystem is evolving at a rapid pace – so rapidly, that important developments are often passing through the public attention zone too quickly. Thus, we think it might be helpful to bring you a digest (by no means complete!) of our favorite highlights on a regular basis. (This effort, by the way, has different goals than the fine Hadoop Weekly newsletter, which has a more expansive view – and which you should subscribe to immediately,

Read More

The Book on Apache Sqoop is Here!

Categories: Books Community Sqoop

Continuing the fine tradition of Clouderans contributing books to the Apache Hadoop ecosystem, Apache Sqoop Committers/PMC Members Kathleen Ting and Jarek Jarcec Cecho have officially joined the book author community: their Apache Sqoop Cookbook is now available from O’Reilly Media (with a pelican the assigned cover beast). 

The book arrives at an ideal time. Hadoop has quickly become the standard for processing and analyzing Big Data,

Read More

Meet the Engineer: Kathleen Ting

Categories: Flume Hadoop Meet the Engineer Sqoop Support ZooKeeper

In this installment of “Meet the Engineer”, get to know Customer Operations Engineering Manager/Apache Sqoop committer Kathleen Ting (@kate_ting).

What do you do at Cloudera, and in what open-source projects are you involved?
I’m a support manager at Cloudera, and an Apache Sqoop committer and PMC member. I also contribute to the Apache Flume and Apache ZooKeeper mailing lists and organize and present at meetups, as well as speak at conferences,

Read More

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read More