Category Archives: Sqoop

New in CDH 5.3: Apache Sentry Integration with HDFS

Categories: Data Ingestion Security Sentry Sqoop

Starting in CDH 5.3, Apache Sentry integration with HDFS saves admins a lot of work by centralizing access control permissions across components that utilize HDFS.

It’s been more than a year and a half since a couple of my colleagues here at Cloudera shipped the first version of Sentry (now Apache Sentry (incubating)). This project filled a huge security gap in the Apache Hadoop ecosystem by bringing truly secure and dependable fine grained authorization to the Hadoop ecosystem and provided out-of-the-box integration for Apache Hive.

Read More

Using Apache Sqoop for Load Testing

Categories: Data Ingestion Guest Sqoop

Our thanks to Montrial Harrell, Enterprise Architect for the State of Indiana, for the guest post below.

Recently, the State of Indiana has begun to focus on how enterprise data management can help our state’s government operate more efficiently and improve the lives of our residents. With that goal in mind, I began this journey just like everyone else I know: with an interest in learning more about Apache Hadoop.

I started learning Hadoop via a virtual server onto which I installed CDH and worked through a few online tutorials.

Read More

How Apache Sqoop 1.4.5 Improves Oracle Database/Apache Hadoop Integration

Categories: Data Ingestion Guest Performance Sqoop

Thanks to Guy Harrison of Dell Inc. for the guest post below about time-tested performance optimizations for connecting Oracle Database with Apache Hadoop that are now available in Apache Sqoop 1.4.5 and later.

Back in 2009, I attended a presentation by a Cloudera employee named Aaron Kimball at the MySQL User Conference in which he unveiled a new tool for moving data from relational databases into Hadoop. This tool was to become,

Read More

How SQOOP-1272 Can Help You Move Big Data from Mainframe to Apache Hadoop

Categories: Guest Hadoop Sqoop

Thanks to M. Asokan, Chief Architect at Syncsort, for the guest post below.

Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this:

  • HDFS is used simply as an archival medium for historical data living on the mainframe.

Read More

Sqooping Data with Hue

Categories: Hue Sqoop

Hue, the open source Web UI that makes Apache Hadoop easier to use, has a brand-new application that enables transferring data between relational databases and Hadoop. This new application is driven by Apache Sqoop 2 and has several user experience improvements, to boot.

Sqoop is a batch data migration tool for transferring data between traditional databases and Hadoop. The first version of Sqoop is a heavy client that drives and oversees data transfer via MapReduce.

Read More