Category Archives: Sqoop

Using Sqoop to Import Data from MySQL to Cloudera Data Warehouse

Categories: CDH Sqoop

Cloudera Data Warehouse offers a powerful combination of flexibility and cost-savings. Using Cloudera Data Warehouse, you can transform and optimize your current traditional data warehouse by moving select workloads to your CDH cluster. This article shows you how to transform your current setup into a modern data warehouse by moving some initial data over to Impala on your CDH cluster.

Moving mysql data to hdfs copy

Prerequisites

To use the following data import scenario,

Read more

New in CDH 5.3: Apache Sentry Integration with HDFS

Categories: Data Ingestion Platform Security & Cybersecurity Sentry Sqoop

Starting in CDH 5.3, Apache Sentry integration with HDFS saves admins a lot of work by centralizing access control permissions across components that utilize HDFS.

It’s been more than a year and a half since a couple of my colleagues here at Cloudera shipped the first version of Sentry (now Apache Sentry (incubating)). This project filled a huge security gap in the Apache Hadoop ecosystem by bringing truly secure and dependable fine grained authorization to the Hadoop ecosystem and provided out-of-the-box integration for Apache Hive.

Read more

Using Apache Sqoop for Load Testing

Categories: Data Ingestion Guest Sqoop

Our thanks to Montrial Harrell, Enterprise Architect for the State of Indiana, for the guest post below.

Recently, the State of Indiana has begun to focus on how enterprise data management can help our state’s government operate more efficiently and improve the lives of our residents. With that goal in mind, I began this journey just like everyone else I know: with an interest in learning more about Apache Hadoop.

I started learning Hadoop via a virtual server onto which I installed CDH and worked through a few online tutorials.

Read more

How Apache Sqoop 1.4.5 Improves Oracle Database/Apache Hadoop Integration

Categories: Data Ingestion Guest Performance Sqoop

Thanks to Guy Harrison of Dell Inc. for the guest post below about time-tested performance optimizations for connecting Oracle Database with Apache Hadoop that are now available in Apache Sqoop 1.4.5 and later.

Back in 2009, I attended a presentation by a Cloudera employee named Aaron Kimball at the MySQL User Conference in which he unveiled a new tool for moving data from relational databases into Hadoop. This tool was to become,

Read more

How SQOOP-1272 Can Help You Move Big Data from Mainframe to Apache Hadoop

Categories: Guest Hadoop Sqoop

Thanks to M. Asokan, Chief Architect at Syncsort, for the guest post below.

Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this:

  • HDFS is used simply as an archival medium for historical data living on the mainframe.

Read more