At Cloudera, we’re always working to make it easier for you to work with Hadoop and integrate Hadoop-based systems in with your existing data sources. One example of how we accomplish this is Sqoop, a database import tool developed at Cloudera that allows you to easily copy data between databases and HDFS. We originally announced this tool in June, but we’ve been steadily improving it since then. It can now talk with several more databases than before, and performance has been improved considerably. Sqoop has demonstrated its usefulness pretty quickly; several open source projects and many of our clients use Sqoop as part of their data pipeline. Last summer our friend Pete Skomoroch demonstrated how to integrate it into his Wikipedia Trending Topics project (blog tutorial ).
This talk at Hadoop World NYC by Cloudera engineer Aaron Kimball introduces Sqoop, describes its use cases, and gives some technical details of how it works.