Apache Sqoop (incubating) provides an efficient approach for transferring big data between Hadoop related systems (such as HDFS, Hive, and HBase) and structured data stores (such as relational databases, data warehouses, and NoSQL systems). The extensible architecture used by Sqoop allows support for a data store to be added as a so-called connector. By default, Sqoop comes with connectors for a variety of databases such as MySQL, PostgreSQL, Oracle, SQL Server, and DB2.
This post provides a high-level overview of Apache Sqoop (incubating). It discusses the general problem addressed by Sqoop and provides simple examples on how to use it. This post is written by Arvind Prabhakar, who is a Sqoop committer.
This post was contributed by The Global Biodiversity Information Facility development team.
The Global Biodiversity Information Facility is an international organization, whose mission is to promote and enable free and open access to biodiversity data worldwide. Part of this includes operating a search, discovery and access system, known as the Data Portal; a sophisticated index to the content shared through GBIF. This content includes both complex taxonomies and occurrence data such as the recording of specimen collection events or species observations.