Inaugural Sqoop Meetup
This blog was originally posted on the Apache Blog:
Over 30 people attended the inaugural Sqoop Meetup on the eve of Hadoop World in NYC. Faces were put to names, troubleshooting tips were swapped, and stories were topped – with the table-to-end-all-tables weighing in at 28 billion rows.
I started off the scheduled talks by discussing “Habits of Effective Sqoop Users.” One tip to make your next debugging session more effective was to provide more information up front on the mailing list such as versions used and running with the –verbose flag enabled. Also, I pointed out workarounds to common MySQL and Oracle errors.
Next up was Eric Hernandez’s “Sqooping 50 Million Rows a Day from MySQL,” where he displayed battle scars from creating a single data source for analysts to mine. Key lessons learned were: (1.) Develop an incremental import when sqooping in large active tables. (2.) Limit the amount of parts that data will be stored in HDFS. (3.) Compress data in HDFS.
The final talk of the night was given by Joey Echeverria on “Scratching Your Own Itch.” Joey methodically stepped future Sqoop committers through the science from finding a Sqoop bug, filing a jira, coding a patch, submitting it for review, revising accordingly, and finally to ship it ‘+1′ approval.
With the conclusion of the scheduled talks, the hallway talks commenced and went well into the night. Sqoop Committer Aaron Kimball was even rumored to have shed a tear over the healthy turnout and impending momentum barreling towards the next Sqoop Meetup on the Left Coast. See you there!
Photos from Masatake Iwasaki and Kate Ting.