Wednesday, November 9th, 2011
This session will focus on the challenges of replacing existing Relational DataBase and Data Warehouse technologies with Open Source components. Jason Han will base his presentation on his experience migrating Korea Telecom (KT’s) CDR data from Oracle to Hadoop, which required converting many Oracle SQL queries to Hive HQL queries. He will cover the differences between SQL and HQL; the implementation of Oracle’s basic/analytics functions with MapReduce; the use of Sqoop for bulk loading RDB data into Hadoop; and the use of Apache Flume for collecting fast-streamed CDR data. He’ll also discuss Lucene and ElasticSearch for near-realtime distributed indexing and searching. You’ll learn tips for migrating existing enterprise big data to open source, and gain insight into whether this strategy is suitable for your own data.