Hadoop World 2011: A Glimpse into Enterprise Architecture

Categories: Community Hadoop

The Enterprise Architecture track at Hadoop World 2011 will provide insight into how Hadoop is powering today’s advanced data management ecosystems and how Hadoop fits into modern enterprise environments. Speakers will discuss architecture and models, demonstrating how Hadoop connects to surrounding platforms. Attendees of the Enterprise Architecture track will learn Hadoop deployment design patterns; enterprise models and system architecture; types of systems managing data that is transferred to Hadoop using Apache Sqoop and Apache Flume; and how to publish data via Apache Hive, Apache HBase and Apache Sqoop to systems that consume data from Hadoop.

Preview of Enterprise Architecture Track Sessions

Building Realtime Big Data Services at Facebook with Hadoop and HBase
Jonathan Gray, Facebook, Inc.

Abstract: Facebook has one of the largest Apache Hadoop data warehouses in the world, primarily queried through Apache Hive for offline data processing and analytics. However, the need for realtime analytics and end-user access has led to the development of several new systems built using Apache HBase. This talk will cover specific use cases and the work done at Facebook around building large scale, low latency and high throughput realtime services with Hadoop and HBase. This includes several significant contributions to existing projects as well as the release of new open source projects.

Extending the Enterprise Data Warehouse with Hadoop
Jonathan Seidman, Orbitz Worldwide
Rob Lancaster, Orbitz worldwide

Abstract: Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.

Leveraging Hadoop for Legacy Systems
Mathias Herberts, Crédit Mutuel Arkéa

Abstract: Since many companies in the financial sector still relies on legacy systems for its daily operations, Hadoop can only be truly useful in those environments if it can fit nicely among COBOL, VSAM, MVS and other legacy technologies. In this session, we will detail how Crédit Mutuel Arkéa solved this challenge and successfully mixed the mainframe and Hadoop.

Replacing RDB/DW with Hadoop and Hive for Telco Big Data
Jason Han, NexR Inc.

Abstract: This session will focus on the challenges of replacing existing Relational DataBase and Data Warehouse technologies with Open Source components. Jason Han will base his presentation on his experience migrating Korea Telecom (KT’s) CDR data from Oracle to Hadoop, which required converting many Oracle SQL queries to Hive HQL queries. He will cover the differences between SQL and HQL; the implementation of Oracle’s basic/analytics functions with MapReduce; the use of Sqoop for bulk loading RDB data into Hadoop; and the use of Apache Flume for collecting fast-streamed CDR data. He’ll also discuss Lucene and ElasticSearch for near-realtime distributed indexing and searching. You’ll learn tips for migrating existing enterprise big data to open source, and gain insight into whether this strategy is suitable for your own data.

WibiData: Entity-centric Analysis with HBase
Aaron Kimball, Odiago

Abstract: WibiData is a collaborative data mining and predictive modeling platform for large-scale, multi-structured, user-centric data. It leverages HBase to combine batch analysis and real time access within the same system, and integrates with existing BI, reporting and analysis tools. WibiData offers a set of libraries for common user-centric analytic tasks, and more advanced data mining libraries for personalization, recommendation, and other predictive modeling applications. Developers can write re-usable libraries that are also accessible to data scientists and analysts alongside the WibiData libraries. In this talk, we will provide a technical overview of WibiData, and show how we used it to build FoneDoktor, a mobile app that collects data about device performance and app resource usage to offer personalized battery and performance improvement recommendations directly to users.

Storing and Indexing Social Media Content in the Hadoop Ecosystem
Lance Riedel, Jive Software

Abstract: Jive is using Flume to deliver the content of a social web (250M messages/day) to HDFS and HBase. Flume’s flexible architecture allows us to stream data to our production data center as well as Amazon’s Web Services datacenter. We periodically build and merge Lucene indices with Hadoop jobs and deploy these to Katta to provide near real time search results. This talk will explore our infrastructure and decisions we’ve made to handle a fast growing set of real time data feeds. We will further explore other uses for Flume throughout Jive including log collection and our distributed event bus.