Hadoop Summit Europe is coming up in Amsterdam next week, so this is an appropriate time to make you aware of the Cloudera speaker program there (all three talks on Thursday, March 21):
- Apache HBase Sizing Notes (Lars George, Solutions Architect/HBase PMC Member – also Track Chair of Hadoop Summit’s “Integrating Hadoop” track)
This talk will address valuable lessons learned with the current versions of HBase. There are inherent architectural features that warrant for careful evaluation of the data schema and how to scale out a cluster. The audience will get a best practices summary of where there are limitations in the design of HBase and how to avoid those. In particular, we will discuss issues like proper memory tuning (for reads and writes), optimal flush file sizing, compaction tuning, and the number of write ahead logs required. Furthermore, there will be a discussion of the theoretical write performance, in comparison to those observed on real clusters. A collection of cheat sheets and example calculation for cluster sizing rounds out the talk towards the end.
- HBase Storage Internals, Present and Future (Matteo Bertozzi, Software Engineer/HBase Committer)
Apache HBase is the Hadoop open-source, distributed, versioned storage manager well suited for random, realtime read/write access. This talk will give an overview on how HBase achieve random I/O, focusing on the storage layer internals, starting from how the client interacts with Region Servers and Master and going into WAL, MemStore, Compactions, and on-disk format details.
- Hadoop and the Enterprise Data Warehouse (Patrick Angeles, Solutions Architect)
The Data Warehouse has been a staple in data-driven organizations for decades. As a result, the ecosystem, architecture, processes and methodologies around data warehousing is extremely mature. The arrival of Hadoop and Big Data has brought new life into traditional data warehousing by proposing new architectures and processes that upend existing norms. This presentation goes over several variants of how Hadoop interplays with existing data warehouses to solve modern problems.
There will be other Cloudera employees at the summit as well, so be on the lookout for anyone wearing Cloudera Blue!