The Apache HBase Medium Object Storage (MOB) feature was introduced by HBASE-11339. This feature improves low latency read and write access for moderately-sized values (ideally from 100K to 10MB based on our testing results), making it well-suited for storing documents, images, and other moderately-sized objects . The Apache HBase MOB feature achieves this improvement by separating IO paths for file references and MOB objects, applying different compaction policies to MOBs and thus reducing write amplification created by HBase’s compactions.
[Update: A new package for Apache Phoenix 4.7.0 on CDH 5.7 was released in June 2016.]
New Cloudera Labs packages for Apache Phoenix 4.5.2 (which includes Apache Spark integration) is now available for CDH 5.4.x and CDH 5.5.x.
Earlier this year, Cloudera announced the inclusion of Apache Phoenix in Cloudera Labs.
To recap: Phoenix adds SQL to Apache HBase,
Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data.
Event processing involves tracking and analyzing streams of data from events to support better insight and decision making. With the recent explosion in data volume and diversity of data sources, this goal can be quite challenging for architects to achieve.
Complex event processing (CEP) is a type of event processing that combines data from multiple sources to identify patterns and complex relationships across various events.
Learn the details about using Impala alongside Kudu.
Kudu (currently in beta), the new storage layer for the Apache Hadoop ecosystem, is tightly integrated with Impala, allowing you to insert, query, update, and delete data from Kudu tablets using Impala’s SQL syntax, as an alternative to using the Kudu APIs to build a custom Kudu application. In addition, you can use JDBC or ODBC to connect existing or new applications written in any language,
Learn how to use OCR tools, Apache Spark, and other Apache Hadoop components to process PDF images at scale.
Optical character recognition (OCR) technologies have advanced significantly over the last 20 years. However, during that time, there has been little or no effort to marry OCR with distributed architectures such as Apache Hadoop to process large numbers of images in near-real time.
In this post, you will learn how to use standard open source tools along with Hadoop components such as Apache Spark,