Cloudera’s new “Designing and Building Big Data Applications” is a great springboard for writing apps for an enterprise data hub.
Cloudera’s vision of an enterprise data hub as a central, scalable repository for all your data is changing the notion of data warehousing. The best way to gain value from all of your data is by bringing more workloads to where the data lives. That place is Apache Hadoop.
Our thanks to Janos Matyas, CTO and Founder of SequenceIQ, for the guest post below about his company’s use case for Morphlines (part of the Kite SDK).
SequenceIQ has an Apache Hadoop-based platform and API that consume and ingest various types of data from different sources to offer predictive analytics and actionable insights. Our datasets are structured, unstructured, log files, and communication records, and they require constant refining,
CDK has a new monicker, but the goals remain the same.
We are pleased to announce a new name for the Cloudera Development Kit (CDK): Kite. We’ve just released Kite version 0.10.0, which is purely a rename of CDK 0.9.0.
The new repository and documentation are here:
Why the rename?
In my previous post you learned how to index email messages in batch mode, and in near real time, using Apache Flume with MorphlineSolrSink. In this post, you will learn how to index emails using Cloudera Search with Apache HBase and Lily HBase Indexer, maintained by NGDATA and Cloudera. (If you have not read the previous post, I recommend you do so for background before reading on.)
Which near-real-time method to choose,
In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.
In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks: