Category Archives: Kite SDK

How-to: Process Data using Morphlines (in Kite SDK)

Categories: Kite SDK Use Case

Our thanks to Janos Matyas, CTO and Founder of SequenceIQ, for the guest post below about his company’s use case for Morphlines (part of the Kite SDK).

SequenceIQ has an Apache Hadoop-based platform and API that consume and ingest various types of data from different sources to offer predictive analytics and actionable insights. Our datasets are structured, unstructured, log files, and communication records, and they require constant refining,

Read more

Cloudera Development Kit is Now "Kite SDK"

Categories: Kite SDK

CDK has a new monicker, but the goals remain the same.

We are pleased to announce a new name for the Cloudera Development Kit (CDK): Kite. We’ve just released Kite version 0.10.0, which is purely a rename of CDK 0.9.0.

The new repository and documentation are here:

Why the rename?

Read more

Email Indexing Using Cloudera Search and HBase

Categories: HBase Kite SDK Search Use Case

In my previous post you learned how to index email messages in batch mode, and in near real time, using Apache Flume with MorphlineSolrSink. In this post, you will learn how to index emails using Cloudera Search with Apache HBase and Lily HBase Indexer, maintained by NGDATA and Cloudera. (If you have not read the previous post, I recommend you do so for background before reading on.)

Which near-real-time method to choose,

Read more

Cascading, Spring, and Spark: Development Choices for CDH Users Expand

Categories: CDH Hadoop Kite SDK

In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.

In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks:

  • Spark + CDH

    Cloudera has announced direct support for Apache Spark (incubating) with CDH.

Read more

Collection Aliasing: Near Real-Time Search for Really Big Data

Categories: General Kite SDK Search

The rise of Big Data has been pushing search engines to handle ever-increasing amounts of data. While building Cloudera Search, one of the things we considered in Cloudera Engineering was how we would incorporate Apache Solr with Apache Hadoop in a way that would enable near-real-time indexing and searching on really big data.

Eventually, we built Cloudera Search on Solr and Apache Lucene,

Read more