Category Archives: Kite SDK

New Training: Design and Build Big Data Applications

Categories: Kite SDK Training

Cloudera’s new “Designing and Building Big Data Applications” is a great springboard for writing apps for an enterprise data hub.

Cloudera’s vision of an enterprise data hub as a central, scalable repository for all your data is changing the notion of data warehousing. The best way to gain value from all of your data is by bringing more workloads to where the data lives. That place is Apache Hadoop.

For engineers,

Read More

How-to: Process Data using Morphlines (in Kite SDK)

Categories: Kite SDK Use Case

Our thanks to Janos Matyas, CTO and Founder of SequenceIQ, for the guest post below about his company’s use case for Morphlines (part of the Kite SDK).

SequenceIQ has an Apache Hadoop-based platform and API that consume and ingest various types of data from different sources to offer predictive analytics and actionable insights. Our datasets are structured, unstructured, log files, and communication records, and they require constant refining,

Read More

Cloudera Development Kit is Now "Kite SDK"

Categories: Kite SDK

CDK has a new monicker, but the goals remain the same.

We are pleased to announce a new name for the Cloudera Development Kit (CDK): Kite. We’ve just released Kite version 0.10.0, which is purely a rename of CDK 0.9.0.

The new repository and documentation are here:

Why the rename?

Read More

Email Indexing Using Cloudera Search and HBase

Categories: HBase Kite SDK Search Use Case

In my previous post you learned how to index email messages in batch mode, and in near real time, using Apache Flume with MorphlineSolrSink. In this post, you will learn how to index emails using Cloudera Search with Apache HBase and Lily HBase Indexer, maintained by NGDATA and Cloudera. (If you have not read the previous post, I recommend you do so for background before reading on.)

Which near-real-time method to choose,

Read More

Cascading, Spring, and Spark: Development Choices for CDH Users Expand

Categories: CDH Hadoop Kite SDK

In software development, there is no substitute for having choices. Furthermore, freedom of choice – between frameworks, APIs, and languages — is a major fuel source for platform adoption across any successful ecosystem.

In the case of development on CDH, the open source core of Cloudera’s Big Data platform containing Apache Hadoop and related ecosystem projects, the choices have expanded dramatically in the past three weeks:

  • Spark + CDH

    Cloudera has announced direct support for Apache Spark (incubating) with CDH.

Read More