Category Archives: Kite SDK

Collection Aliasing: Near Real-Time Search for Really Big Data

Categories: General Kite SDK Search

The rise of Big Data has been pushing search engines to handle ever-increasing amounts of data. While building Cloudera Search, one of the things we considered in Cloudera Engineering was how we would incorporate Apache Solr with Apache Hadoop in a way that would enable near-real-time indexing and searching on really big data.

Eventually, we built Cloudera Search on Solr and Apache Lucene,

Read more

Email Indexing Using Cloudera Search

Categories: Flume Hadoop Kite SDK Search Use Case

Why would any company be interested in searching through its vast trove of email? A better question is: Why wouldn’t everybody be interested? 

Email has become the most widespread method of communication we have, so there is much value to be extracted by making all emails searchable and readily available for further analysis. Some common use cases that involve email analysis are fraud detection, customer sentiment and churn, lawsuit prevention, and that’s just the tip of the iceberg.

Read more

Next Stops for The Cloudera Sessions: Jersey City, Miami, Denver, Milwaukee

Categories: Events Hadoop Kite SDK Use Case

Cloudera Sessions

In its first leg of its tour of the United States earlier this year (see photos here), The Cloudera Sessions proved to be an invaluable single-day event for business and technical leaders exploring practical applications of Apache Hadoop. So valuable, in fact, that we’ve extended the tour with dates/cities this September and October.

Based on feedback from previous attendees, we’ve customized the agenda to be even more targeted for real-world use cases.

Read more

This Month in the Ecosystem

Categories: Community General Hadoop HBase Hive Kite SDK Sqoop

The ecosystem is evolving at a rapid pace – so rapidly, that important developments are often passing through the public attention zone too quickly. Thus, we think it might be helpful to bring you a digest (by no means complete!) of our favorite highlights on a regular basis. (This effort, by the way, has different goals than the fine Hadoop Weekly newsletter, which has a more expansive view – and which you should subscribe to immediately,

Read more

Introducing Morphlines: The Easy Way to Build and Integrate ETL Apps for Hadoop

Categories: Hadoop Kite SDK Search

This post is the first in a series of blog posts about Cloudera Morphlines, a new command-based framework that simplifies data preparation for Apache Hadoop workloads. To check it out or help contribute, you can find the code here.

Cloudera Morphlines is a new open source framework that reduces the time and effort necessary to integrate, build, and change Hadoop processing applications that extract, transform, and load data into Apache Solr,

Read more