Category Archives: Search

How-to: Index and Search Data with Hue’s Search App

Categories: How-to Hue Search

You can use Hue and Cloudera Search to build your own integrated Big Data search app.

In a previous post, you learned how to analyze data using Apache Hive via Hue’s Beeswax and Catalog apps. This time, you’ll see how to make Yelp Dataset Challenge data searchable by indexing it and building a customizable UI with the Hue Search app.

Indexing Data in Cloudera Search

Indexing data in Cloudera Search involves :

  • Setting up SolrCloud to partition your dataset into multiple indexes and processes
  • Configuring SolrCloud collections to hold indexes
  • Specifying the schema by which indexes will be created
  • Feeding relevant data into the SolrCloud


Read More

How-to: Add Cloudera Search to Your Cluster using Cloudera Manager

Categories: Cloudera Manager HBase How-to Search

Cloudera Manager 4.7 added support for managing Cloudera Search 1.0. Thus Cloudera Manager users can easily deploy all components of Cloudera Search (including Apache Solr) and manage all related services, just like every other service included in CDH (Cloudera’s distribution of Apache Hadoop and related projects).

In this how-to, you will learn the steps involved in adding Cloudera Search to a Cloudera Enterprise (CDH + Cloudera Manager) cluster.

Read More

Email Indexing Using Cloudera Search and HBase

Categories: HBase Kite SDK Search Use Case

In my previous post you learned how to index email messages in batch mode, and in near real time, using Apache Flume with MorphlineSolrSink. In this post, you will learn how to index emails using Cloudera Search with Apache HBase and Lily HBase Indexer, maintained by NGDATA and Cloudera. (If you have not read the previous post, I recommend you do so for background before reading on.)

Which near-real-time method to choose,

Read More

Collection Aliasing: Near Real-Time Search for Really Big Data

Categories: General Kite SDK Search

The rise of Big Data has been pushing search engines to handle ever-increasing amounts of data. While building Cloudera Search, one of the things we considered in Cloudera Engineering was how we would incorporate Apache Solr with Apache Hadoop in a way that would enable near-real-time indexing and searching on really big data.

Eventually, we built Cloudera Search on Solr and Apache Lucene,

Read More

Secrets of Cloudera Support: Impala and Search Make the Customer Experience Even Better

Categories: CDH Hadoop HBase Impala Search Use Case

In December 2012, we described how an internal application built on CDH called Cloudera Support Interface (CSI), which drastically improves Cloudera’s ability to optimally support our customers, is a unique and instructive use case for Apache Hadoop. In this post, we’ll follow up by describing two new differentiating CSI capabilities that have made Cloudera Support yet more responsive for customers:

  • How Cloudera Impala has turbo-charged CSI with support for real-time log file analysis and visualization
  • How Cloudera Search enables interactive data exploration of multiple sources simultaneously from within CSI


Read More