Cloudera Developer Blog · Search Posts

New in CDH 5.1: Document-level Security for Cloudera Search

Cloudera Search now supports fine-grain access control via document-level security provided by Apache Sentry.

In my previous blog post, you learned about index-level security in Apache Sentry (incubating) and Cloudera Search. Although index-level security is effective when the access control requirements for documents in a collection are homogenous, often administrators want to restrict access to certain subsets of documents in a collection.

Index-Level Security Comes to Cloudera Search

The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.

As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:

Secrets of Cloudera Support: Inside Our Own Enterprise Data Hub

Cloudera’s own enterprise data hub is yielding great results for providing world-class customer support.

Here at Cloudera, we are constantly pushing the envelope to give our customers world-class support. One of the cornerstones of this effort is the Cloudera Support Interface (CSI), which we’ve described in prior blog posts (here and here). Through CSI, our support team is able to quickly reason about a customer’s environment, search for information related to a case currently being worked, and much more.

How-to: Index and Search Multilingual Documents in Hadoop

Learn how to use Cloudera Search along with RBL-JE to search and index documents in multiple languages.

Our thanks to Basis Technology for providing the how-to below!

How-to: Index and Search Data with Hue’s Search App

You can use Hue and Cloudera Search to build your own integrated Big Data search app.

In a previous post, you learned how to analyze data using Apache Hive via Hue’s Beeswax and Catalog apps. This time, you’ll see how to make Yelp Dataset Challenge data searchable by indexing it and building a customizable UI with the Hue Search app.

Indexing Data in Cloudera Search

How-to: Add Cloudera Search to Your Cluster using Cloudera Manager

Cloudera Manager 4.7 added support for managing Cloudera Search 1.0. Thus Cloudera Manager users can easily deploy all components of Cloudera Search (including Apache Solr) and manage all related services, just like every other service included in CDH (Cloudera’s distribution of Apache Hadoop and related projects).

In this how-to, you will learn the steps involved in adding Cloudera Search to a Cloudera Enterprise (CDH + Cloudera Manager) cluster.

Installing the SOLR Parcel

Email Indexing Using Cloudera Search and HBase

In my previous post you learned how to index email messages in batch mode, and in near real time, using Apache Flume with MorphlineSolrSink. In this post, you will learn how to index emails using Cloudera Search with Apache HBase and Lily HBase Indexer, maintained by NGDATA and Cloudera. (If you have not read the previous post, I recommend you do so for background before reading on.)

Which near-real-time method to choose, HBase Indexer or Flume MorphlineSolrSink, will depend entirely on your use case, but below are some things to consider when making that decision:

Collection Aliasing: Near Real-Time Search for Really Big Data

The rise of Big Data has been pushing search engines to handle ever-increasing amounts of data. While building Cloudera Search, one of the things we considered in Cloudera Engineering was how we would incorporate Apache Solr with Apache Hadoop in a way that would enable near-real-time indexing and searching on really big data.

Eventually, we built Cloudera Search on Solr and Apache Lucene, both of which have been adding features at an ever-faster pace to aid in handling more and more data. However, there is no silver bullet for dealing with extremely large-scale data. A common answer in the world of search is “it depends,” and that answer applies in large-scale search as well. The right architecture for your use case depends on many things, and your choice will generally be guided by the requirements and resources for your particular project.

Secrets of Cloudera Support: Impala and Search Make the Customer Experience Even Better

In December 2012, we described how an internal application built on CDH called Cloudera Support Interface (CSI), which drastically improves Cloudera’s ability to optimally support our customers, is a unique and instructive use case for Apache Hadoop. In this post, we’ll follow up by describing two new differentiating CSI capabilities that have made Cloudera Support yet more responsive for customers:

Email Indexing Using Cloudera Search

Why would any company be interested in searching through its vast trove of email? A better question is: Why wouldn’t everybody be interested? 

Email has become the most widespread method of communication we have, so there is much value to be extracted by making all emails searchable and readily available for further analysis. Some common use cases that involve email analysis are fraud detection, customer sentiment and churn, lawsuit prevention, and that’s just the tip of the iceberg. Each and every company can extract tremendous value based on its own business needs. 

Older Posts