Category Archives: Search

How-to: Secure Apache Solr Collections and Access Them Programmatically

Categories: Platform Security & Cybersecurity Search Sentry

Learn how to secure your Solr data in a policy-based, fine-grained way.

Data security is more important than ever before. At the same time, risk is increasing due to the relentlessly growing number of device endpoints, the continual emergence of new types of threats, and the commercialization of cybercrime. And with Apache Hadoop already instrumental for supporting the growth of data volumes that fuel mission-critical enterprise workloads, the necessity to master available security mechanisms is of vital importance to organizations participating in that paradigm shift.

Read More

New in Cloudera Enterprise 5.8: SQL Editor and Other Productivity Improvements

Categories: CDH Hue Search Sentry

Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.

As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on

New SQL Editor

Hue’s new code editor is a single-page app that is much simpler to use than the previous editor.

Read More

Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story

Categories: Performance Search Testing

This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions. 

Apache Solr Near Real Time (NRT)  Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher.


Read More

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Categories: Data Ingestion Flume Hadoop Kafka Search Spark Use Case

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases,

Read More

How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr

Categories: CDH Guest Search Use Case ZooKeeper

Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.

Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.

Read More