Tag Archives: Search

How-to: Use Apache Solr to Query Indexed Data for Analytics

Categories: How-to Search

Bet you didn’t know this: In some cases, Solr offers lightning-fast response times for business-style queries.

If you were to ask well informed technical people about use cases for Solr, the most likely response would be that Solr (in combination with Apache Lucene) is an open source text search engine: one can use Solr to index documents, and after indexing, these same documents can be easily searched using free-form queries in much the same way as you would query Google.

Read more

Ibis on Impala: Python at Scale for Data Science

Categories: Cloudera Labs Data Science Impala

This new Cloudera Labs project promises to deliver the great Python user experience and ecosystem at Hadoop scale.

Across the user community, you will find general agreement that the Apache Hadoop stack has progressed dramatically in just the past few years. For example, Search and Impala have moved Hadoop beyond batch processing, while developers are seeing significant productivity gains and additional use cases by transitioning from MapReduce to Apache Spark.

Thanks to such advances in the ecosystem,

Read more

New in CDH 5.4: Sensitive Data Redaction

Categories: CDH Cloudera Manager Platform Security & Cybersecurity

The best data protection strategy is to remove sensitive information from everyplace it’s not needed.

Have you ever wondered what sort of “sensitive” information might wind up in Apache Hadoop log files? For example, if you’re storing credit card numbers inside HDFS, might they ever “leak” into a log file outside of HDFS? What about SQL queries? If you have a query like select * from table where creditcard = ‘1234-5678-9012-3456’,

Read more

How-to: Install Hue on a Mac

Categories: How-to Hue

Learn how to set up Hue, the open source GUI that makes Apache Hadoop easier to use, on your Mac.

You might have already all the prerequisites installed but we are going to show how to start from a fresh Yosemite (10.10) install and end up with running Hue on your Mac in almost no time!

We are going to be using the official Quickstart VM from Cloudera that already packs all the Apache Hadoop ecosystem components your Hue will talk to.

Read more

How Testing Supports Production-Ready Security in Cloudera Search

Categories: Platform Security & Cybersecurity Search Sentry Testing

Security architecture is complex, but these testing strategies help Cloudera customers rely on production-ready results.

Among other things, good security requires user authentication and that authenticated users and services be granted access to those things (and only those things) that they’re authorized to use. Across Apache Hadoop and Apache Solr (which ships in CDH and powers Cloudera Search), authentication is accomplished using Kerberos and SPNego over HTTP and authorization is accomplished using Apache Sentry (the emerging standard for role-based fine grain access control,

Read more