Category Archives: CDH

Quicker Insight into Apache Solr and Collection Health

Categories: CDH Cloudera Manager How-to Search

Successful cluster administration can be very difficult without a real-time view of the state of the cluster. Solr itself does not provide aggregated views about its state or any historical usage data, which is necessary to understand how the service is used and how it is performing. Knowing the throughput and capacities not only helps detect errors and troubleshoot issues, but is also useful for capacity planning.

Questions may arise, such as:

  • What is the size of my cluster and each collection?

Read more

implyr: R Interface for Apache Impala

Categories: CDH Data Science HBase HDFS Impala Kudu Tools

New R package implyr enables R users to query Impala using dplyr.

Apache Impala (incubating) enables low-latency interactive SQL queries on data stored in HDFS, Amazon S3, Apache Kudu, and Apache HBase. With the availability of the R package implyr on CRAN and GitHub, it’s now possible to query Impala from R using the popular package dplyr.

dplyr provides a grammar of data manipulation,

Read more

Wrangle, Powered by Cloudera July 20th

Categories: CDH

We have remaining slots open for our annual Wrangle data science conference in San Francisco on July 20th at the Chapel. This is our third year and by far the best lineup we’ve had. Don’t miss the opportunity to hear from these amazing speakers about the latest innovations and challenges in data science.

Here’s our lineup for the day:

  • Drew Conway, Aluvium: “What Would A CIA Data Scientist Do?”
  • Tyler Schnoenbelen,

Read more

What’s New in Cloudera Director 2.5?

Categories: CDH Cloud Cloudera Manager

Cloudera Director 2.5 brings cluster auto-repair functionality and improved support for AWS Spot instances. Support for Cloudera Manager’s external account feature has been added along with S3Guard support.

Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple,

Read more