Category Archives: Cloudera Manager

Check Out Those New and Improved Cloudera Docs

Categories: CDH Cloudera Manager General

Cloudera has given its documentation set a facelift, and we think you’ll like the new look. We use more whitespace and a font that is easier to read and skim, and your pages load much faster. But the improvements go beyond the merely aesthetic.

While electronic documentation has been around for decades, most online documentation is still presented as if it were printed in books. There is a table of contents that assumes you will read the content from start to finish.

Read More

Cloudera Enterprise 5.7 is Released

Categories: CDH Cloudera Manager Cloudera Navigator Hive Spark

Cloudera Enterprise 5.7 is now generally available (comprising CDH 5.7, Cloudera Manager 5.7, and Cloudera Navigator 2.6).

Cloudera is excited to announce the general availability of Cloudera Enterprise 5.7! Main highlights of this release include production-ready Hive-on-Spark functionality, which will help users accelerate their use of Apache Spark as a data processing standard; 4x performance gains for Apache Impala (incubating); easier cluster configuration and utilization reporting; and end-to-end encryption for Apache Spark data.

Read More

The Cloudera Developer Program: The Low-cost, Low-risk Way to Develop on Cloudera

Categories: CDH Cloudera Manager General

The Cloudera Developer Program is kind of amazing. Here’s why.

For those with a desire to build new applications on Cloudera’s platform, historically there’s been a gap to cross between pure bootstrapping on CDH (whether via a small on-premise cluster, in the public cloud, or using Cloudera Live) and obtaining full-blown support for a complete enterprise data hub with all the fixings (including Cloudera cloudera-developer-programNavigator). For individuals who have moved beyond self-learning and are getting “serious,”

Read More

Making Python on Apache Hadoop Easier with Anaconda and CDH

Categories: CDH Cloudera Manager Data Science Spark

Enabling Python development on CDH clusters (for PySpark, for example) is now much easier thanks to new integration with Continuum Analytics’ Python platform (Anaconda).

Python has become an increasingly popular tool for data analysis, including data processing, feature engineering, machine learning, and visualization. Data scientists and data engineers enjoy Python’s rich numerical and analytical libraries—such as NumPy, pandas, and scikit-learn—and have long wanted to apply them to large datasets stored in Apache Hadoop clusters.

Read More

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and Cloudera Search

Categories: Cloudera Manager Guest How-to Hue Kafka Search

Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.

As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,

Read More