A common design pattern often emerges when teams begin to stitch together existing systems and an EDH cluster: file dumps, typically in a format like CSV, are regularly uploaded to EDH, where they are then unpacked, transformed into optimal query format, and tucked away in HDFS where various EDH components can use them. When these file dumps are large or happen very often, these simple steps can significantly slow down an ingest pipeline. Part of this delay is inevitable;
This new release adds support for Amazon EBS volumes and the ability to diagnose cluster bootstrap errors quickly.
Cloudera Director provides a simple, reliable, enterprise-grade way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud.
Cloudera Director makes it easier for customers to:
- Deploy clusters in line with patterns native to cloud infrastructure
- Use an interface to define in one place the desired cluster specification all the way down to the operating system
- Repeatedly and programmatically instantiate these cluster definitions
- Adapt to the dynamic nature of cloud infrastructure
Cloudera Director 2.2 provides additional mechanisms to get that initial cluster definition right and the ability to diagnose errors and iterate quickly.
Cloudera Enterprise 5.9 includes the latest release of Hue (3.11), the web UI that makes Apache Hadoop easier to use.
As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.9 includes a new release of Hue. Hue continues its focus on SQL and also now makes your interaction with the Cloud easier (Amazon S3 specifically in this first version). We’ll provide a summary of the main improvements in the following part of this blog post.
HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes.
In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using dfs.datanode.data.dir in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different device (for example, on separate HDD and SSD).
When writing new blocks to HDFS,
Today, Cloudera announced the availability of an Apache Spark 2.0 Beta release for users of the Cloudera platform.
- The Dataset API further enhances Spark’s claim as the best tool for data engineering by providing compile-time type safety along with the benefits of a query-optimization engine.
- The Structured Streaming API enables the modeling of streaming data as a continuous DataFrame and expresses operations on that data with a SQL-like API.