Livy, which streamlines Spark architecture for web/mobile apps, is the newest addition to Cloudera Labs.
With respect to the impact of Apache Spark on the Apache Hadoop ecosystem, its virtual overnight adoption as the default data processing engine—and as a standard for powering advanced analytic applications—speaks for itself. But, that’s not to say that there isn’t work yet to be done, particularly in the areas of performance at scale/under multi-tenancy,
Cloudera Enterprise 5.8 is now generally available (comprising CDH 5.8, Cloudera Manager 5.8, and Cloudera Navigator 2.7).
Cloudera is excited to announce the general availability of Cloudera Enterprise 5.8! Main highlights of this release include Impala read/write support on Amazon S3, a redesigned SQL query editor GUI, the expansion of role-based access control functionality to Cloudera Search, and the GA of Cloudera Navigator Optimizer to facilitate and optimize workload migrations.
This new release includes, among other things, support for “slicing and dicing” workloads by user/application/report, workload breakdown by similar queries, and alerts for Apache Hive and Apache Impala (incubating) best practices.
Cloudera Navigator Optimizer enables database architects and database administrators (DBAs) to gain in-depth understanding of their SQL workloads running in data warehouse environments or on Apache Hadoop. Navigator Optimizer makes planning offload projects more predictable by assessing risk and reducing development costs.
The new cluster templates feature in Cloudera Manager 5.7 makes creating clusters faster and easier.
Often, after an Apache Hadoop cluster has been configured correctly, its admin will want to replicate the configuration in one or more clusters—whether for promoting a dev or staging cluster to production, or setting up a new production cluster with the same configuration as an existing one.
For Cloudera customers, until recently the process for replicating cluster configurations was manual and error-prone.
The following post (Part 2 of two parts) by Vik Paruchuri, founder of data science learning platform Dataquest, offers some detailed and instructive insight about data science workflow (regardless of the tech stack involved, but in this case, using Python). We re-publish it here for your convenience.
Before we dive into exploring the data [see Part 1 for steps relating to data preparation], we’ll want to set the context,