Cloudera Data Science Workbench provides freedom for data scientists. It gives them the flexibility to work with their favorite libraries using isolated environments with a container for each project.
In JVM world such as Java or Scala, using your favorite packages on a Spark cluster is easy. Each application manages preferred packages using fat JARs, and it brings independent environments with the Spark cluster. Many data scientists prefer Python to Scala for data science,
Unmodified TPC-DS-based performance benchmark show Impala’s leadership compared to a traditional analytic database (Greenplum), especially for multi-user concurrent workloads. Additionally, benchmark continues to demonstrate significant performance gap between analytic databases and SQL-on-Hadoop engines like Hive LLAP, Spark SQL, and Presto.
The past year has been one of the biggest for Apache Impala (incubating). Not only has the team continued to work on ever-growing scale and stability, but a number of key capabilities have been rolled out that further solidifies Impala as the open standard for high-performance BI and SQL analytics.
The emergence of “Big Data” has made machine learning much easier because the key burden of statistical estimation—generalizing well to new data after observing only a small amount of data—has been considerably lightened. In a typical machine learning task, the goal is to design the features to separate the factors of variation that explain the observed data. However, a major source of difficulty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we can observe.
Cloudera Director 2.4 improves support for long-running clusters by syncing with upgrades and topology changes via Cloudera Manager, and adds support for Spark 2 and Kudu. Cloudera Director along with CM and CDH5.11 adds support for Microsoft Azure Data Lake Store (ADLS), and pausing of clusters with Amazon EBS volumes.
Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice.
Cloudera Enterprise 5.11 is Now Available
Cloudera is pleased to announce that Cloudera Enterprise 5.11 is now generally available (GA). The highlights of this release include lineage support for Apache Spark, Apache Kudu security integration, embedded data discovery for self-service BI, and new cloud capabilities for Microsoft ADLS and Amazon S3.
As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):
- Core Platform and Cloud
- Amazon S3 Consistency: S3Guard ensures that operations on Amazon S3 are immediately visible to other clients,