At Cloudera, we’re always working to provide our customers and the Apache Spark community with the most robust, most reliable software possible. This article describes some recent engineering work on [SPARK-8425] that is available in CDH 5.10 and CDH5.11, as well as in upstream Apache Spark starting with the 2.2 release.
The work pertains to the Blacklist Tracker mechanism in Spark’s scheduler. This was the subject of a recent Spark Summit talk,
Organizations analyze logs for a variety of reasons. Some typical use cases include predicting server failures, analyzing customer behavior, and fighting cybercrime. However, one of the most overlooked use cases is to help companies write better software. In this digital age, most companies write applications, be it for its employees or external users. The cost of faulty software can be severe, ranging from customer churn to a complete firm’s demise, as was the case with Knight Capital in 2012.
This article is syndicated with permission from the Apache HBase blog and highlights a collaboration between our partners at Intel and Alibaba engineering in time for “Singles Day“, the biggest shopping day on the net. For more on HBase, mark your calendars! On June 12th, 2017 the Apache HBase community will be hosting their annual HBaseCon.
HBase is the core storage system in Alibaba’s Search Infrastructure.
Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster’s lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.
The Apache Hadoop project announced the release of 3.0.0-alpha2 on January 25th, 2017. This is the second alpha release in the 3.0.0 release series leading up to 3.0.0 GA, and incorporates 857 new fixes, improvements, and features since 3.0.0-alpha1 last September. It’s worth reading our previous blog post about 3.0.0-alpha1; in this post, we’ll discuss the new improvements that landed in alpha2.
Classpath Isolation for Hadoop Client Jars
The pain of classpath isolation has been experienced by many Java developers.