Apache ZooKeeper 3.5.3-beta Has Been Released

Categories: ZooKeeper

The Apache ZooKeeper team has announced that Apache ZooKeeper release 3.5.3-beta is now available! This release is the first beta release of the 3.5 series, which cover 77 issues, thirteen of which were considered blockers. Here are some highlights:

New Feature

  • ZOOKEEPER-2719 Enable creation of TTL nodes, which are znode that’s not tied to a session and will get cleaned up automatically once expired.

Security Fixes

  • ZOOKEEPER-2014 Only admin roles should be allowed to reconfigure a cluster
  • ZOOKEEPER-2693 Prevent DOS attack on wchp/wchc four letter words (4lw)

Critical Bug Fixes

  • ZOOKEEPER-2383 Solve startup race in ZooKeeperServer
  • ZOOKEEPER-2172 Cluster crashes when reconfig a new node as a participant
  • ZOOKEEPER-2737 NettyServerCnxFactory leaks connection if exception happens while writing to a channel
  • ZOOKEEPER-2247 Zookeeper service becomes unavailable when leader fails to write transaction log
  • ZOOKEEPER-2080 Fix deadlock in dynamic reconfiguration
  • ZOOKEEPER-2687 Deadlock while shutting down the Leader server

Stability,

Read More

BigDL on CDH and Cloudera Data Science Workbench

Categories: CDH How-to Spark

Introduction

As companies strive to implement modern solutions based on deep learning frameworks, there is a need to deploy it on existing hardware infrastructure in a scalable and distributed manner comes to the fore. Recognizing this need, Cloudera’s and Intel’s Big Data Technologies engineering teams jointly detail Intel’s BigDL Apache Spark deep learning library on the latest release of Cloudera’s Data Science Workbench. This collaborative effort allows customers to build new deep learning applications with BigDL Spark Library by leveraging their existing homogeneous compute capacity of Xeon servers running Cloudera’s Enterprise without having to invest in expensive GPU farms and bringing up parallel frameworks such as TensorFlow or Caffe.

Read More

Use your favorite Python library on PySpark cluster with Cloudera Data Science Workbench

Categories: CDH Data Science How-to Spark

Cloudera Data Science Workbench provides freedom for data scientists. It gives them the flexibility to work with their favorite libraries using isolated environments with a container for each project.

In JVM world such as Java or Scala, using your favorite packages on a Spark cluster is easy. Each application manages preferred packages using fat JARs, and it brings independent environments with the Spark cluster. Many data scientists prefer Python to Scala for data science,

Read More

Apache Impala Leads Traditional Analytic Database

Categories: CDH Impala Performance

Unmodified TPC-DS-based performance benchmark show Impala’s leadership compared to a traditional analytic database (Greenplum), especially for multi-user concurrent workloads. Additionally, benchmark continues to demonstrate significant performance gap between analytic databases and SQL-on-Hadoop engines like Hive LLAP, Spark SQL, and Presto.

The past year has been one of the biggest for Apache Impala (incubating). Not only has the team continued to work on ever-growing scale and stability, but a number of key capabilities have been rolled out that further solidifies Impala as the open standard for high-performance BI and SQL analytics.

Read More

Deep Learning Frameworks on CDH and Cloudera Data Science Workbench

Categories: CDH Data Science Hadoop

The emergence of “Big Data” has made machine learning much easier because the key burden of statistical estimation—generalizing well to new data after observing only a small amount of data—has been considerably lightened. In a typical machine learning task, the goal is to design the features to separate the factors of variation that explain the observed data. However, a major source of difficulty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we can observe.

Read More