Cloudera Engineering Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
Wow, a ton of news for such a short month:
Thanks to Matthew Dixon, principal consultant at Quiota LLC and Professor of Analytics at the University of San Francisco, and Mohammad Zubair, Professor of Computer Science at Old Dominion University, for this guest post that demonstrates how to easily deploy exposure calculations on Apache Spark for in-memory analytics on scenario data.
Since the 2007 global financial crisis, financial institutions now more accurately measure the risks of over-the-counter (OTC) products. It is now standard practice for institutions to adjust derivative prices for the risk of the counter-party’s, or one’s own, default by means of credit or debit valuation adjustments (CVA/DVA).
The Kite project recently released a stable 1.0!
Providing Hadoop-as-a-Service to your internal users can be a major operational advantage.
Cloudera Director (free to download and use) is designed for easy, on-demand provisioning of Apache Hadoop clusters in Amazon Web Services (AWS) environments, with support for other cloud environments in the works. It allows for provisioning clusters in accordance with the Cloudera AWS Reference Architecture.
Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics journey.
If you are not looking at your company’s operational logs, then you are at a competitive disadvantage in your industry. Web server logs, application logs, and system logs are all valuable sources of operational intelligence, uncovering potential revenue opportunities and helping drive down the bottom line. Whether your firm is an advertising agency that analyzes clickstream logs for customer insight, or you are responsible for protecting the firm’s information assets by preventing cyber-security threats, you should strive to get the most value from your data as soon as possible.
A Hive-on-Spark beta is now available via CDH parcel. Give it a try!
The Hive-on-Spark project (HIVE-7292) is one of the most watched projects in Apache Hive history. It has attracted developers from across the ecosystem, including from organizations such as Intel, MapR, IBM, and Cloudera, and gained critical help from the Spark community.
The Cloudera HBase Team are proud to be members of Apache HBase’s model community and are currently AWOL, busy celebrating the release of the milestone Apache HBase 1.0. The following, from release manager Enis Soztutar, was published today in the ASF’s blog.
Cloudera Director 1.1 introduces new features and improvements that provide more options for creating and managing cloud deployments of Apache Hadoop. Here are details about how they work.
Cloudera Director, which was released in October of 2014, delivers production-ready, self-service interaction with Apache Hadoop clusters in cloud environments. You can find background information about Cloudera Director’s purpose and fundamental features in our earlier introductory blog post and technical overview blog post.
With Kafka now formally integrated with, and supported as part of, Cloudera Enterprise, what’s the best way to deploy and configure it?
Earlier today, Cloudera announced that, following an incubation period in Cloudera Labs, Apache Kafka is now fully integrated into Cloudera’s Big Data platform, Cloudera Enterprise (CDH + Cloudera Manager). Our customers have expressed strong interest in Kafka, and some are already running Kafka in production.
Having a good grasp of HDFS recovery processes is important when running or moving toward production-ready Apache Hadoop.
An important design requirement of HDFS is to ensure continuous and correct operations to support production deployments. One particularly complex area is ensuring correctness of writes to HDFS in the presence of network and node failures, where the lease recovery, block recovery, and pipeline recovery processes come into play. Understanding when and why these recovery processes are called, along with what they do, can help users as well as developers understand the machinations of their HDFS cluster.