As a warm-up to Spark Summit West in San Francisco (June 6-8), we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier.
Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for parallel processing, expressive transformations, fault tolerance, and exactly-once processing. But it requires a programmer to write code,
This new guide steps you through the process of configuring that platform to manage client connections to Impala for HA in business-critical BI applications.
For production deployments of Apache Impala (incubating), using a load-balancing proxy server has several advantages:
- Applications connect to a single well-known host and port, rather than keeping track of the hosts where the impalad daemon is running.
- If any host running the impalad daemon becomes unavailable,
Cloudera considers the handling and reporting of security vulnerabilities a very serious matter. In this post, learn the processes involved.
In addition to expecting enterprise-class standards for stability and reliability, Cloudera’s customers also have expectations for industry-standard processes around the discovery, fix, and reporting of security issues. In this post, I will describe how Cloudera addresses such issues in our software.
An overview of the process looks like this flowchart:
The first step in the life cycle of a security vulnerability is that it is discovered and reported to Cloudera.
For Cloudera, Apache HBase has grown into a stable, scalable, mature, and critical component of the Apache Hadoop stack.
HBase adds the ability to do low-latency random read/write across your big data. While it is a key piece of the Apache Hadoop ecosystem, HBase itself has an ecosystem of projects and products that use it as a storage engine for systems such as time series database (OpenTSDB), or SQL-style databases (Apache Phoenix,
Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.
Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.