Tag Archives: Apache HBase

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

Categories: Hadoop HBase HDFS Impala Kudu Performance Spark

This new open source complement to HDFS and Apache HBase is designed to fill gaps in Hadoop’s storage layer that have given rise to stitched-together, hybrid architectures.

The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster,

Read More

RecordService: For Fine-Grained Security Enforcement Across the Hadoop Ecosystem

Categories: Hadoop Impala Platform Security & Cybersecurity Sentry

This new core security layer provides a unified data access path for all Hadoop ecosystem components, while improving performance.

We’re thrilled to announce the beta availability of RecordService, a distributed, scalable, data access service for unified access control and enforcement in Apache Hadoop. RecordService is Apache Licensed open source that we intend to transition to the Apache Software Foundation. In this post, we’ll explain the motivation, system architecture,

Read More

YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

Read More

Cloudera Engineering Interns Got Talent

Categories: Careers Cloudera Life Spark

As is their custom, Cloudera Engineering’s interns made innovation, especially for Apache Spark, the theme of the Summer season.

Cloudera has a long-time tradition of searching far and wide for the smartest summer engineering interns that it can find. Alumni of the program have become start-up co-founders, faculty at top-tier CS departments, employees at other prominent technology companies (including Google, Databricks, Uber, LinkedIn), as well as many current employees at Cloudera.

Read More

Inside Santander’s Near Real-Time Data Ingest Architecture

Categories: Flume HBase Kafka

Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK.

Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store a transaction within a few seconds of a card purchase taking place. The system receives the bank’s retail customer card transactions and calculates the associated trend information aggregated by account holder and over a number of dimensions and taxonomies.

Read More