Tag Archives: analytics

Scalability of Kafka Messaging using Consumer Groups

Categories: Data Ingestion Flume Kafka Use Case

Traditional messaging models fall into two categories: Shared Message Queues and Publish-Subscribe models. Both models have their own pros and cons. Neither could successfully handle big data ingestion at scale due to limitations in their design. Apache Kafka implements a publish-subscribe messaging model which provides fault tolerance, scalability to handle large volumes of streaming data for real-time analytics. It was developed at LinkedIn in 2010 to meet its growing data pipeline needs. Apache Kafka bridges the gaps that traditional messaging models failed to achieve.

Read more

Production Recommendation Systems with Cloudera

Categories: CDH Data Science

Many types of business problems boil down to making recommendations, and machine learning is the special sauce that makes these problems solvable. Machine learning for recommendations is a challenging endeavor in its own right, but it is just one part of the recommendation system, which must move, store, process, and update data, in production, across several different components. In this post we show how to use Cloudera’s distribution of open source software to build a production scale recommendation system,

Read more

A Technical Overview of Cloudera Altus Analytic DB

Categories: Altus Analytic Database Cloud

A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.

Architecture of Cloudera Altus Analytic DB

Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports the Altus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises —

Read more

New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators

Categories: CDH Cloudera Data Science Workbench Data Science Performance

Cloudera Data Science Workbench (CDSW) provides data science teams with a self-service platform for quickly developing machine learning workloads in their preferred language, with secure access to enterprise data and simple provisioning of compute. Individuals can request schedulable resources (e.g. compute, memory, GPUs) on a shared cluster that is managed centrally.

While self-service provisioning of resources is critical to the rapid interaction cycle of data scientists, it can pose a challenge to administrators.

Read more

Getting Started with Cloudera’s Cybersecurity Solution

Categories: CDH How-to Platform Security & Cybersecurity

A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer is to adopt a machine learning and analytics platform as a fundamental and durable part of their data strategy. However, many CISOs fear deployment of an initial use case will be somewhat daunting. Cloudera has partnered along with Arcadia Data and StreamSets to make it easier than ever for CISOs to take the first step and deploy basic use cases leveraging data sources common to many environments.

Read more