Traditional messaging models fall into two categories: Shared Message Queues and Publish-Subscribe models. Both models have their own pros and cons. Neither could successfully handle big data ingestion at scale due to limitations in their design. Apache Kafka implements a publish-subscribe messaging model which provides fault tolerance, scalability to handle large volumes of streaming data for real-time analytics. It was developed at LinkedIn in 2010 to meet its growing data pipeline needs. Apache Kafka bridges the gaps that traditional messaging models failed to achieve.
Many types of business problems boil down to making recommendations, and machine learning is the special sauce that makes these problems solvable. Machine learning for recommendations is a challenging endeavor in its own right, but it is just one part of the recommendation system, which must move, store, process, and update data, in production, across several different components. In this post we show how to use Cloudera’s distribution of open source software to build a production scale recommendation system,
A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.
Architecture of Cloudera Altus Analytic DB
Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports the Altus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises —
Cloudera Data Science Workbench (CDSW) provides data science teams with a self-service platform for quickly developing machine learning workloads in their preferred language, with secure access to enterprise data and simple provisioning of compute. Individuals can request schedulable resources (e.g. compute, memory, GPUs) on a shared cluster that is managed centrally.
While self-service provisioning of resources is critical to the rapid interaction cycle of data scientists, it can pose a challenge to administrators.
A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer is to adopt a machine learning and analytics platform as a fundamental and durable part of their data strategy. However, many CISOs fear deployment of an initial use case will be somewhat daunting. Cloudera has partnered along with Arcadia Data and StreamSets to make it easier than ever for CISOs to take the first step and deploy basic use cases leveraging data sources common to many environments.