Tag Archives: Hive

Cloudera Engineering Interns Got Talent

Categories: Careers Cloudera Life Spark

As is their custom, Cloudera Engineering’s interns made innovation, especially for Apache Spark, the theme of the Summer season.

Cloudera has a long-time tradition of searching far and wide for the smartest summer engineering interns that it can find. Alumni of the program have become start-up co-founders, faculty at top-tier CS departments, employees at other prominent technology companies (including Google, Databricks, Uber, LinkedIn), as well as many current employees at Cloudera.

Read More

How-to: Run Apache Mesos on CDH

Categories: CDH Cloudera Manager Guest Ops and DevOps

Big Industries, Cloudera systems integration and reseller partner for Belgium and Luxembourg, has developed an integration of Apache Mesos and CDH that can be deployed and managed through Cloudera Manager. In this post, Big Industries’ Rob Gibbon explains the benefits of deploying Mesos on your cluster and walks you through the process of setting it up.

[Editor’s Note: Mesos integration is not currently supported by Cloudera, thus the setup described below is not recommended for production use.]

Apache Mesos is a distributed,

Read More

Designing Fraud-Detection Architecture That Works Like Your Brain Does

Categories: Flume HBase Kafka Spark Use Case

To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).

At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,

Read More

How-to: Install Apache Zeppelin on CDH

Categories: General Guest How-to Spark

Our thanks to Karthik Vadla and Abhi Basu, Big Data Solutions engineers at Intel, for permission to re-publish the following (which was originally available here).

Data science is not a new discipline. However, with the growth of big data and adoption of big data technologies, the request for better quality data has grown exponentially. Today data science is applied to every facet of life—product validation through fault prediction,

Read More

Getting Started with Ibis and How to Contribute

Categories: Cloudera Labs Impala

Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.

We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,

Read More