Category Archives: How-to

How-to: Configure SAP HANA with Apache Impala (incubating)

Categories: How-to Impala

Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.

Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).

Read More

How-to: Use Impala and Kudu Together for Analytic Workloads

Categories: Data Science Hadoop How-to Impala Kudu Performance

Using Apache Impala (incubating) on top of Apache Kudu (incubating) has significant performance benefits

Apache Kudu (incubating) is the newest addition to the set of storage engines that integrate with the Apache Hadoop ecosystem. The promise of Kudu is to deliver high-scan performance, targeting analytical workloads, while allowing users to concurrently insert, update, and delete records. With these properties, Kudu becomes a viable alternative to existing combinations of HDFS and/or Apache HBase to achieve similar results with less complicated ETL pipelines,

Read More

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and Cloudera Search

Categories: Cloudera Manager Guest How-to Hue Kafka Search

Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.

As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,

Read More

How-to: Install Cloudera Enterprise on Microsoft Azure (Part 1)

Categories: Cloud Guest How-to

Recently, GoDataDriven installed a Cloudera Enterprise (CDH + Cloudera Manager) cluster on Microsoft Azure. This two-part series, written by Alexander Bij and Tünde Alkemade and republished with permission, includes information about use case, design, and installation.

Processing large amounts of unstructured data requires serious computing power and also maintenance effort. As load on computing power typically fluctuates due to time and seasonal influences and/or processes running on certain times,

Read More