Tag Archives: devops

How-to: Prepare Your Apache Hadoop Cluster for PySpark Jobs

Categories: CDH Hadoop How-to Spark

Proper configuration of your Python environment is a critical pre-condition for using Apache Spark’s Python API.

One of the most enticing aspects of Apache Spark for data scientists is the API it provides in non-JVM languages for Python (via PySpark) and for R (via SparkR). There are a few reasons that these language bindings have generated a lot of excitement: Most data scientists think writing Java or Scala is a drag,

Read more

Doing DevOps with Cloudera Manager

Categories: Cloudera Manager General Ops and DevOps

More and more customers are using automation/configuration management frameworks alongside Cloudera Manager.

As Apache Hadoop clusters continue to grow in size, complexity, and business importance as the foundational infrastructure for an Enterprise Data Hub, the use cases for a robust and mature management console expand. 

Dev Ops

As those clusters become larger and more complex, many operators are looking to use configuration management/automation frameworks like Ansible,

Read more

How-to: Install Cloudera Manager and Cloudera Search with Ansible

Categories: Cloudera Manager Guest Ops and DevOps Search

The following guest post is re-published here courtesy of Gerd König, a System Engineer with YMC AG. Thanks, Gerd!

Cloudera Manager is a great tool to orchestrate your CDH-based Apache Hadoop cluster. You can use it from cluster installation, deploying configurations, restarting daemons to monitoring each cluster component. Starting with version 4.6, the manager supports the integration of Cloudera Search, which is currently in Beta state.

Read more

How-to: Deploy Hadoop Clusters Automatically with Dell Crowbar and Cloudera Manager

Categories: Cloudera Manager General Guest

The following guest post, from Mike Pittaro of Dell’s Cloud Software Solutions team, describes his team’s use of the Dell Crowbar tool in conjunction with the Cloudera Manager API to automate cluster provisioning. Thanks, Mike!

Deploying, managing, and operating Apache Hadoop clusters can be complex at all levels of the stack, from the hardware on up. To hide this complexity and reduce deployment time, since 2011, Dell has been using Dell Crowbar in conjunction with Cloudera Manager to deploy the Dell | Cloudera Solution for Apache Hadoop for joint customers.

Read more

How-to: Use Vagrant to Set Up a Virtual Hadoop Cluster (updated for CDH 5)

Categories: CDH Cloudera Manager Guest Ops and DevOps

This guest post, which is now updated for CDH 5, comes to us from David Greco.

Vagrant is a very nice tool for programmatically managing many virtual machines (VMs) on a single physical machine. It natively supports VirtualBox and also provides plugins for VMware Fusion and Amazon EC2, supporting the management of VMs in those environments as well.

Vagrant provides a very easy-to-use, Ruby-based internal DSL that allows the user to define one or more virtual machines together with their configuration parameters.

Read more