Tag Archives: Oozie

How-to: Use the New Apache Oozie Database Migration Tool

Categories: How-to Oozie

Updated 11/22/16 – Important: All features below are working on CDH 5.9.0 and CM 5.9.0 and above. 

This tool makes Oozie migrations off Apache Derby (or any other supported database) easy, in addition to streamlining upgrades.

The Apache Oozie server is a stateless web application by design, with all information about running and completed workflows, coordinator jobs, and bundle jobs stored in a relational database.

Read More

Cloudera Engineering Interns Got Talent

Categories: Careers Cloudera Life Spark

As is their custom, Cloudera Engineering’s interns made innovation, especially for Apache Spark, the theme of the Summer season.

Cloudera has a long-time tradition of searching far and wide for the smartest summer engineering interns that it can find. Alumni of the program have become start-up co-founders, faculty at top-tier CS departments, employees at other prominent technology companies (including Google, Databricks, Uber, LinkedIn), as well as many current employees at Cloudera.

Read More

Checklist for Painless Upgrades to CDH 5

Categories: CDH Cloudera Manager Ops and DevOps

Following these best practices can make your upgrade path to CDH 5 relatively free of obstacles.

Upgrading the software that powers mission-critical workloads can be challenging in any circumstance. In the case of CDH, however, Cloudera Manager makes upgrades easy, and the built-in Upgrade Wizard, available with Cloudera Manager 5, further simplifies the upgrade process. The wizard performs service-specific upgrade steps that, previously, you had to run manually, and also features a rolling restart capability that reduces downtime for minor and maintenance version upgrades.

Read More

How-to: Build Re-usable Spark Programs using Spark Shell and Maven

Categories: Data Science How-to Spark

Set up your own, or even a shared, environment for doing interactive analysis of time-series data.

Although software engineering offers several methods and approaches to produce robust and reliable components, a more lightweight and flexible approach is required for data analysts—who do not build “products” per se but still need high-quality tools and components. Thus, recently, I tried to find a way to re-use existing libraries and datasets stored already in HDFS with Apache Spark.

Read More

How-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue

Categories: Data Ingestion How-to Hue Kafka Search

Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics journey.

If you are not looking at your company’s operational logs, then you are at a competitive disadvantage in your industry. Web server logs, application logs, and system logs are all valuable sources of operational intelligence,

Read More