Cloudera’s new “Designing and Building Big Data Applications” is a great springboard for writing apps for an enterprise data hub.
Cloudera’s vision of an enterprise data hub as a central, scalable repository for all your data is changing the notion of data warehousing. The best way to gain value from all of your data is by bringing more workloads to where the data lives. That place is Apache Hadoop.
For engineers, this means changes to code and data sources. It means learning how to access your data with the best tool for the job. Finally, it means learning how to drive business processes and enable analysis on much larger data sets now that you can access all of it at once.
We are pleased to announce Cloudera University’s newest training course, Designing and Building Big Data Applications. In this class, you will gain experience developing converged applications with the various components of an enterprise data hub. You’ll create end-to-end solutions that address the full data lifecycle: acquiring diverse data sets, processing them with a choice of tools, and presenting the results to users through an easy-to-use web interface.
Code, Customize, Converge, Compile
Like in any large enterprise, your data sources and formats vary widely. The hands-on exercises in the Big Data Applications course have data coming in from web servers, network services, databases, and files. The curriculum replicates scenarios we see among our customers, particularly working with multiple data formats at a time, including HTML, JSON, XML, fixed-width data, and text. You will gain real experience building a data ingestion pipeline using Apache Flume, store and stage massive multi-format data with HDFS, and create data products that leverage the analytical capabilities of Apache Crunch and your own customized user-defined functions for Apache Hive and Cloudera Impala, all resulting in the core processing stages of a Big Data solution scalable in production to many business users.
To perform these jobs, we also include Apache Avro, the Kite SDK, Cloudera Search, Morphlines, Apache Oozie, and Hue in our Big Data toolkit. You will learn how to choose the right tool for the right job:
- Dealing with XML? You’ll use the Kite SDK to transform XML to Avro format, which offers significantly better performance and compatibility with the range of Hadoop ecosystem tools.
- Need to search vast amounts of HTML documents? You’ll use Morphlines to extract the relevant data and index it with Search without writing a line of code.
- Need to create a complex Hadoop workflow that gets repeated on a schedule? You’ll use Hue and Oozie to easily create repeatable processes with data at petabyte scale.
Engineer a Big Data Solution
Over the course of just four days, class participants will have access to and work with:
- 26 exercises and bonus exercises
- 20 Eclipse projects
- 27,000 lines of sample solution code
Want to be part of the future and work towards the Big Data engineering skills that are defining the most successful information-driven enterprises in the world? Find a Designing and Building Big Data Applications session near you, or request a private training engagement for your entire team at your location.
Jesse Anderson is an instructor and curriculum designer for Cloudera University.