New Training: Design and Build Big Data Applications

Cloudera’s new “Designing and Building Big Data Applications” is a great springboard for writing apps for an enterprise data hub.

Cloudera’s vision of an enterprise data hub as a central, scalable repository for all your data is changing the notion of data warehousing. The best way to gain value from all of your data is by bringing more workloads to where the data lives. That place is Apache Hadoop.

For engineers, this means changes to code and data sources. It means learning how to access your data with the best tool for the job. Finally, it means learning how to drive business processes and enable analysis on much larger data sets now that you can access all of it at once.

We are pleased to announce Cloudera University’s newest training course, Designing and Building Big Data Applications. In this class, you will gain experience developing converged applications with the various components of an enterprise data hub. You’ll create end-to-end solutions that address the full data lifecycle: acquiring diverse data sets, processing them with a choice of tools, and presenting the results to users through an easy-to-use web interface.

Code, Customize, Converge, Compile

Like in any large enterprise, your data sources and formats vary widely. The hands-on exercises in the Big Data Applications course have data coming in from web servers, network services, databases, and files. The curriculum replicates scenarios we see among our customers, particularly working with multiple data formats at a time, including HTML, JSON, XML, fixed-width data, and text. You will gain real experience building a data ingestion pipeline using Apache Flume, store and stage massive multi-format data with HDFS, and create data products that leverage the analytical capabilities of Apache Crunch and your own customized user-defined functions for Apache Hive and Cloudera Impala, all resulting in the core processing stages of a Big Data solution scalable in production to many business users.

To perform these jobs, we also include Apache Avro, the Kite SDK, Cloudera Search, Morphlines, Apache Oozie, and Hue in our Big Data toolkit. You will learn how to choose the right tool for the right job:

  • Dealing with XML? You’ll use the Kite SDK to transform XML to Avro format, which offers significantly better performance and compatibility with the range of Hadoop ecosystem tools.
  • Need to search vast amounts of HTML documents? You’ll use Morphlines to extract the relevant data and index it with Search without writing a line of code.
  • Need to create a complex Hadoop workflow that gets repeated on a schedule? You’ll use Hue and Oozie to easily create repeatable processes with data at petabyte scale.

Data is of very little use if you can’t gain value from it. The last exercise in the class takes the various data products from the previous exercises and displays them in a web application. This user interface (see example below) combines the results from interactive queries on Impala with those from the Cloudera Search REST interface to power a dashboard that visualizes key business data. The web application uses a JavaScript charting library to create charts with data served by Impala to gives us an active view into and real-time feedback about all areas of the business.

Engineer a Big Data Solution

In order to be able to perform the hands-on exercises and get the most from Designing and Building Big Data Applications, participants should have an intermediate-level Java knowledge and recent experience, as programming makes up the core of the curriculum. The course assumes you have previous Hadoop knowledge and does not cover the basics of MapReduce or HDFS. However, HTML, JSP, JavaScript, and web application experience are not required. Knowledge of SQL or HiveQL is helpful, but not required. If you have practical experience writing and executing MapReduce jobs in Java, then this class is for you. If not, we highly recommend you begin with the excellent Cloudera Developer Training for Apache Hadoop, our most popular class.

Over the course of just four days, class participants will have access to and work with:

  • 26 exercises and bonus exercises
  • 20 Eclipse projects
  • 27,000 lines of sample solution code

Want to be part of the future and work towards the Big Data engineering skills that are defining the most successful information-driven enterprises in the world? Find a Designing and Building Big Data Applications session near you, or request a private training engagement for your entire team at your location.

Jesse Anderson is an instructor and curriculum designer for Cloudera University.

Filed under:

No Responses

Leave a comment


9 + = ten