Tag Archives: tweet

Converting Apache Avro Data to Parquet Format in Apache Hadoop

Categories: Avro Guest Hadoop Parquet

Thanks to Big Data Solutions Architect Matthieu Lieber for allowing us to republish the post below.

A customer of mine wants to take advantage of both worlds: work with his existing Apache Avro data, with all of the advantages that it confers, but take advantage of the predicate push-down features that Parquet provides. How to reconcile the two?

For more information about combining these formats,

Read more

Customer Spotlight: ISS’ Wes Caldwell Speaks at Cloudera Sessions in Denver

Categories: Events Use Case

This week’s Cloudera Sessions roadshow will make it to Denver, Colo., on Thursday, where the customer Fireside Chat will feature Intelligent Software Solutions (ISS) Chief Architect of Global Enterprise Solutions, Wes Caldwell. ISS helps many government organizations — including several within the U.S. Department of Defense — deploy next-generation data management and analytic solutions using a combination of systems integration expertise and custom-built software.

Read more

Announcing Parquet 1.0: Columnar Storage for Hadoop

Categories: Community Guest Hadoop Impala Parquet

We’re very happy to re-publish the following post from Twitter analytics infrastructure engineering manager Dmitriy Ryaboy (@squarecog).

In March we announced the Parquet project, the result of a collaboration between Twitter and Cloudera intended to create an open-source columnar storage format library for Apache Hadoop.

Today, we’re happy to tell you about a significant Parquet milestone: a 1.0 release, which includes major features and improvements made since the initial announcement.

Read more

How-to: Analyze Twitter Data with Hue

Categories: Data Science Flume Hive How-to Hue

Hue 2.2 , the open source web-based interface that makes Apache Hadoop easier to use, lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features different applications like an Apache Hive editor and Apache Oozie dashboard and workflow builder.

This post is based on our “Analyzing Twitter Data with Hadoop” sample app and details how the same results can be achieved through Hue in a simpler way.

Read more

How-To: Schedule Recurring Hadoop Jobs with Apache Oozie

Categories: Guest Hive Oozie

Our thanks to guest author Jon Natkins (@nattyice) of WibiData for the following post!

Today, many (if not most) companies have ETL or data enrichment jobs that are executed on a regular basis as data becomes available. In this scenario it is important to minimize the lag time between data being created and being ready for analysis.

CDH, Cloudera’s open-source distribution of Apache Hadoop and related projects,

Read more