Category Archives: Kite SDK

How-to: Ingest Data Quickly Using the Kite CLI

Categories: Guest How-to Kite SDK

Thanks to Ben Harden of CapTech for allowing us to re-publish the post below.

Getting delimited flat file data ingested into Apache Hadoop and ready for use is a tedious task, especially when you want to take advantage of file compression, partitioning and performance gains you get from using the Avro and Parquet file formats. 

In general, you have to go through the following steps to move data from a local file system to HDFS.

Read more

How-to: Write Apache Hadoop Applications on OpenShift with Kite SDK

Categories: Cloud Hadoop How-to Kite SDK

The combination of OpenShift and Kite SDK turns out to be an effective one for developing and testing Apache Hadoop applications.

At Cloudera, our engineers develop a variety of applications on top of Hadoop to solve our own data needs (here and here). More recently, we’ve started to look at streamlining our development process by using a PaaS (Platform-as-a-Service) for some of these applications. Having single-click deployment and updates to consistent development environments lets us onboard new developers more quickly,

Read more

How-to: Use Kite SDK to Easily Store and Configure Data in Apache Hadoop

Categories: HBase HDFS How-to Kite SDK

Organizing your data inside Hadoop doesn’t have to be hard — Kite SDK helps you try out new data configurations quickly in either HDFS or HBase.

Kite SDK is a Cloudera-sponsored open source project that makes it easier for you to build applications on top of Apache Hadoop. Its premise is that you shouldn’t need to know how Hadoop works to build your application on it, even though that’s an unfortunately common requirement today (because the Hadoop APIs are low-level;

Read more

New Training: Design and Build Big Data Applications

Categories: Kite SDK Training

Cloudera’s new “Designing and Building Big Data Applications” is a great springboard for writing apps for an enterprise data hub.

Cloudera’s vision of an enterprise data hub as a central, scalable repository for all your data is changing the notion of data warehousing. The best way to gain value from all of your data is by bringing more workloads to where the data lives. That place is Apache Hadoop.

For engineers,

Read more