The Kite project recently released a stable 1.0!
This milestone means that Kite’s data API and command-line tools is ready for long-term use.
The 1.0 data modules and API are no longer rapidly changing. From 1.0 on, Kite will be strict about breaking compatibility and will use semantic versioning to signal what compatibility guarantees you can expect from a given release.
Thanks to Ben Harden of CapTech for allowing us to re-publish the post below.
Getting delimited flat file data ingested into Apache Hadoop and ready for use is a tedious task, especially when you want to take advantage of file compression, partitioning and performance gains you get from using the Avro and Parquet file formats.
In general, you have to go through the following steps to move data from a local file system to HDFS.
The combination of OpenShift and Kite SDK turns out to be an effective one for developing and testing Apache Hadoop applications.
At Cloudera, our engineers develop a variety of applications on top of Hadoop to solve our own data needs (here and here). More recently, we’ve started to look at streamlining our development process by using a PaaS (Platform-as-a-Service) for some of these applications. Having single-click deployment and updates to consistent development environments lets us onboard new developers more quickly,
Kite SDK’s new release contains new improvements that make working with data easier.
Recently, Kite SDK, the open source toolset that helps developers build systems on the Apache Hadoop ecosystem, became a 0.15.0. In this post, you’ll get an overview of several new features and bug fixes.
Working with Datasets by URI
The new Datasets class lets you work with datasets based on individual dataset URIs.
Organizing your data inside Hadoop doesn’t have to be hard — Kite SDK helps you try out new data configurations quickly in either HDFS or HBase.
Kite SDK is a Cloudera-sponsored open source project that makes it easier for you to build applications on top of Apache Hadoop. Its premise is that you shouldn’t need to know how Hadoop works to build your application on it, even though that’s an unfortunately common requirement today (because the Hadoop APIs are low-level;