Cloudera Engineering Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
Applications using HDFS, such as Impala, will be able to read data up to 59x faster thanks to this new feature.
Server memory capacity and bandwidth have increased dramatically over the last few years. Beefier servers make in-memory computation quite attractive, since a lot of interesting data sets can fit into cluster memory, and memory is orders of magnitude faster than disk.
Cloudera Community forums are proving their value as an important contributor to a rich user experience.
It’s been almost exactly one year since the debut of the Cloudera Community forums. In addition to doing the birthday shout-out, I thought it would be interesting to bring you up to date about adoption and usage patterns.
Meet Sravya Tirukkovalur (@sravsatuluri), a Software Engineer working on Apache Hadoop security at Cloudera.
What do you do at Cloudera, and in which Apache projects are you involved?
An improved Search app in Hue 3.6 makes the Hadoop user experience even better.
Hue 3.6 (now packaged in CDH 5.1) has brought the second version of the Search App up to even higher standards. The user experience has been greatly improved, as the app now provides a very easy way to build custom dashboards and visualizations.
Kite SDK’s new release contains new improvements that make working with data easier.
Recently, Kite SDK, the open source toolset that helps developers build systems on the Apache Hadoop ecosystem, became a 0.15.0. In this post, you’ll get an overview of several new features and bug fixes.
Working with Datasets by URI
Spark 1.0 reflects a lot of hard work from a very diverse community.
Cloudera’s latest platform release, CDH 5.1, includes Apache Spark 1.0, a milestone release for the Spark project that locks down APIs for Spark’s core functionality. The release reflects the work of hundreds of contributors (including our own Diana Carroll, Mark Grover, Ted Malaska, Colin McCabe, Sean Owen, Hari Shreedharan, Marcelo Vanzin, and me).
With this new release, setting up a separate MIT KDC for cluster authentication services is no longer necessary.
Kerberos (initially developed by MIT in the 1980s) has been adopted by every major component of the Apache Hadoop ecosystem. Consequently, Kerberos has become an integral part of the security infrastructure for the enterprise data hub (EDH).
Cloudera Search now supports fine-grain access control via document-level security provided by Apache Sentry.
In my previous blog post, you learned about index-level security in Apache Sentry (incubating) and Cloudera Search. Although index-level security is effective when the access control requirements for documents in a collection are homogenous, often administrators want to restrict access to certain subsets of documents in a collection.
While the new Spark Developer training from Cloudera University is valuable for developers who are new to Big Data, it’s also a great call for MapReduce veterans.
When I set out to learn Apache Spark (which ships inside Cloudera’s open source platform) about six months ago, I started where many other people do: by following the various online tutorials available from UC Berkeley’s AMPLab, the creators of Spark. I quickly developed an appreciation for the elegant, easy-to-use API and super-fast results, and was eager to learn more.