Cloudera Developer Blog

Big Data best practices, how-to's, and internals from Cloudera Engineering and the community


How-to: Process Data using Morphlines (in Kite SDK)

Our thanks to Janos Matyas, CTO and Founder of SequenceIQ, for the guest post below about his company’s use case for Morphlines (part of the Kite SDK).

SequenceIQ has an Apache Hadoop-based platform and API that consume and ingest various types of data from different sources to offer predictive analytics and actionable insights. Our datasets are structured, unstructured, log files, and communication records, and they require constant refining, cleaning, and transformation.

This Month in the Ecosystem (March 2014)

Welcome to our seventh edition of “This Month in the Ecosystem,” a digest of highlights from March 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

More good news for the ecosystem!

Sneak Preview: "Features & Internals" Track at HBaseCon 2014

The HBaseCon 2014 “Features & Internals” track covers the newest developments in Apache HBase functionality.

The HBaseCon 2014 (May 5, 2014 in San Francisco) agenda has something for everyone – particularly, developers building apps on HBase. Thanks again, Program Committee!

Cloudera Enterprise 5 is Now Generally Available!

The GA release of Cloudera Enterprise 5 signifies the evolution of the platform from a mere Apache Hadoop distribution into an enterprise data hub.

We are thrilled to announce the GA release of Cloudera Enterprise 5 (comprising CDH 5.0 and Cloudera Manager 5.0). 

How-to: Use the HBase Thrift Interface, Part 3 – Using Scans

The conclusion to this series covers how to use scans, and considerations for choosing the Thrift or REST APIs.

In this series of how-tos, you have learned how to use Apache HBase’s Thrift interface. Part 1 covered the basics of the API, working with Thrift, and some boilerplate code for connecting to Thrift. Part 2 showed how to insert and to get multiple rows at a time. In this third and final post, you will learn how to use scans and some considerations when choosing between REST and Thrift.

Scanning with Thrift

How Impala Brings Real-Time, Big Data Analytics to Digital Reasoning’s Users

The following post, by Sarah Cannon of Digital Reasoning, was originally published in that company’s blog. Digital Reasoning has graciously permitted us to re-publish here for your convenience.

At the beginning of each release cycle, engineers at Digital Reasoning are given time to explore the latest in Big Data technologies, examining how the frequently changing landscape might be best adapted to serve our mission. As we sat down in the early stages of planning for Synthesys 3.8 one of the biggest issues we faced involved reconciling the tradeoff between flexibility and performance. How can users quickly and easily retrieve knowledge from Synthesys without being tied to one strict data model?

Index-Level Security Comes to Cloudera Search

The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.

As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:

Sneak Preview: HBaseCon 2014 "Operations" Track

HBaseCon 2014 “Operations” track reveals best practices used by some of the world’s largest production-cluster operators.

The HBaseCon 2014 (May 5, 2014 in San Francisco) agenda is particularly strong in the area of operations. Thanks again, Program Committee!

Meet the Data Scientist: David F. McCoy

Meet David F. McCoy, one of the first to have earned the title “CCP: Data Scientist” from Cloudera University.

Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.

Where to Find Cloudera Tech Talks (Through June 2014)

Find Cloudera tech talks in Amsterdam, Boston, Berlin, Sao Paulo, Singapore, Zurich, and other cities across Europe and the US during the next calendar quarter.

Below please find our regularly scheduled quarterly update about where to find tech talks by Cloudera employees – this time, for the second calendar quarter of 2014 (April through June). Note that this list will be continually curated during the period; complete logistical information may not be available yet. And remember, many of these talks are in “free” venues (no cost of entry).

Newer Posts Older Posts