Cloudera Developer Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
Our thanks to Janos Matyas, CTO and Founder of SequenceIQ, for the guest post below about his company’s use case for Morphlines (part of the Kite SDK).
SequenceIQ has an Apache Hadoop-based platform and API that consume and ingest various types of data from different sources to offer predictive analytics and actionable insights. Our datasets are structured, unstructured, log files, and communication records, and they require constant refining, cleaning, and transformation.
More good news for the ecosystem!
The HBaseCon 2014 “Features & Internals” track covers the newest developments in Apache HBase functionality.
The GA release of Cloudera Enterprise 5 signifies the evolution of the platform from a mere Apache Hadoop distribution into an enterprise data hub.
We are thrilled to announce the GA release of Cloudera Enterprise 5 (comprising CDH 5.0 and Cloudera Manager 5.0).
The conclusion to this series covers how to use scans, and considerations for choosing the Thrift or REST APIs.
In this series of how-tos, you have learned how to use Apache HBase’s Thrift interface. Part 1 covered the basics of the API, working with Thrift, and some boilerplate code for connecting to Thrift. Part 2 showed how to insert and to get multiple rows at a time. In this third and final post, you will learn how to use scans and some considerations when choosing between REST and Thrift.
Scanning with Thrift
The following post, by Sarah Cannon of Digital Reasoning, was originally published in that company’s blog. Digital Reasoning has graciously permitted us to re-publish here for your convenience.
At the beginning of each release cycle, engineers at Digital Reasoning are given time to explore the latest in Big Data technologies, examining how the frequently changing landscape might be best adapted to serve our mission. As we sat down in the early stages of planning for Synthesys 3.8 one of the biggest issues we faced involved reconciling the tradeoff between flexibility and performance. How can users quickly and easily retrieve knowledge from Synthesys without being tied to one strict data model?
The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.
As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:
HBaseCon 2014 “Operations” track reveals best practices used by some of the world’s largest production-cluster operators.
Meet David F. McCoy, one of the first to have earned the title “CCP: Data Scientist” from Cloudera University.
Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.
Find Cloudera tech talks in Amsterdam, Boston, Berlin, Sao Paulo, Singapore, Zurich, and other cities across Europe and the US during the next calendar quarter.
Below please find our regularly scheduled quarterly update about where to find tech talks by Cloudera employees – this time, for the second calendar quarter of 2014 (April through June). Note that this list will be continually curated during the period; complete logistical information may not be available yet. And remember, many of these talks are in “free” venues (no cost of entry).