Thanks to former Cloudera intern Jose Cambronero for the post below about his summer project, which involved contributions to MLlib in Apache Spark.
Data can come in many shapes and forms, and can be described in many ways. Statistics like the mean and standard deviation of a sample provide descriptions of some of its important qualities. Less commonly used statistics such as skewness and kurtosis provide additional perspective into the data’s profile.
Learn how to build an Impala table around data that comes from non-Impala, or even non-SQL, sources.
As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you might encounter situations where you have data files (particularly in Apache Parquet format) where you do not know the precise table definition. This tutorial shows how you can build an Impala table around data that comes from non-Impala or even non-SQL sources,
The following post from Julien Le Dem, a tech lead at Twitter, originally appeared in the Twitter Engineering Blog. We bring it to you here for your convenience.
ASF, the Apache Software Foundation, recently announced the graduation of Apache Parquet, a columnar storage format for the Apache Hadoop ecosystem. At Twitter, we’re excited to be a founding member of the project.
Apache Parquet is built to work across programming languages,
Security architecture is complex, but these testing strategies help Cloudera customers rely on production-ready results.
Among other things, good security requires user authentication and that authenticated users and services be granted access to those things (and only those things) that they’re authorized to use. Across Apache Hadoop and Apache Solr (which ships in CDH and powers Cloudera Search), authentication is accomplished using Kerberos and SPNego over HTTP and authorization is accomplished using Apache Sentry (the emerging standard for role-based fine grain access control,
Welcome to our 17th edition of “This Month in the Ecosystem,” a digest of highlights from February 2015 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).
Wow, a ton of news for such a short month:
- 4,500 people converged for the first Strata + Hadoop World San Jose, ever (with a special appearance by Barack Obama,