Tag Archives: apache hive

Getting Started with Ibis and How to Contribute

Categories: Cloudera Labs Impala

Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.

We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,

Read more

New in CDH 5.4: Sensitive Data Redaction

Categories: CDH Cloudera Manager Platform Security & Cybersecurity

The best data protection strategy is to remove sensitive information from everyplace it’s not needed.

Have you ever wondered what sort of “sensitive” information might wind up in Apache Hadoop log files? For example, if you’re storing credit card numbers inside HDFS, might they ever “leak” into a log file outside of HDFS? What about SQL queries? If you have a query like select * from table where creditcard = ‘1234-5678-9012-3456’,

Read more

Security, Hive-on-Spark, and Other Improvements in Apache Hive 1.2.0

Categories: Community Hive Spark

Apache Hive 1.2.0, although not a major release, contains significant improvements.

Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago, we covered the Apache Hive 1.0.0 release and explained how it was renamed from 0.14.1 with only minor feature additions since 0.14.0.

Shortly thereafter, Apache Hive 1.1.0 was released (renamed from Apache Hive 0.15.0), which included more significant features—including Hive-on-Spark.

Read more

Graduating Apache Parquet

Categories: Guest Parquet

The following post from Julien Le Dem, a tech lead at Twitter, originally appeared in the Twitter Engineering Blog. We bring it to you here for your convenience.

ASF, the Apache Software Foundation, recently announced the graduation of Apache Parquet, a columnar storage format for the Apache Hadoop ecosystem. At Twitter, we’re excited to be a founding member of the project.

Apache Parquet is built to work across programming languages,

Read more

How-to: Read FIX Messages Using Apache Hive and Impala

Categories: Hadoop Hive How-to Impala

Learn how to read FIX message files directly with Hive, create a view to simplify user queries, and use a flattened Apache Parquet table to enable fast user queries with Impala.

The Financial Information eXchange (FIX) protocol is used widely by the financial services industry to communicate various trading-related activities. Each FIX message is a record that represents an action by a financial party, such as a new order or an execution report.

Read more