Apache Hive Archives | Page 4 of 7

April 23, 2020 | Business

EMR workloads + CDP = better performance and lower costs

The first thing that comes to mind when talking about synergy is how 2+2=5. Being the writer that he is, Mark Twain described it a lot more eloquently as “the bonus that is achieved when things work together harmoniously”. There is a multitude of product and business examples to illustrate the point and I particularly […]

by Wim Stoop 3 min read

April 6, 2020 | Business

Hadoop: Decade Two, Day Zero*

This blog was originally published on Medium The Data Cloud — Powered By Hadoop One key aspect of the Cloudera Data Platform (CDP), which is just beginning to be understood, is how much of a recombinant-evolution it represents, from an architectural standpoint, vis-à-vis Hadoop in its first decade. I’ve been having a blast showing CDP to […]

by Cloudera 6 min read

Apache Hadoop Apache Hive Apache Impala Apache Yarn Cloud Hortonworks Data Platform Hybrid Cloud Cloudera Data Platform Data Hub Private Cloud Public Cloud Data Science Modernize Architecture

February 14, 2020 | Technical

Benchmarking Ozone: Cloudera’s next generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance […]

by Istvan Fajth , Mukul Kumar Singh 4 min read

Apache HDFS Apache Hive Apache Ozone Apache Yarn Cloudera Data Platform Data Hub Data Warehouse Modernize Architecture Performance

December 20, 2019 | Business

SQL Analytics at Scale: Selecting the Right SQL Engine for the Right Job

We are all hungry for data. Not just more data… also new types of data so that we can best understand our products, customers, and markets. We are looking for real-time insight on the newest available data in all shapes and sizes, structured and unstructured. We want to embrace the new generation of business and […]

by Cloudera 6 min read

Apache Hive Apache Impala Apache Spark Data Warehouse Modernize Architecture

October 7, 2019 | Technical

Creating an Open Standard: Machine Learning Governance using Apache Atlas

Machine learning (ML) has become one of the most critical capabilities for modern businesses to grow and stay competitive today. From automating internal processes to optimizing the design, creation and marketing processes behind virtually every product consumed, ML models have permeated almost every aspect of our work and personal lives — and for businesses, the […]

by Cloudera , Alex Breshears 9 min read

Apache Atlas Apache HBase Apache Hive Data Science Data Science Workbench Data Engineering Enterprise AI Governance Security, Risk, & Compliance

September 22, 2019 | Technical

Shared Transactional Tables: The Foundation of Next Generation Big Data Warehousing

The next generation of big data warehousing is being built on transactional tables. Transactions, of course, enable new use cases that require updating, deleting, and merging rows of data. But more importantly, a transaction-centric design enables advanced features such as materialized views, aggressive data caching and efficient replication between warehouses which are critical for modern […]

by Cloudera , Sanjay Radia 7 min read

Apache Atlas Apache Hive Apache Impala Apache Ranger Apache Spark

August 1, 2019 | Technical

Extending Hive Replication: Transactional Tables, External Tables, and Statistics

With every release, Hive’s built-in replication is expanding its territory by improving support for different table types. In this blog post, we will discuss the recent additions i.e. replication of transactional tables (a.k.a ACID tables), external tables and statistics associated with all kinds of tables. Transactional table replication Transactional tables in Hive support ACID properties. […]

by Cloudera 7 min read

Apache Atlas Apache Hive Data backup and recovery Hortonworks Data Platform Cloudera Data Platform

June 26, 2019 | Technical

Cloudera at ACM SIGMOD/PODS 2019

The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. This year ACM SIGMOD/PODS will be held in Amsterdam, The Netherlands on June 30th – July 5th, 2019, and Cloudera will be present in the conference, contributing […]

by Cloudera , Slim Bougerra 2 min read

Apache Hive

June 10, 2019 | Technical

HDFS Erasure Coding in Production

HDFS erasure coding (EC), a major feature delivered in Apache Hadoop 3.0, is also available in CDH 6.1 for use in certain applications like Spark, Hive, and MapReduce. The development of EC has been a long collaborative effort across the wider Hadoop community. Including EC with CDH 6.1 helps customers adopt this new feature by […]

by Cloudera , Xiao Chen , Sammi Chen , Jian Zhang 15 min read

Apache Hadoop Apache Hive Apache Spark Cloudera Enterprise MapsReduce

May 7, 2019 | Technical

Partition Management in Hadoop

Guest blog post written by Adir Mashiach In this post I’ll talk about the problem of Hive tables with a lot of small partitions and files and describe my solution in details. A little background In my organization, we keep a lot of our data in HDFS. Most of it is the raw data but […]

by Cloudera 8 min read

Apache Hadoop Apache Hive

Filter By