This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. 1. Motivation The HiveWarehouseConnector (HWC) is an open-source library which provides new interoperability capabilities between Hive and Spark. In practice, Hive and Spark are often leveraged together by companies to provide a scalable […]
Background Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Organizations commonly use a plethora of data storage and processing systems today. These different systems offer cost-effective performance for their respective use cases. Besides traditional RDBMSs such as Oracle DB, Teradata, or PostgreSQL, […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Our last few blogs as part of the Kafka Analytics blog series focused on the addition of Kafka Streams to HDP and HDF and how to build, secure, monitor Kafka Streams apps / […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Special thanks to Bill Preachuk and Brandon Wilson for reviewing and providing their expertise Introduction Columnar storage is an often-discussed topic in the big data processing and storage world today – there are […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Earlier we talked about reasons for integrating Druid and Hive in a THREE-PART SERIES (Part 1, Part 2 , Part 3) OF DOING ULTRA FAST OLAP ANALYTICS WITH APACHE HIVE AND DRUID. Since […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Guest Author: Greg Kincade, MBA, is an electrical engineer and Sr. Ecosystem Enablement Program Manager for the Micron Storage Solutions Center. We used to build data lakes. Now we fill data oceans. As […]
Enterprises are increasingly moving portions or entire datacenters to the cloud in order to minimize their physical footprint, minimize operational overhead, and shorten their infrastructure acquisition cycles. An incidental benefit is that cloud services, like cloud-based object storage, bring a new set of tools to a Hadoop architect. At Hortonworks, our customers use a number […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Two weeks ago, we announced the GA of HDF 3.1, and to share more details about this milestone release we started the HDF 3.1 Blog Series. In this installment of the series, we’ll […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. Overview As more and more workloads are being brought onto modern hardware in the cloud, it’s important for us to understand how to pick the best databases that can leverage the best hardware. […]