As part of the drumbeat for Spark Summit West in San Francisco (June 6-8), learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.
In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,
Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.
Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,
Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud.
There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).
Using Apache Impala (incubating) on top of Apache Kudu (incubating) has significant performance benefits
Apache Kudu (incubating) is the newest addition to the set of storage engines that integrate with the Apache Hadoop ecosystem. The promise of Kudu is to deliver high-scan performance, targeting analytical workloads, while allowing users to concurrently insert, update, and delete records. With these properties, Kudu becomes a viable alternative to existing combinations of HDFS and/or Apache HBase to achieve similar results with less complicated ETL pipelines,
Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.
As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,