When most people think of Big Data, often they imagine loads of unstructured data. However, there is always some sort of structure or relationships within this data. Based on these relationships there are one or more representation schemes best suited to handle this type of data. A common pattern seen in the field is hierarchy/relationship representation. This form of representation is adept in handling scenarios like complex business models, chain of event or plans, chain of stock orders in banks,
When it comes to self-service business intelligence and exploratory analytics, Cloudera has continued to push limits and innovate to help our customers expedite this journey and get the most value from their data. Over the past year, we have made a number of significant advancements in Hue to provide a more powerful user experience for SQL developers and make them more productive for their every day self-service BI tasks and workflows.
With the recent release of Cloudera 5.12,
This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.
Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS. One of the key differences is in the level of consistency provided by the underlying filesystem.
In late 2016, Ben Lorica of O’Reilly Media declared that “2017 will be the year the data science and big data community engage with AI technologies.” Deep learning on GPUs has pervaded universities and research organizations prior to 2017, but distributed deep learning on CPUs is now beginning to gain widespread adoption in a diverse set of companies and domains. While GPUs provide top-of-the-line performance in numerical computing, CPUs are also becoming more efficient and much of today’s existing hardware already has CPU computing power available in bulk.
Cloudera Search (that is Apache Solr integrated with the Apache Hadoop eco-system) now supports (as of C5.9) a backup and disaster recovery capability for Solr collections.
In this post we will cover the basics of the backup and disaster recovery capability in Solr and hence in Cloudera Search. In the next post we will cover the design of the Solr snapshots functionality and its integration with the Hadoop ecosystem as well as public cloud platforms (e.g.