At Cloudera, we’re known for making innovative technological solutions that drive change and impact the world. Our mission is to make data and analytics easy and accessible to everyone. And that doesn’t end with our customer base. We also aim to provide equitable access to career opportunities within data and analytics to the workforce of […]
Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless […]
Introduction Ozone is an Apache Software Foundation project to build a distributed storage platform that caters to the demanding performance needs of analytical workloads, content distribution, and object storage use cases. The Ozone Manager is a critical component of Ozone. It is a replicated, highly-available service that is responsible for managing the metadata for all […]
Introduction For more than a decade now, the Hive table format has been a ubiquitous presence in the big data ecosystem, managing petabytes of data with remarkable efficiency and scale. But as the data volumes, data variety, and data usage grows, users face many challenges when using Hive tables because of its antiquated directory-based table […]
Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache […]
Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes […]
The recent failures of regional banks in the US, such as Silicon Valley Bank (SVB), Silvergate, Signature, and First Republic, were caused by multiple factors. To ensure the stability of the US financial system, the implementation of advanced liquidity risk models and stress testing using (MI/AI) could potentially serve as a protective measure. Technology alone […]