The world is awash with data, no more so than in the telecommunications (telco) industry. With some Cloudera customers ingesting multiple petabytes of data every single day— that’s multiple thousands of terabytes!—there is the potential to understand, in great detail, how people, businesses, cities and ecosystems function. This information is essential for the management of […]
Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless […]
Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache […]
What is a dead letter queue (DLQ)? Cloudera SQL Stream builder gives non-technical users the power of a unified stream processing engine so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to […]
With a microservices architecture, an application is built as independent service components. The hardest part about them is data: These services often need to propagate data and events amongst each other. To provide reliability it is important to have well defined consistency models.
Cloudera DataFlow Designer empowers developers to create data flows for all their data distribution requirements using a visual, no-code interface that supports agile development.