The world is awash with data, no more so than in the telecommunications (telco) industry. With some Cloudera customers ingesting multiple petabytes of data every single day— that’s multiple thousands of terabytes!—there is the potential to understand, in great detail, how people, businesses, cities and ecosystems function. This information is essential for the management of […]
Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes […]
Data has become an essential driver for new monetization initiatives in the financial services industry. With the vast amount of data collected from customers, transactions, and market movements, among other sources, this abundance offers tremendous potential for financial institutions to extract valuable insights that can inform business decisions, improve customer service, and create new revenue […]
With a microservices architecture, an application is built as independent service components. The hardest part about them is data: These services often need to propagate data and events amongst each other. To provide reliability it is important to have well defined consistency models.
In this blog post we describe how to use the REST API of Apache Solr in CDP Public Cloud directly or through Apache Knox Gateway. We show indicative performance measurements, demonstrating the performance improvement one can achieve by reusing cookies set by the Knox Gateway.
Inevitably in any production deployment, the number of kafka nodes required to maintain cluster changes. Balancing performance and cloud costs requires that administrators scale up/scale down accordingly.