I met Matthew in New York City about a year ago. We sat in a private conference room and he told me the story of his pharma startup. A small group of researchers set out to solve the black-box enigma of certain kinds of vicious cancers. There are so many cancers, so their vision was […]
Cloudera services logs offer a breadth of information to assist in cluster maintenance; from assisting in security checks, auditing tasks, and validation for performance tuning and testing tasks – to name a few. However, log records generated by these services do not hold the same value for every organisation. For example Cyber teams may find […]
About this Blog Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was […]
CDP for Azure introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage. Apache Ranger provides a centralized console to manage authorization and view audits of access to […]
This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. For the curious: DDE is a pre-templeted Solr-optimized cluster deployment option in CDP, and recently […]
From a-z in 10 minutes! It is hard to believe if you have had previous experience with setting up, sizing, and deploying a distributed search engine service that this is possible. Imagine how many times IT has lost valuable time spending hours trying to understand Apache Solr application requirements and map them into how to […]
Introduction We are continuing our blog series about implementing real-time log aggregation with the help of Flink. In the first part of the series we reviewed why it is important to gather and analyze logs from long-running distributed jobs in real-time. We also looked at a fairly simple solution for storing logs in Kafka using […]
Introduction Many of us have experienced the feeling of hopelessly digging through log files on multiple servers to fix a critical production issue. We can probably all agree that this is far from ideal. Locating and searching log files is even more challenging when dealing with real-time processing applications where the debugging process itself can […]
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. We are excited to announce the immediate availability of HDPSearch 4.0. As you are aware, HDP Search offers a performant, scalable, and fault-tolerant enterprise search solution. With HDP Search 4.0, we have added […]
Cloudera Search now supports fine-grain access control via document-level security provided by Apache Sentry. In my previous blog post, you learned about index-level security in Apache Sentry (incubating) and Cloudera Search. Although index-level security is effective when the access control requirements for documents in a collection are homogenous, often administrators want to restrict access to […]