Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.
Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.
Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.
As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,
Learn how to use OCR tools, Apache Spark, and other Apache Hadoop components to process PDF images at scale.
Optical character recognition (OCR) technologies have advanced significantly over the last 20 years. However, during that time, there has been little or no effort to marry OCR with distributed architectures such as Apache Hadoop to process large numbers of images in near-real time.
In this post, you will learn how to use standard open source tools along with Hadoop components such as Apache Spark,
Bet you didn’t know this: In some cases, Solr offers lightning-fast response times for business-style queries.
If you were to ask well informed technical people about use cases for Solr, the most likely response would be that Solr (in combination with Apache Lucene) is an open source text search engine: one can use Solr to index documents, and after indexing, these same documents can be easily searched using free-form queries in much the same way as you would query Google.
Cloudera Search combines the speed of Apache Solr with the scalability of CDH. Our newest training course covers this exciting technology in depth, from indexing to user interfaces, and is ideal for developers, analysts, and engineers who want to learn how to effectively search both structured and unstructured data at scale.
Despite being nearly 10 years old, Apache Hadoop already has an interesting history. Some of you may know that it was inspired by the Google File System and MapReduce papers,