When most people think of Big Data, often they imagine loads of unstructured data. However, there is always some sort of structure or relationships within this data. Based on these relationships there are one or more representation schemes best suited to handle this type of data. A common pattern seen in the field is hierarchy/relationship representation. This form of representation is adept in handling scenarios like complex business models, chain of event or plans, chain of stock orders in banks,
Successful cluster administration can be very difficult without a real-time view of the state of the cluster. Solr itself does not provide aggregated views about its state or any historical usage data, which is necessary to understand how the service is used and how it is performing. Knowing the throughput and capacities not only helps detect errors and troubleshoot issues, but is also useful for capacity planning.
Questions may arise, such as:
- What is the size of my cluster and each collection?
Learn how to use Cloudera to spin up Apache Hadoop clusters across multiple cloud providers to take advantage of competing prices and avoid infrastructure lock-in.
Why is a multi-cloud strategy important?
In the early days of Cloudera, it was a fair assumption that our software would be running on industry-standard servers that were purchased, owned, and operated by the client in their own data center. In the last few years,
Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension.
Cloudera Search (that is Apache Solr integrated with the Apache Hadoop eco-system) now supports (as of C5.9) a backup and disaster recovery capability for Solr collections.
In this post we will cover the basics of the backup and disaster recovery capability in Solr and hence in Cloudera Search. In the next post we will cover the design of the Solr snapshots functionality and its integration with the Hadoop ecosystem as well as public cloud platforms (e.g.