Apache Spark is a very popular analytics engine used for large-scale data processing. It is widely used for many big data applications and use cases. CDP Operational Database Experience Experience (COD) is a CDP Public Cloud service that lets you create and manage operational database instances and it is powered by Apache HBase and Apache […]
Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques. At Cloudera, we recently introduced several […]
Apache YuniKorn (Incubating) has just released 0.10.0 (release announcement). As part of this release, a new feature called Gang Scheduling has become available. By leveraging the Gang Scheduling feature, Spark jobs scheduling on Kubernetes becomes more efficient. What is Apache YuniKorn (Incubating)? Apache YuniKorn (Incubating) is a new Apache incubator project that offers rich scheduling […]
Update August 2021: Starting with CDE v1.9, you can now use the python-env resource (Option 2) for all Python packages, including those dependent on C base libraries such as Pandas, Pyarrow, etc. Use custom-runtime-image (Option 3) only for custom libraries & more advanced scenarios. Apache Spark is now widely used in many enterprises for building […]
What is Cloudera Data Engineering (CDE) ? Cloudera Data Engineering is a serverless service for Cloudera Data Platform (CDP) that allows you to submit jobs to auto-scaling virtual clusters. CDE enables you to spend more time on your applications, and less time on infrastructure. CDE allows you to create, manage, and schedule Apache Spark jobs […]
Cloudera Operational Database is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. In the previous blog posts, we looked at application development concepts and how Cloudera Operational Database (COD) interacts with other CDP […]
In the previous blog post, we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. COD is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers […]
Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. In this blog post, we’ll look at Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database. But […]
In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations Using Catalogs In this example, let’s load the table ‘tblEmployee’ that we made in the […]
When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. When many businesses start their journey into ML and AI, it’s common to place a lot of energy and focus on the coding and data […]