Apache Spark Archives - Page 2 of 9

June 10, 2021 | Technical

How to use Apache Spark with CDP Operational Database Experience

Apache Spark is a very popular analytics engine used for large-scale data processing. It is widely used for many big data applications and use cases. CDP Operational Database Experience Experience (COD) is a CDP Public Cloud service that lets you create and manage operational database instances and it is powered by Apache HBase and Apache […]

by Gokul Kamaraj , Liliana Kadar , Santhosh Gowda 4 min read

June 2, 2021 | Business

Modernizing Data Pipelines using Cloudera Data Platform – Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques. At Cloudera, we recently introduced several […]

by Joydeep Das 3 min read

Apache Airflow Apache Spark CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Customer Analytics Data Ingestion Data Science Machine Learning Modernize Architecture

May 5, 2021 | Technical

Spark on Kubernetes – Gang Scheduling with YuniKorn

Apache YuniKorn (Incubating) has just released 0.10.0 (release announcement). As part of this release, a new feature called Gang Scheduling has become available. By leveraging the Gang Scheduling feature, Spark jobs scheduling on Kubernetes becomes more efficient. What is Apache YuniKorn (Incubating)? Apache YuniKorn (Incubating) is a new Apache incubator project that offers rich scheduling […]

by WeiWei Yang , Wilfred Spiegelenburg , Kinga Marton 6 min read

Apache Spark Apache Yunikorn Kubernetes Cloudera Data Platform (CDP) Data Engineering Ops and DevOps Performance

April 30, 2021 | Technical

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Update August 2021: Starting with CDE v1.9, you can now use the python-env resource (Option 2) for all Python packages, including those dependent on C base libraries such as Pandas, Pyarrow, etc. Use custom-runtime-image (Option 3) only for custom libraries & more advanced scenarios. Apache Spark is now widely used in many enterprises for building […]

by Vijay Karthikeyan 7 min read

Apache Spark CDP Private Cloud CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering

April 14, 2021 | Technical

Cloudera Data Engineering – Integration steps to leverage Spark on Kubernetes

What is Cloudera Data Engineering (CDE) ? Cloudera Data Engineering is a serverless service for Cloudera Data Platform (CDP) that allows you to submit jobs to auto-scaling virtual clusters. CDE enables you to spend more time on your applications, and less time on infrastructure. CDE allows you to create, manage, and schedule Apache Spark jobs […]

by Harsh Shah , Ashish Shah , Shaun Ahmadian 5 min read

Apache Spark Kubernetes CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Ingestion Modernize Architecture Ops and DevOps Performance

February 26, 2021 | Technical

Sample applications for Cloudera Operational Database

Cloudera Operational Database is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. In the previous blog posts, we looked at application development concepts and how Cloudera Operational Database (COD) interacts with other CDP […]

by Gokul Kamaraj , Liliana Kadar , Krishna Maheshwari 5 min read

Apache Kafka Apache NiFi Apache Phoenix Apache Ranger Apache Spark Cloudera Data Platform (CDP) Operational DB Ops and DevOps

February 16, 2021 | Technical

Using other CDP services with Cloudera Operational Database

In the previous blog post, we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. COD is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers […]

by Gokul Kamaraj , Liliana Kadar , Krishna Maheshwari 4 min read

Apache Kafka Apache NiFi Apache Phoenix Apache Ranger Apache Spark Cloud Enterprise data cloud Machine Learning Cloudera Data Platform (CDP) Data Engineering DataFlow Machine Learning Operational DB SDX Technologies Ops and DevOps

February 9, 2021 | Technical

Cloudera Operational Database application development concepts

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. In this blog post, we’ll look at Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database. But […]

by Gokul Kamaraj , Liliana Kadar 6 min read

Apache HBase Apache Kafka Apache NiFi Apache Phoenix Apache Ranger Apache Spark Cloudera Data Platform (CDP) Operational DB SDX Technologies Data Ingestion Data Science Ops and DevOps

January 13, 2021 | Technical

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 2: Querying/ Loading Data

In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations Using Catalogs In this example, let’s load the table ‘tblEmployee’ that we made in the […]

by Manas Chakka 5 min read

Apache Spark Machine Learning Cloudera Data Platform (CDP) Cloudera Data Science Workbench Machine Learning Operational DB Machine Learning Modernize Architecture

December 17, 2020 | Business

Enabling The Full ML Lifecycle For Scaling AI Use Cases

When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. When many businesses start their journey into ML and AI, it’s common to place a lot of energy and focus on the coding and data […]

by Santiago Giraldo 5 min read

Apache Spark Machine Learning Cloudera Data Platform (CDP) Data Engineering Machine Learning SDX Technologies Data Science Machine Learning Modernize Architecture Ops and DevOps

Filter By

Youtube