Learn the details about using Impala alongside Kudu.
Kudu (currently in beta), the new storage layer for the Apache Hadoop ecosystem, is tightly integrated with Impala, allowing you to insert, query, update, and delete data from Kudu tablets using Impala’s SQL syntax, as an alternative to using the Kudu APIs to build a custom Kudu application. In addition, you can use JDBC or ODBC to connect existing or new applications written in any language,
This post from the HUE team about using HUE (the open source web GUI for Apache Hadoop), Apache Spark, and SQL for analytics was initially published in the HUE project’s blog.
Apache Spark is getting popular and HUE contributors are working on making it accessible to even more users. Specifically, by creating a Web interface that allows anyone with a browser to type some Spark code and execute it.
This new core security layer provides a unified data access path for all Hadoop ecosystem components, while improving performance.
We’re thrilled to announce the beta availability of RecordService, a distributed, scalable, data access service for unified access control and enforcement in Apache Hadoop. RecordService is Apache Licensed open source that we intend to transition to the Apache Software Foundation. In this post, we’ll explain the motivation, system architecture,
Learn how to build an Impala table around data that comes from non-Impala, or even non-SQL, sources.
As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you might encounter situations where you have data files (particularly in Apache Parquet format) where you do not know the precise table definition. This tutorial shows how you can build an Impala table around data that comes from non-Impala or even non-SQL sources,
The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen, Imran Rashid [PMC], Sandy Ryza, and Marcelo Vanzin) for their perspectives about how the Spark community has worked and is working together, and the work to be done via the One Platform initiative to make the Spark stack enterprise-ready.
Recently, Apache Spark has become the most currently active project in the Apache Hadoop ecosystem (measured by number of contributors/commits over time),