Learn about the new functionality coming aboard Cloudera Navigator, the trail-blazing solution for metadata management and lineage in Apache Hadoop.
More than two years ago, Cloudera introduced Cloudera Navigator 1.0, which was the first offering to unify auditing across enterprise Apache Hadoop deployments. About a year later, Cloudera released Cloudera Navigator 2.0, which introduced another first for Hadoop: comprehensive metadata management and lineage to Hadoop. Today, more than 200 customers across numerous industries use Cloudera Navigator in production to deliver trust and visibility to their Hadoop deployments.
Today we are announcing exciting news for Cloudera Navigator: Cloudera Navigator has joined the Cloudera Accelerator Program, a partner program designed to expedite the development and certification of partner applications. We have enlisted many of our leading data management and governance partners into this program—with even more partners to follow. This collaboration between leading data management and governance providers will ensure seamless interoperability and provide a unified foundation for governance and data management, that spans beyond Hadoop and across the entire enterprise.
In previous posts, we’ve discussed the unique challenges of governing Hadoop and how Cloudera Navigator addresses these challenges head-on. We also looked at how some of our partners, such as Informatica and CapTech Consulting, have used Cloudera Navigator to help deliver enterprise-wide governance solutions. In this post, I’ll recap some of the most important Cloudera Navigator capabilities that have shipped so far, and reveal what’s in store in the near future, too. (As with all forward-looking statements about roadmaps, keep in mind that these plans are always subject to change.)
Report Card for Cloudera Navigator 2.0, 2.1, 2.2, and 2.3
Delivered in 2.0
The 2.0 release focused on comprehensive, turnkey governance and compliance for Hadoop. In this release, we shipped:
- Comprehensive, automatically-collected metadata and lineage for Apache Hive, HDFS, MapReduce, MapReduce 2, Apache Pig, Apache Oozie, and Apache Sqoop—including table, file, and column-level lineage.
- Flexible custom metadata tagging, named value pairs, and definitions for files, tables, columns, and operations
Delivered in 2.1
The 2.1 release expanded Cloudera Navigator’s governance and compliance capabilities.
- User roles: we made it easy to let data stewards, curators, and business analysts start using Cloudera Navigator. These user groups might need to tag or search metadata, but likely should not get access to audit history.
- Integrated auditing interface: we enhanced the auditing user interface and moved it from Cloudera Manager over to Cloudera Navigator, thereby unifying all of Cloudera Navigator’s governance capabilities into a single, integrated interface.
- Expanded coverage: Comprehensive support for Apache Sentry (incubating) auditing
Delivered in 2.2
The 2.2 release significantly expanded Cloudera Navigator’s feature set to include important capabilities for data stewardship and curation.
- Data policy and lifecycle management: built on top of its rich, unified metadata foundation, Cloudera Navigator 2.2’s policy engine lets you automate crucial data stewardship and curation activities, such as metadata classification, data archiving, and retention, or even invoking partner products for additional data preparation and transformation.
Delivered in 2.3
The 2.3 release introduced self-service discovery capabilities for data scientists and business analysts.
- Self-service discovery: we introduced a brand new point-and-click search interface into Cloudera Navigator’s metadata that enables end-users to find, trust, and analyze new data sets.
- Expanded component coverage: In this release, we vastly expanded component coverage across the platform, including Impala lineage, Cloudera Search auditing, and full support for Apache Avro and Apache Parquet schemas.
Coming Later in 2015
Here’s what you can expect to see before the end of the calendar year:
- Expanded lifecycle management: built-in policy actions for retention, encryption, and more
- Expanded Partner SDK: a Java SDK that will complement our REST APIs with lineage and metadata augmentation APIs
- Expanded coverage: auditing, lineage, and metadata support for Hive on Spark
Coming in 2016
- Cloudera Navigator Optimizer integration: we have some exciting plans to leverage technology from our acquisition of Xplain for active data optimization.
- Fine-grained access control: you’ll be able to fine-tune and curate the data sets that users can discover through Cloudera Navigator based on their role within your organization.
- Even deeper Integration with Sentry: you’ll be able to set Sentry authorization policies based on Cloudera Navigator’s custom metadata—for example, grant specific user groups access to data that is tagged as “sensitive.”
- Expanded coverage: you can expect that we’ll continue to lead the market in breadth and depth of component coverage, including enhanced Apache Spark and Apache Kafka support, as well as unique offerings for cloud-based Hadoop deployments.
We’re proud that Cloudera Navigator has become the de facto standard for governing and managing data in Hadoop. No other Hadoop distribution matches Cloudera’s breadth and depth of governance capabilities, from vast component coverage to detailed column-level lineage and auditing. In the coming years, we’ll continue to lead the market in these areas with unique and compelling offerings.
Let us know what you think about Cloudera Navigator—either in the comments below, or on our brand-new community forum for Cloudera Navigator.
Mark Donsky leads data management solutions at Cloudera. Prior to Cloudera, Mark was at Silver Spring Networks, where he managed big data analytics solutions that reduced greenhouse gas emissions by millions of dollars annually.