4 responses on “New in Cloudera Enterprise: Interactive Data Lineage Exploration

  1. Ruslan

    That’s really cool.
    I noticed there is no Spark in the list “Apache Hive, Apache Pig, and Apache Impala (incubating) “.
    Do you guys plan to add Spark column-level data lineage too? We use Spark in our production jobs extensively.

    1. Mark Donsky

      Thanks for the feedback, Ruslan. We’re hard at work on Spark column-level lineage right now — stay tuned and you’ll see it in an upcoming release! Also keep in mind that we already have several APIs, including the Navigator SDK at https://github.com/cloudera/navigator-sdk, that allow you to add column-level lineage with compute engines, such as MR, when it’s not possible to automatically collect column-level lineage.

  2. irfan aziz

    This is quite informative. However, i could not find any article related to collecting, using and configuring the custom metadata used by the CDH navigator. I am new to CDH and now i want to setup a standard template describing the information to be added as custom metadata (business metadata) that can be used for each sources getting into CDH.