
Home | OpenLineage
OpenLineage is an open platform for collection and analysis of data lineage. It tracks metadata about datasets, jobs, and runs, giving users the information required to identify the root cause …
Getting Started - OpenLineage
This guide covers how you can quickly get started collecting dataset, job, and run metadata using OpenLineage. We'll show how to collect run-level metadata as OpenLineage events using …
About OpenLineage
OpenLineage is an open framework for data lineage collection and analysis. At its core is an extensible specification that systems can use to interoperate with lineage metadata.
Python - OpenLineage
To try out the client, follow the steps below to install and explore OpenLineage, Marquez (the reference implementation of OpenLineage), and the client itself. Then, the instructions will …
Tracing Data Lineage with OpenLineage and Apache Spark
Nov 5, 2021 · The goal of OpenLineage is to reduce issues and speed up recovery by exposing those hidden dependencies and informing both producers and consumers of data about the …
Resources - OpenLineage
Data lineage is the foundation for a new generation of powerful, context-aware data tools and best practices. OpenLineage enables consistent collection of lineage metadata, creating a deeper …
Using OpenLineage with Spark
The Spark integration from OpenLineage offers users insights into graphs of datasets stored in object stores like S3, GCS, and Azure Blob Storage, as well as BigQuery and relational …
Example Lineage Events | OpenLineage
Dec 28, 2020 · "_schemaURL": "https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SqlJobFacet", …
Object Model - OpenLineage
OpenLineage was designed to enable large-scale observation of datasets as they move through a complex pipeline. Because of this, it integrates with various tools with the aim of emitting real …
Data Lineage with Snowflake | OpenLineage
Apr 27, 2022 · With OpenLineage’s open standard and extensible backend, users can easily identify the root causes of slow or failing jobs and issues with data quality in their ecosystems …