The Lineage Graph Is Free Now, Right Up to Where It Hurts
Airflow and dbt will draw your pipeline for nothing. The arrow still dies one hop short of the meeting where the number gets used.
In 2023 two things made automated lineage close to a commodity: OpenLineage graduated within the LF AI & Data Foundation in September, and Apache Airflow 2.7 made OpenLineage a built-in provider in August. The pipeline now narrates itself. The arrow still stops at the last hop, which is the only hop a regulator or an executive ever asks about.
OpenLineage's September 2023 graduation confirmed an open standard for collecting lineage as jobs run, with built-in integrations for Airflow, Spark, Flink, dbt, and warehouses. Weeks earlier, Airflow 2.7 had folded OpenLineage from a brittle external plugin into a native provider you enable by configuration. Capturing how a table was built is no longer the expensive part.
This matters because it relocates the hard part without removing it. The graph reliably covers the orchestrated, transform-shaped middle — the part written in code that already emits events. It thins out at the two ends humans actually argue over: the source system that emits nothing, and the dashboard, the exported spreadsheet, the metric in the board deck where the figure is finally believed and acted on.
What it reveals is which work the org was willing to automate and which it wasn't. Code-to-code lineage got standardized because engineers wanted it and machines could supply it. The last mile stayed manual because it requires a person to declare that this number on this slide is that column, and to keep saying so as both drift. That is stewardship, and stewardship resists being shipped as a provider.
Watch where the graph ends in your own tool, and whether anyone notices. A lineage map that terminates at the warehouse boundary is honest infrastructure; a lineage map sold as "end-to-end" that quietly terminates there is theater with good production values. The gap is not a missing connector. It is the hop nobody was assigned to walk.
Automated lineage covers the part machines can see; trust lives in the last hop they can't. A graph sold as 'end-to-end' that quietly stops at the warehouse is theater with good production values.
OpenLineage graduated within the LF AI & Data Foundation in September 2023 as an open standard with built-in integrations for Airflow, Spark, Flink, and dbt.
supports01Apache Airflow 2.7 (August 2023) made OpenLineage a built-in provider rather than a separately maintained plugin.
Standardized, automated lineage centers on the code-driven transform layer, leaving source-system and downstream BI/last-mile coverage comparatively thin.
No notes yet. The margin is open.
Sign in to add a note. The margin is moderated — we keep it useful, not cruel.
The EU AI Act's data-governance clause assumes lineage, provenance, and bias records most teams were never resourced to keep.
Owner MissingA graph that stops at the warehouse door explains everything except where the number came from.
Shiny Object PursuitThe catalog logs in. The steward logs the catalog. Nobody logs the decision the purchase was meant to replace.