Lineage is mandatory for the audit and partial in practice.
A graph that stops at the warehouse door explains everything except where the number came from.
Every regulator wants the same thing: trace the number in the report back to where it began. Most lineage diagrams answer a narrower question, which is where the number was last seen.
What happened is that lineage matured as a standard faster than it matured as coverage. OpenLineage is an open framework for collecting lineage metadata across pipelines, with Marquez as its reference implementation under the Linux Foundation's LF AI & Data. The tooling to record provenance is real, governed, and adopted. The question is how far up the chain the recording actually reaches.
It matters because the audit need is specific. Column-level lineage tracks how an individual field was computed and transformed, not merely that one table fed another. Table-level lineage tells an auditor that customer data reached a report; column-level tells them which field, through which transformation, produced the figure under review. The gap between those two sentences is the entire point of the exercise.
What it reveals about the field is a familiar boundary problem. The Basel Committee's risk-data principles, in force since 2013, require banks to aggregate risk data fully and accurately, and a 2023 progress report found additional work required at all assessed banks to attain or sustain full compliance. Lineage that begins where data lands in the warehouse quietly excludes the spreadsheets, hand-edits, and source systems where the number was actually shaped.
What to watch is where the graph terminates. A lineage program that stops at the warehouse door is documenting custody, not origin. The reveal: lineage explains everything except the one thing the audit asked, which is where the number came from and who is accountable for the transformation that made it.
Lineage that ends at the warehouse documents custody, not origin. Audit accountability lives in the transformations it skips.
OpenLineage is an open standard for lineage metadata collection, with Marquez as its reference implementation under LF AI & Data.
supports01Column-level lineage traces how a specific field was computed and transformed, where table-level lineage only shows that tables are connected.
supports02Regulators require banks to trace and aggregate risk data accurately, and lineage provides the audit trail used to verify it.
Nearly a decade after the principles took effect, a 2023 review found additional work required at all assessed banks to reach or sustain full compliance.
context03
No notes yet. The margin is open.
Sign in to add a note. The margin is moderated — we keep it useful, not cruel.
The obligation assumes an inventory the organization skipped. The inventory is the project.
Owner MissingThe Generative AI Profile treats provenance as a control — but admits most builders cannot say what they trained on.
Owner MissingThe catalog filled with products. The org chart did not move an inch.