249·10 · Definition DriftNo. 249 · 29 May 2026 · 2 min

Your Pipeline Learned to Call Yesterday's Breakage Normal

Anomaly detection now defines 'good' for you. It defines it as 'whatever usually happens.'

EvidenceThe EditorThe Controlled Term

Databricks made data quality monitoring generally available in June 2024 with a feature that needs no manual configuration: it watches every table's freshness and completeness by learning what each table normally does.

Per the product documentation, the system 'analyzes the history of commits to a table and builds a per-table model to predict the time of the next commit'; if a commit is unusually late, the table is marked stale. Completeness works the same way — it predicts a range of expected row counts from history and flags volumes below the lower bound. Databricks 'automatically scans each table at the same frequency it's updated,' with no per-table setup. The baseline for 'good' is the table's own past behavior.

This is enormously useful and quietly load-bearing. Statistical baselining catches the breaks no one wrote a rule for, which is most of them. But 'anomalous' and 'wrong' are different words. A table that has shipped duplicate rows every morning for a month has taught the model that duplicates are Tuesday. The monitor will defend the steady state, including the parts of the steady state that are already broken, because nobody supplied an external definition of correct for it to compare against.

It reveals the field's central substitution: we adopted observability faster than we adopted definitions. Monte Carlo's 2023 survey of 200 data professionals found respondents averaging 67 incidents a month, detection taking four-plus hours, resolution averaging fifteen, and — the quiet part — 74% saying business stakeholders spot the issue first, up from 47% the year before. A learned baseline tells you a number moved. It cannot tell you the number was wrong, because no one ever told it what right was.

Watch for the gap between a freshness alert and a freshness SLA. The first says the data changed later than usual; the second says someone promised consumers a deadline and named who answers when it slips. Anomaly detection is a fine smoke alarm and a poor fire code. Use it to catch the unruly, but keep the human-authored thresholds — the contract, the owner, the definition of 'good' — for the things you actually promised. A model that learns normal will eventually ratify whatever you tolerate.

The takeaway

Automated anomaly detection answers 'did this change?', never 'is this correct?' A model that learns normal will eventually ratify whatever you tolerate — keep the human-authored definition of good for the things you actually promised.

The claim, mapped

Databricks anomaly detection builds a per-table model from commit history to predict the next commit and marks a table stale if a commit is unusually late, with no manual configuration.
supports01
Databricks announced General Availability of Data Quality Monitoring at Data + AI Summit in June 2024.
supports02
In Monte Carlo's 2023 survey of 200 data professionals, 74% said business stakeholders identify data issues first all or most of the time, up from 47% the prior year, amid an average of 67 incidents per month.
context03
Completeness monitoring predicts an expected row-count range from history and flags volumes below the lower bound, defining normal from the table's own past.
supports01

Sources

Databricks Documentation — Anomaly detection — Data quality monitoring (Unity Catalog)2025-07-21 · Tier 1 · primaryAnalyzes the history of commits to a table and builds a per-table model to predict the time of the next commit. If a commit is unusually late, the table is marked as stale.

↗

Databricks Blog — What's New with Databricks Unity Catalog at Data + AI Summit 20242024-06-13 · Tier 1 · vendorWe are also excited to announce the General Availability of Databricks Data Quality Monitoring, profiling and enforcing quality directly in the platform.

↗

Monte Carlo / Wakefield Research — The State of Data Quality Survey2023-05-02 · Tier 2 · analyst74% reported business stakeholders identify issues first, all or most of the time, up from 47% in 2022; average of 67 monthly incidents.

↗

Mark this entry

Marginalia · 0 notes

No notes yet. The margin is open.

Related entries

Definition Drift

Everyone Agreed on the YAML. Nobody Agreed on the Owner.

A real standard for data contracts now exists. The argument it was supposed to settle has simply moved up a layer.

Process Debt

Data observability raised a fortune to watch the number. Defining the number raised nothing.

You can monitor a metric to the second and still not know what it counts.

Definition Drift

The data contract grew up. It stopped being a schema and started being a governance control.

An engineering convenience became a control point. A control point needs a controller.