Skip to content
Vol. I · No. 251
Mon · 8 Jun
A Daily Lexicon of Trustworthy Data
Glossary index

The Lexicon

429 terms of data management, each led by the meeting-safe punchline. The formal definition still appears with the little glasses on.

429 shown429 filed terms
005·8

access controls

A least-privilege model in which forty-one people have privilege, because requesting removal is harder than requesting access.

The mechanisms governing who may read, write, or export a given dataset, ideally on a least-privilege basis.
Privacy & Compliance
011·12

access recertification

The quarterly ritual of approving every entry on a list too long to read, thereby renewing access nobody remembers granting.

The periodic review in which managers confirm that each person's system access still matches their current role.
Governance
658·40

accountability

A quality every initiative is said to require and every org chart is careful never to locate.

The obligation of a single named party to answer for a decision and its consequences; not delegable, unlike the work itself.
Governance
005·76

accuracy

Agreement with the source system, which everyone has quietly agreed to call reality.

The degree to which a recorded value correctly describes the real-world entity or event it represents.
Quality
021·41

acting data steward

The only person who knows what the field means, formally prevented from being the one who knows what the field means.

A steward appointed on a temporary basis to maintain definitions and quality for a domain pending a formal assignment.
Roles & Theater
006·31

actionable insight

Term struck pending the definitions of action, owner, and decision right it claims to carry; absent those, the insight is observed but not actionable.

Term struck pending the definitions of action, owner, and decision right it claims to carry; absent those, the insight is observed but not actionable.
Analytics
006·31

ad hoc analysis

A one-off query that becomes a weekly dependency the moment someone forwards the screenshot.

A one-off investigation answering a specific question outside the standing reporting set.
Analytics
005·14

additive measure

A number the dashboard sums across every dimension, including the two where summing it means nothing.

A fact that can be summed meaningfully across every dimension, as opposed to semi-additive or non-additive measures with restricted aggregation.
Modeling
006·343

agentic systems

Software empowered to act on decisions no human had been assigned to make.

Systems in which a model plans and executes multi-step actions across tools with limited human intervention.
AI / ML
AI / ML

agentic workflow

A workflow that gained autonomy before the approval matrix did.

A process in which an AI system plans and performs a sequence of tasks with limited human direction.
AI / ML
006·31

ai bill of materials

The parts list everyone agrees is essential and assembles the week after the audit is scheduled.

An inventory of the models, datasets, and dependencies that compose an AI system, kept for provenance and risk.
Governance
006·30

ai center of excellence

The committee formed to govern the three models the other teams already shipped without it.

A central team chartered to set AI standards, build reusable capability, and advise teams on adoption.
Roles & Theater
006·33

ai gateway

A toll booth on the model traffic that three teams are already routing around.

A control point that routes, logs, and rate-limits requests to language models across an organization.
Architecture
006·314

ai governance

Term struck pending the decision rights, review cadence, and accountability for training data and outcomes it presumes someone already assigned.

Term struck pending the decision rights, review cadence, and accountability for training data and outcomes it presumes someone already assigned.
AI / ML
006·313

ai maturity

Term struck: cannot be measured until the field defines the quality, ownership, and monitoring the score is meant to rate.

Term struck: cannot be measured until the field defines the quality, ownership, and monitoring the score is meant to rate.
AI / ML
005·75

ai pipeline strategy

Term struck pending the definitions of ingestion, ownership, idempotency, lineage, and a data freshness SLA it depends on.

Term struck pending the definitions of ingestion, ownership, idempotency, lineage, and a data freshness SLA it depends on.
Pipelines
006·315

ai policy

A document that prohibits the thing three teams shipped last quarter, dated next quarter.

A written standard governing acceptable model uses, data handling, disclosure, and human review.
AI / ML
006·31

ai readiness

A green slide presented before anyone has profiled the data the slide describes.

The degree to which an organization's data is profiled, documented, owned, and fit for a stated model use.
AI / ML
342·085

ai readiness of our data

Struck: the possessive is the only part established; classification, retention, lawful basis, and ownership are not.

Struck: the possessive is the only part established; classification, retention, lawful basis, and ownership are not.
Privacy & Compliance
006·30

ai strategy

Term struck pending the definition of quality, ownership, lineage, and decision rights it depends on.

Term struck pending the definition of quality, ownership, lineage, and decision rights it depends on.
AI / ML
006·31

ai-readiness assessment

Struck. It rates the ownership, quality, lineage, and decision rights it quietly presumes already exist.

Struck. It rates the ownership, quality, lineage, and decision rights it quietly presumes already exist.
Roles & Theater
006·31

ai-ready data model

Term struck pending the definitions of grain, ownership, conformed dimensions, and metric meaning it claims to already possess.

Term struck pending the definitions of grain, ownership, conformed dimensions, and metric meaning it claims to already possess.
Modeling
005·75

alert fatigue

A channel so loud it has achieved the silence it was built to prevent.

The diminished responsiveness that results when a pipeline emits so many low-value alerts that operators stop distinguishing the actionable ones.
Pipelines
658·40

alignment

The state of having agreed on the word 'alignment.'

A documented, shared agreement among stakeholders on a definition, priority, or decision, with the agreement recorded.
Governance
658·82

analyst relations

The budget that moves the dot on the quadrant without moving the product.

The function that manages a company's standing with the industry research firms whose ratings influence buyers.
Roles & Theater
005·76

anomaly detection

A system that reliably notices the unusual and forwards it to an inbox that treats everything as usual.

The automated identification of data points or patterns that deviate materially from an established expectation or historical baseline.
Quality
342·0853

anonymization

A claim made about a dataset roughly until the first person re-identifies someone in it for a conference talk.

Irreversibly altering data so that no individual can be identified, removing it from the scope of privacy law.
Privacy & Compliance
011·09

approved use

The use described in the request form, as distinct from the use, which began the following Tuesday.

The set of purposes for which a dataset may lawfully and contractually be used, recorded at the point of access.
Governance
657·45

audit finding

The annual rediscovery of last year's finding, now with a new tracking number.

A documented gap between a control's stated design and its observed operation, raised by an independent reviewer.
Governance
657·45

audit trail

A complete and immutable record of every action, retained for exactly thirty days, then quietly rotated out.

A tamper-evident chronological record of who accessed or changed data, and when.
Privacy & Compliance
005·75

backfill

Re-running history to match the present, and discovering history had a different opinion about what 'customer' meant.

Reprocessing historical data through a pipeline to populate or correct records for periods before a job existed or after it failed.
Pipelines
005·75

batch processing

Running it overnight, so the data is ready by morning and wrong by lunch.

Processing data in bounded, scheduled groups rather than continuously, typically optimizing throughput over latency.
Pipelines
658·40

benchmark

The industry average chosen because the company was already above it.

A reference value, often external, against which a metric is compared to judge performance.
Analytics
006·326

benchmark saturation

Everyone got an A, so the test resigned.

The point at which models score near-perfectly on a benchmark, so it no longer distinguishes their real capability.
AI / ML
658·40

best practice

Someone else's context, repackaged as a rule and applied without the conditions that made it work for them.

A method shown to produce good results in comparable settings, adopted to avoid relearning what is already known.
Roles & Theater
005·52

bounded context

The fence inside which customer means one thing, drawn precisely so the fight happens at the gate.

A boundary within which a particular domain model and its terms are defined and consistent, borrowed from domain-driven design.
Architecture
342·0858

breach notification

The seventy-two-hour clock that starts the moment someone decides whether it counts as a breach.

The obligation to inform regulators and affected individuals of a personal-data breach within a defined window.
Privacy & Compliance
342·0858

breach response plan

A binder, last opened during the tabletop exercise, that names a coordinator who left in the spring.

The documented procedure assigning roles, decisions, and timelines for handling a data breach.
Privacy & Compliance
005·76

break

A difference between two numbers, resolved by writing down why it is fine and changing neither.

In reconciliation, a discrepancy between two datasets that should agree; each break must be identified, explained, and resolved.
Quality
005·8

break-glass access

An emergency-only door, propped open since the incident in March, with the review still pending.

An emergency mechanism granting exceptional access in a crisis, designed to be heavily logged and reviewed afterward.
Privacy & Compliance
048·43

bridge table

The table built to connect two things that were promised never to relate that way.

An intermediary table resolving a many-to-many relationship between a fact and a dimension, often carrying an allocation weight.
Modeling
005·75

brittle job

A job documented entirely in the muscle memory of the one person who knows which step to re-run first.

A pipeline task that fails on minor, expected variation in its inputs or environment, requiring manual intervention to recover.
Pipelines
005·19

bronze layer

The pile, renamed to suggest a podium finish.

The raw landing zone of a medallion design, holding source data as ingested with minimal transformation.
Architecture
005·50

brownfield architecture

Where the architecture meets the people who depend on it and loses.

A design that must integrate with and evolve existing legacy systems, accommodating their constraints, data, and dependencies.
Architecture
658·31

bus factor

Usually one, usually unaware, usually about to take leave.

The number of people who would have to be lost before a system or definition became unmaintainable; a lower number is a higher risk.
Roles & Theater
025·49

business glossary

A glossary in which the term 'owner' is the only entry without one.

The authoritative, owned register of agreed business term definitions and their relationships to data.
Governance
006·31

business intelligence

Two words that survived the rebrand to analytics, the rebrand to data science, and the rebrand back.

The practice and tooling that turn operational data into reports, dashboards, and analysis for decisions.
Analytics
005·14

business key

The identifier the business uses in conversation, which is to say a different one in each conversation.

The natural identifier the business recognizes for an entity, carried alongside the surrogate to preserve traceability to the source.
Modeling
005·75

business logic drift

The slow distance between what the query computes and what anyone still means by it, measured in pull requests no one reviewed.

The gradual divergence between the rules encoded in a transformation and the current definitions the business actually uses, accumulating as logic is copied without review.
Pipelines
005·14

canonical data model

The one true model every system maps to, plus the per-system exceptions that quietly restore the chaos it replaced.

A single agreed representation of a business concept that systems translate to and from, so integrations share one definition rather than negotiating pairwise.
Modeling
658·40

center of excellence

Three people, a shared inbox, and a name chosen before the headcount was approved.

A central team that sets practices, builds reusable capability, and advises domains on a governance discipline.
Governance
005·34

centralization

What the last reorg was moving away from and the next one is named after.

Consolidating data ownership, modeling, and decision authority into a single team or platform for consistency and control.
Architecture
005·74

certified report

A report stamped trustworthy by whoever was last willing to put their name on it.

A report formally reviewed and endorsed as accurate, with a named owner accountable for its definitions.
Analytics
005·75

change data capture

Capturing every change the source makes, including the ones it would prefer you had not noticed.

A technique that detects and propagates row-level changes from a source database, usually by reading its transaction log, rather than re-copying full tables.
Pipelines
658·40

change management

The workstream that explains the new tool to the people who will still lack the authority the old tool was missing.

The discipline of helping people adopt a new way of working: communication, training, and support that make a change stick.
Roles & Theater
658·40

chief data officer

An accountability spread across an average tenure shorter than a remediation plan.

The executive accountable for an organization's data strategy, governance, quality, and value realization.
Governance
006·333

chunking

Cutting the policy in half exactly where the exception lived.

Splitting source documents into passages sized for retrieval and model context windows.
AI / ML
658·45

circle back

To place a decision in a holding pattern phrased as motion. The circle is reliable; the back is theoretical.

To return to an item later, after gathering the input or authority a decision requires.
Roles & Theater
006·31

citizen analyst

The role created so the analytics team could decline the request and still call it enablement.

A business user who builds reports and analyses without a formal analytics or data role.
Analytics
005·76

completeness

The percentage of fields that are populated, of which a smaller, unmeasured percentage is populated correctly.

The extent to which all required values are present in a dataset, typically measured as the proportion of non-null entries against an expected population.
Quality
022·02

completeness check

A test confirming the rows that arrived are all there, while remaining serenely unaware of the rows that did not.

A test verifying that all expected records and required fields are present for a given load or period.
Quality
657·45

compliance attestation

A signed affirmation that the controls work, dated the same week three of them were turned off for the migration.

A formal statement, often annual, in which an organization affirms it meets a defined set of controls.
Privacy & Compliance
657·45

compliance gate

A checkbox that has never once been left unchecked.

A mandatory checkpoint in a delivery process at which work must demonstrate conformance before proceeding.
Governance
342·0851

compliant by default

Struck: 'by default' names no classification, no access control, and no retention to be compliant against.

Struck: 'by default' names no classification, no access control, and no retention to be compliant against.
Privacy & Compliance
005·46

composable data stack

Some assembly required, sold as a philosophy.

An architecture assembled from interchangeable, best-of-breed components connected through open standards rather than a single vendor suite.
Architecture
006·322

concept drift

The thing being measured moved; the dashboard, helpfully, stayed green.

A shift in the underlying relationship a model predicts, so the same input now implies a different outcome.
AI / ML
005·14

conformed dimension

A dimension every department agreed to share, then quietly forked the moment the meeting ended.

A dimension shared with identical structure and meaning across multiple fact tables, enabling consistent reporting across business processes.
Modeling
005·75

connector

A component marked 'maintained' until the source changes its API, after which it is marked 'a meeting'.

A reusable component that handles authentication, extraction, and protocol details for a specific source or destination system.
Pipelines
342·082

consent

A freely given, specific, informed agreement obtained through a banner engineered so that 'Accept All' is the only restful option.

A freely given, specific, informed, and unambiguous indication of a data subject's agreement to processing.
Privacy & Compliance
005·76

consistency

The state of two systems agreeing, usually because no one has joined them yet.

The degree to which the same fact holds the same value across systems, records, and points in time without contradiction.
Quality
006·35

context engineering

Requirements-gathering, performed live, in a text box, by whoever is holding the laptop.

The practice of assembling the instructions, examples, and retrieved facts a model needs for a task within its context window.
AI / ML
006·334

context window

A budget the organization fills with everything except the one document that decided the matter.

The maximum span of tokens a model can attend to in a single request, bounding how much it can consider at once.
AI / ML
011·07

control attestation

A signature confirming that someone read the sentence describing the control, which is legally distinct from the control existing.

A signed assertion by an accountable owner that a stated control operated as designed over a defined period.
Governance
025·49

controlled vocabulary

The aspiration; the practice is mostly the uncontrolled kind, which is why this exists.

A curated, governed set of approved terms that constrains how concepts are named and used to keep meaning stable.
Governance
006·344

copilot

A second opinion on data the org never gave a first opinion on.

An assistant embedded in a workflow that suggests or drafts actions for a human to accept or edit.
AI / ML
005·75

cron job

A line of asterisks on a server in the closet, quietly running the business since a tenure no one present can confirm.

A task scheduled to run automatically at fixed times or intervals, defined by a crontab expression on a host.
Pipelines
343·099

cross-border transfer

A transfer mechanism on file, and a logging endpoint in a region nobody filed.

The movement of personal data across jurisdictions, lawful only under an approved transfer mechanism.
Privacy & Compliance
005·75

dag

Acyclic in the diagram. The on-call rotation reports a cycle.

Directed acyclic graph: a representation of pipeline tasks and their dependencies in which no task can depend on itself, directly or transitively.
Pipelines
005·83

dark data

Most of it, kept on the theory that volume is a strategy.

Data an organization collects and stores but never uses, classifies, or governs.
Architecture
343·0721

dark pattern

The 'Manage Preferences' button rendered in the exact shade that the eye has been trained to skip.

An interface designed to steer users toward choices against their interest, such as broader data sharing.
Privacy & Compliance
006·31

dashboard

A wall of charts no one is assigned to act on.

A curated visual surface that presents selected metrics for monitoring and decision support.
Analytics
006·31

dashboard adoption

Measured by logins, because measuring whether it changed a decision was harder.

The degree to which intended users actually open and act on a dashboard after it ships.
Analytics
006·31

dashboard graveyard

Where dashboards go when no one will decide to delete them and no one will open them.

The accumulation of unused dashboards that remain published, indexed, and occasionally cited.
Analytics
005·55

data as a product

Term struck pending the named owner, consumer, quality SLA, and decision rights that distinguish a product from a table with ambition.

Term struck pending the named owner, consumer, quality SLA, and decision rights that distinguish a product from a table with ambition.
Architecture
005·42

data catalog

A complete index of assets nobody has agreed are correct.

A searchable inventory of an organization's data assets, enriched with descriptions, ownership, lineage, and usage to aid discovery and trust.
Architecture
005·76

data certification

A badge granted on a Tuesday and never re-checked against the Wednesday data.

A formal attestation that a dataset has met defined quality criteria and is approved for a stated class of use.
Quality
658·40

data champion

A title awarded in lieu of time, headcount, or authority, on the understanding that enthusiasm is a kind of budget.

A volunteer within a team who promotes good data practice and links the central program to local reality.
Roles & Theater
658·40

data citizenship

A responsibility assigned to everyone, which is the established method of assigning it to no one.

The shared responsibility of all employees to handle, interpret, and share data according to agreed standards.
Governance
005·74

data classification

A four-tier scheme under which everything is filed as 'Internal' to avoid the conversation.

The assignment of data to sensitivity tiers, such as public or restricted, to drive handling, access, and retention rules.
Governance
005·76

data cleansing

Repairing the data instead of the system that keeps breaking it, monthly, in perpetuity.

The correction or removal of inaccurate, incomplete, or improperly formatted records to bring a dataset within its quality requirements.
Quality
005·75

data contract

A written agreement about what the source will send, honored until the source has a deadline.

An agreed, enforceable specification of the schema, semantics, and quality a data producer guarantees to consumers, versioned and tested at the pipeline boundary.
Pipelines
005·22

data contract

A promise with a schema, signed by no one with the authority to keep it.

A versioned, enforceable agreement specifying the schema, semantics, quality, and SLAs a producer guarantees to its consumers.
Architecture
658·40

data culture

The thing a poster is bought to fix, on the model of curing a leak by framing a photograph of a dry ceiling.

The shared habits and incentives by which an organization actually defines, trusts, and acts on its data, beyond what any policy states.
Roles & Theater
658·40

data democracy

Struck: absent an access policy, an accountable owner, and decision rights, it is an unmanaged free-for-all with a hopeful name.

Struck: absent an access policy, an accountable owner, and decision rights, it is an unmanaged free-for-all with a hopeful name.
Governance
006·31

data democratization

Granting everyone access to the warehouse, then asking why everyone got a different total.

Broadening access to data and analytics tools beyond a central specialist team.
Analytics
005·74

data dictionary

A faithful description of the column named 'cust_status_2_final', whose meaning died with its author.

A catalog of a system's data elements — their names, types, formats, allowed values, and meanings — describing structure as built.
Modeling
005·741

data discovery

The tool whose first scan is reclassified internally as an incident.

The automated scanning of systems to locate and classify sensitive data wherever it has accumulated.
Privacy & Compliance
651·56

data disposition

The least-rehearsed verb in the data lifecycle, and the first one a regulator asks you to demonstrate.

The defensible destruction or transfer of records at the end of their retention period.
Privacy & Compliance
005·24

data domain

A boundary three teams drew differently and one team enforces.

A bounded business area — customer, order, payment — that organizes data ownership and modeling around a coherent subject.
Architecture
658·40

data domain owner

Accountable for the domain, consulted on the budget, informed of the deadline.

The accountable leader for a data domain, holding authority over its definitions, access, and quality priorities.
Governance
005·76

data downtime

The hours the data was wrong, billed entirely to whoever used it, not whoever produced it.

The cumulative period during which data is missing, late, or wrong — the data analogue of system downtime, measured in hours of untrustworthy state.
Quality
021·31

data evangelist

One who brings the good news that the data exists, and is not asked whether it is correct.

A role focused on internal promotion of data initiatives and tooling, measured by enthusiasm generated rather than data quality delivered.
Roles & Theater
005·17

data fabric

The word chosen because every other textile was taken.

An integration layer that uses active metadata, knowledge graphs, and automation to connect data across distributed sources for unified access.
Architecture
006·312

data for ai

Struck: it claims the warehouse already holds the profiling, ownership, rights clearance, and fitness-for-use no one has done.

Struck: it claims the warehouse already holds the profiling, ownership, rights clearance, and fitness-for-use no one has done.
AI / ML
005·74

data freshness

The timestamp everyone trusts and no one checks.

How recently the data in a report reflects the source system, relative to the decision it supports.
Analytics
005·75

data freshness sla

A promise about how recent the data is, signed by no one and measured by whoever is annoyed first.

A committed bound on how stale data may be, stating the maximum acceptable lag between an event in the source and its availability downstream.
Pipelines
005·74

data governance

The meeting where everyone agrees data is important and no one agrees who owns it.

The exercise of decision rights and accountability over data assets, expressed through agreed policies, roles, and standards.
Governance
658·40

data governance charter

A two-page mandate granting a committee the authority to request authority.

A founding document defining a governance program's scope, authority, membership, and escalation path.
Governance
658·40

data governance council

A monthly quorum convened to ratify decisions no attendee has the authority to make.

A standing cross-functional body chartered to set data policy, arbitrate disputes, and assign accountability across domains.
Governance
005·74

data governance framework

A wheel diagram printed, framed, and never opened past the wheel diagram.

A reference structure of domains, roles, and practices, such as DAMA-DMBOK or DCAM, used to organize a governance program.
Governance
005·74

data governance tooling

Bought to make the decisions; capable only of cataloguing the fact that no one will.

The platforms that catalog, classify, and track data and policy, supporting governance work but not performing its decisions.
Governance
006·318

data labeling

The step where the team discovers it never agreed on the categories.

The process of attaching agreed categories or values to examples so a model has something to learn from.
AI / ML
005·15

data lake

A folder, but you capitalized it.

A repository holding raw data in native formats at scale, schema applied on read rather than on write.
Architecture
Governance

data lineage

A family tree for numbers, requested immediately after the audit discovers everyone was adopted.

A record of where data came from, how it changed, and where it moved across systems and processes.
Governance
657·45

data lineage for audit

A diagram that is accurate up to the join where someone exported it to a spreadsheet and emailed it.

A traceable record of where personal data originated, how it moved, and where it now resides, for accountability.
Privacy & Compliance
006·31

data literacy

The training program assigned to readers to explain why the numbers never matched.

The ability to read, interpret, and reason about data and the limits of a given measure.
Analytics
021·67

data literacy program

Training that teaches everyone to read the dashboard and no one to ask who built it.

An organized effort to raise staff competence in reading, interpreting, and reasoning with data.
Roles & Theater
005·30

data marketplace

A storefront stocked with products that have no buyers and a returns desk no one staffs.

An internal or external venue where data products are published, discovered, and subscribed to under defined terms.
Architecture
005·75

data mart

The version of the numbers a team built so it would stop having to agree with the other teams.

A curated, subject-area subset of the warehouse, modeled and maintained for a specific team or function's reporting needs.
Pipelines
005·824

data masking

The careful obscuring of the production data, in the one environment, that a different team had already copied somewhere else.

The replacement of sensitive values with realistic but non-sensitive substitutes for use in lower environments.
Privacy & Compliance
658·41

data maturity, level 3

The level every organization rates itself: precisely one above wherever it actually is.

A self-assessed stage on a capability model indicating defined, repeatable practice across the organization.
Governance
005·11

data mesh

A reorganization announced as an architecture.

A sociotechnical approach that distributes ownership of analytical data to the domains that produce it, supported by self-serve platform infrastructure and federated governance.
Architecture
342·085

data minimization

The principle that lost, by unanimous vote, to a roadmap item called 'data we might need to train on later.'

The principle that collection be limited to data adequate, relevant, and necessary for a stated purpose.
Privacy & Compliance
005·76

data observability

The ability to watch the data break in real time, in high resolution, on a screen no one is assigned to.

The practice of monitoring data systems for freshness, volume, schema, and distribution so that incidents are detected before consumers are affected.
Quality
658·32

data onboarding

The two weeks a new hire spends discovering which of the four dashboards is the real one.

The process by which a new team member gains the access and context to work productively with the data.
Roles & Theater
658·40

data owner

A field on a slide, populated last quarter with the name of whoever wasn't there to refuse.

The accountable party with authority to approve access, definitions, and acceptable use for a data asset, and who answers for its outcomes.
Governance
005·75

data pipeline

A sequence of steps that worked when one person built it and has been load-bearing ever since.

A defined sequence of steps that moves and transforms data from a source to a destination on a schedule.
Pipelines
005·47

data pipeline architecture

A diagram of arrows, one of which is doing something nobody documented.

The structural arrangement of stages — extract, transform, load, orchestrate — through which data moves from source to consumer.
Architecture
005·28

data platform

It did not fail to create ownership. It succeeded at revealing there was none.

An integrated set of services for storing, processing, governing, and serving data, offered to internal teams as shared infrastructure.
Architecture
005·74

data policy

A document whose enforcement mechanism is the hope that someone reads it.

A documented, enforceable rule governing how data may be defined, classified, accessed, retained, or shared.
Governance
005·12

data product

A dataset with a landing page.

A reusable dataset with named ownership, a quality SLA, documented consumers, and decision rights attached.
Architecture
006·31

data product

A dashboard given a roadmap so no one has to retire it.

In analytics, a maintained dataset or metric set with a named owner, an SLA, and documented definitions for consumers.
Analytics
658·48

data product manager

A title invented so a dataset could have someone to attend its roadmap review.

The role accountable for a data product's consumers, roadmap, quality, and value.
Roles & Theater
005·76

data profiling

The one-time exercise of learning what the data actually contains, scheduled for after launch.

The systematic examination of a dataset's structure, content, and statistics to discover its true shape, ranges, patterns, and anomalies.
Quality
342·0852

data protection impact assessment

An assessment performed before the activity begins, scheduled three weeks after the activity began.

A structured analysis of privacy risks for a high-risk processing activity, performed before that activity begins.
Privacy & Compliance
005·76

data quality

A program everyone funds, no one owns, and the dashboard reports as green.

The degree to which a dataset is fit for its intended use, measured across dimensions such as accuracy, completeness, timeliness, consistency, validity, and uniqueness.
Quality
005·76

data quality debt

The interest paid every quarter on the validation skipped to make one deadline two years ago.

The accumulated future cost of deferred quality work — unfixed defects, skipped validation, and undefined rules — that compounds as systems build on top of it.
Quality
005·76

data quality dimension

One of six words a vendor promises to measure and a scorecard promises to average into a single reassuring number.

A named axis along which quality is assessed — accuracy, completeness, timeliness, consistency, validity, or uniqueness — each with its own rules and measures.
Quality
021·59

data quality initiative

The quarterly observation that the data has quality, followed by the closing of the observation.

A time-boxed program to identify, prioritize, and remediate data-quality defects across one or more domains.
Roles & Theater
005·76

data quality rule

A test that fires, raises a ticket, and waits to learn whether anyone meant it.

A codified, testable assertion about data — a constraint, range, or relationship — whose pass or fail is evaluated against incoming records.
Quality
021·61

data quality scorecard

A panel of green squares whose thresholds were set after the squares were already green.

A periodic summary of data-quality metrics—completeness, validity, timeliness—reported to stakeholders by domain.
Roles & Theater
343·099

data residency

A map showing where the data lives, drawn before anyone asked the CDN where it caches.

A requirement that data be stored within a specified geographic or jurisdictional boundary.
Privacy & Compliance
651·56

data retention schedule

A document specifying when data will be deleted, ratified by an organization that has never deleted anything.

A documented policy stating how long each record class is kept and when it must be disposed.
Privacy & Compliance
343·099

data sovereignty

A principle observed everywhere except the vendor's support tier, which is global and reads everything.

The principle that data is subject to the laws of the nation in which it is collected or stored.
Privacy & Compliance
005·74

data standard

One of the several standards each team independently considers the standard.

An agreed, documented specification for how a class of data is to be named, formatted, classified, or validated.
Governance
658·40

data steward

The one person who reads the policy, applied as a layer on top of an existing full-time job.

A named individual accountable for the definition, quality, and correct use of a defined set of data within their domain.
Governance
005·76

data stewardship

A responsibility added to a job description, never to a calendar.

The assigned operational responsibility for the quality, definition, and correct use of a defined set of data within its domain.
Quality
006·31

data storytelling

The discipline of making a number persuasive before it is correct.

The practice of framing analysis as a narrative so an audience can follow the finding and its implication.
Analytics
342·0857

data subject access request

A request that, for the first time, forces the organization to find out where it actually keeps that person.

A formal request by which an individual obtains the personal data an organization holds about them.
Privacy & Compliance
005·16

data swamp

Term struck pending the definitions of quality, ownership, catalog, and lineage it was supposed to replace.

Term struck pending the definitions of quality, ownership, catalog, and lineage it was supposed to replace.
Architecture
005·29

data virtualization

The integration you postponed, now happening live, in front of the user.

A technique that provides unified, real-time access to data across sources through an abstraction layer, without physically moving or copying it.
Architecture
005·14

data warehouse

Where data goes to be agreed upon, eventually, by people who have since left.

A centralized store of integrated, modeled, historical data optimized for query and reporting across subjects.
Architecture
658·40

data-driven culture

Term struck pending the definitions of decision right, metric ownership, and the willingness to act against a preference the data contradicts.

Term struck pending the definitions of decision right, metric ownership, and the willingness to act against a preference the data contradicts.
Analytics
005·75

dbt model

A SQL file that turned a folder of opinions about 'revenue' into a folder of opinions under version control.

A version-controlled SQL select statement that defines a transformation, materialized by dbt as a table or view with declared dependencies and tests.
Pipelines
005·75

dead-letter queue

Where the records that could not be processed go to be inspected, in the same sense that a junk drawer is a filing system.

A holding location for messages or records a pipeline could not process, set aside for inspection rather than discarded.
Pipelines
005·33

decentralization

The half of the pendulum currently swinging away from the slide that recommended the other half.

Distributing data ownership and decision authority to teams closer to the data, in place of a single central function.
Architecture
658·47

decision intelligence

The newest word for who decides, sold so the question can be procured instead of answered.

The discipline of designing how decisions are framed, made, and improved using data and models.
Analytics
658·45

decision log

The artifact whose existence is confirmed only by the meeting spent looking for it.

A maintained record of significant decisions, their rationale, owner, and date, kept so they can be revisited rather than relitigated.
Governance
658·40

decision rights

Who may decide, within what bounds — the one field every RACI leaves to a future workshop.

The explicit allocation of who may decide what, within which bounds, and who must be consulted before they do.
Governance
006·31

deduplication

The quarterly project to merge the duplicate customers, which itself runs in three teams under two names.

The identification and removal or merging of records that represent the same entity, reducing a set to its distinct members.
Modeling
005·76

default value pollution

The reason a company believes a meaningful share of its customers were born on the first of January, 1900.

The degradation of accuracy that occurs when placeholder or default entries — 01/01/1900, 99999, 'N/A' — accumulate and are later treated as real values.
Quality
005·14

definition of done

A list that names tests, deploys, and docs, and goes quiet on whether the number is right.

The agreed checklist a unit of work must satisfy before it is considered complete.
Quality
048·49

degenerate dimension

A dimension demoted to living in the fact table, named as though the modeling, and not the org, were at fault.

A dimension attribute, such as an invoice or order number, stored in the fact table itself because it has no other attributes to warrant its own dimension.
Modeling
005·14

denormalization

Copying the value into six tables so the dashboard loads faster, then learning which copy is wrong six tables at a time.

The deliberate introduction of redundancy into a schema to improve read performance, accepting update complexity in exchange for fewer joins.
Modeling
658·40

digital transformation

A standing program whose deliverable is its own continuation. It does not finish; it renews, like a season pass to a process no one owns.

A coordinated change to how an organization operates, using technology to do work that was previously manual, slow, or impossible.
Roles & Theater
005·14

dimension table

Where the company keeps every adjective it has ever applied to a customer, none of them governed.

A table of descriptive attributes that give business context to facts and supply the labels used to filter and group them.
Modeling
005·14

dimensional modeling

Deciding what the business measures and what it measures by, then discovering nobody agrees on either.

A design technique that organizes data into facts and dimensions to optimize a warehouse for query and analysis rather than transaction processing.
Modeling
343·0721

do not sell

A right whose definition of 'sell' the company has worked harder to interpret than to honor.

A consumer right to opt out of the sale or sharing of their personal data, with a clear mechanism to exercise it.
Privacy & Compliance
005·15

documentation debt

The distance between the wiki and the warehouse, compounding every sprint and paid by whoever onboards next.

The accumulating gap between how a system actually behaves and what its documentation describes.
Quality
005·23

domain-oriented ownership

A diagram in which every box owns the data and every arrow owns the blame.

The principle that the business domain producing data owns its modeling, quality, and lifecycle rather than a central team.
Architecture
658·40

dotted line

The line on the chart that carries all of the responsibility and none of the headcount. It is dotted because solid would imply someone could be held to it.

A secondary reporting relationship: influence or coordination without direct authority over a person or function.
Roles & Theater
006·31

drill-down

The feature requested in every demo and used the day after the number is already wrong.

Navigation from an aggregate figure into its underlying detail to locate the source of a change.
Analytics
006·31

drill-through

The path from the green number to the spreadsheet that disagrees with it.

A link from a summary visual to a detailed report filtered to the selected context.
Analytics
347·072

ediscovery

The annual event at which the org learns, under oath, how many places it kept the same message.

The identification, collection, and production of electronically stored information for legal proceedings.
Privacy & Compliance
005·75

elt

ETL, but the part nobody documented now runs where the compute bill can see it.

Extract, load, transform: raw data lands in the warehouse first and is reshaped in place, deferring transformation to the destination.
Pipelines
006·330

embeddings

Coordinates that faithfully encode whatever bias the source corpus never declared.

Numeric vector representations of text or other data, positioned so that similar items sit near each other.
AI / ML
658·40

enablement

The function that hands teams a self-service portal to a decision that still requires three approvals they are not on.

The work of equipping teams to do something for themselves: training, documentation, templates, and support.
Roles & Theater
005·82

encryption at rest

The control most cited in the questionnaire and least relevant to the breach, which came in through a login.

The encoding of stored data so that it is unreadable without a key, protecting it if the storage is compromised.
Privacy & Compliance
006·31

entity resolution

Deciding whether these four customers are one customer, a question the org would rather bill four times than answer once.

The process of determining which records across systems refer to the same real-world entity and linking or merging them.
Modeling
005·76

error budget

A budget you only discover you had when someone explains it was already spent.

The maximum tolerable amount of quality failure allowed within a period before remediation must take priority over new work.
Quality
658·40

escalation

Sending a decision upward in search of an owner, where it is acknowledged, admired for its complexity, and sent back down for alignment.

The act of raising an unresolved issue to a level with the authority to decide it.
Roles & Theater
658·40

escalation path

A documented route to the person who will say it isn't their decision either.

The defined sequence of authorities a governance issue follows until it reaches someone empowered to resolve it.
Governance
005·75

etl

Three verbs, four teams, and no agreement on which one owns the part that broke.

Extract, transform, load: a pattern that moves data from source systems, reshapes it, then writes it to a destination warehouse.
Pipelines
006·325

eval harness

A scoreboard everyone trusts and no one has read the rules of.

The reusable apparatus that runs a model against fixed test cases and records scored outputs.
AI / ML
006·36

eval set

The exam the model is allowed to study before every retake.

A fixed, labeled collection of cases used to measure a model's quality consistently across versions.
AI / ML
005·39

event-driven architecture

Decoupling, sold as a feature, until someone asks which service owns the order.

A design in which components communicate by producing and reacting to events, decoupling producers from consumers in time and identity.
Architecture
005·75

exactly-once

A guarantee printed on the slide and approximated in the system.

A delivery guarantee that each event is processed one time and one time only, with no loss and no duplication, typically requiring coordinated state and deduplication.
Pipelines
658·40

executive buy-in

Term struck pending a definition of the thing bought into, the price paid, and the moment the buyer would be asked to refund it.

Term struck pending a definition of the thing bought into, the price paid, and the moment the buyer would be asked to refund it.
Roles & Theater
658·40

executive dashboard

The one dashboard that must never be red while the report is open.

A summary view of high-level indicators built for senior leadership to track performance at a glance.
Analytics
658·40

executive sponsor

The name on the kickoff slide, available again at the one-year retrospective.

The senior leader who funds a governance program, owns its mandate, and is accountable for its outcomes at the executive level.
Governance
658·40

executive summary

The only slide that gets read, written by the person who least wanted to read the others.

A short framing of an analysis that states the finding and recommendation before the detail.
Analytics
006·352

explainability

A chart explaining a decision no one had defined well enough to question.

The degree to which a model's outputs can be attributed to understandable factors a human can audit.
AI / ML
005·14

fact table

The one table everyone trusts, named for a property no one has verified.

A table holding the measurements of a business process at a declared grain, with foreign keys to its dimensions.
Modeling
658·40

fail fast

A slogan repeated until the first failure, after which it is replaced by the question of who approved the experiment.

A practice of testing ideas in small, cheap experiments so the wrong ones are abandoned early and the learning is kept.
Roles & Theater
005·76

false positive rate

The reason the real alert was muted along with the eighty before it.

The proportion of quality alerts that flag a problem where none exists, eroding trust in the detection system that raised them.
Quality
006·336

feature leakage

The model knew the answer because the answer was in the question.

When information unavailable at prediction time slips into training, inflating measured accuracy.
AI / ML
006·335

feature store

A shared shelf where four teams store four columns named 'revenue'.

A managed repository of curated model inputs with consistent definitions shared across training and serving.
AI / ML
005·53

federated governance

Central authority for writing the policy, local authority for ignoring it.

A model in which global standards and policies are set centrally while domains apply and enforce them locally, balancing autonomy and consistency.
Architecture
005·25

federation

The word for centralization once the central team admits it cannot make anyone do anything.

An arrangement in which independent systems or teams retain local control while agreeing on shared standards for interoperation.
Architecture
006·339

fine-tuning

Teaching the model the house style before deciding the house had one.

Further training a pretrained model on task-specific examples to specialize its behavior.
AI / ML
005·76

fitness for use

The clause that lets data fail every dimension and still pass, because no one wrote down the use.

The principle that quality is defined relative to a specific consumer and purpose, not as an absolute property of the data.
Quality
006·353

foundation model

A general-purpose engine bought to solve a problem the org had not yet agreed to define.

A large model pretrained on broad data that serves as a base for many downstream tasks.
AI / ML
005·76

freshness

How recently the pipeline ran, which is not the same as how recently it was right.

A measure of how recently a dataset was last updated relative to the cadence its consumers expect.
Quality
005·75

freshness check

A test that confirms the data is old, then emails someone who has muted the channel.

An automated test that flags a table as stale when its most recent record is older than a defined threshold.
Pipelines
022·04

freshness sla

A promise about data age that is measured precisely, breached quietly, and reported only in the incident review.

A committed maximum age for data in a table, beyond which it is considered stale and consumers are notified.
Quality
657·45

full audit coverage

Struck: there is no covering a data inventory, audit trail, and lineage that were never drawn.

Struck: there is no covering a data inventory, audit trail, and lineage that were never drawn.
Privacy & Compliance
025·49

glossary owner

A role defined in the glossary it was supposed to be assigned in.

The named party accountable for the completeness, accuracy, and approval workflow of business glossary entries.
Governance
005·20

gold layer

The tier the dashboard reads from, which is why it is always green.

The business-ready, aggregated tier of a medallion design, modeled for direct consumption by reports and applications.
Architecture
006·31

golden record

The one true version of the customer, assembled from five wrong ones by a rule no one in the room can recite.

The single best, reconciled version of an entity assembled from multiple sources according to defined survivorship rules.
Modeling
658·40

governance by committee

A method of distributing a decision until no single person can be found to have made it.

An approach that routes governance decisions through group consensus rather than assigning them to accountable individuals.
Governance
658·40

governance council meeting

The governance council met. The definition did not. Minutes were taken; a decision was scheduled for the meeting that takes minutes about this one.

A recurring forum where data leaders ratify standards, settle disputes over definitions, and assign ownership.
Roles & Theater
658·40

governance office

The function accountable for everything and authorized for the calendar invites.

The standing function that coordinates policy, stewardship, issue management, and reporting across a governance program.
Governance
658·40

governance scorecard

A dashboard reporting every metric except whether anything was decided.

A periodic summary of governance metrics, such as stewardship coverage, issue aging, and policy conformance.
Governance
005·14

grain

The question 'what does one row mean?', asked far too late and answered by whoever wrote the query.

The precise level of detail represented by a single row in a fact table; the first and most consequential modeling decision.
Modeling
005·76

green dashboard

Green because nobody had agreed what red would have required someone to do.

A quality monitoring display showing all indicators within tolerance, signaling that no metric currently requires intervention.
Quality
005·49

greenfield architecture

A clean design, defined by the absence of anyone using it yet.

A design built from scratch with no constraints from existing systems, free to adopt preferred patterns and technologies.
Architecture
006·317

ground truth

One analyst's spreadsheet, promoted to truth because no one else volunteered.

The reference labels treated as correct, against which a model's outputs are measured.
AI / ML
006·328

grounding

Asking the model to cite a source, which works only if the organization keeps one.

Constraining a model's output to retrieved, citable evidence rather than its parametric memory alone.
AI / ML
006·342

guardrails

Rules that enforce a policy, written for the policy the org has not yet enforced anywhere else.

Programmatic constraints that filter or block model inputs and outputs against defined safety and policy rules.
AI / ML
006·327

hallucination

The model answering a question the organization had also never answered, only faster.

A model output that is fluent and confident but not supported by its inputs or any verifiable source.
AI / ML
658·34

handoff

The moment a responsibility becomes a diagram of an arrow, with no one standing at either end of it.

The transfer of responsibility for a task or system from one team or person to another.
Roles & Theater
005·31

headless bi

Business intelligence with the part that asked the business removed.

An architecture that exposes governed metric definitions through an API, decoupling the metrics layer from any single visualization tool.
Architecture
006·31

hierarchy

The official rollup of the org, redrawn each reorg, never matching the one Finance is actually using.

An ordered parent-child structure within a dimension — such as date or organization — that supports rollup and drill-down across levels.
Modeling
658·40

hippo

The variable the dashboard was quietly fitted to predict.

Highest Paid Person's Opinion: the decision input that overrides the analysis when the two disagree.
Analytics
006·345

human in the loop

The named human who will be blamed for the output they were given two seconds to approve.

A design in which a person reviews or approves model outputs before they take effect.
AI / ML
005·75

idempotency

The property everyone assumed the job had until the day it ran twice.

The property that running a pipeline operation multiple times produces the same result as running it once, so retries and replays do not duplicate or corrupt data.
Pipelines
033·03

idempotent load

A load that is safe to rerun any number of times, which is why it has now been rerun any number of times.

A load designed so that running it multiple times produces the same result as running it once.
Pipelines
005·76

incident postmortem

A document that names the cause precisely, assigns the corrective action vaguely, and recurs on schedule.

A structured review after a quality incident that documents the timeline, impact, root cause, and corrective actions to prevent recurrence.
Quality
005·75

incremental load

Processing only what changed, defined by a watermark everyone trusts and no one can explain.

A load strategy that processes only new or changed records since the last run, rather than reloading the full dataset.
Pipelines
006·346

inference

The part priced per call, unlike the definitions it relies on, which are free and missing.

The act of running a trained model on new inputs to produce predictions in production.
AI / ML
005·75

ingestion

Accepting whatever a source sends, on the working theory that the schema it sent last week is the schema it will send today.

The process of bringing data from external or upstream systems into a data platform for storage and processing.
Pipelines
005·44

ingestion layer

The door, drawn as a layer to justify a team.

The tier responsible for acquiring data from sources and landing it into the platform, batch or streaming, with basic validation.
Architecture
006·31

insight backlog

The pile of insights that were actionable until everyone agreed they would not act.

The queue of analyses that have been produced but not yet acted on by a decision owner.
Analytics
006·31

insights dashboard

A dashboard renamed 'insights' the year charts stopped impressing the board.

A dashboard positioned to surface findings and recommendations rather than raw measures.
Analytics
006·319

inter-annotator agreement

The metric that quietly reports how undefined the term was all along.

A measure of how consistently independent labelers assign the same label to the same example.
AI / ML
021·37

interim owner

The permanent owner, pending the search that was never opened.

A person designated to hold accountability for an asset or process until a permanent owner is identified.
Roles & Theater
005·35

interoperability

Term struck pending the shared definitions, contracts, and identifiers two systems must agree on before they can be said to operate together at all.

Term struck pending the shared definitions, contracts, and identifiers two systems must agree on before they can be said to operate together at all.
Architecture
005·74

issue log

A spreadsheet sorted by date opened, scrolled to the bottom only during audits.

The authoritative record of open and closed data issues, each with an owner, severity, and target resolution date.
Governance
005·74

issue management

Moving a problem from a person's worry to a system's backlog, where it is now safe.

The disciplined intake, triage, ownership, and resolution of data problems against tracked service levels.
Governance
005·75

it runs on dave's laptop

The architecture diagram for a critical daily process, redrawn every time Dave takes a vacation.

An informal note that a production-relevant process executes on an individual workstation rather than managed infrastructure, with no redundancy or oversight.
Pipelines
005·14

junk dimension

The drawer where the model keeps every yes/no flag a stakeholder swore was critical and never queried.

A dimension that consolidates assorted low-cardinality flags and indicators into one table to keep the fact table narrow.
Modeling
342·085

keep everything for ai

Data minimization's opposite, sponsored at the executive level, filed under strategy rather than under risk.

An informal retention posture in which deletion is deferred indefinitely on the theory that any data may later have model value.
Privacy & Compliance
658·40

key performance indicator

A number promoted to 'key' the quarter it finally looked good.

A quantified measure tied to an objective, used to judge whether performance is on track.
Analytics
005·14

keys and identity

The unanswered question of whether two rows are the same person, deferred until a regulator asks it for you.

The discipline of deciding what uniquely distinguishes one record from another, and which attribute the model trusts to do so over time.
Modeling
006·31

knowledge graph

A graph of how every entity relates to every other, built atop a 'customer' table no two systems define the same way.

A graph-structured model of entities and their relationships, typically grounded in an ontology, that links facts across a domain for query and inference.
Modeling
006·320

label drift

Definition drift, now with a confidence interval.

Change over time in how a target category is defined or applied, degrading models trained on the older meaning.
AI / ML
658·40

lagging indicator

The number that tells you what happened, reported the week after you could have changed it.

A measure that confirms an outcome after it has occurred, useful for accountability but not steering.
Analytics
005·13

lakehouse

A lake that read the warehouse's performance review and panicked.

An architecture that adds warehouse-style management — transactions, schema, governance — over low-cost lake storage, aiming to serve both analytics and machine learning from one tier.
Architecture
658·83

land and expand

The pricing model in which the second invoice is the strategy.

A sales motion that begins with a small, low-friction deployment and grows usage and spend over time.
Roles & Theater
005·75

late-arriving data

Data that shows up after the meeting that needed it, with a perfectly good explanation no one waited to hear.

Records that reach the pipeline after the time window they belong to has already been processed, requiring late updates or reprocessing.
Pipelines
342·0853

lawful basis

The ground selected after processing begins, by choosing whichever one the activity already happens to fit.

The legal ground that legitimizes a given processing activity, selected before processing begins.
Privacy & Compliance
658·40

leading indicator

A metric called 'leading' until the outcome it predicted failed to follow.

A measure that tends to change before an outcome, used to anticipate rather than confirm.
Analytics
005·8

least privilege

A principle interpreted as 'the least we can grant without anyone filing a ticket,' which trends upward.

The principle that each identity receives only the access strictly required to perform its function.
Privacy & Compliance
347·064

legal hold

The one instruction that the retention schedule, otherwise inert, has always obeyed instantly.

A directive suspending normal deletion of records relevant to anticipated or active litigation.
Privacy & Compliance
005·75

lineage

A map of where the number came from, consulted exclusively after it is already wrong.

A traced record of how data flows through transformations from source to destination, showing each column's upstream dependencies.
Pipelines
006·37

llm-as-judge

Outsourcing whether the answer is good to a second system that also does not know.

Using one language model to score another model's outputs against a rubric, in place of human grading.
AI / ML
005·14

logical data model

The clean diagram approved at kickoff, preserved unchanged precisely because the database stopped resembling it in week two.

A technology-independent representation of entities, attributes, and relationships that captures business meaning before physical implementation.
Modeling
006·31

master data

The handful of nouns the entire company depends on and no single function will admit it owns.

The shared core entities of a business — customers, products, suppliers, accounts — maintained for consistent use across processes and systems.
Modeling
006·31

master data management

A program funded to decide who owns 'customer', staffed by everyone who is certain it is not them.

The discipline and tooling that govern master data — its definitions, stewardship, survivorship rules, and distribution — to keep core entities consistent.
Modeling
658·40

maturity assessment

A survey that asks a team to grade itself and is then surprised by the grade.

A structured evaluation of a program against a maturity model, producing a current-state score and improvement targets.
Governance
658·40

maturity model

A five-level scale on which every organization assesses itself at a confident level three.

A staged framework rating a capability's development from ad hoc to optimized, used to plan and benchmark improvement.
Governance
005·18

medallion architecture

A medal ceremony for data that has not yet competed.

A layered design that refines data through successive bronze, silver, and gold zones of increasing cleanliness and business readiness.
Architecture
005·74

metadata

Everything we know about the data except whether anyone agreed what it means.

Data that describes other data — its meaning, structure, origin, ownership, and quality — enabling discovery, governance, and trust.
Modeling
005·41

metadata layer

Where the ownership field is required and the ownership is not.

The tier that captures and serves information about data — schemas, lineage, ownership, definitions — to people and systems that need context.
Architecture
Governance

metadata management

Writing down what the data is so future employees do not have to summon the one person who remembers.

The discipline of maintaining descriptive, technical, operational, and business context about data assets.
Governance
005·74

metric certification

A badge that certifies who to ask, not whether the number is right.

A formal review that endorses a metric's definition and computation as the approved version.
Analytics
005·74

metric definition

The document everyone cites and no one has read since the analyst who wrote it left.

The exact logic, filters, and grain that determine how a measure is computed and what it excludes.
Analytics
005·74

metric layer

A single place to disagree about 'active user', now with version control.

A centralized service that defines metrics once and serves consistent values to every downstream report and tool.
Analytics
006·31

metric sprawl

Nine definitions of 'churn', each defended by the team that pays for the tool that computes it.

The accumulation of redundant or near-duplicate metrics that lack a single agreed definition.
Analytics
658·40

metric tree

A diagram proving the north star depends on twelve numbers no one owns.

A hierarchy that decomposes a top-line metric into the inputs that drive it.
Analytics
005·54

metrics store

A store for the definitions everyone agreed to centralize and continued to override locally.

A centralized layer that defines, computes, and serves business metrics consistently across tools from a single set of definitions.
Architecture
658·40

minimum viable governance

The minimum, identified correctly, then funded at half.

The smallest set of roles, policies, and decision rights needed to manage data risk without stalling delivery.
Governance
006·347

mlops

The operating discipline a model needs, requested only after the model is already in production.

The discipline of versioning, deploying, monitoring, and retraining models with reproducible, owned pipelines.
AI / ML
006·349

model card

The honest paragraph about limitations that the deployment slide forgot to copy across.

A short document stating a model's intended use, training data, evaluation, and known limitations.
AI / ML
006·338

model collapse

The org photocopied the photocopy until 'customer' meant nothing at all.

Progressive degradation when models are trained largely on the outputs of earlier models rather than original data.
AI / ML
006·321

model drift

The model did not change. The world did, and no one was watching the gap.

Degradation of a deployed model's accuracy as live data diverges from the data it was trained on.
AI / ML
006·324

model evaluation

The benchmark the model passes, chosen after seeing which benchmark it passes.

The structured measurement of a model's accuracy, calibration, and failure modes against a held-out reference.
AI / ML
006·323

model monitoring

A panel that shows the model is healthy by reporting every signal except whether it is still right.

Ongoing measurement of a deployed model's inputs, outputs, and accuracy against expected behavior.
AI / ML
006·348

model registry

A list of every model in production except the three a team is quietly running from a notebook.

A catalog of model versions with their lineage, metrics, approvals, and deployment status.
AI / ML
005·27

modern data stack

A stack whose defining feature is the year you bought it.

A loosely coupled set of cloud-native, mostly managed tools — ingestion, warehouse, transformation, BI — assembled around a central store.
Architecture
005·14

natural key

A key the business swore was unique and permanent, right up until the merger, the typo, and the reissued account number.

An identifier drawn from a record's real-world business attributes, as opposed to a system-generated surrogate.
Modeling
005·14

normalization

Storing each fact exactly once so that no one can find it, then joining nine tables to prove it was there all along.

The process of organizing tables to reduce redundancy and dependency, so each fact is stored once and updated in one place.
Modeling
658·40

north star

A fixed point chosen for navigation and consulted by no one steering. It is admired at the offsite and unreachable from the desk, which is rather the point of a star.

A single guiding metric or aim meant to keep distributed teams pointed at the same long-term outcome.
Roles & Theater
005·48

north-star architecture

A destination chosen for its property of never being arrived at.

An aspirational end-state design used to align investment and sequencing toward a shared long-term target.
Architecture
658·40

north-star metric

The metric everyone steers by until the next reorg picks a different star.

A single metric chosen to align an organization on the value it delivers to customers.
Analytics
658·40

offsite

The annual event where a year of deferred decisions is recast as a fresh set of priorities and deferred again, now with a venue.

A multi-day gathering held away from the workplace to set strategy, resolve tension, and produce decisions hard to reach in the daily flow.
Roles & Theater
658·40

okr

Last quarter's aspirations, reformatted as this quarter's data model.

Objectives and Key Results: a goal-setting frame pairing a qualitative aim with measurable outcomes.
Analytics
658·40

okr-as-data

Automating the status update so the goal can drift in real time without anyone typing it.

The practice of wiring OKR key results directly to live metrics so progress updates automatically.
Analytics
658·40

okrs

A quarterly ritual that converts work into numbers, then converts the numbers back into the work that was always going to happen anyway. Graded green by the same team that set them.

Objectives and Key Results: a method for setting an ambitious aim and the few measurable outcomes that would prove it reached.
Roles & Theater
005·75

on-call rotation

The one place the org reliably assigns pipeline ownership: at 3 a.m., to whoever is holding the pager.

A schedule assigning responsibility for responding to pipeline incidents to a named person during a given window.
Pipelines
006·31

ontology

The most rigorous way to model a business while declining to define a single word the business uses today.

A formal specification of concepts, their properties, and the relationships among them within a domain, expressive enough to support inference.
Modeling
658·40

operating model

Called 'federated' when no one wanted to fund 'centralized.'

The chosen arrangement of governance authority across the organization, typically centralized, federated, or hybrid.
Governance
658·40

operating rhythm

A schedule of recurring meetings mistaken for the work they were meant to coordinate. The rhythm is reliable; the operating is pending.

The regular cadence of reviews and decisions by which an organization stays coordinated week to week.
Roles & Theater
005·75

orchestration

The discipline of deciding which job runs first, now practiced by three tools that each believe they are in charge.

The coordinated scheduling and sequencing of pipeline tasks, including dependencies, retries, and execution order.
Pipelines
005·75

orchestrator sprawl

The state of having three tools to schedule jobs and a fourth, undocumented one to schedule the other three.

The accumulation of multiple, overlapping orchestration tools across an organization, each scheduling a subset of jobs with no single source of truth.
Pipelines
658·33

perennial q3 item

An item with the rare distinction of being permanently next.

A planned deliverable that is repeatedly deferred to a future quarter without ever being cancelled.
Roles & Theater
342·08

personally identifiable information

The fields everyone agreed were sensitive, in the one table nobody scanned.

Data that, alone or combined with other data, can be linked to a specific individual.
Privacy & Compliance
005·14

physical data model

What was actually built, which the logical model has agreed never to look at directly.

The implementation-specific model defining tables, columns, types, keys, indexes, and constraints as deployed in a given platform.
Modeling
005·75

pipeline monitoring

A wall of green panels confirming the job ran, and silent on whether it was right.

The instrumentation that tracks pipeline runs, durations, failures, and data volumes, surfacing problems before consumers report them.
Pipelines
005·75

pipeline ownership

A field in the catalog set to the name of someone who left in March.

Named accountability for a pipeline's correctness, uptime, and downstream contracts, including who responds when it fails.
Pipelines
005·56

platform engineering

Building the self-service so thoroughly that no one notices service was never assigned.

The discipline of building and operating internal self-service platforms that abstract infrastructure so product teams can ship faster.
Architecture
005·75

platform migration

Rebuilding every pipeline on the new platform, then keeping the old one running for the four jobs nobody can find.

The project of moving pipelines from one orchestration or processing platform to another, typically motivated by cost, scale, or vendor consolidation.
Pipelines
658·46

poc purgatory

Where a project is funded forever for almost working.

The state in which a proof of concept neither fails nor reaches production, persisting indefinitely as a pilot.
Roles & Theater
005·74

policy exception

A temporary exemption now entering its fourth year of being temporary.

A documented, time-bound waiver permitting a deviation from policy, granted with a remediation date and an accountable approver.
Governance
342·0851

privacy by design

A principle implemented, in practice, as privacy by retrofit, two sprints after the feature shipped.

The practice of building privacy safeguards into systems and processes from inception rather than bolting them on later.
Privacy & Compliance
342·0853

privacy notice

A document written to be technically complete and structurally unreadable, which is treated as the same achievement.

A statement informing individuals what data is collected, why, and how it will be used and shared.
Privacy & Compliance
005·76

proactive data quality

Term struck pending a single budgeted owner — until prevention has a name and a calendar, it is firefighting described in the future tense.

Term struck pending a single budgeted owner — until prevention has a name and a calendar, it is firefighting described in the future tense.
Quality
006·340

prompt engineering

Writing the requirements doc you skipped, one sentence at a time, into a text box.

The practice of structuring instructions, context, and examples to steer a model toward reliable outputs.
AI / ML
006·341

prompt injection

Discovering the assistant trusts the document more than the policy that approved it.

An attack that smuggles adversarial instructions into a model's input so it overrides its intended task.
AI / ML
006·34

prompt library

The folder everyone copies from and no one was assigned to keep working.

A shared, versioned collection of approved prompts for reuse across teams.
AI / ML
344·04

protected health information

Data protected by everyone who assumed someone else was protecting it.

Individually identifiable health data held or transmitted by a covered entity, subject to statutory safeguards.
Privacy & Compliance
342·09

purpose creep

The reason the privacy notice had to be written broad enough to survive the next idea.

The gradual reuse of data for ends beyond the purpose for which it was originally collected and consented.
Privacy & Compliance
342·086

purpose limitation

The requirement that the stated purpose be broad enough to cover any future purpose nobody has thought of yet.

The requirement that data collected for one purpose not be reused for an incompatible one without a fresh basis.
Privacy & Compliance
005·76

quality scorecard

A page that averages six things no one acts on into one number no one questions.

A summary view that aggregates quality measures across dimensions and datasets into scores intended to guide attention and investment.
Quality
005·76

quality sla

A promise about data with a number, a period, and no clause describing what happens when it breaks.

A service-level agreement that commits a data producer to specified quality targets — freshness, completeness, accuracy — over a stated period.
Quality
005·76

quality slo

The target you report against when you would rather not name the agreement you report to.

A service-level objective: an internal, measurable quality target an organization holds itself to, typically expressed as a percentage over a window.
Quality
005·76

quality threshold

A number set just below the current measurement, so the check passes on the day it ships.

The boundary value at which a quality measure is deemed acceptable or unacceptable, separating a passing dataset from a failing one.
Quality
658·40

quick win

A dashboard delivered fast to defer the slow question of who acts on it. The win is quick because the hard part was left out of scope.

A small, visible result delivered early to build momentum and confidence for the larger effort behind it.
Roles & Theater
658·40

raci matrix

A grid that makes authority explicit by listing four people beside a decision none of them can make alone.

A grid mapping who is Responsible, Accountable, Consulted, and Informed for each task, so authority and answerability are explicit.
Roles & Theater
021·19

raci theater

A grid in which every cell is Consulted and no cell is ever Accountable.

The production of a responsibility-assignment matrix as the deliverable, after which the assignments are neither enforced nor revisited.
Roles & Theater
658·40

rag status

Amber, the color of a team that has seen the number and would prefer not to be quoted on it.

A red-amber-green indicator summarizing whether a metric or initiative is on track.
Analytics
342·0853

re-identification

The quiet proof that the word 'anonymized' was doing a great deal of unsupervised work.

The process of matching supposedly anonymous records back to the individuals they describe.
Privacy & Compliance
006·31

real-time dashboard

Term struck pending the definition of the decision its latency serves; until a choice depends on the second, real-time is a refresh rate, not a requirement.

Term struck pending the definition of the decision its latency serves; until a choice depends on the second, real-time is a refresh rate, not a requirement.
Analytics
005·76

reconciliation

The monthly ritual of explaining why two numbers differ, after which both are kept.

The process of comparing two or more datasets that should agree and identifying, explaining, and resolving every difference between them.
Quality
651·5

records management

A lifecycle with a thorough creation stage, a generous storage stage, and a disposition stage observed mainly in theory.

The discipline of controlling records through their lifecycle, from creation to authorized disposition.
Privacy & Compliance
006·39

red-teaming theater

A documented attempt to break the model, scoped to end just before it succeeds.

Adversarial testing of an AI system performed and documented mainly to satisfy a review rather than to change the system.
AI / ML
005·26

reference architecture

A diagram everyone references and no system resembles.

A reusable template capturing recommended structures, patterns, and component choices for a class of systems.
Architecture
658·81

reference customer

The one deployment that worked, available for the call, unavailable for the details.

An existing customer a vendor cites as evidence its product succeeds in production.
Roles & Theater
006·31

reference data

The list of approved values, maintained in a spreadsheet, owned by an analyst who left in 2019.

Standardized sets of permissible values — country codes, currencies, status codes — used to classify and constrain other data.
Modeling
022·07

referential integrity

The guarantee that every reference points somewhere real, enforced everywhere except the warehouse where it was switched off for performance.

The guarantee that every foreign key points to a row that actually exists in the referenced table.
Quality
005·74

refresh cadence

How often the numbers change, set independently of how often anyone looks.

The scheduled interval at which a report's underlying data is reloaded and recomputed.
Analytics
343·07

regulatory reporting

The one report whose definition of 'customer' is finally agreed, because the penalty for drift is statutory.

The periodic submission of mandated data to a regulator in a prescribed format and cadence.
Privacy & Compliance
005·76

remediation

Fixing the rows now and scheduling the cause for a quarter that does not arrive.

The corrective work that brings a defective dataset back within its quality requirements and, ideally, addresses the cause that produced the defect.
Quality
005·76

remediation backlog

The list of things everyone agrees are broken, sorted by how long they have been agreed.

The accumulated queue of identified quality defects awaiting correction, prioritized against competing engineering demands.
Quality
005·74

remediation plan

A plan to fix the finding, due the week after the auditor's next visit.

A documented set of corrective actions, owners, and dates intended to close a finding or resolve a data issue.
Governance
658·84

renewal

The only product review with a hard deadline, run by the one team that never opened the tool.

The point at which a customer decides whether to continue paying for a product for another term.
Roles & Theater
658·40

reorg

The act of redrawing the boxes to avoid defining what goes in them. The ambiguity is conserved; only its address changes.

A restructuring of reporting lines and responsibilities intended to fix how an organization works.
Roles & Theater
005·75

replay

Living the same afternoon again, this time with the join written correctly.

Reprocessing a recorded stream of events from a stored offset to recover from failure or apply corrected logic.
Pipelines
006·31

report sprawl

Forty dashboards answering one question, each slightly, confidently wrong.

The uncontrolled proliferation of overlapping reports and dashboards across teams and tools.
Analytics
006·351

responsible ai

Term struck pending a named owner for each of fairness, accountability, transparency, and review — qualities currently assigned to everyone, which means no one.

Term struck pending a named owner for each of fairness, accountability, transparency, and review — qualities currently assigned to everyone, which means no one.
AI / ML
AI / ML

retrieval augmented generation

A method for making the model read the documents the organization has not finished governing.

A pattern that grounds a model's answer by retrieving relevant context from an external corpus before generation.
AI / ML
006·329

retrieval-augmented generation

A pipeline that retrieves the company's contradictory documents and asks the model to sound certain about all of them.

An architecture that retrieves relevant documents at query time and conditions a model's answer on them.
AI / ML
005·75

retry storm

A pipeline responding to a small problem with the enthusiasm of a much larger one.

A cascading overload in which many failed tasks retry simultaneously, amplifying load on an already-degraded system and prolonging the outage.
Pipelines
005·75

reverse etl

Sending the warehouse's answer back to the system that asked, so the two can be wrong in unison, at last.

The practice of moving modeled data from the warehouse back into operational systems such as CRMs and ad platforms to power workflows.
Pipelines
658·72

rfp

A two-hundred-question instrument for selecting a tool to fix a problem the questions never required anyone to define.

A request for proposal: a structured document inviting suppliers to bid against stated requirements so options can be compared on the merits.
Roles & Theater
342·0858

right to be forgotten

A right honored in production, in staging, in the warehouse, in the lake, in the backups, and in the seventeen exports nobody mapped.

A data subject's right to have personal data erased where there is no overriding lawful ground to retain it.
Privacy & Compliance
651·56

right-sized retention

Term struck pending the definition of the retention schedule and disposition practice that would make any size defensible.

Term struck pending the definition of the retention schedule and disposition practice that would make any size defensible.
Privacy & Compliance
658·40

roadmap

A map drawn for a vehicle that has not been built, on a road that resets each quarter. Everything important lives just past the visible horizon.

A sequenced plan of what an organization intends to deliver and roughly when, used to set expectations and coordinate work.
Roles & Theater
005·76

root cause analysis

The search for the underlying cause, which terminates the moment it reaches a team that can defend itself.

The disciplined investigation that traces a quality failure back from its symptom to the underlying defect in process, system, or definition that produced it.
Quality
658·49

rubber-stamp review

A gate with a hinge and no latch.

An approval step required by process that, in practice, grants approval without substantive scrutiny.
Governance
005·14

scd type 2

The full audit trail of every value the business changed its mind about, kept forever, consulted never.

A slowly-changing-dimension method that preserves history by inserting a new versioned row on each change, with effective-date and current-flag columns.
Modeling
005·75

scheduler

The component that decides when work happens, currently outvoted by a cron entry no one will admit to writing.

A component that triggers pipeline runs based on time intervals, calendar rules, or upstream events.
Pipelines
005·14

schema design

An afternoon of careful structure followed by two years of columns named after the ticket that demanded them.

The deliberate structuring of tables, columns, types, and relationships to represent a domain and serve its access patterns.
Modeling
005·75

schema drift

The source renamed a column and told no one, in the same spirit it once promised the contract was stable.

Unannounced changes to a source's structure — added, removed, or retyped fields — that a downstream pipeline was not designed to accommodate.
Pipelines
005·14

schema evolution

The fossil record of the data model: every layer a column someone added and no one was ever allowed to remove.

The managed change of a schema over time — adding, deprecating, or altering fields — while preserving compatibility for existing consumers.
Modeling
658·40

scorecard

A scorecard that keeps the score but never names the player who lost.

A compact tabular view that tracks a fixed set of metrics against their targets over time.
Analytics
005·14

self-describing schema

Term struck pending the business glossary, owner, and agreed meanings a schema cannot describe about itself.

Term struck pending the business glossary, owner, and agreed meanings a schema cannot describe about itself.
Modeling
006·31

self-service analytics

Unsupported.

A model that lets business users build their own queries and reports without filing a request to a central team.
Analytics
005·74

self-service governance

Self-service, until two of the services certify the same number differently.

A model where analysts build freely within shared definitions, certifications, and review enforced by the platform.
Analytics
006·32

semantic cache

A faster way to return last week's answer to a definition that changed on Tuesday.

A store of prior model responses keyed by meaning, returned to avoid recomputing similar queries.
Architecture
025·49

semantic drift

The slow process by which 'customer' comes to mean six things, none of them on file.

The gradual divergence of a term's working meaning across teams and time, eroding trust in shared data.
Governance
005·74

semantic layer

The place where the company agreed what 'revenue' means, except for the four teams that didn't get the invite.

A modeling tier that maps physical tables to shared business definitions so every tool computes a metric the same way.
Analytics
005·74

semantic model

The layer where the business finally agrees what 'revenue' means, defined four times to match four dashboards.

A layer that maps physical tables to business concepts, metrics, and dimensions so consumers query meaning rather than raw schema.
Modeling
006·332

semantic search

Search that understands the question better than the org understands the answer.

Retrieval that ranks results by learned meaning similarity rather than exact keyword match.
AI / ML
005·45

serving layer

The exit, drawn as a layer to balance the door.

The tier that exposes prepared data to consumers — applications, reports, APIs — optimized for fast, concurrent reads.
Architecture
006·350

shadow ai

The actual deployment, running one floor below the AI strategy describing it.

Models or assistants adopted by teams outside any inventory, review, or governance process.
AI / ML
005·54

shadow analytics

The numbers the business actually uses, kept in the one place governance agreed not to look.

Reporting and metrics produced outside the governed platform, typically in spreadsheets, that inform real decisions.
Analytics
005·86

shadow copy

The copy that the deletion request, the lineage diagram, and the retention schedule all politely declined to know about.

An unmanaged duplicate of governed data, created outside official systems and absent from the inventory.
Privacy & Compliance
005·75

shadow pipeline

The pipeline the org does not know it depends on, discovered the week the person who ran it stops answering.

An undocumented pipeline built outside the governed platform, often a spreadsheet macro or personal script, that feeds production reporting.
Pipelines
006·31

single pane of glass

One pane of glass over eleven dashboards — the same view, now load-bearing and impossible to close.

A unified interface that consolidates metrics from many systems into one consistent view.
Analytics
005·74

single source of truth

Term struck pending the definitions of ownership, metric, and certification it depends on; without them the source is single but the truth is plural.

Term struck pending the definitions of ownership, metric, and certification it depends on; without them the source is single but the truth is plural.
Analytics
005·76

single version of the truth

The truth that everyone agreed to, and then maintained a personal copy of, just in case.

The architectural goal of one authoritative, agreed source for each business fact, eliminating conflicting figures across reports and systems.
Quality
005·14

slowly changing dimension

An attribute the business called permanent in the requirements meeting and changed twice before launch.

A dimension whose attribute values change over time, managed by a defined policy for whether history is overwritten, versioned, or preserved.
Modeling
005·14

snowflake schema

What a star schema becomes after each team adds the one extra table that only their report needs.

A star schema whose dimensions are normalized into additional related tables, reducing redundancy at the cost of more joins.
Modeling
658·40

stakeholder

A title that grants a seat at the review and immunity from the result. Every stakeholder holds the stake; no stakeholder holds the decision.

A person or group with a legitimate interest in an outcome, whose needs a decision must account for.
Roles & Theater
005·76

staleness

The state a dashboard reaches quietly, while still rendering its last good number with full confidence.

The condition of data that has aged past its acceptable freshness window and may no longer reflect the current state of the world.
Quality
005·14

star schema

A clean diagram at the center of the deck, surrounded on all sides by the joins that did not fit on the slide.

A dimensional design in which a central fact table joins directly to denormalized dimension tables, forming a star-shaped query path.
Modeling
658·40

steering committee

A body that steers by approving the direction already taken.

A senior body that sets governance direction, approves funding, and removes organizational obstacles for the program.
Governance
658·40

stewardship RACI

Three columns fully staffed; the Accountable column reserved for the hire the org is still recruiting.

A matrix mapping who is Responsible, Accountable, Consulted, and Informed for each governance activity, designed to remove ambiguity from ownership.
Governance
658·40

stewardship coverage

Reported at ninety percent because the unassigned ten percent had no one to report it.

The proportion of critical data elements or domains that have a named, active steward assigned.
Governance
005·32

storage and compute separation

Two invoices where there used to be one.

A design that decouples persistent storage from processing resources so each can scale and be billed independently.
Architecture
658·40

strategic priority

An item declared a top priority alongside the other nine top priorities. Priority is a count of one; the plural is the tell.

An aim ranked high enough to attract scarce time, money, and attention ahead of competing work.
Roles & Theater
005·75

stream processing

Real-time delivery of a number no downstream meeting is scheduled to act on for another quarter.

Processing data continuously as unbounded events arrive, optimizing for low latency over batch throughput.
Pipelines
005·40

streaming architecture

Real-time delivery of a number no one has agreed how to define yet.

A design that processes data continuously as it arrives, supporting low-latency analytics and reaction over unbounded streams.
Architecture
342·087

subprocessor

A name on page nine of a list nobody re-reads, who in turn keeps a list of their own that you will never see.

A third party engaged by a processor to carry out specific processing on the controller's behalf.
Privacy & Compliance
005·14

surrogate key

A number we invented so we would stop arguing about which real-world thing it points to. The argument moved one table over.

A system-generated identifier with no business meaning, used as a stable primary key independent of changing natural attributes.
Modeling
006·31

survivorship

The logic that picks which of three addresses is real, optimized so the survivor is always the one entered most recently and least correctly.

The rules that decide which attribute value wins when merging duplicate records into a single golden record.
Modeling
005·76

survivorship rule

The logic that picks which department was right, then routes the complaint to the other one.

A defined policy that decides which value wins when multiple sources disagree about the same attribute of an entity.
Quality
658·40

synergy

The named benefit of a merger, claimed before either side has agreed what customer means in their respective tables.

The combined effect of two efforts exceeding the sum of their separate results.
Roles & Theater
006·337

synthetic data

Invented to fill a gap, and shaped exactly like the assumption that left it.

Artificially generated records used to augment or stand in for scarce or sensitive real data.
AI / ML
005·76

system of record

Term struck pending the definitions of accuracy, ownership, and reconciliation it claims to settle — a record is not authoritative because it was named first.

Term struck pending the definitions of accuracy, ownership, and reconciliation it claims to settle — a record is not authoritative because it was named first.
Quality
658·45

take it offline

The polite retirement of the only question that mattered, to a venue that is never named and a time that never comes.

To move a detailed or contentious topic out of a meeting into a smaller venue with the right people.
Roles & Theater
005·51

target-state architecture

A state the organization is permanently transitioning toward, on a slide updated quarterly.

The documented future-state design an organization intends to reach, against which current-state gaps and a transition plan are measured.
Architecture
005·75

task dependency

The reason the report is late, traced backward through eleven tasks to a file someone renamed.

A declared relationship requiring one pipeline task to complete successfully before a downstream task begins.
Pipelines
006·31

taxonomy

A tree of categories the steering committee approved in March and routed around by April.

A hierarchical classification scheme that arranges concepts into broader and narrower categories with defined membership.
Modeling
005·75

temporary job

Term struck pending the removal date promised at creation, now three years past and load-bearing in four reports.

Term struck pending the removal date promised at creation, now three years past and load-bearing in four reports.
Pipelines
658·40

the abstract owner

Everyone agreed it was owned. No one could say by whom. The asset has a guardian the way the weather has a manager.

A placeholder of accountability assigned to a team, function, or committee rather than a person who can be reached on a Tuesday.
Roles & Theater
658·00

the business

The party to whom accountability is assigned precisely because it is not a person. The requirement came from the business, was approved by the business, and was owned, in the end, by no one who answers to that name.

The operating side of an organization that uses data to make and sell things, as distinct from the functions that supply and govern that data.
Roles & Theater
005·14

third normal form

A standard the schema fully met on the day of the design review and has been drifting away from ever since.

A normalization level in which every non-key attribute depends on the key, the whole key, and nothing but the key.
Modeling
342·087

third-party data sharing

A defined legal basis covering the vendor, the vendor's vendor, and the analytics tag nobody put in the contract.

The disclosure of personal data to an external party under a defined legal basis and contract.
Privacy & Compliance
658·45

thought leadership

Term struck pending the discovery of either the thought or the leadership; the field reports only the hyphen.

Term struck pending the discovery of either the thought or the leadership; the field reports only the hyphen.
Roles & Theater
021·53

tiger team

The same five people, summoned again, because the org has confused availability with capacity.

A small group pulled from their regular duties to resolve an urgent, high-visibility problem under compressed timelines.
Roles & Theater
005·76

timeliness

The window between when the data was right and when anyone looked at it.

The degree to which data is available within the window required for its intended decision or process.
Quality
006·354

token

The smallest thing in the program with a clearly assigned cost and owner.

The sub-word unit a model reads and generates, and the unit by which usage is billed.
AI / ML
005·824

tokenization

A reversible privacy control whose entire safety rests on a vault that three integrations have read access to 'temporarily.'

Substituting a sensitive value with a non-sensitive token, with the mapping held in a separately controlled vault.
Privacy & Compliance
006·38

tool sprawl

Forty tools the agent may call, eleven that do the same thing, none that anyone will deprecate.

The accumulation of overlapping tools an agent can call, added without a registry of what already exists.
AI / ML
658·45

town hall

A broadcast styled as a conversation: the questions are pre-submitted, the hard one is read aloud, and the answer is taken offline, where it remains.

An all-hands gathering where leadership shares direction and the wider organization can ask questions.
Roles & Theater
006·316

training data

Whatever was easiest to export, now load-bearing.

The labeled or observed examples a model learns from, whose coverage and quality bound everything the model can do.
AI / ML
005·75

transformation

Where the business logic lives, defined by whoever happened to write the join and unavailable for comment.

The step in a pipeline that applies business logic to reshape, clean, join, or aggregate raw data into a form fit for analysis.
Pipelines
658·40

tribal knowledge

The real documentation, sharded across the people most likely to resign.

Operational understanding held informally in people's heads rather than in documentation or systems.
Roles & Theater
005·76

trusted dataset

A dataset trusted because it was labeled trusted, by the team that did the labeling.

A dataset designated as reliable for decision-making on the basis of documented quality controls, lineage, and ownership.
Quality
005·76

uniqueness

The dimension that is perfect right up to the first merger.

The degree to which each real-world entity is represented by exactly one record, with no unintended duplicates.
Quality
005·14

universal data model

Term struck pending the definition of a single business term it claims to model for every business at once.

Term struck pending the definition of a single business term it claims to model for every business at once.
Modeling
005·76

validation

Confirming the data is the right shape, and inferring nothing about whether it is true.

The act of checking data against defined rules at the point of entry, transformation, or load to prevent invalid records from propagating.
Quality
005·76

validity

Proof that a value followed the rules, and no proof at all that the rules described anything true.

The degree to which a value conforms to its defined format, type, range, or domain of allowed values.
Quality
658·40

vanity funnel

A funnel widened at the top until the conversion rate finally cleared the target.

A conversion funnel whose stages are defined to flatter the figure rather than to model the buyer.
Analytics
658·40

vanity metric

A number that goes up to make a slide feel better and a quarter look busier.

A figure that reliably trends upward but does not inform any decision or change any behavior.
Analytics
005·76

vanity quality metric

The metric that goes up while trust goes down.

A quality measure chosen because it reliably reads well rather than because it reflects fitness for use or drives any decision.
Quality
006·331

vector database

The new system of record for documents no one would govern in the old system of record.

A store optimized to index embeddings and return nearest-neighbor matches for similarity search.
AI / ML
658·80

vendor demo

A film in which your problem is played by a stand-in with clean data, good lighting, and a definition of customer everyone in the scene shares.

A guided presentation in which a supplier shows its product solving a problem, ideally one the buyer actually has.
Roles & Theater
025·49

vocabulary court

The hearing where five teams arrive certain that 'active user' is obvious and leave with six definitions.

An informal review forum where conflicting definitions of a contested term are reconciled into one agreed entry.
Governance
658·40

war room

The room convened at the cost of the ownership that would have prevented the incident. The reality tax, billed in conference-room hours.

A temporary, intensive gathering of the right people to resolve a critical problem in real time.
Roles & Theater
005·75

watermark

A single stored number that is the difference between an incremental load and a four-hour incident.

A tracked marker, often a timestamp or sequence value, recording how far a pipeline has processed so the next run knows where to resume.
Pipelines
658·40

working group

A standing invitation that survives the problem it was formed to solve, named for the one activity it most reliably defers.

A small cross-functional team convened to make progress on a specific problem between formal decisions.
Roles & Theater
044·04

write-audit-publish

A pattern that writes, audits, and publishes, in which the audit step is a boolean permanently pinned to true.

A pattern that writes data to a staging area, runs quality checks, and publishes only if those checks pass.
Architecture
005·43

zero-copy architecture

Zero copies of the data and the same number of agreements about what it means.

A design that grants access to data in place — through sharing, views, or references — without duplicating it across systems.
Architecture
044·09

zero-copy clone

A free instant copy of production, of which there are now forty, each owned by someone who has left.

A storage feature that creates an independent, writable copy of a dataset by sharing underlying data until it is changed.
Architecture
005·76

zero-defect data

Term struck pending the definition of fitness-for-use it depends on — there is no defect without a stated requirement to violate.

Term struck pending the definition of fitness-for-use it depends on — there is no defect without a stated requirement to violate.
Quality
005·75

zero-etl

Term struck pending an account of where the transform went, who owns the source schema now, and what happens to the join when it drifts.

Term struck pending an account of where the transform went, who owns the source schema now, and what happens to the join when it drifts.
Pipelines