Skip to content
Vol. I · No. 251
Mon · 8 Jun
A Daily Lexicon of Trustworthy Data
The Lexicon

006·324

model evaluation

/ˈmɒdəl ɪˌvæljuˈeɪʃən/ - n.

1 [colloq.] The benchmark the model passes, chosen after seeing which benchmark it passes.Keep. Punchy.This is the problem.

Working definition

2. The structured measurement of a model's accuracy, calibration, and failure modes against a held-out reference.

Evidence
See also
  • eval harnessA scoreboard everyone trusts and no one has read the rules of.
  • ground truthOne analyst's spreadsheet, promoted to truth because no one else volunteered.
  • hallucinationThe model answering a question the organization had also never answered, only faster.
  • model driftThe model did not change. The world did, and no one was watching the gap.