Skip to content
Vol. I · No. 251
Mon · 8 Jun
A Daily Lexicon of Trustworthy Data
The Lexicon

006·325

eval harness

/ɪˈvæl ˈhɑːrnəs/ - n.

1 [colloq.] A scoreboard everyone trusts and no one has read the rules of.Keep. Punchy.This is the problem.

Working definition

2. The reusable apparatus that runs a model against fixed test cases and records scored outputs.

Filed
See also
  • benchmark saturationEveryone got an A, so the test resigned.
  • ground truthOne analyst's spreadsheet, promoted to truth because no one else volunteered.
  • model evaluationThe benchmark the model passes, chosen after seeing which benchmark it passes.
  • model registryA list of every model in production except the three a team is quietly running from a notebook.