Skip to content
Vol. I · No. 251
Mon · 8 Jun
A Daily Lexicon of Trustworthy Data
The Lexicon

006·326

benchmark saturation

/ˈbɛntʃmɑːrk ˌsætʃəˈreɪʃən/ - n.

1 [colloq.] Everyone got an A, so the test resigned.Keep. Punchy.This is the problem.

Working definition

2. The point at which models score near-perfectly on a benchmark, so it no longer distinguishes their real capability.

Filed
See also
  • ai maturityTerm struck: cannot be measured until the field defines the quality, ownership, and monitoring the score is meant to rate.
  • eval harnessA scoreboard everyone trusts and no one has read the rules of.
  • ground truthOne analyst's spreadsheet, promoted to truth because no one else volunteered.
  • model evaluationThe benchmark the model passes, chosen after seeing which benchmark it passes.