Skip to content
Vol. I · No. 251
Mon · 8 Jun
A Daily Lexicon of Trustworthy Data
The Lexicon

006·37

llm-as-judge

/ˌɛl ɛl ˈɛm æz dʒʌdʒ/ - n.

1 [colloq.] Outsourcing whether the answer is good to a second system that also does not know.Keep. Punchy.This is the problem.

Working definition

2. Using one language model to score another model's outputs against a rubric, in place of human grading.

Evidence
See also
  • eval setThe exam the model is allowed to study before every retake.
  • ground truthOne analyst's spreadsheet, promoted to truth because no one else volunteered.
  • model evaluationThe benchmark the model passes, chosen after seeing which benchmark it passes.