006·37
llm-as-judge
/ˌɛl ɛl ˈɛm æz dʒʌdʒ/ - n.
1 [colloq.] Outsourcing whether the answer is good to a second system that also does not know.Keep. Punchy.This is the problem.
Working definition
2. Using one language model to score another model's outputs against a rubric, in place of human grading.