006·39

red-teaming theater

/ˈrɛd ˌtiːmɪŋ ˈθɪətər/ - n.

1 [colloq.] A documented attempt to break the model, scoped to end just before it succeeds.Keep. Punchy.This is the problem.

Working definition

2. Adversarial testing of an AI system performed and documented mainly to satisfy a review rather than to change the system.

Filed

See also

guardrailsRules that enforce a policy, written for the policy the org has not yet enforced anywhere else.
model evaluationThe benchmark the model passes, chosen after seeing which benchmark it passes.
responsible aiTerm struck pending a named owner for each of fairness, accountability, transparency, and review — qualities currently assigned to everyone, which means no one.