AI evaluations and standards (or "evals") are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or "safety evaluations").Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it's safe to deploy new models, and can help with AI governance and regulation.

Further reading

Lesswrong (2023) AI Evaluation posts

Karnofsky, Holden (2022) Racing through the minefield, Cold Takes, December 22.

Karnofsky, Holden (2022) AI Safety Seems Hard to Measure, Cold Takes, December 8.

Alignment Research Center (2023) Evals: A project of the non-profit Alignment Research Center focused on evaluating the capabilities and alignment of advanced ML models

Barnes, Beth (2023) Safety evaluations and standards for AI, EAG Bay Area, March 20.

Related entries