A new initiative for developing third-party model evaluations

Company

Anthropic

Date Published

July 1, 2024

Author

Word count

1877

Language

English

Hacker News points

URL

www.anthropic.com/news/a-new-initiative-for-developing-third-party-model-evaluations

Summary

A new initiative has been launched to fund third-party model evaluations, aiming to develop high-quality safety-relevant evaluations for assessing AI capabilities and risks. The focus areas include AI Safety Level assessments, advanced capability and safety metrics, and infrastructure tools for developing evaluations. Evaluations will be prioritized based on their ability to measure real-world risks accurately, with a particular emphasis on cybersecurity, CBRN risks, model autonomy, social manipulation, misalignment risks, advanced science, harmfulness and refusals, multilingual capabilities, societal impacts, and infrastructure development. The initiative aims to provide valuable tools that benefit the entire AI ecosystem, with a focus on collaboration and iteration to refine evaluations for maximum impact.