Expanding our model safety bug bounty program

Company

Anthropic

Date Published

Aug. 8, 2024

Author

Word count

681

Language

English

Hacker News points

None

URL

www.anthropic.com/news/model-safety-bug-bounty

Summary

We're expanding our model safety bug bounty program to identify and mitigate universal jailbreak attacks, which are exploits that allow consistent bypassing of AI safety guardrails across a wide range of areas, including critical domains such as CBRN and cybersecurity. The new initiative will test our next-generation system for AI safety mitigations in a controlled environment before its public deployment, offering bounty rewards up to $15,000 for novel attacks. We're inviting interested researchers to apply to the program and work with us to strengthen AI safety in high-risk areas, aligning with commitments signed by other AI companies to develop responsible AI.