AI Safety Bounties

PatrickL

AI Safety Bounties

PatrickL

8 min readAug 24, 2023

Comments 2

Sorted by

New & upvoted

JakubK

Note that in the US, the National Defense Authorization Act (NDAA) for FY2024 might direct the Secretary of Defense to establish an AI bug bounty program for "models being integrated into Department of Defense missions and operations." Here is the legislative text.

PatrickL

Good find, thanks! I'm not very keen on instructing teams to run bug bounties and not other mechanisms, so am not particularly enthusiastic about this.

It looks like this would focus on infosecurity of the AI systems being used (i.e. can this weapon's AI be hacked?) rather than testing for potential vulnerabilities from the AI systems themself.

Comments

More from the author

204

Scoring forecasts from the 2016 “Expert Survey on Progress in AI”

PatrickL·3y ago·Curated 3y ago·10m read

Exposure to 3m Pointless viewers- what to promote?

PatrickL·4y ago·1m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·1w ago·Curated 1d ago·22m read

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·3d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Maybe do the thing you wish CEA would do

alejoacelas 🔸·18h ago·2m read

I used AI to fix transcription errors, rerrarange the ideas, and suggest tweaks to the title and some sentences. Three of the most exciting projects to come out of EA in recent years are, in a vague sense, CEA spinouts: * Kairos is directly a spinout of CEA and now handles most support for university AI safety groups. Basically everyone I've found who knows them is really excited about what they do * NEST is an opinionated ideas-fi...

Recent opportunities to take action

Announcing the Safe Pareto Improvements (SPI) Fundamentals Program

Center on Long-Term Risk, Anthony DiGiovanni 🔸, Santeri T 🔹·9h ago·3m read

RP is looking for project founders in neglected animal areas

Rethink Priorities·1d ago·7m read

Effective petitions (July 2026)

Stijn Bruers 🔸·7h ago·1m read

	1. Evals-based	2. Subjectively judged, organized by labs	3. Trusted bug hunters test private systems
Target systems	A wide range of AI systems – preferably with the system developers’ consent and buy-in	Testing of a particular AI model – with its developer’s consent and engagement	Testing of a particular AI model – preferably with its developer’s consent and buy-in
Prize criteria	Demonstrate (potentially dangerous) capabilities beyond those revealed by testers already partnering with labs, such as ARC Evals	Convince a panel of experts that the issue is worth dedicating resources toward solving. or Demonstrate examples of behaviors which the AI model’s developer attempted to avoid through their alignment techniques.	A broad range of criteria is possible (including those in the previous two models).
Disclosure model – how private are submissions?	Coordinated disclosure (Organizers default to publishing all submissions which are deemed safe)	Coordinated disclosure	Coordinated- or non-disclosure
Participation model	Public	Public	Invite only
Access level	Public APIs	Public APIs	Invited participants have access to additional resources – e.g., additional non-public information or tools within a private version of the API
Who manages the program	Evals organization (e.g., ARC Evals), a new org., or an existing platform (e.g., HackerOne).	AI organization, or a collaboration with an existing bounty platform (e.g., HackerOne).	AI organization, or a collaboration with an existing bounty platform (e.g., HackerOne).
Program duration	Ongoing	Ongoing	Time-limited
Prize scope (how broad are the metrics for winning prizes)	Targeted	Expansive	Medium
Financial reward per prize	High (up to $1m)	Low (up to $10k)	Medium (up to $100k)
Pre- or post- deployment	Post-deployment	Post-deployment	Potentially pre-deployment

AI Safety Bounties

AI Safety Bounties

Short summary

Long summary

Introduction and bounty program recommendations

Why and how to run AI safety bounties

Other recommended practices for bounty organizers

Acknowledgements