*Up to $500 for alignment contest ideas*
Olivia Jimenez and I are composing questions for an AI alignment talent search contest. We want to use (or come up with) a frame of the alignment problem that is accessible to smart high schoolers/college students and people without ML backgrounds.
$20 for links to existing framings of the alignment problem (or subproblems) that we find helpful.
$500 for coming up with a new framing that meets our criteria or that we use (see below for details; also feel free to send us a FB message if you want to work on this and have questions).
We’ll also consider up to $500 for anything else we find helpful.
-- More context --
We like Eliezer’s strawberry problem: How can you get an AI to place two identical (down to the cellular but not molecular level) strawberries on a plate, and then do nothing else?
Nate Soares noted that the strawberry problem has the quality of capturing two core alignment challenges: (1) Directing a capable AGI towards an objective of your choosing and (2) Ensuring that the AGI is low-impact, conservative, shutdownable, and otherwise corrigible.
We also imagine if we ask someone this question and they *notice* these challenges are what makes the problem difficult, and maybe come at the problem from an interesting angle as a result, that’s a really good signal about their thinking.
However, we worry if we ask exactly this question in a contest, people will get lost thinking about AI capabilities, molecular biology, etc. We also don’t like that there aren’t many impressive answers besides full answers to the alignment problem. So, we want to come up with a similar question/frame that is more contest-friendly.
Ideal criteria for the question/frame (though we can imagine great questions not meeting all of these):
- It can be explained in a few sentences or pictures.
- It implicitly gets at one or more core challenges of the alignment problem.
- It is comprehensible to smart high schoolers/college students and not easily misunderstood. (Ideally the question can be visualized.)
- People don’t need an ML background to understand or answer the question.
- There are good answers besides solving the entire alignment problem.
- Answers might reveal people’s abilities to notice the hard parts of the alignment problem, avoid assuming these hard parts away, reason clearly, rule out bad/incomplete solutions, think independently, and think creatively
- People could write a response in under a few hours or several hundred words.
More examples we like:
- ARC’s Eliciting Latent Knowledge Problem, because it has clear visuals, is approachable to people without ML backgrounds, doesn’t bog people down in thinking about capabilities, and encourages people to demonstrate their thought process (with builder/breaker moves). Limitations: It’s long, it usually takes a long time to develop proposals, and it focuses on how ARC approaches alignment.
- The Sorcerer’s Apprentice Problem from Disney’s Fantasia, because it has clear visuals, is accessible to quite young people and can be understood quickly, and might get people out of the headspace of ML solutions. Limitations: The connection to alignment is not obvious without a lot of context, and the magical/animated context might give people an impression of childishness.