This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan?
Summary
Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions.
Public Opinion...
I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...
This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...
Congrats to the prizewinners!
Folks thinking about corrigibility may also be interested in the paper "Human Control: Definitions and Algorithms", which I will be presenting at UAI next month. It argues that corrigibility is not quite what we need for a safety guarantee, and that (considering the simplified "shutdown" scenario), instead we should be shooting for "shutdown instructability".
Shutdown instructability has three parts. The first is 1) obedience - the AI follows an instruction to shut down. Rather than requiring the AI to abstain from manipulating the human, as corrigibility would traditionally require, we need the human to maintain 2) vigilance - to instruct shutdown when endangered. Finally, we need the AI to behave 3) cautiously, in that it is not taking risky actions (like juggling dynamite) that would cause a disaster to occur once it is shut down.
We think that vigilance (and shutdown instructability) is a better target than non-manipulation (and corrigibility) because:
Given all of this, it seems to us that in order for corrigibility to seem promising, we would need it to be argued in some greater detail that non-manipulation implies vigilance - that the AI refraining from intentionally manipulating the human would be adequate to ensure that the human can come to give adequate instructions.
Insofar as we can't come up with such justification, we should think more directly about how to achieve obedience (which needs a definition of "shutting down subagents"), vigilance (which requires the human to be able to know whether it will be harmed), and caution (which requires safe-exploration, in light of the human's unknown values).
Hope the above summary is interesting for people!