“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·5d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

Amsterdam Insect Protest

Bentham's Bulldog·2d ago·3m read

(Please share with your friends in the Netherlands). On August 5, in Amsterdam in the Netherlands, there’s a protest that could help shut down the world’s largest insect farm. It’s at Teleportboulevard 105 1043 EJ Amsterdam, Netherlands. Read more at this link. There’s also a zoom call on Wednesday 29th July at 12:30pm (CEST) with more info. This...

Recent opportunities to take action

Blog Revival Project

Austin, Carol N·17h ago·2m read

Amsterdam Insect Protest

Bentham's Bulldog·2d ago·3m read

Job: Executive Director of CEEALAR (EA Hotel)

CEEALAR·1d ago·3m read

Thanks to Evan Hubinger for discussion, here. ↩︎
This is a point emphasized, for example, by proponents of "shard theory" – see e.g. this summary. ↩︎
Though note that "autopilot" can still encode a non-sphex-ish policy. ↩︎
This is a point made in an entry to the Open Philanthropy worldviews contest which, to my knowledge, remains unpublished. ↩︎
I'm adapting this example from one suggested to me in conversation with Paul Christiano. ↩︎
Though one can imagine cases where, after a takeover, a schemer continues executing these heuristics to some extent, at least initially, because it hasn't yet been able to fully "shake off" all that training. And relatedly, cases where these heuristics etc play some ongoing role in shaping the schemer's values. ↩︎
Plus we're positing additional claims about training-gaming being a good instrumental strategy because it prevents goal-modification and leads to future escape/take-over opportunities, which feels additionally conjunctive. ↩︎

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

"Clean" vs. "messy" goal-directedness

Does scheming require a higher standard of goal-directedness?