Success without dignity: a nearcasting story of avoiding catastrophe by luck

Holden Karnofsky

Comments 3

Sorted by

New & upvoted

Thanks for laying out these points! After having engaged with many people's thoughts on these issues I'm similarly unconvinced about the very unfavourable odds many people seem to assign, so I really look forward to the discussion here.

I'm particularly curious about this point, because when I think about AI risk scenarious I put quite some stock into the potential for very direct government interventions when the risks are more obvious and more clearly a near term problem:

Some decisive demonstration of danger is achieved, and AIs also help to create a successful campaign to persuade key policymakers to aggressively work toward a standards and monitoring regime. (This could be a very aggressive regime if some particular government, coalition or other actor has a lead in AI development that it can leverage into a lot of power to stop others’ AI development.)

AI seems to me to already clearly be among the top priorities for geopolitical considerations for the US, and it seems like when this is the case the space of options is fairly unrestricted.

Vasco Grilo🔸

Thanks for the post!

More broadly, it seems to me like essentially all attempts to make the most important century go better also risk making it go a lot worse, and for anyone out there who might’ve done a lot of good to date, there are also arguments that they’ve done a lot of harm (e.g., by raising the salience of the issue overall).
Even “Aligned AI would be better than misaligned AI” seems merely like a strong bet to me, not like a >95% certainty, given what I see as the appropriate level of uncertainty about topics like “What would a misaligned AI actually do, incorporating acausal trade considerations and suchlike?”; “What would humans actually do with intent-aligned AI, and what kind of universe would that lead to?”; and “How should I value various outcomes against each other, and in particular how should I think about hopes of very good outcomes vs. risks of very bad ones?”
To reiterate, on balance I come down in favor of aligned AI, but I think the uncertainties here are massive - multiple key questions seem broadly “above our pay grade” as people trying to reason about a very uncertain future.

I really like these points. It is often easy to forget how uncertain is the future.

Michael_Cohen

How “natural” are intended generalizations (like “Do what the supervisor is hoping I’ll do, in the sense that most humans would mean this phrase rather than in a precise but malign sense”) vs. unintended ones (like “Do whatever maximizes reward”)?

I think this is an important point. I consider the question in this paper, published last year at AI Magazine. See the "Competing Models of the Goal" section, and in particular the "Arbitrary Reward Protocols" subsection. (2500 words)

I think there's something missing from the discussion here, which the key point of that section.First, I claim that sufficiently advanced agents will likely need to engage in hypothesis testing between multiple plausible models of what worldly events lead to reinforcement, or else they would fail at certain tasks. So even if the "intended generalization" is a quite bit more plausible to the agent than the unintended one, as long as it is cheap to test them, and as long as it has a long horizon, it would likely deem wireheading to be worth trying out, just in case. That said, in some situations (I mention a chess game in the paper) I expect the intended generalization to be so much simpler that it isn't even worth trying out.

Just a warning before you read it, I use the word "reward" a bit differently than you appear to. In my terminology, I would phrase is this as "Do what the supervisor is hoping" vs. "Do whatever maximizes the relevant physical signal", and the agent would essentially wonder which of the two constitutes "reward", rather than being a priori sure that its past rewards "are" those physical signals.

Comments

More from the author

135

Responsible Scaling Policy v3

Holden Karnofsky·4mo ago·43m read

644

Some comments on recent FTX-related events

Holden Karnofsky·3y ago·5m read

529

EA is about maximization, and maximization is perilous

Holden Karnofsky·3y ago·8m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 5d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

122

Let's taboo the V-word

lincolnq·2d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

Recent opportunities to take action

Success without dignity: a nearcasting story of avoiding catastrophe by luck

The initial alignment problem

Basic countermeasures

The deployment problem

Some objections to this picture

Success without dignity

Notes