Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Joe_Carlsmith

Comments 7

Sorted by

New & upvoted

Hey there!

And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I'd encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there's a strong reason for a system optimizing those objectives to get to the states that give them many more options.

After spending a lot of time on understanding that work, my impression is that the main theorems in the paper are very complicated and are limited in ways that were not reasonably explained. (To the point that, probably, very few people understand the main theorems and what environments they are applicable for, even though the work has been highly praised within the AI alignment community).

RyanCarey

Have you explained your thoughts somewhere? It'd more productive to hash out the disagreement rather than generically casting shade!

Ofer

Thanks, you're right. There's this long thread, but I'll try to explain the issues here more concisely. I think the theorems have the following limitations that were not reasonably explained in the paper (and some accompanying posts):

The theorems are generally not applicable for stochastic environments (despite the paper and some related posts suggesting otherwise).
The theorems may not be applicable if there are cycles in the state graph of the MDP (other than self-loops in terminal states); for example:
- The theorems are not applicable in states from which a reversible action can be taken.
- The theorems are not applicable in states from which only one action (that is not POWER-seeking) allows to reach a cycle of a given length.

I'm not arguing that the theorems don't prove anything useful. I'm arguing that it's very hard for the readers of the paper (and some accompanying posts) to understand what the theorems actually prove. Readers need to understand about 20 formal definitions that build on each other to understand the theorems. I also argue that the lack of explanations about what the theorems actually prove, and some of the informal claims that were made about the theorems, are not reasonable (and cause the theorems to appear more impressive). Here's an example for such an informal claim (taken from this post):

Not all environments have the right symmetries

But most ones we think about seem to

Charles He

To onlookers, I want to say that:

This isn't exactly what Ofer is complaining about, but one take on the issue, that math can be overstated, poorly socialized, misleading or overbearing, is a common critique in domains that use a lot of applied math (theoretical econ, interdisciplinary biology) that borrows from pure math, physics, etc.
- It depends on things (well, sort of your ideology, style, and academic politics TBH) but I think the critique can often be true.
Although to be fair, this particular one critique seems much more specific and it seems like Ofer might be talking past Alex Turner and his meaning (but I have no actual idea of the math or the claims)
The tone of the original post is pretty normal or moderate, and isn't "casting shade".
- but it might be consistent with issues like:
  - this person has some agenda that is unhelpful and unreasonable;
  - they are just a gadfly;
  - they don't really "get it" but know enough to fool themselves and pick at things forever.
- But these issues apply to my account too. I think the tone is pretty good to me.

Jeremy

I noticed a typo in the transcript that is pretty confusing. Probably important to fix, since this article is being used in several curriculum for AI alignment.

"power is useful for loss of objectives"

should be

"power is useful for lots of objectives"

Michael St Jules 🔸

Thanks for sharing this! I think this is the best exposition of the arguments I've come across so far: transparent and explicit about assumptions, good illustrative examples, reasonably accessible (at least to me), and short and to the point.

Andrea_Miotti

Thanks for sharing the presentation, great work!

Regarding the third question from the audience, "What kind of resource could we share to a random person on the street if we want to introduce them to AI x-risk?", in addition to the resources you mention I think Stuart Russel's 2021 BBC Reith Lectures Series, "Living with Artificial Intelligence", is an excellent introduction for a generalist audience.

In addition to being accessible, the talks have the institutional gravitas of being from a prestigious lecture series from the BBC and an established academic, which makes them more likely to convince a generalist audience.

Comments