I was listening to https://80000hours.org/podcast/episodes/paul-christiano-ai-alignment-solutions recently, and found it very helpful. However, I still have a question at the end of it.

What is the plan that organizations like OpenAI have to prevent bad outcomes from AGI? From how Paul Christiano frames it, it seems like it's "create AGI, and make sure it's aligned."

But I don't understand how this plan accounts for competition. To use a stupid analogy, if I was concerned that cars in America weren't safe, I might start my own car company to manufacture and sell safer cars. Maybe I spend a lot of time engineering a much safer car. But my efforts would be for naught if my cars weren't very popular (and hence my company wasn't very successful), even if they were groundbreakingly safe.

It seems like this latter part is most of the trick, at least in the domain of cars.

I'd like to understand in more detail how this analogy breaks down. I can imagine several ways, but would love to hear it direct from the horse's mouth.




Sorted by Click to highlight new comments since:

Paul Christiano has a notion of competitiveness, which seems relevant. Directions and desiderata for AI control seems to be the the place it's stated most clearly.

The following quote (emphasis in the original) is one of the reasons he gives for desiring competitiveness, and seems to be in the same ballpark as the reason you gave:

You can’t unilaterally use uncompetitive alignment techniques; we would need global coordination to avoid trouble. If we _don’t know how to build competitive benign AI, then users/designers of AI systems have to compr_omise efficiency in order to maintain reliable control over those systems. The most efficient systems will by default be built by whoever is willing to accept the largest risk of catastrophe (or perhaps by actors who consider unaligned AI a desirable outcome).

It may be possible to avert this kind of race to the bottom by effective coordination by e.g. enforcing regulations which mandate adequate investments in alignment or restrict what kinds of AI are deployed. Enforcing such controls domestically is already a huge headache. But internationally things are even worse: a country that handicapped its AI industry in order to proceed cautiously would face the risk of being overtaken by a less prudent competitor, and avoiding that race would require effective international coordination.

Ultimately society will be able and willing to pay some efficiency cost to reliably align AI with human interests. But the higher that cost, the harder the coordination problem that we will need to solve. I think the research community should be trying to make that coordination problem as easy as possible.

Thanks for the link. So I guess I should amend what Paul and OpenAI's goal seems like to me, to "create AGI, make sure it's aligned, and make sure it's competitive enough to become widespread."

Seems right, though I don't know to what extent Paul's view is representative of OpenAI's overall view.

I'd like to understand in more detail how this analogy breaks down.

I think the important disanalogy is that once you've created a safe AGI of sufficient power, you win. (Because it's an AGI, so it can go around doing powerful AGI stuff – other projects could be controlled or purchased, etc.)

It's not for sure the case that first-past-the-post will be the end-of-the-day winner, but being first-past-the-post is probably a big advantage. Bostrom has some discussion of this in the multipolar / singleton section of Superintelligence, if I recall correctly.

Drexler's Comprehensive AI Services is an alternative framing for what we mean by AGI. Probably relevant here, though I haven't engaged closely with it yet.

OK, this is what I modeled AI alignment folks as believing. But doesn't the idea of first-past-the-post-is-the-winner rely on a "hard takeoff" scenario? This is a view I associate with Eliezer. But Paul in the podcast says that he thinks a gradual takeoff is more likely, and envisions a smooth gradient of AI capability such that human-level AI comes into existence in a world where slightly stupider AIs already exist.

The relevant passage:

and in particular, when someone develops human level AI, it’s not going to emerge in a world like the world of today where we can say that indeed, having human level AI today would give you a decisive strategic advantage. Instead, it will emerge in a world which is already much, much crazier than the world of today, where having a human AI gives you some more modest advantage.

So I get why you would drop everything and race to be the first to build an aligned AGI if you're Eliezer. But if you're Paul, I'm not sure why you would do this, since you think it will only give you a modest advantage.

(Also, if the idea is to build your AGI first and then use it to stop everyone else from building their AGIs -- I feel like that second part of the plan should be fronted a bit more! "I'm doing research to ensure AI does what we tell it to" is quite a different proposition from "I'm doing research to ensure AI does what we tell it to, so that I can build an AI and tell it to conquer the world for me.")

I feel like that second part of the plan should be fronted a bit more!

Would probably incur a lot of bad PR.

... why you would drop everything and race to be the first to build an aligned AGI if you're Eliezer. But if you're Paul, I'm not sure why you would do this, since you think it will only give you a modest advantage.

Good point. Maybe another thing here is that under Paul's view, working on AGI / AI alignment now increases the probability that the whole AI development ecosystem heads in a good direction. (Prestigious + safe AI work increases the incentives for others to do safe AI work, so that they appear responsible.)

Speculative: perhaps the motivation for a lot of OpenAI's AI development work is to increase its clout in the field, so that other research groups take the AI alignment stuff seriously. Also sucking up talented researchers to increase the overall proportion of AI researchers that are working in a group that takes safety seriously.

From how Paul Christiano frames it, it seems like it's "create AGI, and make sure it's aligned."

I think that's basically right. I believe something like was Eliezer's plan too, way back in the day, but then he updated to believing that we don't have the basic ethical, decision theoretic, and philosophical stuff figured out that's prerequisite to actually making a safe AGI. More on that in his Rocket Alignment Dialogue.

Consumers care somewhat about safe cars, and if safety is mostly an externality then legislators may be willing to regulate it, and there are only so many developers and if the moral case is clear enough and the costs low enough then the leaders might all make that investment.

At the other extreme, if you have no idea how to build a safe car, then there is no way that anyone is going to use a safe car no matter how much people care. Success is a combination of making safety easy and getting people to care / regulating / etc.

Here is the post I wrote about this.

If you have "competitive" solutions, then the required social coordination may be fairly mild. As a stylized example, if the leaders in the field are willing to invest in safety, then you could imagine surviving a degree of non-competitiveness in line with the size of their lead (though the situation is a bit messier than that).

Thanks, this and particularly the Medium post was helpful.

So to restate what I think your model around this is, it's "the efficiency gap determines how tractable social solutions will be (if < 10% they seem much more tractable), and technical safety work can change the efficiency gap."

+1 to Paul's 80,000 Hours interview being awesome.

It's definitely an important question.

In this case, the equivalent is a "car safety" nonprofit that goes around to all the car companies to help them make safe cars. The AI safety initiatives would attempt to make sure that they can help or advise whatever groups do make an AGI. However, knowing how to advise those companies does require making a few cars internally for experimentation.

I believe that OpenAI basically publically stated that they are willing to work with any groups close to AGI, but forgot where they mentioned this.

It's in their charter:

Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”

It's also possible there won't be much competition. There may only be 3-6 entities with serious chances of making an AGI. One idea is to have safety researchers in almost every entity.

There's probably people who can answer better, but my crack at it: (from most to least important)

1. If people who care about AI safety also happen to be the best at making AI, then they'll try to align the AI they make. (This is already turning out to be a pretty successful strategy: OpenAI is an industry leader that cares a lot about risks.)

2. If somebody figures out how to align AI, other people can use their methods. They'd probably want to, if they buy that misaligned AI is dangerous to them, but this could fail if aligned methods are less powerful or more difficult than not-necessarily-aligned methods.

3. Credibility and public platform: People listen to Paul Christiano because he's a serious AI researcher. He can convince important people to care about AI risk.

Curated and popular this week
Relevant opportunities