I was listening to https://80000hours.org/podcast/episodes/paul-christiano-ai-alignment-solutions recently, and found it very helpful. However, I still have a question at the end of it.
What is the plan that organizations like OpenAI have to prevent bad outcomes from AGI? From how Paul Christiano frames it, it seems like it's "create AGI, and make sure it's aligned."
But I don't understand how this plan accounts for competition. To use a stupid analogy, if I was concerned that cars in America weren't safe, I might start my own car company to manufacture and sell safer cars. Maybe I spend a lot of time engineering a much safer car. But my efforts would be for naught if my cars weren't very popular (and hence my company wasn't very successful), even if they were groundbreakingly safe.
It seems like this latter part is most of the trick, at least in the domain of cars.
I'd like to understand in more detail how this analogy breaks down. I can imagine several ways, but would love to hear it direct from the horse's mouth.
It's definitely an important question.
In this case, the equivalent is a "car safety" nonprofit that goes around to all the car companies to help them make safe cars. The AI safety initiatives would attempt to make sure that they can help or advise whatever groups do make an AGI. However, knowing how to advise those companies does require making a few cars internally for experimentation.
I believe that OpenAI basically publically stated that they are willing to work with any groups close to AGI, but forgot where they mentioned this.
It's in their charter: