The argument I’m going to make here is obviously not original with me, but I do think that while AI governance is talked about, and one of the career paths that 80,000 hours recommends, it is not sufficiently pointed to as an important cause area. Further, I think the specific issue that I’m pointing to is even more neglected than AI governance generally.

 

Suppose we create a human superior AI that does what we tell it to.

This creates the possibility of us immediately transforming into a post scarcity economy. Nobody needs to work. Everybody gets as much as they want of the stuff that used to be scarce due to limited human labor.

This would be amazing!

Each of us could own a yacht, or two. The limit on how many yachts we can have is space on the water, not how expensive the boat is. We can own as many clothes as we want. If anyone wants an authentic 18th century costume, it can be hand sewn immediately. 

If anyone gets sick, they will be able to have access to a flawless surgeon using the most advanced medical knowledge. They also won’t need to wait for an appointment to talk to a doctor. Every single child will have access to private tutors who know as much as every tutor that any child has had until the present put together.

An actual super intelligence could give us delicious zero suffering meat, probably a malaria vaccine, it could eliminate either all, or nearly all childhood diseases.

AI could create a world where nobody needs to desperately work and hustle just to get by, and where no one ever needs to worry about whether there will be food and shelter.

Etc, etc, etc.

There is a reason that techno-optimist transhumanists think friendly AI can create an amazingly good world. 

 

So, let's for a moment assume Deepmind, Open AI or a black project run by the Chinese government successfully creates a super human intelligence that does what they ask, and which will not betray them.

Does this super awesome world actually come into existence?


 

Reasons to worry (definitely not exhaustive!)


 

  • Power corrupts
    • Capabilities researchers have been reading about how awesome singletons are.
      • Even worse, some of the suits might have read about it too. Sergey and Elon definitely have.
  • The Leftist critique: A limited number of powerful people making decisions about the fate of people who have no say in it. They will treat our collective human resources as their own private toys.
    • I take this issue very seriously.
    • A good world is a world where everyone has control over their own fate, and is no longer at the mercy of impersonal forces that they can neither understand nor manipulate. 
    • Further a good world is one in which people in difficult, impoverished, and non normative circumstances are able to make choices to make their lives go well, as they see it.
  • The Nationalism problem
    • Suppose AI developed in the US successfully stays under democratic control. And it is used purely to aggrandize the wealth and well being of Americans, by locking in America’s dominance of all resources in the solar system and the light cone, forever.
      • Poor Malawians are still second or third class citizens on earth, and are still only receiving the drips of charity from those who silently consider themselves their betters.
      • We could have fixed poverty forever instead.
    • Suppose AI is developed in China 
      • They establish a regime with communist principles and social control over communication everywhere on the planet. This regime keeps everyone, everywhere, forever parroting communist slogans.
    • Worse: Suppose they don’t give any charity to the poor? There is precedent for dominant groups to simply treat the poor around them as parasites or work animals. Perhaps whoever controls the AI will starve or directly kill all other humans.


 

Summary: Misaligned people controlling the AI would be bad.

 

This issue is connected to the agency problem (of which aligning AI itself is an example). 

  • How do we make sure that the people or institutions in power act with everyone's best interests in mind?
  • What systems can we put in place to hold the people/institutions in power accountable to the promises they make?
  • How do we shape the incentive landscape in such a way that the people/institutions in power act to maximise wellbeing (while not creating adverse effects)?
  • How do we make sure we give power only to the actors who have a sufficient understanding of the most pressing issues and are committed to tackling them?


 

So what can we do about this? Two approaches that I have heard about:


 

  • Limiting financial upside of developing AI to a finite quantity that is small relative to the output of a dyson swarm.
    • Windfall clauses
    • Profit capping arrangements like I think Open AI has
    • Ad hoc after the fact government taxes and seizures
  • The Moon Treaty, and giving the whole global community collective ownership of outerspace resources
  • Make sure that if AI is developed, it only comes out of a limited number of highly regulated and government controlled entities where part of the regulatory framework ensures a broad distribution of the benefits to at least the citizens of the country where it was constructed. This centralization might also have substantial safety benefits.

 


 

The problem with any approach to AI control after it is developed is that we cannot trust the legal system to constrain the behavior of someone in control of a singleton following a fast take off scenario. There need to be safeguards embedded in these companies that are capable of physically forcing the group that built the AI to do what they promised to do with it, and these regulations need to be built into the structure of how any AI that might develop into a singleton is trained and built.

This should be part of the AI safety regulatory framework, and might be used as part of what convinces the broader public that AI safety regulation is necessary in the first place (it would actually be bad, even if you are a libertarian, if AI is just used to satisfy the desires of rich people). 

All of this only becomes a problem if we actually solve the general alignment problem of creating a system that does what its developers actually want it to do. What you think the p(AGI Doom) is will drive whether you think this is worth working on. 

This also is an effective and possibly tractable place to focus on systemic change. A world system that ensures that everyone gets a sufficient share of the global resources to meet their needs after full automation likely will require major legal and institutional changes, possibly of the same magnitude as a switch to communism or anarcho capitalism would require.

The value of improving a post aligned AI future is multiplied by the possibility that we actually reach that future. So if you think that the odds are 1/million that AI is safely developed, the expected value from efforts in this direction is far lower than if you believe the odds of AI killing us all are 1/million. 

But if we meet that (possibly unlikely) bar of not dying from AI, there will still be more work needed to be done to create utopia.

I'd like to thank Milan, Laszlo, Marta, Gergo, Richard and David for their comments on the draft text of this essay.

 


 

40

0
0

Reactions

0
0

More posts like this

Comments2


Sorted by Click to highlight new comments since:

Thank you for making and pushing for the relevance of these comments!

One way to succinctly make a similar point, I suggest, is to insist, continuously, that AI alignment and AI safety are not the same problems but are actually distinct.

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f