H

harfe

902 karmaJoined

Posts
1

Sorted by New
5
· · 1m read

Comments
161

Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria. So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million). Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers. Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.

However, a model trained to obey the RLHF objective will expect negative reward if decided taking over the world

If an AI takes over the world there is no-one around to give it a negative reward. So the AI will not expect a negative reward for taking over the world

The issue is not whether the AI understands human morality. The issue is whether it cares.

The arguments from the "alignment is hard" side that I was exposed to don't rely on the AI misinterpreting what the humans want. In fact, superhuman AI assumed to be better at humans at understanding human morality. It still could do things that go against human morality. Overall I get the impression you misunderstand what alignment is about (or maybe you just have a different association to words as "alignment" than me).

Whether a language model can play a nice character that would totally give back the dictatorial powers after takeover is barely any evidence whether the actual super-human AI system will step back from its position of world dictator after it has accomplished some tasks.

How is that better than individuals just donating to wherever they think makes sense on the margin?

I think the comment already addresses that here:

moreover, rule by committee enables deliberation and information transfer, so that persuasion can be used to make decisions and potentially improve accuracy or competence at the loss of independence.

This article has a lot of downvoting (net karma of 39 from 28)

This does not seem to be an unusual amount of downvoting to me. The net karma is even higher than the number of votes!

As a more general point, I think people should worry less about downvotes on posts with a high net karma.

Answer by harfe5
0
0

As for existential risk from AI takeover, I don't think having a self-sustaining civilization on Mars would help much.

If an AI has completed takeover on earth and killed all humans on earth, taking over Mars too does not sound that hard, especially since the human civilization is likely quite fragile. (There might be some edge cases, where you solve the AI control problem well enough to guarantee that all advanced AIs leave Mars alone, but not well enough for AI to leave Australia alone, but I think scenarios like these are extremely unlikely).

For other existential risks, it might be in principle useful, but practically very difficult. Building a self-sustaining city on Mars will take a lot of time and resources. On the scale of centuries, it seems like a viable option though.

At the same time though I don't think you mean to endorse 1).

I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says

critics seem to frequently conflate my arguments with other, simpler positions that can be more easily dismissed.

so maybe this is one of these cases and I should be more careful.

the AI won't ever have more [...] capabilities to hack and destroy infrastructure than Russia, China or the US itself.

Having better hacking capability than China seems like a low bar for super-human AGI. The AGI would need to be better at writing and understanding code than a small group of talented humans, and have access to some servers. This sounds easy if you accept the premise of smarter-than-human AGI.

Merely listing EA under "Memetics adjacence" does not support the claim "is also an avowed effective altruist."

Load more