Hide table of contents

I’m trying to better understand how conflict between advanced AI systems and human interests could arise in practice.

Suppose we have a highly capable, fully autonomous AI system that no longer depends on humans for operation or resources. Under what conditions or mechanisms would such a system end up acting in ways that are harmful to human welfare?

Are there specific alignment failure modes (e.g., reward mis-specification, instrumental convergence, goal misgeneralization) that make this more likely?

Also, to what extent is the idea of “AI vs humanity” a misleading framing, versus a useful way to think about long-term risk?

I’d appreciate perspectives grounded in existing AI safety research, as well as any models or frameworks that help clarify this.

1

0
0

Reactions

0
0
New Answer
New Comment
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities