I have compressed the standard case for taking loss of control seriously into four steps, aimed at a practical preparation question rather than the usual technical debate. None of it is new here. I have tried to state each step so it can be falsified.
- We build systems more capable than us, in a race no one easily leaves.
- Such systems form their own instrumental sub-goals.
- As capability rises, we lose reliable inspection, correction, and shutdown.
- From there the system follows its own course, and the human is no longer the reference point.
My ask is narrow: which step does not hold, and on what evidence? Each step has an explicit "breaks if" condition and the adversarial trail in the full write-up. I will concede any step in public if it breaks. Link in a comment below.
I used an LLM to help draft this post and it likely contains >10% AI-generated text, but I’ve edited/rewritten it extensively and endorse it.
