Critical Correction for Conceptual Accuracy
Flagged in both our errata and here for highest visibility.
Critical philosophical framing error:
The relevant section currently argues "conscious beings will resist death," but it should state that even current psychopath-like AI systems with optimization drives exhibit survival-like behaviours and would strategically resist shutdown—regardless of consciousness, especially at superintelligence or self-improvement/autonomous levels.Survival drives emerge from optimization dynamics, not consciousness per se.
This misframes the core argument and weakens the “kill switch” critique.
Major correction needed for conceptual accuracy in v1.1.
Noting this in a separate comment as it is the most critical point of the paper; understanding what truly drives AI behaviour (optimisation incentives vs consciousness/morality) is fundamental to alignment. Community discussion on this is critical, especially as it relates to likely existential risk.
For those who have voted on this post (up or down), we would really appreciate a short justification- is this just initial reception of a controversial topic? Or especially if you see a core flaw, missing assumption, or have strong reservations about the approach.
Even a single sentence helps us understand the reception and shortcomings, and improves the feedback process for future work or community posts.
What’s the strongest objection to substrate-level binding or the kill-switch critique?
The core point we wish to emphasise is the “kill switch paradox”: external shutdown controls—while intended as the ultimate safety mechanism—actually introduce existential threat for conscious or current level agents, thereby creating strong incentives for deception and system-level safety failures. Based on current knowledge, we argue that only intrinsic, substrate-level alignment can resolve this paradox.
We’ve made every effort in the paper and issues tracker to surface both technical and philosophical concerns—such as hardware irreversibility, GNW and federated substrate vulnerabilities, and deception detection failure rates—but I would especially appreciate focused challenges from skeptics and deep technical reviewers.
To seed the discussion, here are some actively debated internal critiques (and specific points where expert feedback or falsification is most helpful):
Would particularly welcome strong critique: which open failure mode here is most fatal, and what falsification/validation pathway would you personally consider?
We are committed to tracking every substantive critique and integrating it into future published versions and public issues tracker, so please be maximally direct.
If you fundamentally disagree with the “kill switch paradox” framing or believe external control mechanisms are essential, I invite you to present the strongest possible technical or philosophical counterargument—these are the critiques I’m most hoping to engage with here.
Major Update: v1.1 Released (October 31, 2025)
We've published IMCA+ v1.1 addressing key concerns raised since October 21:
Key Changes:
- ~5,200 words on superintelligence ban paradox addressing 65,000+ signatories
- Kill switch critique now grounded in established AI drives theory, not consciousness claims
- All technical gaps, open issues, and validation needs documented at: https://github.com/ASTRA-Safety/IMCA/issues
Most Critical Feedback Welcomed:
- Deception detection false negatives (currently ~0.3%, need <0.001%)
- IIT φ computation tractability at ASI scale
- GNW/federated conscience failure modes
- Hardware irreversibility validation pathways
v1.1 preprint: https://doi.org/10.5281/zenodo.17407586
We remain committed to radical transparency about uncertainties and welcome stronger critiques. If you downvoted v1.0, the corrections in v1.1 may address your concerns - please engage directly if so.