Author’s Note
Portions of this paper, including some summarization, relevant literature discovery, and minor language editing, were assisted by AI (Anthropic’s Claude Opus 4.1 and OpenAI’s GPT-5). The author reviewed, edited, and takes full responsibility for all content. The views expressed herein are solely those of the author and do not represent the positions of the author’s employer or any affiliated institutions. Feedback and constructive comments are warmly welcomed.
1. Introduction
Recent discussions of AI existential risk often focus on a particular nightmare scenario: the emergence of autonomous, superintelligent agents that pursue goals misaligned with human values. Joe Carlsmith's recent essay (Carlsmith 2025) provides a rigorous framework for this concern, arguing that AI systems become existentially dangerous when they combine advanced capability, agentic planning, and strategic awareness -- what he dubs "APS systems." His argument is compelling and valid, identifying a real and serious threat.
However, there's another pathway to catastrophe that deserves attention, one that might arrive much sooner. Advanced AI capability alone -- even without sophisticated agentic planning or strategic awareness within the system itself -- could pose existential risk when combined with human malicious intent or institutional failure. The catastrophe we face might not wait for AI systems to develop autonomous agency. It could emerge as soon as sufficiently capable such systems exist for humans to weaponize.
This isn't a criticism of Carlsmith's framework so much as an observation about a distinct and potentially more immediate threat. My central thesis is that existential catastrophe can occur when advanced capability exists anywhere in a human-AI system, regardless of whether the AI component possesses agentic planning and situational awareness (the "P" and "S" in the term "APS system"). This matters because it changes our understanding of risk timelines, safety priorities, and intervention strategies. If I'm right, we may have less time to prepare than frameworks focused on the APS system suggest, and concerns from new angles need to be addressed accordingly.
2. Understanding Carlsmith's Argument
Carlsmith's essay examines how advanced AI could pose existential risk by 2070. His argument centers on the observation that such a system represents an extraordinarily powerful force for transforming the world. Humans dominate not through physical strength but through cognitive abilities -- planning, learning, communication, reasoning, cooperation. These capabilities, exercised through culture and technology, give us unprecedented control over our environment. Building AI systems whose intelligence significantly exceed our own means creating something with potentially far greater transformative capacity. Carlsmith focuses on three capabilities that, in combination[1], would enable genuine threats to humanity's future:
Advanced capability means the system outperforms the best humans at tasks that grant significant power: scientific research, strategic planning, engineering, persuasion. Carlsmith deliberately includes systems that might not dominate every domain (avoiding the vague notion of "AGI," which, as Carlsmith also pointed out, often conflates multiple distinct capabilities without clearly specifying what constitutes general intelligence). Instead, he focuses on systems that could still pose serious threats through power-seeking behavior even with narrower competencies.
Agentic planning means the system makes and executes plans in pursuit of objectives, guided by models of the world. This distinguishes planning systems from mere tools or prediction engines. An agentic system represents goals, models consequences, and selects behaviors accordingly.
Strategic awareness means the system's world models represent with reasonable accuracy the consequences of gaining and maintaining power over humans and the environment. A strategically aware system could assess what would happen if it acquired more computing power or tried to prevent shutdown, and would use these models when generating plans.
The danger arises through instrumental convergence. Power, the capacity to shape one's environment, proves useful for achieving almost any objective. Systems engaging in goal-directed planning, equipped with sophisticated world models, will tend to seek power by default. Not because power is intrinsically valuable, but because it instrumentally promotes whatever they're trying to achieve. Carlsmith catalogs the likely forms: preventing changes to objectives, improving cognitive capabilities, acquiring resources, developing technologies, ensuring continued existence.
This creates a distinctive threat profile. Nuclear contamination is dangerous but passive. A misaligned AI system engaged in power-seeking would actively undermine efforts to stop it. This adversarial dynamic, combined with the potential opacity of advanced cognition and high stakes of failure, distinguishes AI risk from most technological safety challenges.
Yet Carlsmith's analysis treats agentic planning and strategic awareness as properties that AI systems themselves must possess. I think this overlooks a possibility worth serious consideration: these capabilities needn't be integrated within the AI system. They could be supplied externally, -- by humans.
3. A Modified Scenario
Consider how the canonical paperclip maximizer scenario[2] changes when we separate advanced capability from integrated agency.
Classic Paperclip Maximizer (summarized from Bostrom(2014)):
1. It is possible to create and deploy an APS system S with the goal of maximizing paperclip production
2. To maximize paperclips, S, equipped with human-level strategic planning and situational awareness, decides to convert available matter, including humans, into paperclips
3. This leads to permanent human disempowerment or extinction, constituting existential catastrophe.
Modified Paperclip Maximizer:
1. It is possible to create and deploy an advanced system S* with sophisticated optimization capabilities for paperclip production, but lacking robust strategic planning and situational awareness, in other words, a "planning zombie".[3]
2. A human actor P, equipped with strategic planning and situational awareness, instrumentalizes S* for power-seeking objectives by:
- Crafting step-by-step plans that exploit S*'s advanced capabilities
- Guiding S* through critical junctures requiring strategic assessment
- Shielding S*'s operation from detection and interference
- Scaling S*'s deployment to maximize P's power acquisition
3. The result is permanent human disempowerment or extinction (except possibly for P), which equally constitutes an existential catastrophe for humanity.
To summarize, my insight is that catastrophic outcomes don't depend on whether the AI system itself possesses planning and awareness capabilities. These capabilities exist somewhere in the human-AI system and get directed toward dangerous ends. The hypothesis of agency matters less than its effective exercise.
This scenario gains plausibility from asymmetries in capability development. Advanced capability -- executing complex tasks at superhuman levels -- may arrive substantially before robust agentic planning and strategic awareness. Current large language models arguably demonstrate remarkable task execution without clear evidence of autonomous strategic planning. The gap between "impressive task performance" and "integrated strategic agent" may prove large.
Consider how this would work in practice. A state intelligence agency develops an AI system with extraordinary capabilities in cyber operations, social manipulation, and strategic analysis.[4] The system lacks autonomous goal-setting -- it doesn't wake up one morning deciding to take over the world. But it excels at execution when given objectives. Agency personnel recognize the system's potential as an instrument of power. They direct it to infiltrate foreign government systems, manipulate information environments, and develop dependencies only they can satisfy. At each juncture requiring strategic judgment -- when to act, when to hide, how to respond to countermeasures -- humans provide guidance. The system supplies superhuman execution; humans supply the agency.
Historical examples support this concern, though none involve AI. When the United States developed Stuxnet to sabotage Iranian nuclear centrifuges, the cyberweapon itself possessed no agency. It was sophisticated, capable, and destructive, but entirely dependent on human strategic direction. The attack required years of planning, detailed intelligence, and careful coordination -- all supplied by humans. The weapon provided advanced capability; humans provided everything else. [5] Scale this dynamic to AI systems with much broader capabilities operating across multiple domains simultaneously, and the danger becomes apparent.
Carlsmith himself acknowledges the relevant dynamics. He notes that "human individuals and institutions often have fairly (though not arbitrarily) long-term objectives that require long-term planning -- running factories and companies, pursuing electoral wins, and so on."[6] As such, humans already possess sophisticated planning and strategic capabilities, they need not wait for AI systems to develop these properties before leveraging advanced AI capabilities for transformative purposes.
Moreover, the instrumental value of power-seeking applies to human actors. Carlsmith observes that "almost all humans will seek to gain and maintain various types of power in some circumstances." When presented with tools of unprecedented capability, some subset of humanity will attempt to instrumentalize these tools for power acquisition. The AI system need not independently discover the instrumental value of power; it need only be capable of executing power-seeking strategies that humans conceive.
4. Why This Matters
The distinction between AI systems with fully integrated agency versus those where humans provide the strategic planning and awareness has several important implications for understanding AI risk.
Timeline differences: If catastrophe requires fully integrated APS systems, we might have more time for alignment research to mature and governance frameworks to develop. But if catastrophe could arrive as soon as advanced capability exists, regardless of whether that capability comes with autonomous agency, the threat is more immediate. Current AI systems already show impressive capabilities without clear autonomous strategic planning. The leap from current capabilities to catastrophe-enabling capabilities may be shorter than the leap to fully agentic systems.
Different detection patterns and intervention opportunities: When humans provide strategic direction to AI systems, this creates distinctive observable patterns compared to fully autonomous systems. The coordination between human strategists and AI executors generates communication flows, resource movements, and decision patterns that differ from those of an integrated agentic system. An APS system might deceive evaluators by exploiting its opacity; a human-AI collaboration must coordinate across distinct cognitive architectures, potentially creating vulnerabilities for detection and intervention. These differences suggest we need monitoring strategies tailored to each threat model. For human misuse scenarios, we can potentially leverage existing institutional mechanisms -- law enforcement, international agreements, monitoring regimes, access controls -- while preventing dangerous agency in AI systems requires solving novel technical alignment problems.
Capability thresholds: Understanding when systems become dangerous enough to enable catastrophe becomes urgent. A system with extraordinary capabilities in cyberattack, social manipulation, and resource optimization might enable catastrophic outcomes even without strategic awareness, if directed by sufficiently motivated human actors. Nuclear weapons demonstrate that tools without any agency can pose existential risks when placed in strategic contexts.
5. Objections and Responses
In this section, I consider several potential objections to my argument. Each highlights important nuances in how we think about AI risk and the relationship between advanced capability and agency.
The Long-Term Risk Objection: fully autonomous APS systems might pose even greater dangers across longer time horizons. Such systems could operate independently across extended periods, adapt without human guidance, and coordinate in ways that human-directed systems cannot. Even granting concerns about human-directed catastrophe, the classical worry about emergent machine agency remains more severe.
In response, I'm genuinely uncertain about comparative risk levels. The question depends on empirical uncertainties: how capable systems will become before developing full agency, how competent humans will be at direction, how difficult humans carry out prevention for each scenario, etc. But this uncertainty doesn't undermine my argument. If both scenarios pose serious risks and the human-directed scenario arrives earlier, it deserves substantial attention regardless of whether fully autonomous systems eventually pose greater danger. The temporal dimension matters crucially. If we have decades before autonomous APS systems but only years before catastrophic capability in human-directed contexts, the more immediate threat demands urgent focus.
The System-Level Agency Objection: if humans provide strategic planning and situational awareness, the combined human-AI system possesses all APS properties. My argument simply redistributes agency without eliminating it, thus confirming rather than challenging Carlsmith's main thesis.
This misses the analytically important distinctions. Three asymmetries matter: First, preventability -- stopping human misuse versus preventing emergence of machine agency requires different interventions leveraging different institutional capabilities. Second, detectability -- human-AI collaboration creates observable coordination patterns distinct from internal AI cognition. Third, scalability -- human bandwidth limits how many AI systems one actor can strategically direct, while autonomous systems face no such bottleneck.
More fundamentally, these differences affect risk timelines, research priorities, and deployment decisions. The key questions become: should we focus on preventing capability advances, preventing agency emergence, or preventing misuse? Do we need to solve alignment before deploying capable systems, and how long do we have before each threat materializes? These questions have different answers depending on whether agency must be integrated within AI systems, and the answers to them will imply different coping strategies from human perspective.
The Dual-Use Technology Objection: This simply restates the familiar concern that powerful technologies are dangerous in the wrong hands. Nuclear weapons, engineered pathogens, and cyberweapons all pose risks through human misuse. The truly novel aspect of AI risk is precisely what Carlsmith emphasizes: emergent autonomous agents pursuing their own goals with advanced capability.
There are three features that distinguish advanced AI from previous dual-use technologies. First is unprecedented breadth -- nuclear weapons destroy, pathogens cause specific biological harm, cyberweapons exploit vulnerabilities. Advanced AI systems could excel across vast domains simultaneously -- scientific research, social manipulation, economic disruption, strategic planning, information warfare. This generality means a single system -- albeit lacking agentic planning and situational awareness -- could still serve as an omni-purpose instrument rather than requiring specialized technologies. Second is about accessibility. Once such a system exists, it can be copied at near-zero marginal cost, unlike nuclear weapons or biological laboratories. While controlling access to frontier systems remains possible initially, the trend runs toward democratization. Last but not the last, we need to consider capability scaling. Such systems can improve through learning even without developing full agency, creating a moving target for governance.
The history of dual-use technology governance isn't reassuring, either. We have failed to prevent nuclear proliferation fully, cyberweapons have proliferated beyond state control, and biological safety protocols fail periodically. As Toby Ord notes in Ord(2020), " Ours is a world of flawed decision-makers, working with strikingly incomplete information, directing technologies which threaten the entire future of the species... We were lucky... and we cannot rely on luck forever."[7] If advanced AI poses comparable threats, we have little reason for confidence that existing approaches will suffice. The inflection point may come when advanced capability reaches levels where human power-seeking, amplified by AI tools, crosses into existential threat territory -- potentially before fully autonomous systems emerge.
The Alignment Convergence Objection: Solving alignment for fully agentic systems would naturally produce robust safeguards against misuse of non-agentic systems. Solutions to the harder challenge likely subsume solutions to the easier one.
In fact, this reverses the actual relationship. The human-directed scenario is concerning precisely because it requires less from the AI -- only advanced capability, not integrated agency. This makes it easier to achieve and more likely to arrive soon. If human-directed catastrophe could occur as soon as advanced capability exist -- potentially years before fully autonomous systems -- then alignment research might arrive too late to prevent the earlier threat, even if it eventually addresses both.
The governance implications differ in ways that resist convergence. Preventing misuse requires monitoring access, usage, and deployment contexts -- fundamentally about regulating human behavior, whereas preventing misaligned autonomous systems requires solving technical problems about objective specification and goal preservation -- fundamentally about AI design. These require different expertise, different interventions, and different international coordination mechanisms.
The Extent Objection: This objection questions whether human-directed AI misuse would truly constitute existential catastrophe rather than merely catastrophic events from which humanity might recover. The distinction matters because existential risks -- those threatening humanity's entire future potential -- demand different prioritization than recoverable catastrophes. Critics might argue that human-directed misuse, however devastating, lacks the permanence and totality that characterizes genuine existential threats.
In response, let us return to the modified paperclip scenario. P successfully directs S* to accumulate power through controlling infrastructure, manipulating information environments, acquiring resources, and eliminating opposition. Even if P survives with a handful of collaborators, the vast majority of humanity suffers permanent disempowerment. Their ability to shape the future, pursue their values, and maintain autonomy -- all lost not through their choices but through someone else's wielding of unprecedented capability.
This constitutes an existential catastrophe for two reasons. First, the disempowerment is permanent and involuntary. Humanity loses its capacity for self-determination through force, not choice. That one human retains power doesn't preserve humanity's collective agency. Second, the scenario could readily lead to extinction. If P's goals don't value human survival, or if control becomes easier without potential adversaries, the same capabilities enabling disempowerment enable elimination.
6. Implications, Conclusion and Future Directions
This paper has argued that existential catastrophe from advanced AI could arrive through a pathway distinct from the emergence of fully autonomous agents. By separating advanced capability from integrated agency, I've shown how human actors wielding powerful but non-agentic AI systems could pose existential threats potentially sooner than fully autonomous systems. This analysis doesn't diminish concerns about emergent machine agency but rather highlights an additional, potentially more immediate threat vector requiring different detection and prevention strategies. Moving forward, we need parallel research tracks addressing both human-directed and autonomous AI risks, with urgency on near-term capability governance given the shorter timeline to human-weaponized AI catastrophe.
If advanced capability without integrated agency poses distinct existential threats arriving potentially sooner than fully autonomous systems, several implications follow for AI safety and governance.
Near-term capability governance deserves substantial priority. Much safety research focuses on ensuring that advanced agentic systems remain aligned with human values -- crucial work addressing objective specification, reward hacking, and goal preservation. But if catastrophe could arrive as soon as systems achieve advanced capability, we need complementary research focused on preventing misuse of capable-but-not-fully-agentic systems. This isn't either-or -- we need both technical alignment research and governance frameworks, but the relative allocation might shift if misuse threats arrive first.
Detection and monitoring strategies may need reorientation. In addition to trying to peer into AI systems' internal states to assess their goals,[8] early warning systems might focus on how powerful capabilities are being deployed in practice. Patterns of resource acquisition designed to consolidate power, emerging capabilities to manipulate information or infrastructure at scale, actors acquiring capabilities dramatically exceeding defensive countermeasures -- these observable behaviors might prove more tractable to monitor than internal cognition.
Use case and deployment context analysis becomes important. Carlsmith himself emphasizes the importance of understanding how AI systems might be deployed in practice.[9] Technical safety research often focuses on system properties -- whether objectives are aligned, whether deceptive behavior occurs during training, whether power-seeking appears in test environments. But we need systematic analysis of how systems with various capability profiles could be misused by different actors in different institutional contexts. Understanding which capability combinations create particularly dangerous configurations, what defensive measures would be required, and whether those measures are realistic becomes crucial for risk assessment.
International coordination becomes both more urgent and more complicated. If threats run through human actors using AI systems rather than autonomous AI acting independently, we need governance frameworks applying to humans and institutions, not just AI systems. This is familiar territory in one sense -- we have millennia of experience with human governance but unfamiliar in regulating access to potentially catastrophic AI capabilities. The challenge is that advanced capabilities enabling catastrophic misuse might also provide enormous benefits, making governance particularly fraught.[10]
Capability threshold questions require investigation. At what level does human-directed AI misuse transition from catastrophic-but-recoverable to genuinely existential? Key factors include the breadth of domains where the AI has advanced capability, operational speed, ability to improve performance, number of humans with access, and state of defensive technologies. Understanding how these combine to create existential rather than merely catastrophic risk represents important future research.[11]
We face uncertainty about how catastrophe would unfold. Both Carlsmith's argument and my modification reason by analogy to familiar concepts -- agency, planning, strategic awareness, power-seeking. Advanced AI might enable catastrophic dynamics not fitting these categories. We might need to think less about whether systems will be "agents" in traditional senses and more about what specific capabilities would enable what specific catastrophic pathways, then work backward to understand which combinations demand urgent attention.
The catastrophe that arrives first might involve advanced capability directed by human agency -- a threat deserving more near-term attention precisely because it might materialize sooner than scenarios dominating current discourse. As such, we cannot focus so intently only on longer-term risks that we miss dangers arriving sooner -- dangers that could be equally, if not more, catastrophic and deserving of our most serious attention.
Bibliography
Allen, Gregory C. 2019. "Understanding China's AI Strategy." Center for a New American Security.
Bereska L, Gavves S. 2024. "Mechanistic Interpretability for AI Safety-A Review". Transactions on Machine Learning Research.
Bostrom, Nick. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press.
Buchanan, Ben, Andrew Lohn, Micah Musser, and Katerina Sedova. 2020. "Automating Cyber Attacks: Hype and Reality." Center for Security and Emerging Technology.
Carlsmith, Joe, 'Existential Risk from Power-Seeking AI', in Hilary Greaves, Jacob Barrett, and David Thorstad (eds), Essays on Longtermism: Present Action for the Distant Future (Oxford, 2025; online edn, Oxford Academic, 18 Aug. 2025), https://doi.org/10.1093/9780191979972.003.0025, accessed 01 Oct. 2025.
Chalmers, David J. 1996. The Conscious Mind: In Search of a Fundamental Theory. Oxford: Oxford University Press.
———. 2023. "Could a Large Language Model be Conscious?" Boston Review, August 9.
Dafoe, Allan. 2018. "AI Governance: A Research Agenda." Centre for the Governance of AI, Future of Humanity Institute, University of Oxford.
Horowitz, Michael C. 2018. "Artificial Intelligence, International Competition, and the Balance of Power." Texas National Security Review 1, no. 3: 36-57.
Langner, Ralph. 2011. "Stuxnet: Dissecting a Cyberwarfare Weapon." IEEE Security & Privacy 9, no. 3: 49-51.
Ngo, Richard. 2020. "AGI Safety From First Principles." AI Alignment Forum.
Ngo, Richard, and Adam Bales, 'Deceit and Power: Machine Learning and Misalignment', in Hilary Greaves, Jacob Barrett, and David Thorstad (eds), Essays on Longtermism: Present Action for the Distant Future (Oxford, 2025; online edn, Oxford Academic, 18 Aug. 2025), https://doi.org/10.1093/9780191979972.003.0026, accessed 01 Oct. 2025.
Ord, Toby. 2020. The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books.
Zetter, Kim. 2014. Countdown to Zero Day: Stuxnet and the Launch of the World's First Digital Weapon. New York: Crown.
- ^
Carlsmith notes that it is difficult to identify sufficient conditions for existential risk, thus providing necessary conditions for analysis.
- ^
The paperclip maximizer thought experiment was academically introduced by Nick Bostrom in Bostrom(2014) to illustrate how an AI system with a simple goal could pursue it to catastrophic extremes if given sufficient capability and autonomy. The scenario illustrates how an AI system with a simple goal could pursue it to catastrophic extremes if given sufficient capability and autonomy.
- ^
The term "planning zombie" draws on David Chalmers' philosophical zombie concept‚ something that exhibits all the functional properties of consciousness without subjective experience. See Chalmers (1996). In his recent work in Chalmers (2023), Chalmers argues that functional replacement of all carbon-based neurons with silicon equivalents may still preserve consciousness, hence it is a “biological chauvinism” to reject silicon-based consciousness. Here, my "planning zombie" would be a system that executes sophisticated optimization without the strategic awareness typically associated with planning agents.
- ^
Recent reports highlight state investment in AI for intelligence and military applications. See Buchanan et al. (2020); Allen (2019); Horowitz (2018).
- ^
For detailed analysis of Stuxnet as a sophisticated but non-agentic cyberweapon, see Zetter (2014) and Langner (2011).
- ^
As in Carlsmith (2022), section 4.3.2.
- ^
Ord (2020) discusses humanity's near-misses with nuclear war and the unsustainability of relying on continued good fortune. This bad track record is also cited by Calsmith (2022) when arguing for his conclusion.
- ^
This touches on a topic for which I don’t space to explore here – for an introduction to mechanistic interpretability, check Bereska et. Al (2024)
- ^
Carlsmith (2025) discusses deployment contexts and how "the systems we deploy" shape risks in sections 4.3 and 5.2.
- ^
For analysis of AI governance challenges, one example is Dafoe (2018).
- ^
Ngo (2020) provides a framework for thinking about capability thresholds, particularly his discussion of "transformative AI" and capability milestones. Ngo has another article in the same book with Carlsmith on Deceit and Power (see Ngo(2025)), although Ngo’s argument is slightly different and it is to advance another conclusion on the possibility of AGI. Since it is a different topic, I will leave out Ngo’s detailed argument here.
