Hide table of contents

It's quite possible someone has already argued this, but I thought I should share just in case not.

Goal-Optimisers and Planner-Simulators

When people in the past discussed worries about AI development, this was often about AI agents - AIs that had goals they were attempting to achieve, objective functions they were trying to maximise. At the beginning we would make fairly low-intelligence agents, which were not very good at achieving things, and then over time we would make them more and more intelligent. At some point around human-level they would start to take-off, because humans are approximately intelligent enough to self-improve, and this would be much easier in silicon.

This does not seem to be exactly how things have turned out. We have AIs that are much better than humans at many things, such that if a human had these skills we would think they were extremely capable. And in particular LLMs are getting better at planning and forecasting, now beating many but not all people. But they remain worse than humans at other things, and most importantly the leading AIs do not seem to be particularly agentic - they do not have goals they are attempting to maximise, rather they are just trying to simulate what a helpful redditor would say.

What is the significance for existential risk?

Some people seem to think this contradicts AI risk worries. After all, ignoring anthropics, shouldn’t the presence of human-competitive AIs without problems be evidence against the risk of human-competitive AI?

I think this is not really the case, because you can take a lot of the traditional arguments and just substitute ‘agentic goal-maximising AIs, not just simulator-agents’ in wherever people said ‘AI’ and the argument still works. It seems like eventually people are going to make competent goal-directed agents, and at that point we will indeed have the problems of their exerting more optimisation power than humanity.

In fact it seems like these non-agentic AIs might make things worse, because the goal-maximisation agents will be able to use the non-agentic AIs.

Previously we might have hoped to have a period where we had goal-seeking agents that exerted influence on the world similar to a not-very-influential person, who was not very good at planning or understanding the world. But if they can query the forecasting-LLMs and planning-LLMs, as soon as the AI ‘wants’ something in the real world it seems like it will be much more able to get it. 

So it seems like these planning/forecasting non-agentic AIs might represent a sort of planning-overhang, analogous to a Hardware Overhang. They don’t directly give us existentially-threatening AIs, but they provide an accelerant for when agentic-AIs do arrive.

How could we react to this?

One response would be to say that since agents are the dangerous thing, we should regulate/restrict/ban agentic AI development. In contrast, tool LLMs seem very useful and largely harmless, so we should promote them a lot and get a lot of value from them.

Unfortunately it seems like people are going to make AI agents anyway, because ML researchers love making things. So an alternative possible conclusion would be that we should actually try to accelerate agentic AI research as much as possible, and decelerate tool LLM planners, because eventually we are going to have influential AI maximisers, and we want them to occur before the forecasting/planning overhang (and the hardware overhang) get too large.

I think this also makes some contemporary safety/alignment work look less useful. If you are making our tools work better, perhaps by understanding their internal working better, you are also making them work better for the future AI maximisers who will be using them. Only if the safety/alignment work applies directly to the future maximiser AIs (for example, by allowing us to understand them) does it seem very advantageous to me.

2024-07-15 edit: added clarification about differential progress.

Comments3


Sorted by Click to highlight new comments since:

Can you clarify this a bit "Only if the safety/alignment work applies directly to the future maximiser AIs (for example, by allowing us to understand them) does it seem very advantageous to me."

Kind of lost here

Suppose we have some LLM interpritability technology that helps us take LLMs from a bit worse than humans at planning to a bit better (say because it reduces the risk of hallucinations), and these LLMs will ultimately be used by both humans and future agentic AIs. The improvement from human-level planning to better-than-human level benefits both humans and optimiser AIs. But the improvement up to human level is a much bigger boost to the agentic AI, who would otherwise not have access to such planning capabilities, than to humans, who already had human-level abilities. So this interpritability technology actually ends up making crunch time worse.

It's different if this interpritability (or other form of safety/alignment work) also applied to future agentic AIs, because we could use it to directly reduce the risk from them.

It seems I get the knack of it now... 

So your argument here is that if we are going to go this route, then interpretability technology should be used as a measure  in the future towards ensuring the safety of this agentic AI as much as they are using currently to improve their "planning capabilities"  

Curated and popular this week
 ·  · 16m read
 · 
Applications are currently open for the next cohort of AIM's Charity Entrepreneurship Incubation Program in August 2025. We've just published our in-depth research reports on the new ideas for charities we're recommending for people to launch through the program. This article provides an introduction to each idea, and a link to the full report. You can learn more about these ideas in our upcoming Q&A with Morgan Fairless, AIM's Director of Research, on February 26th.   Advocacy for used lead-acid battery recycling legislation Full report: https://www.charityentrepreneurship.com/reports/lead-battery-recycling-advocacy    Description Lead-acid batteries are widely used across industries, particularly in the automotive sector. While recycling these batteries is essential because the lead inside them can be recovered and reused, it is also a major source of lead exposure—a significant environmental health hazard. Lead exposure can cause severe cardiovascular and cognitive development issues, among other health problems.   The risk is especially high when used-lead acid batteries (ULABs) are processed at informal sites with inadequate health and environmental protections. At these sites, lead from the batteries is often released into the air, soil, and water, exposing nearby populations through inhalation and ingestion. Though data remain scarce, we estimate that ULAB recycling accounts for 5–30% of total global lead exposure. This report explores the potential of launching a new charity focused on advocating for stronger ULAB recycling policies in low- and middle-income countries (LMICs). The primary goal of these policies would be to transition the sector from informal, high-pollution recycling to formal, regulated recycling. Policies may also improve environmental and safety standards within the formal sector to further reduce pollution and exposure risks.   Counterfactual impact Cost-effectiveness analysis: We estimate that this charity could generate abou
sawyer🔸
 ·  · 2m read
 · 
Note: This started as a quick take, but it got too long so I made it a full post. It's still kind of a rant; a stronger post would include sources and would have gotten feedback from people more knowledgeable than I. But in the spirit of Draft Amnesty Week, I'm writing this in one sitting and smashing that Submit button. Many people continue to refer to companies like OpenAI, Anthropic, and Google DeepMind as "frontier AI labs". I think we should drop "labs" entirely when discussing these companies, calling them "AI companies"[1] instead. While these companies may have once been primarily research laboratories, they are no longer so. Continuing to call them labs makes them sound like harmless groups focused on pushing the frontier of human knowledge, when in reality they are profit-seeking corporations focused on building products and capturing value in the marketplace. Laboratories do not directly publish software products that attract hundreds of millions of users and billions in revenue. Laboratories do not hire armies of lobbyists to control the regulation of their work. Laboratories do not compete for tens of billions in external investments or announce many-billion-dollar capital expenditures in partnership with governments both foreign and domestic. People call these companies labs due to some combination of marketing and historical accident. To my knowledge no one ever called Facebook, Amazon, Apple, or Netflix "labs", despite each of them employing many researchers and pushing a lot of genuine innovation in many fields of technology. To be clear, there are labs inside many AI companies, especially the big ones mentioned above. There are groups of researchers doing research at the cutting edge of various fields of knowledge, in AI capabilities, safety, governance, etc. Many individuals (perhaps some readers of this very post!) would be correct in saying they work at a lab inside a frontier AI company. It's just not the case that any of these companies as
Dorothy M.
 ·  · 5m read
 · 
If you don’t typically engage with politics/government, this is the time to do so. If you are American and/or based in the U.S., reaching out to lawmakers, supporting organizations that are mobilizing on this issue, and helping amplify the urgency of this crisis can make a difference. Why this matters: 1. Millions of lives are at stake 2. Decades of progress, and prior investment, in global health and wellbeing are at risk 3. Government funding multiplies the impact of philanthropy Where things stand today (February 27, 2025) The Trump Administration’s foreign aid freeze has taken a catastrophic turn: rather than complying with a court order to restart paused funding, they have chosen to terminate more than 90% of all USAID grants and contracts. This stunningly reckless decision comes just 30 days into a supposed 90-day review of foreign aid. This will cause a devastating loss of life. Even beyond the immediate deaths, the long-term consequences are dire. Many of these programs rely on supply chains, health worker training, and community trust that have taken years to build, and which have already been harmed by U.S. actions in recent weeks. Further disruptions will actively unravel decades of health infrastructure development in low-income countries. While some funding may theoretically remain available, the reality is grim: the main USAID payment system remains offline and most staff capable of restarting programs have been laid off. Many people don’t believe these terminations were carried out legally. But NGOs and implementing partners are on the brink of bankruptcy and insolvency because the government has not paid them for work completed months ago and is withholding funding for ongoing work (including not transferring funds and not giving access to drawdowns of lines of credit, as is typical for some awards). We are facing a sweeping and permanent shutdown of many of the most cost-effective global health and development programs in existence that sa