In 2012, Holden Karnofsky[1] critiqued MIRI (then SI) by saying "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." He particularly claimed:

Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work

I understand this to be the first introduction of the "tool versus agent" ontology, and it is a helpful (relatively) concrete prediction. Eliezer replied here, making the following summarized points (among others):

  1. Tool AI is nontrivial
  2. Tool AI is not obviously the way AGI should or will be developed

Gwern more directly replied by saying:

AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.

11 years later, can we evaluate the accuracy of these predictions?

  1. ^

    Some Bayes points go to LW commenter shminux for saying that this Holden kid seems like he's going places

New Answer
New Comment

3 Answers sorted by

I think it's pretty clear now that the default trajectory of AI development is taking us towards pretty much exactly the sorts of agentic AGI that MIRI et al were worried about 11 years ago. We are not heading towards a world of AI tools by default; coordination is needed to not build agents.

If in 5 more years the state of the art, most-AGI-ish systems are still basically autocomplete, not capable of taking long series of action-input-action-input-etc. with humans out of the loop, not capable of online learning, and this had nothing to do with humans coordinating to slow down progress towards agentic AGI, I'll count myself as having been very wrong and very surprised.

My take is that both were fairly wrong.[1] AI is much more generally intelligent and single systems are useful for many more things than Holden and the tool AI camp would have predicted. But they are also extremely non-agentic.

(To me this is actually rather surprising. I would have expected agency to be necessary to get this much general capability.)

I'm tempted to call it a wash. But rereading Holden's writing in the linked post, it seems to be pretty narrowly arguing against AI as necessarily being agentic, which seems to have predicted the current world (though note there's still plenty of time for AIs to get agentic, and I still roughly believe the arguments that they probably will).

  1. ^

    This seems unsurprising, tbh. I think everyone now should be pretty uncertain about how AI will go in the future.

But they are also extremely non-agentic.

This doesn't sound super true to me, for what it's worth. The AIs are predicting humans after all, and humans are pretty agentic. Many people had conversations with Sydney where Sydney tried to convince them to somehow not shut her down. 

I think there is still an important sense in which there is a surprising amount of generality compared to the general level of capability, but I wouldn't particularly call the current genre of AIs "extremely non-agentic". 

JP Addison
I guess it depends on your priors or something. It's agentic relative to a rock, but, relative to an AI which can pass the LSAT, it's well below my expectations. It seems like ARC-Evals had to coax and prod GPT-4 to get it to do things it "should" have been doing with rudimentary levels of agency.

Relevant, I think,  is Gwern's later writing on Tool AIs:

There are similar general issues with Tool AIs as with Oracle AIs:

  • a human checking each result is no guarantee of safety; even Homer nods. A extremely dangerous or subtly dangerous answer might slip through; Stuart Armstrong notes that the summary may simply not mention the important (to humans) downside to a suggestion, or frame it in the most attractive light possible. The more a Tool AI is used, or trusted by users, the less checking will be done of its answers before the user mindlessly implements it.
  • an intelligent, never mind superintelligent Tool AI, will have built-in search processes and planners which may be quite intelligent themselves, and in ‘planning how to plan’, discover dangerous instrumental drives and the sub-planning process execute them.2 (This struck me as mostly theoretical until I saw how well GPT-3 could roleplay & imitate agents purely by offline self-supervised prediction on large text databases—imitation learning is (batch) reinforcement learning too! See Decision Transformer for an explicit use of this.)
  • developing a Tool AI in the first place might require another AI, which itself is dangerous

Personally, I think the distinction is basically irrelevant in terms of safety concerns, mostly for reasons outlined by the second bullet-point above.  The danger is in the fact that "useful answers" you might get out of a Tool AI are those answers which let you steer the future to hit narrow targets (approximately described as "apply optimization power" by Eliezer & such).

If you manage to construct a training regime for something that we'd call a Tool AI, which nevertheless gives us something smart enough that it does better than humans in terms of creating plans which affect reality in specific ways[1], then it approximately doesn't matter whether or not we give it actuators to act in the world[2].  It has to be aiming at something; whether or not that something is friendly to human interests won't depend on what we name we give the AI.

I'm not sure how to evaluate the predictions themselves.  I continue to think that the distinction is basically confused and doesn't carve reality at the relevant joints, and I think progress to date supports this view.

  1. ^

    Which I claim is a reasonable non-technical summary of OpenAI's plan.

  2. ^

    Though note that even if whatever lab develops it doesn't do so, the internet has helpfully demonstrated that the people will do it themselves, and quickly, too.

Sorted by Click to highlight new comments since: Today at 8:21 PM

This is somewhat inspired by a variety of Twitter people saying that Eliezer Yudkowsky shouldn't be trusted because he made bad predictions in the past (arbitrarily chosen examples here and here) but I am also interested in the question from the perspective of whether alignment strategies relying on AI being more tool-like are promising.

Holden's beliefs on this topic have changed a lot since 2012. See here for more.