Against LLM Reductionism

Erich_Grunewald 🔸

Comments 4

Sorted by

New & upvoted

I think the attitudes described are a reaction, perhaps an overreaction, to people being overly impressed at mundane behaviors of Chatgpt.

For example, I've seen people on twitter who are impressed that chatgpt can "solve the Monty Hall problem". This is actually a fairly mundane achievement, given that there are probably thousands of detailed examples on the monty hall out on the internet. This is a good example of parrot behavior being mistaken for complex thought.

On the other hand, passing the "dumb monty hall" problem, which is phrased the same as the old one except the doors are transparent, would sound less impressive to the layman, but would actually be a significantly greater achievement. (that is, until the problem got absorbed into it's training set). I'm curious to see how long it takes before LLM's can reliably solve these kinds of anti-riddles.

Interestingly, I asked chatgpt the original riddle and then the anti-riddle, and it detected a change, and tried (but failed) to respond accordingly:

[CHATGPT:] If you can see that the car is behind door 1, then the host would not have opened door 3, as doing so would reveal a goat and spoil the game. Therefore, this scenario is not possible, and the answer would be that the game show scenario described is not valid.
However, if we assume that the host made a mistake and opened door 3 even though the car was behind door 1, then it would not make a difference whether you switch to door 2 or stick with your original choice of door 1. This is because you already know that door 1 has the car behind it, so switching to door 2 or staying with door 1 would both result in you winning the car.

In this case, it's acting more than just parrot-like, but still failing to produce an accurate answer.

Erich_Grunewald 🔸

Yeah, it's definitely plausible to me that current LLMs are generally less capable than impressive (by some measurements of those), and/or that people overestimate their capabilities. It's also plausible to me that people anthropomorphize LLMs in ways that definitely aren't warranted. (By "people", I guess I mean the median Twitter user or the median EA, maybe not the median AI safety or ML researcher.)

On anti-riddles, I found the Inverse Scaling Prize winners pretty interesting -- seems related.

Roman Leventov

Bing definitely "helps" people to over-anthropomorphise it by actively corroborating that it has emotions (via self-report and over-use of emojis), consciousness, etc.

Daniel Smith

"It may have the ability and motivation to gain more such knowledge, and to improve its reasoning skills."

I don't think the creature would have either of these two abilities, since its ancestors haven't been selected for their ability to acquire new information over the course of their lifespans, as I understand it the experiment.

Plus, one trillion years sounds like too long a time for a vestigial mental trait like curiosity and learning to survive this strong selection for a very alien mental task.

I love the experiment, though only as horror fiction.

Comments

More from the author

194

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI

Erich_Grunewald 🔸·2y ago·Curated 2y ago·20m read

Not a Meat Eater FAQ

Erich_Grunewald 🔸·1y ago·44m read

Doubts about Track Record Arguments for Utilitarianism

Erich_Grunewald 🔸·4y ago·13m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 1d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

166

The first video from Giving What We Can's new channel is out now!

JustinPortela·3d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·4d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·10h ago·2m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·8h ago·1m read

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

titotal

I think the attitudes described are a reaction, perhaps an overreaction, to people being overly impressed at mundane behaviors of Chatgpt.

Interestingly, I asked chatgpt the original riddle and then the anti-riddle, and it detected a change, and tried (but failed) to respond accordingly:

[CHATGPT:] If you can see that the car is behind door 1, then the host would not have opened door 3, as doing so would reveal a goat and spoil the game. Therefore, this scenario is not possible, and the answer would be that the game show scenario described is not valid.
However, if we assume that the host made a mistake and opened door 3 even though the car was behind door 1, then it would not make a difference whether you switch to door 2 or stick with your original choice of door 1. This is because you already know that door 1 has the car behind it, so switching to door 2 or staying with door 1 would both result in you winning the car.

In this case, it's acting more than just parrot-like, but still failing to produce an accurate answer.

See Wei et al. (2022) for some examples of emergent capabilities measured using sets of benchmark tasks, including arithmetic, word unscrambling, analogical reasoning and more. ↩︎
Though it's certainly true that LLMs are unreliable, I think Marcus's assessment of and predictions about their capabilities are mistaken. I reckon I think reliability is less of an issue than he does because (1) for many applications, reliability isn't necessary, (2) even when it is, you can sometimes work around LLMs' lack of reliability, for example by adding quality control processes to check their outputs, and (3) even though the frontier for AI capabilities is everlastingly faulty, LLMs are getting more reliable at specific tasks^[14]: this is often forgotten as those particular capabilities, which used to be at the frontier, are now so obvious that they're taken for granted, and reminds me of a marathon runner keeping their eyes fixed a few paces ahead, not really sensing the distance they've travelled. As for truthfulness, I'm guessing, with lots of uncertainty, that LLMs will get better at this though, like humans, they won't become perfectly truthful any time soon, and that will be a real problem. ↩︎
Gary Marcus writes: "Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, draw from massive statistical databases of human language." ↩︎
Ted Chiang writes: "The fact that ChatGPT rephrases material from the Web instead of quoting it word for word makes it seem like a student expressing ideas in her own words, rather than simply regurgitating what she's read; it creates the illusion that ChatGPT understands the material. In human students, rote memorization isn't an indicator of genuine learning, so ChatGPT's inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something. When we're dealing with sequences of words, lossy compression looks smarter than lossless compression. [...] There's nothing magical or mystical about writing, but it involves more than placing an existing document on an unreliable photocopier and pressing the Print button. It's possible that, in the future, we will build an A.I. that is capable of writing good prose based on nothing but its own experience of the world. The day we achieve that will be momentous indeed -- but that day lies far beyond our prediction horizon. In the meantime, it's reasonable to ask, What use is there in having something that rephrases the Web?" ↩︎
Emily Bender says: "[A Language Model] is a system for haphazardly stitching together linguistic forms from its vast training data, without any reference to context or meaning. That's where the term 'stochastic parrots' comes from. Parrots mimic sounds, but they don't understand what they mean. LMs may haphazardly output form, and they've gotten pretty good at making forms that look plausible. But it's still the human being, encountering synthetic text, who makes sense of it. The computer is merely making a pattern that the human then applies meaning to."

Elsewhere she says: "Just so everyone is clear: ChatGPT is still just a language model: just a text synthesis machine/random BS generator. Its training has honed the form of that BA a bit further, including training to avoid things that look like certain topics, but there's still no there there. [...] ChatGPT generates strings based on combinations of words from its training data. When it sometimes appears to say things that are correct and sensible when a human makes sense of them, that's only by chance."

It may be unfair to group Bender's argument with descriptors like "shallow pattern matchers" and "massive look-up tables". Her view is not an unconsidered take, but a proper philosophical argument: principally, that LLMs (1) don't ground the meaning of words in sensory experience of the world, and therefore exhibit no understanding, and (2) generate text without intending anything.

But Bender also seems to think that this has implications for what LLMs can do, and what they'll be able to do in future. So maybe it is fair to group them, after all. It seems quite possible to me that LLMs are limited by their lack of access to the non-textual world, similarly to how a human would be limited if they only learned about things through textbooks. It also seems plausible that LLMs cannot be scaled up to AGI -- that we'd need multimodal systems trained on diverse tasks, or some other regime, to reach AGI. ↩︎
Another, and parallel, drive is to push back against excessive anthropomorphising. But I think the reason people want to push back against anthropomorphising is often to prove that LLMs are less capable, or less impressive, than they seem. Another reason is to put an end to flawed reasoning, a laudable goal, but one undermined by the subsequent retreat to similarly flawed reasoning. ↩︎
Important, but not a good ground of dismissal. To quote Justin Weinberg, "[Q]uite possibly the stupidest response to this technology is to say something along the lines of, 'it's not conscious/thinking/intelligent, so no big deal'." ↩︎
Can internal computation alone in principle produce strategically-aware planning systems? Obviously LLMs can generate texts that describe plans, but can they also "understand" them or act on them? I don't think so -- LLMs are passive, only doing things when they are prompted.

That said, LLMs could do some sort of planning and plan-execution internally when generating outputs. Transformers, as vanilla neural networks, compute sequentially when doing inference.

Elhage et al. (2021): "One of the main features of the high level architecture of a transformer is that each layer adds its results into what we call the 'residual stream'. [...] The residual stream has a deeply linear structure. Every layer performs an arbitrary linear transformation to 'read in' information from the residual stream at the start, and performs another arbitrary linear transformation before adding to 'write' its output back into the residual stream."

So it seems possible for an LLM to produce a plan (appropriate to the input received) in its earlier layers, and execute that plan in its later layers. This surely makes them capable of (at least rudimentary forms of) planning, in principle. It's an open question whether something like this actually happens in LLMs today. ↩︎
The model is trained directly on addition, meaning that, while it shares the same architecture as LLMs, it's trained on a far more limited type of data, and for a far more specialised task, than LLMs. I see no reason why LLMs shouldn't be capable of learning general rules in the same way that this addition model is, though it would of course take far more time for an LLM to grok the general rules of addition in particular since (1) addition is only a tiny part of the data set that LLMs are trained on, and (2) the general rules of addition competes for space in the model's weights with other general rules, and with shallowly memorised information. ↩︎
Why, you may ask, does the model strive for the more general algorithm, when memorising the answers is enough to achieve a low loss on the training data? It doesn't, after all, see any of the test data during training, so it's unaware of how badly it does out-of-distribution. The answer is that modern ML models are regularised, meaning more or less that they are encouraged, in various ways, to learn more generalisable patterns. In Nanda et al. (2023), this is chiefly done by penalising the model for having large weights (this is called weight decay), which incentivises it to use smaller weights to achieve its goal of achieving low error, in effect punishing it for rote memorisation.

My understanding is that, in this experiment at least, the model gradually picks up small general patterns that help it lower its weights, for example recognising that 2 + 7 is equivalent to 7 + 2. These patterns don't form an entire, independently working algorithm, and therefore don't help the model on out-of-distribution data, but they do help make the model more parsimonious. At some point, though, there are enough of these general patterns that the model can "put them together" into an independently working algorithm, such that it swiftly becomes able to (1) perform just as well out-of-distribution as in-distribution and (2) shed any (now unnecessary) memorised answers to the training data. ↩︎
Jacob Steinhardt collects further examples of ML phase changes in "Future ML Systems Will Be Qualitatively Different" and "Emergent Deception and Emergent Optimization". ↩︎
As I was writing this post, I came across Matthew Barnett's post on a similar subject. He makes pretty much these same points. ↩︎
Here is ChatGPT partially failing another test along the same lines:

"ME: What is the 'it' in each of these two sentences?
1. The cat fed the kitten because it was hungry.
2. The cat snarled at the kitten because it was angry.
CHATGPT: In both sentences, 'it' refers to the cat. In the first sentence, the cat fed the kitten because the cat was hungry. In the second sentence, the cat snarled at the kitten because the cat was angry."

Though if phrased differently, it sometimes (but not always) gets it right:

"ME: What is the 'it' in each of these two sentences?
1. The cat fed the kitten because it was hungry.
2. The kitten was snarled at by the cat because it was angry.
CHATGPT: In both sentences, 'it' refers to the subject of the clause that precedes it.
1. In the first sentence, 'it' refers to the kitten, which was hungry and therefore fed by the cat.
2. In the second sentence, 'it' refers to the cat, which was angry and therefore snarled at the kitten."
So I'm happy to concede that ChatGPT doesn't grasp, or doesn't fully grasp, the meaning of these sentences. I reckon GPT-4 will pass this test, but we shall see. ↩︎
Example: Sorting randomly generated single-digit integer lists. Two years ago janus tested this on GPT-3, and found that, even with a 32-shot (!) prompt, GPT-3 managed to sort lists of 5 integers 10/50 times, and lists of 10 integers 0/50 times. (A 0-shot, Python-esque prompt did better at 38/50 and 2/50 respectively). I tested the same thing with ChatGPT just now and it got it right 5/5 times for 10-integer lists. (Example prompt: "Can you sort this list in ascending order? [0, 8, 6, 5, 1, 1, 1, 8, 3, 7]".) I then asked it to sort five 10-integer lists in one go, and it got 4/5 right! (NB: I'm pretty confident that this improvement didn't come with ChatGPT exactly, but rather with the newer versions of GPT-3 that ChatGPT is built on top of.) ↩︎

Against LLM Reductionism

Summary

Pattern Matchers, Look-up Tables, Stochastic Parrots

Those Aren't Good Descriptors

LLMs Can Learn General Algorithms

General Analytic Capabilities

Fully General Algorithms

Phase Changes

LLMs Can Contain and Use Models of the World

LLMs Aren't Next-Token Predictors, They Are Next-Token-Prediction Artefacts

The Chimpanzee Next-Word Predictor Thought Experiment

Reasons for Caution

"If We Ever Succeed [...] It Must Be by Reducing Y to 'Just X'"

References