(Related text posted to Twitter; this version is edited and has a more advanced final section.)
Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.
Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder? (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)
Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.
GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:
There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.
Indeed, in general, you've got to be more intelligent to predict particular X, than to generate realistic X. GPTs are being trained to a much harder task than GANs.
Same spirit: <Hash, plaintext> pairs, which you can't predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN's discriminator about it (assuming a discriminator that had learned to compute hash functions).
Consider that some of the text on the Internet isn't humans casually chatting. It's the results section of a science paper. It's news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.
As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.
Lots of what's shadowed on the Internet has a *complicated* causal process generating it.
Consider that sometimes human beings, in the course of talking, make errors.
GPTs are not being trained to imitate human error. They're being trained to *predict* human error.
Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.
If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you'll make, and play those errors. If the actress guesses correctly, it doesn't mean the actress is just as error-prone as you.
Consider that a lot of the text on the Internet isn't extemporaneous speech. It's text that people crafted over hours or days.
GPT-4 is being asked to predict it in 200 serial steps or however many layers it's got, just like if a human was extemporizing their immediate thoughts.
A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.
Or maybe simplest:
Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."
Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -
Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.
The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.
When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.
GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.
Figuring out that your next utterance is 'mongo' is not mostly a question, I'd guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying 'mongo'. Figuring out exactly who's talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say 'mongo'.
When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be. It's not a problem that is best solved by "have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be".
GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.
GPTs are not Imitators, nor Simulators, but Predictors.
(crossposted from Alignment Forum)
While the claim - the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence - is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong.
Let’s start from the end and try to figure out what goes wrong.
From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at analysing the data from LHC, leading to the Higgs detection paper. Doing this type of work basically requires a human brain to have a predictive model of aggregates of outputs of a very large number of collisions of high-energy particles, processed by a complex configuration of computers and detectors.
Where GPT and humans differ is not some general mathematical fact about the task, but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded. The different landscape of both boundedness and architecture can lead to both convergent cognition (thinking as the human would do) and the opposite, predicting what the human would output in highly non-human way.
The boundedness is overall a central concept here. Neither humans nor GPTs are attempting to solve ‘how to predict stuff with unlimited resources’, but a problem of cognitive economy - how to allocate limited computational resources to minimise prediction error.
If I try to imagine a mind which is able to predict my next word when asked to make up random words, and be successful at assigning 20% probability to my true output, I’m firmly in the realm of weird and incomprehensible Gods. If the Mind is imaginably bounded and smart, it seems likely it would not devote much cognitive capacity to trying to model in detail strings prefaced by a context like ‘this is a list of random numbers’, in particular if inverting the process generating the numbers would seem really costly. Being this good at this task would require so much data and cheap computation that this is way beyond superintelligence, in the realm of philosophical experiments.
Overall I think it is really unfortunate way how to think about the problem, where a system which is moderately hard to comprehend (like GPT) is replaced by something much more incomprehensible. Also it seems a bit of a reverse intuition pump - I’m pretty confident most people's intuitive thinking about this ’simplest’ thing will be utterly confused.
How did we got here?
Apart from the fact that humans are also able to rap battle or impro on the fly, notice that “what would the loss function like the system to do” in principle tells you very little about what the system will do. For example, the human loss function makes some people attempt to predict winning lottery numbers. This is an impossible task for humans and you can’t say much about the human based on this. Or you can speculate about minds which would be able to succeed in this task, but you soon get into the realm of Gods and outside of physics.
Again, from the cognitive economy perspective, predicting my errors would often be wasteful. With some simplification, you can imagine I make two types of errors - systematic, and random. Often the simplest way how to predict the systematic error would be to emulate the process which led to the error. Random errors are ... random, and a mind which knows me in enough detail to predict which random errors I’ll make seems a bit like the mind predicting the lottery numbers.
The general claim that some predictions are really hard and you need superhuman powers to be good at them is true, but notice that this does not inform us about what GPT-x will learn.
Yes this is clearly true: in the limit the task is of unlimited difficulty.