Hide table of contents

(Related text posted to Twitter; this version is edited and has a more advanced final section.)

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.

Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)


Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.

GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:

There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.

Indeed, in general, you've got to be more intelligent to predict particular X, than to generate realistic X.  GPTs are being trained to a much harder task than GANs.

Same spirit: <Hash, plaintext> pairs, which you can't predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN's discriminator about it (assuming a discriminator that had learned to compute hash functions).


Consider that some of the text on the Internet isn't humans casually chatting. It's the results section of a science paper. It's news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.

As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.

Lots of what's shadowed on the Internet has a *complicated* causal process generating it.


Consider that sometimes human beings, in the course of talking, make errors.

GPTs are not being trained to imitate human error. They're being trained to *predict* human error.

Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.

If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you'll make, and play those errors.  If the actress guesses correctly, it doesn't mean the actress is just as error-prone as you.


Consider that a lot of the text on the Internet isn't extemporaneous speech. It's text that people crafted over hours or days.

GPT-4 is being asked to predict it in 200 serial steps or however many layers it's got, just like if a human was extemporizing their immediate thoughts.

A human can write a rap battle in an hour.  A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.


Or maybe simplest:

Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."

Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -

Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.

The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.

When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.

GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.


Figuring out that your next utterance is 'mongo' is not mostly a question, I'd guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying 'mongo'.  Figuring out exactly who's talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say 'mongo'.

When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be.  It's not a problem that is best solved by "have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be".


GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

GPTs are not Imitators, nor Simulators, but Predictors.

74

1
0

Reactions

1
0

More posts like this

Comments12


Sorted by Click to highlight new comments since:

I agree that it's best to think of GPT as a predictor, to expect it to think in ways very unlike humans, and to expect it to become much smarter than a human in the limit.

That said, there's an important further question that isn't determined by the loss function alone---does the model do its most useful cognition in order to predict what a human would say, or via predicting what a human would say?

To illustrate, we can imagine asking the model to either (i) predict the outcome of a news story, (ii) predict a human thinking step-by-step about what will happen next in a news story. To the extent that (ii) is smarter than (i), it indicates that some significant part of the model's cognitive ability is causally downstream of "predict what a human would say next," rather than being causally upstream of it. The model has learned to copy useful cognitive steps performed by humans, which produce correct conclusions when executed by the model for the same reasons they produce correct conclusions when executed by humans.

(In fact (i) is smarter than (ii) in some ways, because the model has a lot of tacit knowledge about news stories that humans lack, but (ii) is smarter than (i) in other ways, and in general having models imitate human cognitive steps seems like the most useful way to apply them to most economically relevant tasks.)

Of course in the limit it's overdetermined that the model will be smart in order to predict what a human would say, and will have no use for copying along with the human's steps except insofar as this gives it (a tiny bit of) additional compute. But I would expect to AI to be transformative well before approaching that limit, so that this will remain an empirical question.

GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

I don't think this is totally meaningful. Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems. But:

  • Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn't super meaningful.
  • To the extent that GPT-4 and humans are both optimizing a loss function, getting a nearly perfect genetic fitness is probably harder than getting a nearly perfect log loss.
  • Getting a GPT-4 level loss on GPT-4's task is probably much easier than getting a human-level loss on the human task.

Smaller notes:

  • The conditional GAN task (given some text, complete it in a way that looks human-like) is just even harder than the autoregressive task, so I'm not sure I'd stick with that analogy.
  • I think that >50% of the time when people talk about "imitation" they mean autoregressive models; GANs and IRL are still less common than behavioral cloning. (Not sure about that.)
  • I agree that "figure out who to simulate, then simulate them" is probably a bad description of the cognition GPT does, even if a lot of its cognitive ability comes from copying human cognitive processes.

For what it's worth, I think Eliezer's post was primarily directed at people who have spent a lot less time thinking about this stuff than you, and that this sentence:

"Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems."

Is the whole point of his post, and is not at all obvious to even very smart people who haven't spent much time thinking about the problem. I've had a few conversations with e.g. skilled Google engineers who have said things like "even if we make really huge neural nets with lots of parameters, they have to cap out at human-level intelligence, since the internet itself is human-level intelligence," and then I bring up the hash/plaintext example (which I doubt I'd have thought of if I hadn't already seen Eliezer point it out) and they're like "oh, you're right... huh." 

I think the point Eliezer's making in this post is just a very well-fleshed out version of the hash/plaintext point (and making it clear that the basic concept isn't just confined to that one narrow example), and is actually pretty significant and non-obvious, and it only feels obvious because it has one of the nice property of simple, good ideas, of being "impossible to unsee" once you've seen it.

Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn't super meaningful.

I don't understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn't near the upper limit?

As an analogy: consider a variant of rock paper scissors where you get to see your opponent's move in advance---but it's encrypted with RSA. In some sense this game is much harder than proving Fermat's last theorem, since playing optimally requires breaking the encryption scheme. But if you train a policy and find that it wins 33% of the time at encrypted rock paper scissors, it's not super meaningful or interesting to say that the task is super hard, and in the relevant intuitive sense it's an easier task than proving Fermat's last theorem.

(crossposted from Alignment Forum)

While the claim - the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence - is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong. 

Let’s start from the end and try to figure out what goes wrong.

GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at analysing the data from LHC, leading to the Higgs detection paper. Doing this type of work basically requires a human brain to have a predictive model of aggregates of outputs of a very large number of collisions of high-energy particles, processed by a complex configuration of computers and detectors. 


Where GPT and humans differ is not some general mathematical fact about the task,  but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded. The different landscape of both boundedness and architecture can lead to both convergent cognition (thinking as the human would do) and the opposite, predicting what the human would output in highly non-human way. 

The boundedness is overall a central concept here. Neither humans nor GPTs are attempting to solve ‘how to predict stuff with unlimited resources’, but a problem of cognitive economy - how to allocate limited computational resources to minimise prediction error.
 

Or maybe simplest:
 Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."

 Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -

 Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.

The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.

 When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.

 GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.

 

If I try to imagine a mind which is able to predict my next word when asked to make up random words, and be successful at assigning 20% probability to my true output, I’m firmly in the realm of weird and incomprehensible Gods. If the Mind is imaginably bounded and smart, it seems likely it would not devote much cognitive capacity to trying to model in detail strings prefaced by a context like ‘this is a list of random numbers’, in particular if inverting the process generating the numbers would seem really costly. Being this good at this task would require so much data and cheap computation that this is way beyond superintelligence, in the realm of philosophical experiments.

Overall I think it is really unfortunate way how to think about the problem, where a system which is moderately hard to comprehend (like GPT) is replaced by something much more incomprehensible. Also it seems a bit of a reverse intuition pump - I’m pretty confident most people's intuitive thinking about this ’simplest’ thing will be utterly confused.

How did we got here?

 

 A human can write a rap battle in an hour.  A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.

 

Apart from the fact that humans are also able to rap battle or impro on the fly, notice that “what would the loss function like the system to do”  in principle tells you very little about what the system will do. For example, the human loss function makes some people attempt to predict winning lottery numbers. This is an impossible task for humans and you can’t say much about the human based on this. Or you can speculate about minds which would be able to succeed in this task, but you soon get into the realm of Gods and outside of physics.
 

Consider that sometimes human beings, in the course of talking, make errors.

GPTs are not being trained to imitate human error. They're being trained to *predict* human error.

Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.


Again, from the cognitive economy perspective, predicting my errors would often be wasteful.  With some simplification, you can imagine I make two types of errors - systematic, and random. Often the simplest way how to predict the systematic error would be to emulate the process which led to the error.  Random errors are ...  random, and a mind which knows me in enough detail to predict which random errors I’ll make seems a bit like the mind predicting the lottery numbers.

Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.

GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:

There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.
 

 The general claim that some predictions are really hard and you need superhuman powers to be good at them is true, but notice that this does not inform us about what GPT-x will learn. 
 

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.

Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  


Yes this is clearly true: in the limit the task is of unlimited difficulty.  

 

I think this post needs more discussion of concrete examples. 

Hypothetically, say you trained an advanced version of GPT on all human chess games, but removed computer games from it's database (and human games that cheated with an engine). The reward function here is still "predict the next move". How "smart" would this GPT version be at chess? How would it far against stockfish or alphazero?

It seems like the "prediction" goal function is inherently limiting here. GPT-9 would be focusing it's compute time and energy on modelling the psychology of a a top-of-his game Magnus Carlsen  in a given situation, which definitely would require a brilliant understanding of chess. But it would be learning how to play human chess. Stockfish would crush it (even the present day version vs future super-GPT), because the goal function of stockfish is to be good at chess. This is the sense in which I think being an "imitator" limits your intelligence.

There's an important distinction here between prediction the next token in a piece of text and predicting the next action in a causal chain. If you have a computation that is represented by a causal graph, and you train a predictor to predict nodes conditional on previous nodes, then it's true that the predictor won't end up being able to do better than the original computational process. But text is not ordered that way! Texts often describe outcomes before describing the details of the events which generated them. If you train on texts like those, you get something more powerful than an imitator. If you train a good enough next-token predictor on chess games where the winner is mentioned before the list of moves, you can get superhuman play by prepending "This is a game which white/black wins:". If you train a good enough next-token predictor on texts that have the outputs of circuits listed before the inputs, you get an NP-oracle. You're almost certainly not going to get an NP-oracle from GPT-9, but that's because of the limitations of the training processes and architectures of that this universe can support, it's not a limitation of the loss function.

I think there very much is a limitation in the loss function, when you consider efficiency of results. In chess, stockfish and alphazero don't just match the best chess players, they exceed them by a ridiculous level, and that's right now. Whereas GPT, with the same level of computation, still hasn't figured out how not to make illegal moves. 

I can't rule out that a future GPT version will be able to beat the best human, by really good pattern matching on what a "winning" game looks like. But that's still pattern matching on human games. Stockfish has no such limitation. 

I'm really glad to see posts like this. It's great that someone can take concepts that are ordinarily only explained with an entire textbook worth of words, and being condensed into an intuitive blog post that laypersons can understand and remember. Lots of people try to explain this in a couple sentences and they just fail, or at least they fail to convey the nuances.

I still think that this could be distilled perhaps to 50% of the length and retain 95% of the value (both from containing all of the original content, and giving enough examples such that most readers will operationalize the concepts properly). There was also this one sentence that was strikingly difficult to follow, even though its content was helpful:

Figuring out that your next utterance is 'mongo' is not mostly a question, I'd guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying 'mongo'.

I think an easier way to see it is realising that its loss is uniform over all pieces of text, whereas humans only care about predicting an extreme minority of that text. If you see a sentence like

"It was a rainy day in Nairobi, the capital of_"

...it's obvious to you that the salient piece of knowledge here is what country Nairobi is the capital of, so that's how you design your benchmarks for AI performance. But the AI cares equally about predicting 'capital' after 'the', and 'rainy' after 'It was a'. GPT-2 was already past human level at almost all text except the very selective subset humans put all their optimisation into (e.g. answers to math tests, long-term coherence in stories, etc.).

And yet GPT-4 rivals us at what we care about.

It's comparable to a science fiction author who only cares about writing better stories yet ends up rivalling top scientists in every field as an instrumental side quest. Human-centric benchmarks[1] vastly underestimate the objective intelligence and generality of GPTs.

  1. ^

    Lessons from Are We Smart Enough to Know How Smart Animals Are seem relevant here.

Extracting the single sentence that I learned something profound from:

<Hash, plaintext> pairs, which you can't predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN's discriminator about it (assuming a discriminator that had learned to compute hash functions).

I already had the other insights, so a post that was only this sentence would have captured 99% of the value for me. Not saying to shorten it, I just think it's a good policy to provide anecdotes re what different people learn the most from.

Hm, I wish the forum had paragraph-ratings, so people could (privately or otherwise) thumbs up paragraphs they personally learned the most from.

Curated and popular this week
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 17m read
 · 
TL;DR Exactly one year after receiving our seed funding upon completion of the Charity Entrepreneurship program, we (Miri and Evan) look back on our first year of operations, discuss our plans for the future, and launch our fundraising for our Year 2 budget. Family Planning could be one of the most cost-effective public health interventions available. Reducing unintended pregnancies lowers maternal mortality, decreases rates of unsafe abortions, and reduces maternal morbidity. Increasing the interval between births lowers under-five mortality. Allowing women to control their reproductive health leads to improved education and a significant increase in their income. Many excellent organisations have laid out the case for Family Planning, most recently GiveWell.[1] In many low and middle income countries, many women who want to delay or prevent their next pregnancy can not access contraceptives due to poor supply chains and high costs. Access to Medicines Initiative (AMI) was incubated by Ambitious Impact’s Charity Entrepreneurship Incubation Program in 2024 with the goal of increasing the availability of contraceptives and other essential medicines.[2] The Problem Maternal mortality is a serious problem in Nigeria. Globally, almost 28.5% of all maternal deaths occur in Nigeria. This is driven by Nigeria’s staggeringly high maternal mortality rate of 1,047 deaths per 100,000 live births, the third highest in the world. To illustrate the magnitude, for the U.K., this number is 8 deaths per 100,000 live births.   While there are many contributing factors, 29% of pregnancies in Nigeria are unintended. 6 out of 10 women of reproductive age in Nigeria have an unmet need for contraception, and fulfilling these needs would likely prevent almost 11,000 maternal deaths per year. Additionally, the Guttmacher Institute estimates that every dollar spent on contraceptive services beyond the current level would reduce the cost of pregnancy-related and newborn care by three do
 ·  · 1m read
 · 
Need help planning your career? Probably Good’s 1-1 advising service is back! After refining our approach and expanding our capacity, we’re excited to once again offer personal advising sessions to help people figure out how to build careers that are good for them and for the world. Our advising is open to people at all career stages who want to have a positive impact across a range of cause areas—whether you're early in your career, looking to make a transition, or facing uncertainty about your next steps. Some applicants come in with specific plans they want feedback on, while others are just beginning to explore what impactful careers could look like for them. Either way, we aim to provide useful guidance tailored to your situation. Learn more about our advising program and apply here. Also, if you know someone who might benefit from an advising call, we’d really appreciate you passing this along. Looking forward to hearing from those interested. Feel free to get in touch if you have any questions. Finally, we wanted to say a big thank you to 80,000 Hours for their help! The input that they gave us, both now and earlier in the process, was instrumental in shaping what our advising program will look like, and we really appreciate their support.