Hide table of contents

Epistemic Status

Highlighting a thesis in Janus' "Simulators" that I think is insufficiently appreciated.

 

Thesis

In the limit, models optimised for minimising predictive loss on humanity's text corpus converge towards general intelligence[1].


Preamble

From Janus' Simulators:

Something which can predict everything all the time is more formidable than any demonstrator it predicts: the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum (though it may not be trivial to extract that knowledge).


Introduction

I affectionately refer to the above quote as the "simulators thesis". Reading and internalising that passage was an "aha!" moment for me. I was already aware (at latest July 2020) that language models were modelling reality. I was persuaded by arguments of the below form:

Premise 1: Modelling is transitive. If X models Y and Y models Z, then X models Z.

Premise 2: Language models reality. "Dogs are mammals" occurs more frequently in text than "dogs are reptiles" because dogs are in actuality mammals and not reptiles. This statistical regularity in text corresponds to a feature of the real world. Language is thus a map (albeit flawed) of the external world.

Premise 3: GPT-3 models language. This is how it works to predict text.

Conclusion: GPT-3 models the external world.

But I hadn't yet fully internalised all the implications of what it means to model language and hence our underlying reality. The limit that optimisation for minimising predictive loss on humanity's text corpus will converge to. I belatedly make those updates.


Interlude: The Requisite Capabilities for Language Modelling

Janus again:

If loss keeps going down on the test set, in the limit – putting aside whether the current paradigm can approach it – the model must be learning to interpret and predict all patterns represented in language, including common-sense reasoning, goal-directed optimization, and deployment of the sum of recorded human knowledge

Its outputs would behave as intelligent entities in their own right. You could converse with it by alternately generating and adding your responses to its prompt, and it would pass the Turing test. In fact, you could condition it to generate interactive and autonomous versions of any real or fictional person who has been recorded in the training corpus or even could be recorded (in the sense that the record counterfactually “could be” in the test set). 


Implications

The limit of predicting text is predicting the underlying processes that generated said text. If said underlying processes are agents, then sufficiently capable language models can predict agent (e.g., human) behaviour to arbitrary fidelity[2]. If it turns out to be the case that the most efficient way of predicting the behaviour of conscious entities (as discriminated via text records) is to instantiate conscious simulacra, then such models may perpetuate mindcrime.

 

Furthermore, the underlying processes that generate text aren't just humans, but the world which we inhabit. That is, a significant fraction of humanity's text corpus reports on empirical features of our external environment or the underlying structure of reality:

  • Timestamps
    • And other empirical measurements
  • Log files
  • Database files
    • Including CSVs and similar
  • Experiment records
  • Research findings
  • Academic journals in quantitative fields
  • Other reports
  • Etc. 

Moreover, such text is often clearly distinguished from other kinds of text (fiction, opinion pieces, etc.) via its structure, formatting, titles, etc. In the limit of minimising predictive loss on such text, language models must learn the underlying processes that generated them — the conditional structure of the universe.

The totality of humanity's recorded knowledge about the world — our shared world model — is a lower bound on what language models can learn in the limit[3]. We would expect that sufficiently powerful language models would be able to synthesise said shared world model and make important novel inferences about our world that are implicit in humanity's recorded knowledge, but which have not yet been explicitly synthesised by anyone[4].

The idea that the capabilities of language models are bounded by the median human contributor to their text corpus or even the most capable human contributor is completely laughable. In the limit, language models are capable of learning the universe[5].

 

Text prediction can scale to superintelligence[6].

This is a very nontrivial claim. Sufficiently hard optimisation for performance on most cognitive tasks (e.g. playing Go) will not converge towards selecting for generally intelligent systems (let alone strongly superhuman general intelligences). Text prediction is quite special in this regard.

This specialness suggests that text prediction is not an inherently safe optimisation target; future language models (or simulators more generally) may be dangerously capable[7].


Caveats

Humanity's language corpus embeds the majority of humanity's accumulated explicit knowledge about our underlying reality. There does exist knowledge possessed by humans that hasn't been represented in text anywhere. It is probably the case that the majority of humanity's tacit knowledge hasn't been explicitly codified anywhere, and even among the knowledge that has been recorded in some form, a substantial fraction may be hard to access or not be organised/structured in formats suitable for consumption by language models.

I suspect that most useful (purely) cognitive work that humans do is communicated via language to other humans and thus is accessible for learning via text prediction. Most of our accumulated cultural knowledge and our shared world model(s), do seem to be represented in text. However, it's not necessarily the case that pure text prediction is sufficient to learn arbitrary capabilities of human civilisation.

Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. Likewise, the limitations imposed by the architecture of whatever model we are training. In other words, that a particular upper bound exists in principle, does not mean it will be realised in practice.

 

Furthermore, the limit of text prediction does not necessarily imply learning the conditional structure of our particular universe, but rather a (minimal?) conditional structure that is compatible with our language corpus. That is, humanity's language corpus may not uniquely constrain our universe (but a set of universes of which ours is a member). The aspects of humanity's knowledge about our external world that are not represented in text may be crucial missing information to uniquely single out our universe (or even just humanity's shared model of our universe). Similarly, it may not be possible — even in principle — to learn features of our universe that humanity is completely ignorant of[8].

For similar reasons, it may turn out to be the case that it is possible to predict text generated by conscious agents to arbitrarily high fidelity without instantiating conscious simulacra. That is, humans may have subjective experiences and behaviour that cannot be fully captured/discriminated within language. Any aspects of the human experience/condition that are not represented (at least implicitly by reasonable inductive biases) are underdetermined in the limit of text prediction.


Conclusions

Ultimately, while I grant the aforementioned caveats some weight, and those arguments did update me significantly downwards on the likelihood of mindcrime in sufficiently powerful language models[9], I still fundamentally expect text prediction to scale to superintelligence in the limit.

I think humanity's language corpus is a sufficiently comprehensive record of humanity's accumulated explicit knowledge and sufficiently rich representation of our shared world model, that arbitrarily high accuracy in predicting text necessarily requires strongly superhuman general intelligence.

  1. ^

    Particularly strongly superhuman general intelligence. Henceforth "superintelligence".

  2. ^

    At least to degrees of fidelity that can be distinguished via text.

  3. ^

    More specifically, the world model implicit in our recorded knowledge.

  1. ^

    A ring theorist was able to coax ChatGPT to develop new nontrivial, logically sound mathematical concepts and generate examples of them. Extrapolating further, I would expect that sufficiently powerful language models will be able to infer many significant novel theoretical insights that could be in principle located given the totality of humanity's recorded knowledge.

  2. ^

    That is, they can learn an efficient map of our universe and successfully navigate said map to make useful predictions about it. Sufficiently capable language models should be capable of e.g. predicting research write ups, academic reports and similar.

  3. ^

    At least in principle, leaving aside whether current architectures will scale that far. Sufficiently strong optimisation on the task of text prediction is in principle capable of creating vastly superhuman generally intelligent systems.

  4. ^

    That is, sufficiently powerful language models are capable enough to a degree that they could — under particular circumstances — be existentially dangerous. I do not mean to imply that they are independently (by their very nature) existentially dangerous.

  5. ^

    That is, features of our universe that are not captured, not even implicitly, not even by interpolation/extrapolation in our recorded knowledge.

  6. ^

    This point may not matter that much as future simulators will probably be multimodal. It seems much more likely that the limit of multimodal prediction of conscious agents, may necessitate instantiating conscious simulacra.

    But this post was specifically about the limit of large language models, and I do think the aspects of human experience not represented in text are a real limitation to the suggestion that in the limit language models might instantiate conscious simulacra.

Show all footnotes
Comments1


Sorted by Click to highlight new comments since:
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f