A Guide to Forecasting AI Science Capabilities

Eleni_A

A Guide to Forecasting AI Science Capabilities

Comments 1

Sorted by

New & upvoted

I am currently trying to form my own views on AI risk and, having skimmed this post, think I will find it very useful, thank you very much.

Particular aspects of this post that helped:

Bullet point lists
Clear, fairly precise summaries of important ideas (eg: FOOM) with links to learn more.
Outlining alternatives for the core ideas (eg: hard vs soft takeoff)
Expressing opinion (implicitly) on the most important parts of each issue. Often these lists will end up listing every reasonable post on a subject (eg: in the concrete AI stories section). This isn't helpful for a newcomer because you need guidance on where to focus your reading.

Comments

A Guide to Forecasting AI Science Capabilities — EA Forum

Collections and resources

Frontpage

The following contains resources that I (Eleni) curated to help the AI Science team of AI Safety Camp 2023 prepare for the second half of the project, i.e., forecasting science capabilities. Suggestions for improvement of this guide are welcome.

Key points and readings

for forecasting in general:

What is a vignette?: https://www.lesswrong.com/posts/jusSrXEAsiqehBsmh/vignettes-workshop-ai-impacts
Start using Metaculus and Manifold (if you haven’t already)
Book review of Superforecasting
Actually possible: thoughts on Utopia

for forecasting AI in particular:

What is a hard take off?: “A hard takeoff (or an AI going "FOOM"^[2]) refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.
Read more:https://www.lesswrong.com/posts/tjH8XPxAnr6JRbh7k/hard-takeoff
What is a soft take off?: “A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real time.”
What is a sharp left turn?: the transition from a slower to a faster scaling regime as defined here (6:27 to 7:50).
Pivotal acts: acts that we, humans, take that make a big difference in terms of making x-risk less likely.
Pivotal acts from Math AIs: use AI to solve alignment at the formal/mathematical level.
Objection: AI alignment doesn’t seem like a purely mathematical problem - it requires knowledge about different aspects of reality that we haven’t been able to formalize (yet) e.g., how do agents (both human and artificial) represent values.
Follow-up objection: if we knew how to formalize these problems then we wouldn’t need much help from AI science models, i.e., if we knew what questions to ask and how to ask them.
Such questions would be: “can you solve this [input difficult for humans yet solvable mathematical puzzle e.g., a matrix multiplication]. (see below in STEM AGI).
How far can we predict?:
- According to the Event Horizon Thesis, whatever happens once we have superintelligence is unpredictable and can’t compare to other technological advances we have already observed/studied.
- Alternative scenarios and how likely are they?: https://www.yudkowsky.net/singularity/schools
How much detail can we have?: If the Event Horizon Thesis is correct, detailed predictions probably stop right before the singularity.
Aim for mechanistic explanations for any claims or prediction, meaning always offer a detailed account of how a process or system works in terms of its components, their interactions, and any underlying principles governing their behavior.
There’s a lot to say about explanation; you might say, well, we’re trying to predict, not to explain. To explain is more like the opposite of predicting ( you may have read that explanation = retrodiction). The two processes have a lot in common but what I’m trying to say is stick to the details!

If you’d like to read some posts to inform your thinking about explanation, I highly recommend:

Specific changes to consider:

Compute → how will chip production change? What are the costs of running GPT-x?

Bio-anchors is compute-centric: if I were to rerun evolution, how much compute would it take to get to agents that do science?
How many chips will we have, how many flops, how much money will the various actors be willing to spend?

Data → are there any reasons to expect that we will run out of good quality data before having powerful models?
Algorithms → are there reasons to expect that current algorithms, methods, architectures will stop working?

Algorithms evolve to reduce compute/computations costs
Watch the AI Triad

How does change happen?

Scaling Laws: more high-quality data get us better results than more parameters.

Concrete AI stories

Choose your ending:
- Good: https://forum.effectivealtruism.org/posts/AuRBKFnjABa6c6GzC/what-success-looks-like
- Pretty bad: https://gwern.net/fiction/clippy
- Even worse: probably something involving s-risks (that I thankfully couldn't find). For ideas on what this could mean, see Superintelligence “Would maximally efficient work be fun?”
- In progress - no end (yet): https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like
- Would a neutral end make sense? What would that even mean? E.g., there is superintelligence but leaves us alone.

Understanding STEM AGI scenarios

The main idea is in STEM AGI i.e., train science models in formal language, not natural language, keep them in a box and make sure they have no cognitive/epistemic tools to develop situational awareness.
But also, we want these models to be really good at tackling complex scientific tasks while not being autonomous in any harmful ways.
Can we know what the AIs will want?
It seems strange to want to have effective solutions for problems of the physical world without any input from the physical world.
In which world do we live?
Looks like we are in a world where the exact opposite is happening: we mostly have very successful NLP models.
GPT-x models have already read the internet many times and probably have enough information to at least simulate the human world in different degrees of detail, depending on the prompt.
We don’t exactly know what GPT-x models know, but here are some hypotheses:

The model operates according to an 1-1 correspondence with the external world through information it retrieves from the internet. Each token is a representation of an object/concept in the human/natural world.
The model is a simulation of the external world. This simulation is like the world of a board game: it features properties of the external world in the form of inferences from human text the model has been trained on, including agentic and non-agentic entities, e.g., you’re playing a WW2 game and you’re in the position of the Allies while your friend plays in the position of Germany. The rules of the game are not the rules of the war that actually took place, but rather what the model inferred about the rules from the training dataset.

Forecasting AGI and science models overlap

See, for example, Conor Leahy’s view on AI predictions.
Current AI models write well enough to be published in scientific journals.
The question is not “when will we have AIs that can trick reviewers into thinking that a paper is good” (that’s a measure of the reviewer’s stupidity); we’re looking for when can the model do science.
When can an AI system publish a paper that satisfies the criteria of “good science” which could entail e.g., lots of citations?
By the time that happens, it’s already too late for alignment.
Conor Leahy predicts

1) that AIs will be able to publish correct science before they can load dishwashers.

2) the world ends before more than 10% of cars on the street are autonomous.

Continuity vs discontinuity in AI progress

Meta-scientific considerations

How will the field of AI alignment evolve?
Different research agendas → growing into a “mature” science/transition to a paradigmatic phase.
Will it become clear that one available research agenda is better than the others?
For example, if it turns out that some AI systems significantly help us with alignment, this will impact how research will move forward.
How do we draw the line between working on capabilities and working on alignment?