The following contains resources that I (Eleni) curated to help the AI Science team of AI Safety Camp 2023 prepare for the second half of the project, i.e., forecasting science capabilities. Suggestions for improvement of this guide are welcome.
Key points and readings
for forecasting in general:
- What is a vignette?: https://www.lesswrong.com/posts/jusSrXEAsiqehBsmh/vignettes-workshop-ai-impacts
- Start using Metaculus and Manifold (if you haven’t already)
- Book review of Superforecasting
- Actually possible: thoughts on Utopia
for forecasting AI in particular:
- What is a hard take off?: “A hard takeoff (or an AI going "FOOM") refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.
- Read more:https://www.lesswrong.com/posts/tjH8XPxAnr6JRbh7k/hard-takeoff
- What is a soft take off?: “A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real time.”
- What is a sharp left turn?: the transition from a slower to a faster scaling regime as defined here (6:27 to 7:50).
- Pivotal acts: acts that we, humans, take that make a big difference in terms of making x-risk less likely.
- Pivotal acts from Math AIs: use AI to solve alignment at the formal/mathematical level.
- Objection: AI alignment doesn’t seem like a purely mathematical problem - it requires knowledge about different aspects of reality that we haven’t been able to formalize (yet) e.g., how do agents (both human and artificial) represent values.
- Follow-up objection: if we knew how to formalize these problems then we wouldn’t need much help from AI science models, i.e., if we knew what questions to ask and how to ask them.
- Such questions would be: “can you solve this [input difficult for humans yet solvable mathematical puzzle e.g., a matrix multiplication]. (see below in STEM AGI).
- How far can we predict?:
- How much detail can we have?: If the Event Horizon Thesis is correct, detailed predictions probably stop right before the singularity.
- Aim for mechanistic explanations for any claims or prediction, meaning always offer a detailed account of how a process or system works in terms of its components, their interactions, and any underlying principles governing their behavior.
- There’s a lot to say about explanation; you might say, well, we’re trying to predict, not to explain. To explain is more like the opposite of predicting ( you may have read that explanation = retrodiction). The two processes have a lot in common but what I’m trying to say is stick to the details!
If you’d like to read some posts to inform your thinking about explanation, I highly recommend:
Specific changes to consider:
- Compute → how will chip production change? What are the costs of running GPT-x?
- Bio-anchors is compute-centric: if I were to rerun evolution, how much compute would it take to get to agents that do science?
- How many chips will we have, how many flops, how much money will the various actors be willing to spend?
- Data → are there any reasons to expect that we will run out of good quality data before having powerful models?
- Algorithms → are there reasons to expect that current algorithms, methods, architectures will stop working?
How does change happen?
Scaling Laws: more high-quality data get us better results than more parameters.
- Choose your ending:
- Good: https://forum.effectivealtruism.org/posts/AuRBKFnjABa6c6GzC/what-success-looks-like
- Pretty bad: https://gwern.net/fiction/clippy
- Even worse: probably something involving s-risks (that I thankfully couldn't find). For ideas on what this could mean, see Superintelligence “Would maximally efficient work be fun?”
- In progress - no end (yet): https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like
- Would a neutral end make sense? What would that even mean? E.g., there is superintelligence but leaves us alone.
Understanding STEM AGI scenarios
- The main idea is in STEM AGI i.e., train science models in formal language, not natural language, keep them in a box and make sure they have no cognitive/epistemic tools to develop situational awareness.
- But also, we want these models to be really good at tackling complex scientific tasks while not being autonomous in any harmful ways.
- Can we know what the AIs will want?
- It seems strange to want to have effective solutions for problems of the physical world without any input from the physical world.
- In which world do we live?
- Looks like we are in a world where the exact opposite is happening: we mostly have very successful NLP models.
- GPT-x models have already read the internet many times and probably have enough information to at least simulate the human world in different degrees of detail, depending on the prompt.
- We don’t exactly know what GPT-x models know, but here are some hypotheses:
- The model operates according to an 1-1 correspondence with the external world through information it retrieves from the internet. Each token is a representation of an object/concept in the human/natural world.
- The model is a simulation of the external world. This simulation is like the world of a board game: it features properties of the external world in the form of inferences from human text the model has been trained on, including agentic and non-agentic entities, e.g., you’re playing a WW2 game and you’re in the position of the Allies while your friend plays in the position of Germany. The rules of the game are not the rules of the war that actually took place, but rather what the model inferred about the rules from the training dataset.
Forecasting AGI and science models overlap
- See, for example, Conor Leahy’s view on AI predictions.
- Current AI models write well enough to be published in scientific journals.
- The question is not “when will we have AIs that can trick reviewers into thinking that a paper is good” (that’s a measure of the reviewer’s stupidity); we’re looking for when can the model do science.
- When can an AI system publish a paper that satisfies the criteria of “good science” which could entail e.g., lots of citations?
- By the time that happens, it’s already too late for alignment.
- Conor Leahy predicts
1) that AIs will be able to publish correct science before they can load dishwashers.
2) the world ends before more than 10% of cars on the street are autonomous.
Continuity vs discontinuity in AI progress
- Will AI undergo discontinuous progress?
- Discontinuous progress in history: an update
- Likelihood of discontinuous progress around the development of AGI
- How will the field of AI alignment evolve?
- Different research agendas → growing into a “mature” science/transition to a paradigmatic phase.
- Will it become clear that one available research agenda is better than the others?
- For example, if it turns out that some AI systems significantly help us with alignment, this will impact how research will move forward.
- How do we draw the line between working on capabilities and working on alignment?