That man is born merely for a few, who thinks only of the people of his own generation. Many thousands of years and many thousands of peoples will come after you; it is to these that you should have regard.
— Lucius Annaeus Seneca
Future Matters is a newsletter about longtermism and existential risk. Each month we collect and summarize relevant research and news from the community, and feature a conversation with a prominent researcher. You can also subscribe on Substack, listen on your favorite podcast platform and follow on Twitter. Future Matters is also available in Spanish.
Ajeya Cotra’s biological anchors model to forecast AGI timelines consists of three parts — an estimate of the compute required to train AGI with 2020 algorithms, a projection of how these compute requirements decrease over time due to algorithmic progress, and a forecast of how the size of training runs will increase over time due to declining hardware costs and increased investment in AI training. Tom Davidson’s What a compute-centric framework says about AI takeoff speeds extends Cotra’s framework to incorporate a more sophisticated model of how R&D investment translates into algorithmic and hardware progress, and also to capture the “virtuous circle” whereby AI progress leads to more automation in AI R&D and in turn faster AI progress. This results in a model of AI takeoff speed, defined here as the time between AI being able to automate 20% of cognitive tasks to being able to automate 100% of cognitive tasks. Davidson’s median estimate for AI takeoff is approximately three years. This is an impressive and significant piece of research, which we cannot summarize adequately here; we hope to feature a conversation with the author in a future issue to explore it in more depth. The full report is available here. Readers are encouraged to play around with the neat interactive model.
AGI and the EMH, by Trevor Chow, Basil Halperin, and J. Zachary Mazlish, highlights the tension between the efficient market hypothesis and the hypothesis that transformative AI will arrive in the next few decades. Transformative AI will either raise economic growth rates if aligned or raise the risk of extinction if unaligned. But either of these disjuncts imply much higher real interest rates. (This implication follows from both intuition and mainstream economic theory.) Since we are not observing higher real interest rates, we should conclude either that timelines are longer than generally assumed by the EA and alignment communities, or that markets are radically underestimating how soon transformative AI will arrive.
Zac Hatfield-Dodds shares some Concrete reasons for hope about AI safety [🔉]. A researcher at Anthropic (writing in a personal capacity), he takes existential risks from AI seriously, but pushes back on recent pronouncements that AI catastrophe is pretty much inevitable. Hatfield-Dodds highlights some of the promising results from the nascent efforts at figuring out how to align and interpret large language models. The piece is intended to “rebalance the emotional scales” in the AI safety community, which he feels have recently tipped too far towards a despair that feels is both unwarranted and unconstructive.
Holden Karnofsky's Transformative AI issues (not just misalignment) [🔉] surveys some of the high-stakes issues raised by transformative AI, particularly those that we should be thinking about ahead of time in order to make a lasting difference to the long-term future. These include not just existential risk from misalignment, but also power imbalances, early AI applications, new life forms, and persistent policies and norms. Karnofsky is inclined to prioritize the first two issues, since he feels very uncertain about the sign of interventions focused on the remaining ones.
Lizka Vaintrob argues that we should Beware safety-washing [🔉] by AI companies, akin to greenwashing, where companies misrepresent themselves as being more environmentally conscious than they actually are, rather than taking costly actions to reduce their environmental impact. This could involve misleading not just consumers, but investors, employees, regulators, etc. on whether an AI project took safety concerns seriously. One promising way to address this would be developing common standards for safety, and trustworthy methods for auditing and evaluating companies against these standards.
In How we could stumble into AI catastrophe [🔉], Holden Karnofsky describes a concrete scenario of how unaligned AI might result in a global catastrophe. The scenario is described against two central assumptions (which Karnofsky discusses in previous writings): that we will soon develop very powerful AI systems and that the world will otherwise be very similar to today's when those systems are developed. Karnofsky's scenario draws heavily from Ajeya Cotra's post, Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover (see FM#4 for a summary of the article and FM#5 for our conversation with Cotra).
In Managing the transition to widespread metagenomic monitoring [🔉], Chelsea Liang and David Manheim outline a vision for next-generation biosurveillance using current technologies. An ambitious program of widespread metagenomic sequencing would be great for managing pandemic risk, by serving as an early warning for identifying novel outbreaks. But getting to this stage requires first addressing a number of important obstacles including high costs and privacy concerns.
In Technological stagnation: why I came around [🔉], Jason Crawford outlines some arguments for the ‘great stagnation’ hypothesis—the view that technological and scientific progress have slowed down substantially since the 1970s. Crawford’s main argument is qualitative: while we have seen significant innovation in IT since the 1970s, we’ve haven’t had many major breakthroughs in manufacturing, energy, and transportation, whereas previous industrial revolutions have been characterized by innovation across all major sectors. Crawford offers some quantitative arguments, pointing to US GDP and TFP growth rates. This was a readable post, but we remain to be convinced of the stagnation hypothesis: the qualitative arguments were hand-wavy, and the macro data looks pretty inconclusive for the most part (see also Alexey Guzey’s criticism of an influential paper on the topic).
Holden Karnofsky's Spreading messages to help with the most important century [🔉] considers various messaging strategies for raising awareness about the risks posed by transformative AI. Karnofsky favors approaches that help others develop a gears-level understanding of the dangers of AI, that communicate that AI alignment research is uniquely beneficial, and that focus on the threats AI poses for all humans. By contrast, he believes we should de-emphasize messages stressing the importance and potential imminence of powerful AI, and those that stress the dangers of AI without explaining why it is dangerous.
Literature review of transformative artificial intelligence timelines, by Keith Wynroe, David Atkinson and Jaime Sevilla, is a comprehensive overview of various attempts to forecast the arrival of transformative AI. The authors summarize five model-based forecasts and five judgement-based forecasts, and produce an aggregate of each of these two forecasts types based on Epoch members' subjective weightings. The Epoch website also lets readers input their weights and see the resulting aggregate forecasts. We found this literature review very useful and consider it the best existing summary of what is currently known about AI timelines.
Misha Yagudin, Jonathan Mann & Nuño Sempere share an Update to Samotsvety AGI timelines. In aggregate, the forecasting group places 10% on AGI by 2026, and 50% by 2041. This represents a shortening of timelines since Samotsvety last published similar numbers, which put ~32% on AGI by 2042.
Eli Dourado's Heretical thoughts on AI [🔉] argues that artificial intelligence may fail to have a transformative economic impact even if it transforms other aspects of human life. Dourado notes that, for many of the largest sectors in the economy—such as housing, energy, transportation and health—, growth has been slow primarily because of regulation, litigation and public opposition. Progress in capabilities, however impressive, may thus fail to precipitate an economic transformation.
In Longtermism and animals [🔉], Heather Browning and Walter Veit argue that the interests of non-human animals should be incorporated in longtermist priority-setting, and that this could meaningfully affect decision-making about the long-term future. As the authors mention, this is closely relevant to questions on the ethics of digital minds.
- Paul Christiano shares Thoughts on the impact of RLHF research [🔉] (reinforcement learning from human feedback), which was a focus of his alignment work at OpenAI in 2017–20.
- Nuño Sempere shares his highly personal skepticism braindump on existential risk from AI.
- Richard Chappell’s Text, Subtext, and Miscommunication is a particularly thoughtful discussion of the recent Nick Bostrom debacle.
- Generative language models and automated influence operations [🔉], by Josh Goldstein and collaborators, investigates the impacts of large language models on efforts to influence public opinion and considers possible interventions to mitigate these risks.
- In We need holistic AI macrostrategy [🔉], Nick Gabs argues that research on macrostrategic questions related to AI alignment should be a top priority.
- MIRI released a conversation between Scott Alexander and Eliezer Yudkowsky [🔉] covering analogies to human moral development, “consequentialism”, acausal trade, and alignment research opportunities.
- In Air safety to combat global catastrophic biorisks [🔉], Jam Kraprayoon, Gavriel Kleinwaks, Alastair Fraser-Urquhart, and Josh Morrison argue that extending indoor air quality standards to include airborne pathogen levels could significantly reduce global catastrophic biological risks.
Applications for the 2023 PIBBSS Summer Research Fellowship are open until Feb 5th.
Jack Clark gave an educational presentation on AI policy to the US Congress’s AI Caucus.
Kelsey Piper predicts what will likely happen with AI in 2023 [🔉]: better text generators, better image models, more widespread adoption of coding assistants, takeoff of AI personal assistants, and more.
Everyone’s least favorite tool for communicating existential risk, the Doomsday Clock, has been set to 90 seconds to midnight this year.
Michaël Trazzi interviewed [🔉] DeepMind senior research scientist Victoria Krakovna about arguments for AGI ruin, paradigms of AI alignment, and her co-written article 'Refining the Sharp Left Turn threat model'.
David Krueger talked about existential safety, alignment, and specification problems for the Machine Learning Safety Scholars summer program.
Applications are open for the course “Economic Theory & Global Prioritization”, taught primarily by Phil Trammell and sponsored by the Forethought Foundation, to be held in Oxford in August 2023. Apply now.
Meanwhile, OpenAI has received a new $10 billion investment from Microsoft.
The New York Times asks Are we living in a computer simulation, and can we hack it?
The RAND Corporation is accepting applications for the Stanton Nuclear Security Fellows Program, open to postdoctoral students and tenure track junior faculty, as well as to doctoral students working primarily in nuclear security. Apply now.
Aisafety.training is a useful new website collecting information on AI safety programs, conferences and events.
Epoch, a research group forecasting the development of transformative artificial intelligence, has released a report summarizing their main achievements in 2022.
Conversation with Lukas Finnveden
Lukas Finnveden is a research analyst at Open Philanthropy, where he focuses on potential risks from advanced AI and ways to reduce them. Previously, he was a research scholar at the Future of Humanity Institute. Lukas has a B.S. in computer science from KTH Royal Institute of Technology.
Future Matters: You recently co-authored a report on AGI and lock-in, which will be the focus of this conversation. To begin, could you clarify what you mean by those two terms?
Lukas Finnveden: Yes. We were looking at some fairly strong definition of AGI: AI that can do all relevant tasks at least as well and as cheaply as humans can do them. In reality, the first system like that will probably be significantly superhuman at most tasks, since I’d expect AI to have a different spread in their abilities compared with humans, but in the report we only assume roughly human-level abilities. And when I say 'relevant tasks', that's basically a way to say all tasks without actually having to commit to all tasks. Maybe there's some random task that humans are better at—but no: we're looking at the tasks that are actually relevant for lock-in, which you can hopefully glean from the report and the arguments. So that's what we mean by AGI.
For lock-in, we're operating with a somewhat different definition than some other people, so it's definitely good to clarify this. The thing that we're looking at is predictable stability: some property of the world has been locked in if it's very probable that that property of the world will hold for a very large amount of time, where probable is supposed to be read in a pseudo-objective sense. It's not about purely subjective probability (as though lock-in happened if someone believes that lock-in happened) because we want to take into account the possibility that people can be wrong. But also objective probability is a bit of a fraught concept. So we're looking at what a highly informed, reasonable observer would believe. That's the basic definition we're using for lock-in.
This contrasts with the definition Toby Ord uses in The Precipice: some aspect is locked in if it is almost impossible to change. The reason why we didn't use his definition is that it implies some sort of distinction between things that civilization could change but don't want to change, and things that are impossible to change. But in our report, a big part of the story of how lock-in could happen is that AI systems could be designed to have particular, stable desires and preferences. And in that context, I don’t think it makes sense to strongly distinguish a dispreference of changing things from an inability to change things.
Future Matters: You begin the report by describing five different claims one could make about the long run trajectory of intelligent life. Can you list them and briefly elaborate on each of these?
Lukas Finnveden: The report starts off with four examples of claims that some might make about the future, to contrast with the claim we make about lock-in:
(A) Humanity will almost certainly go extinct in the next million years.
(B) Under Darwinian pressures, intelligent life will predictably spread throughout the stars and rapidly evolve towards maximal reproductive fitness.
(C) Through moral reflection, intelligent life will reliably be driven to pursue some specific higher non-reproductive goal, such as maximizing the happiness of all creatures. [This is something you might believe if you are a strong moral realist.]
(D) The choices of intelligent life are fundamentally uncertain and unpredictable, so much so that even over millions of years of history, at no point will you be able to predict any important features about what will happen in the next 1000 years or something. Things will just keep changing.
And then finally the claim that we want to defend:
(E) It is possible to stabilize many features of society for millions or trillions of years. But it is possible to stabilize them into many different shapes —so civilization's long term behavior is contingent on what happens early on.
I think it's worth noticing that, unlike E, claims A to D have one thing in common: they're fairly confident that the future will turn out in a particular way. (Or in the case of D, fairly confident that we can at no point be confident about how the future will turn out.) And so we want to contrast these confident claims with the claim that actually some fairly wide set of futures are at least possible, right now, but that this could change with the coming of AGI.
Importantly, if someone wanted to argue about the probability that humanity goes extinct, or that Darwinian pressure pushes them towards maximal reproductive fitness, or that some other thing gets locked in very soon, then the report has less to say about the relative probabilities of those things.
Future Matters: The core of the report is three claims about lock-in, conditional on the arrival of AGI. Could you walk us through these claims one by one?
Lukas Finnveden: Yeah so these 3 assertions are a short breakdown of why we think that lock-in will be possible with the coming of AGI.
Assertion 1 is that it will be possible to preserve highly nuanced specifications of values and goals far into the future, without losing any information. A necessary component for some complex values to be locked-in is that you can at least preserve information about what those values are. I think this will be fulfilled because with AGI you could basically just store values in the form of minds with very nuanced, detailed goals. And with the help of error correction you could store these for an extremely long time without randomly losing information.
Assertion 2 is that with sufficient investments it will be feasible to develop AGI based institutions that, with high probability, competently and faithfully pursue the specified values —at least until an external source stops them, or the values themselves recommend that this should stop. This is very related to AI alignment—the possibility of designing AI systems aligned with particular goals—combined with the claim that those AIs could be built and used in ways such that they would be very unlikely to ever drift from those goals.
Here it’s important to flag that the report only discusses whether lock-in would be possible given a very large amount of coordination and investment. While I think that alignment is probably solvable given enough investment, in reality it’s very unclear whether this will happen. Which is worrying, since without some form of alignment, the likely alternative is that AI takes over and disempowers humanity.
And then, since assertion 2 considers the possibility of an external source stopping them, assertion 3 holds that if the world's economic and military powers agreed to set up such an institution, and gave it the power and ability to defend itself against any external threats then that institution could pursue its agenda for at least millions and possibly for trillions of years. This assertion basically follows from assertion 1 and 2, I think, but is worth flagging separately.
Future Matters: In the report, you make this point with reference to whole brain emulation (WBE) technologies, which you expect would arrive soon after AGI. Does the picture change substantially if faithful information preservation is attained via technologies other than whole brain emulation?
Lukas Finnveden: Could lock-in still happen without whole brain emulation? Yes, that seems likely, at least for values that didn’t urgently require judgments that would be hard to get without whole brain emulation.
The key point with bringing up whole brain emulation is that it could preserve information about what humans want. But in order to capture the preferences of some group of humans—not what they’d think with a thousand years of thought, but just their current view—then it also seems like an AGI system that spent a great number of years speaking with those humans about their preferences, asking about lots of different edge cases, really trying to nail down their psychology, and so on, that such an AI system would be able to get a great predictive model of what those humans would think about different cases.
And then if it was important to at some point have whole brain emulations (for example to find out what the humans would think given the opportunity to think about something for 1000 years), those could in principle come later on, as long as brain preservation methods were solved fairly soon after AGI.
Future Matters: Turning to the second of the three assertions, the report notes that the form of alignment required for lock-in is easier to solve than the traditional problem of alignment. Can you explain why you think that is the case?
Lukas Finnveden: I think the most notable thing here is that when people speak about ‘solving alignment’, they're often (though not always) thinking about a competitive form of alignment. Competitive alignment is about being able to build aligned AI systems that are about as efficient and smart and useful as misaligned AI systems at accomplishing the same tasks.
One reason why competitive alignment techniques are important is that they’re a smaller ask. If important decision makers disagree about misalignment being a big risk, they might not be willing to implement incredibly expensive or inefficient alignment solutions. But a competitive alignment technique could still be worthwhile as long as they agreed that there were some risks.
Another reason is that if some misaligned AI systems appear in the world, it would be nice if aligned AI systems weren’t immediately overpowered because they're so much less efficient.
By contrast, for the lock-in hypothetical, we’re discussing feasibility, so we don’t engage with how people might mess up by underestimating alignment risks. And I’m imagining a scenario where the world is, for a time, coordinated enough that there are no misaligned AI systems to compete with. And then once there’s a stable institution in place, the report argues that such an institution could enforce a ban on dangerous technology, including misaligned AI, if that was necessary.
A related but distinct point is that lock-in wouldn’t require significantly superhuman intelligence — we only assume AI have roughly human-level abilities. Roughly human-level AI systems seem much easier to align, since it should be possible for humans to understand what they're doing and provide good feedback.
Future Matters: Turning to the third assertion, you point out that governments tend to be unstable, and historic attempts to establish permanent regimes have typically failed quite quickly. Could you explain why you think the postulated long-lasting institutions could be significantly more robust to the sorts of threats that typically destabilize regimes? And could you also elaborate on the sources of instability that they would still face?
Lukas Finnveden: Sure, I’ll outline some reasons.
So firstly, one common way in which regimes have ended in the past is that a leader or group of leaders die, and that their successors are much less interested in pursuing their predecessors’ vision. By contrast, AI systems wouldn’t die of age, and many copies of them could be stored in different places to prevent them from being vulnerable to accidents. This point applies more broadly too — for anything that the institution needs to survive, they could store many copies of them in redundant places. This would make them robust to everything but for either worldwide natural disasters or an intelligent, organized attempt to get rid of them. So let's talk about those.
Natural disasters are something that has destabilized some states in the past. But I think AI-based institutions could quite easily be robust to them. If you look at the short list of natural disasters that could potentially threaten human civilization as a whole, today, it looks like AGI-based systems should be a lot less vulnerable to them. For natural disasters like supervolcanoes or asteroids, most of the danger comes from dust blotting out the sun for a long time. But it seems that AGI-based institutions could survive such things by storing alternative forms of energy that don't require the sun. Biological pandemics wouldn't really be a problem for AGIs. I think computer virus style pandemics are one of the more plausible obstacles, but that digital error correction probably would prevent them.
Intelligent, organized opposition is of course a very common reason that states lose power — either because of internal power struggles, rebelling populations, or external states. But the AGI based institutions could build large numbers of AGI systems that share its goals, and be run entirely by such systems. It would thus not face significant internal opposition. And assuming that it started out with overwhelming economic and military power, it would also not face any dangerous external opposition. In particular, it's worth noting that the endless supply of loyal AI systems would give it really powerful surveillance capabilities, if it chose to use them.
Let’s talk about events that could spell the end of even the most stable institutions. One natural event where these arguments wouldn't apply would be the end of the universe, or the eventual end of accessible useful resources across the universe. And then in the intelligent opposition category, there’s the possibility of encountering alien civilizations. The assumption that the dominant institutions started out with large economic and military power wouldn't really matter in the case where they encountered an alien civilization, because the aliens could be similarly powerful.
Future Matters: As you note, the report focuses on the feasibility rather than the desirability of long-term lock-in. Do you have any thoughts on the latter question? How desirable would these scenarios be?
Lukas Finnveden: Some salient lock-in scenarios seem very scary, in that — at least in principle — they paint a picture where someone could pick some arbitrary set of values and impose them onto the world forever. If that happened in an unconsidered manner, or with input from just a small number of people, that would be a huge tragedy. The futures that I hope for are ones where everyone gets to have their say, where there’s vigorous discussions about what values to prioritize, serious attempts to reach compromises, and so on.
That said, when I think about what scenarios are desirable or not, it actually doesn’t seem to me like “lock-in” is a very useful category — at least not our definition. It seems like some types of stability could be very good — e.g. stable institutions that make it hard for some small group of people to seize power and lock in their own regime, or that make extinction unlikely, or preserve some types of human rights. Also, recall the definition of “lock-in” that we use in the report — the definition that focuses on predictable stability. That would include situations where governance is very democratic and people are very open minded, but where they’ve just thought about things for a long time, already processed every argument that they could think of, such that despite their open mindedness, they’re unlikely to change their mind again. This would still count as a case where the future has become predictable, and many possible paths have been excluded, and in that sense count as lock-in.
But from a desirability perspective, such scenarios are incredibly different from a situation where a small group of people seize power and immediately lock-in their favorite value. So at least our definition of “lock-in” is very much not up to the task of separating good futures from bad ones. Instead, for future thinking on this, I think we probably want to have more nuanced categories of different types of stability, and different types of governance mechanisms we may or may not want.
The reason that the report looks at these very extreme locked-in scenarios is just that they’re unusually easy to analyze, and that they tell us something important about what is and isn’t feasible here. Given that it would be hypothetically possible to lock-in almost any set of values, that demonstrates that there are many things that can happen, there is potentially some path-dependency here, and we could really benefit from thinking ahead about what we want. But this extreme end of the spectrum isn't necessarily where most of the important decision relevant action is, once we've established that it's possible.
Future Matters: We are curious about the origins of this report. We remember seeing an unfinished draft on Jess Riedel’s website. Then, a while later, we see this report co-written by you, Jess and Carl Shulman. What’s the story behind it?
Lukas Finnveden: Yeah, so the story begins with Jess working on this topic some number of years ago, partly with input from Carl. He wrote this 50 page document of notes. Then, when Will MacAskill was writing What We Owe the Future, he was discussing value lock-in and thought that it would be great if Jess's work could be finished so he could reference it. Jess didn't have time to finish it himself, but both he and Carl were happy to be co-authors, so they were looking for someone else to write up a public-facing version. And that’s where I entered the picture.
Future Matters: Thank you, Lukas!
Lukas spoke to Future Matters in a personal capacity, and any views expressed are his own rather than those of his employer or co-authors.
We thank Leonardo Picón and Lyl Macalalad for editorial assistance.