tl;dr: The report underestimates the amount of compute used by evolution because it only looks at what it would take to simulate neurons, rather than neurons in agents inside a complex environment. It's not clear to me what the magnitude of the error is, but it could range many, many orders of magnitude. This makes it a less forceful outside view.

Background

Within Effective Altruism, Ajeya Cotra's report on artificial general intelligence (AGI) timelines has been influential in justifying or convincing members and organizations to work on AGI safety. The report has a section on the "evolutionary anchor", i.e., an upper bound on how much compute it would take to reach artificial general intelligence. The section can be found in pages 24-28 of this Google doc. As a summary, in the report's own words:

This hypothesis states that we should assume on priors that training computation requirements will resemble the amount of computation performed in all animal brains over the course of evolution from the earliest animals with neurons to modern humans, because we should expect our architectures and optimization algorithms to be about as efficient as natural selection.

This anchor isn't all that important in the report's own terms: it only gets a 10% probability assigned to it in the final weighted average. But this bound is personally important to me because I do buy that if you literally reran evolution, or if you use as much computation as evolution, you would have a high chance of producing something as intelligent as humans, and so I think that it is particularly forceful as an "outside view".

Explanation of my concern

I don't buy the details of how the author arrives at the estimate of the compute used by evolution:

The amount of computation done over evolutionary history can roughly be approximated by the following formula: (Length of time since earliest neurons emerged) * (Total amount of computation occurring at a given point in time). My rough best guess for each of these factors is as follows:

  • Length of evolutionary time: Virtually all animals have neurons of some form, which means that the earliest nervous systems in human evolutionary history likely emerged around the time that the Kingdom Animalia diverged from the rest of the Eukaryotes. According to timetree.org, an online resource for estimating when different taxa diverged from one another, this occurred around ~6e8 years ago. In seconds, this is ~1e16 seconds.
  • Total amount of computation occurring at a given point in time: This blog post attempts to estimate how many individual creatures in various taxa are alive at any given point in time in the modern period. It implies that the total amount of brain computation occurring inside animals with very few neurons is roughly comparable to the amount of brain computation occurring inside the animals with the largest brains. For example, the population of nematodes (a phylum of small worms including C. Elegans) estimated to be ~1e20 to ~1e22 individuals. Assuming that each nematode performs ~10,000 FLOP/s,the number of FLOP contributed by the nematodes every second is ~1e21 * 1e4 = ~1e25; this doesn't count non-nematode animals with similar or fewer numbers of neurons. On the other hand, the number of FLOP/s contributed by humans is (~7e9 humans) * (~1e15 FLOP/s / person) = ~7e24. The human population is vastly larger now than it was during most of our evolutionary history, whereas it is likely that the population of animals with tiny nervous systems has stayed similar. This suggests to me that the average ancestor across our entire evolutionary history was likely tiny and performed very few FLOP/s. I will assume that the "average ancestor" performed about as many FLOP/s as a nematode and the "average population size" was ~1e21 individuals alive at a given point in time. This implies that our ancestors were collectively performing ~1e25 FLOP every second on average over the ~1 billion years of evolutionary history.

This implies that the total amount of computation done over the course of evolution from the first animals with neurons to humans was (~1e16 seconds) * (~1e25 FLOP/s) = ~1e41 FLOP

Unlike what the reader might suspect, I don't particularly want to take issue with the assumption that "...the total amount of brain computation occurring inside animals with very few neurons is roughly comparable to the amount of brain computation occurring inside the animals with the largest brains". I haven't looked at the literature on this, and the assumption seems prima facie plausible.

Instead, I don't buy the assumption that to simulate evolution, it would be enough to simulate the behaviour of all the neurons throughout history.  Instead, I think that one would also have to simulate the stimuli to which these neurons are exposed in order to compute how neurons behave, and for this one also has to simulate the environment. For a simple example, in the case of AlphaGo, one not only has to simulate the inner workings of the model, but also the state of the Go board. Likewise, to simulate evolution, one would not only have to simulate the neurons of the brains in it, but also the state of the world in which they are.

It's not clear to me how big a world would one need to spawn intelligent life. For instance, I think that Minecraft requires on the order of ~500 GFLOPs to run[1]. Conversely, if we want a planet as large as Earth, with a complex atmosphere & so on, this might vastly exceed the computing power of current supercomputers[2]. Working through these details might be a neat research project, and perhaps a good fit for an organization such as AI Impacts or Epoch, though I'm not sure how many estimates other than my own it would change. In any case, this would involve:

  1. Coming up with estimates of what the least fine-grained world that we would expect might be able to produce intelligent life if we simulated natural selection in it.
  2. Calculating how much compute it would take to in fact simulate it.

Propagation of beliefs

After reading this, I think that the reader should:

  1. Make AI timelines (slightly) wider to the right
  2. Slightly reduce credence in the argument chain in The Most Important Century series, which leans heavily on the argument that AGI will be developed this century.
  3. Slightly increase credence that something like The Patient Philanthropy Project, which pretty much conditions on no AGI soon, is a good idea.
  4. The meta update: about whether arguments in favor of shorter timelines are found more often, and about how intensely Open Philanthropy reports have been scrutinized.

I think that the first three updates should be relatively small, on the order of 0.1% to 10%:

  • Something like 0.1% if you are keeping careful track of your AI timelines in a spreadsheet and don't particularly care about the evolutionary anchor, but want to update a bit to not fall prey to confirmation bias.
  • Something like 10% if you are replacing the whole of Ajeya Cotra's weight on the previous evolutionary anchor timelines with some unspecified but much longer timelines, and if you greatly favored Cotra's estimate over Carlsmith's or Davidson's.
  • You could also have a much larger update if you more predominantly cared about the evolutionary anchor.

I haven't given much thought about how the meta update should look like.

Discussion

Alex Guzey mentions:

indeed by only considering neurons fired instead information observed by the organisms you probably save yourself orders of magnitude of computational requirements. But you don't know a priori how exactly to make these neurons fire without looking at all of the environment and doing computation over it.

In conversation with Misha Yagudin, he mentions:

  1. that it is not clear whether the report is talking about "simulating all of evolution" when saying something like "we should expect our architectures and optimization algorithms to be about as efficient as natural selection."
  2. That he personally expects AI to require less compute than evolution.
  3. Than even if you have to simulate an environment, this could have comparable computational requirements to computing the neurons firing in the ML model. But it could also be much greater. "It’s important if the computational requirements are compatible or greater. If it’s 10% more than baseline, this is within model’s precision. If it’s ×1 — that’s concerning, if it’s ×10 it’s a mistake, ×10^10 — that’s a disaster."

I broadly agree with these points. On 3., I don't like to use the term "Knightian uncertainty", but I've given an overly broad range between "Minecraft" and "simulate all of the earth at similar levels of precision as we currently simulate the atmosphere".

Acknowledgements

Thanks to Alexey Guzey and Misha Yagudin for reviewing this post. Otherwise, my work is generally supported by the Quantified Uncertainty Research Institute.


This page claims that recommended requirements are an Intel Core i5 for the CPU, and a Radeon 7 200 for the GPU. The Intel CPU has a processing speed of 37.73 GFLOPs and the Radeon of 467.2 GFLOPs. I'm not sure what the utilization of the GPU/CPU is, but in this case it doesn't matter much because the estimate is so low compared to the cost of computing neurons. ↩︎

E.g., see this article on a £1.2 billion, 60 petaFLOPS supercomputer built solely for better atmospheric prediction/simulation. Simulating the rest of the world below the atmosphere would require more computing power on top of that. I'm also not clear on what the resolution on the atmospheric grid which these supercomputers are simulated is. ↩︎

75

43 comments, sorted by Click to highlight new comments since: Today at 3:59 PM
New Comment

Ajeya's report addresses this in the "What if training data or environments will be a bottleneck?" section, in particular in the "Having computation to run training environments" subsection:

 

An implicit assumption made by all the biological anchor hypotheses is that the overwhelming majority of the computational cost of training will come from running the model that is being trained, rather than from running its training environment. 

This is clearly the case for a transformative model which only operates on text, code, images, audio, and video since in that case the “environment” (the strings of tokens or pixels being processed) requires a negligible amount of computation and memory compared to what is required for a large model. Additionally, as I mentioned above, it seems possible that some highly abstract mathematical environments which are very cheap to run could nonetheless be very rich and support extremely intelligent agents. I think this is likely to be sufficient for training a transformative model, although I am not confident.  

If reinforcement learning in a rich simulated world (e.g. complex physics or other creatures) is required to train a transformative model, it is less clear whether model computation will dominate the computation of the environment. Nonetheless, I still believe this is likely. My understanding is that the computation used to run video game-playing agents is currently in the same ballpark as the computation used to run the game engine. Given these models are far from perfect play, there is likely still substantial room to improve on those same environments with a larger model. It doesn’t seem likely that the computational cost of environments will need to grow faster than the computational cost of agents going forward.  (If several intelligent agents must interact with one another in the environment, it seems likely that all agents can be copies of the same model.) 

In the main report, I assume that the computation required to train a transformative model under this path can be well-approximated by FHKP, where F is the model’s FLOP / subj sec, H is the model’s horizon length in subjective seconds, P is the parameter count of the model, and and K describe scaling behavior. I do not add an additional term for the computational cost of running the environment.

Thanks for pointing out that section. I agree that the section discusses the issue. But I am left unsatisfied about whether it defeats it.

In particular,

  • "I think this is likely to be sufficient for training a transformative model, although I am not confident. "
  • "it is less clear whether model computation will dominate the computation of the environment. Nonetheless, I still believe this is likely"

don't seem hugely confident. And that's fine, the report is already pretty long. 

But then even if the report is not at fault I am kind of unsatisfied about the evolutionary anchor part being used as an actual upper bound—not sure whether people are actually doing that all that often, but Eli's comment below seems to indicate that it might be, and I remember it being used that way on a couple of occasions.

This seems like a reasonable assumption for other anchors such as the Lifetime and the Neural Network Horizon anchors, which assume that training environments for TAI are similar to training environments used for AI today. But it seems much more difficult to justify for the evolution anchor, which Ajeya admits would be far more computationally intensive than storing text or simulating a deterministic Atari game.

This post argues that the evolutionary environment is similarly or more complex than the brains of the organisms within it, while the second paragraph of the above quotation disagrees. Neither argument seems detailed enough to definitively answer the question, so I’d be interested to read any further research on the two questions proposed in the post:

  1. Coming up with estimates of what the least fine-grained world that we would expect might be able to produce intelligent life if we simulated natural selection in it.
  2. Calculating how much compute it would take to in fact simulate it.

But it seems much more difficult to justify for the evolution anchor, which Ajeya admits would be far more computationally intensive than storing text or simulating a deterministic Atari game.

 

The evolution anchor involves more compute than the other anchors (because you need to get so many more data points and train the AI on them), but it's not obvious to me that it requires a larger proportion of compute spent on the environment than the other anchors. Like, it seems plausible to me that the evolution anchor looks more like having the AI play pretty simple games for an enormously long time, rather than having a complicated physically simulated environment.

Fair enough. Both seem plausible to me, we’d probably need more evidence to know which one would require more compute.

I did this analysis a while back, but it's worth doing again, let's see what happens:

If you are spending 1e25 FLOP per simulated second simulating the neurons of the creatures, you can afford to spend 4e24 FLOP per simulated second simulating the environment & it will just be a rounding error on your overall calculation so it won't change the bottom line. So the question is, can we make a sufficiently detailed environment for 4e24 FLOP per second?

There are 5e14 square meters on the surface of the earth according to wolfram alpha.

So that's about 1e10 FLOP per second per square meter available. So, you could divide the world into 10x10 meter squares and then have a 1e12 FLOP computer assigned to each square to handle the physics and graphics. If I'm reading this chart right, that's about what a fancy high-end graphics card can do.  (Depends on if you want double or single-precision I think?). That feels like probably enough to me; certainly you could have a very detailed physics simulation at least. Remember also that you can e.g. use a planet 1 OOM smaller than Earth but with more habitable regions, and also dynamically allocate compute so that you have more of it where your creatures are and don't waste as much simulating empty areas. Also, if you think this is still maybe not enough, IIRC Ajeya has error bars of like +/- SIX orders of magnitude on her estimate, so you can just add 3 OOMs no problem without really changing the bottom line that much.

It would be harder if you wanted to assign a graphics card to each nematode worm, instead of each chunk of territory. There are a lot of nematode worms or similar tiny creatures--Ajeya says 1e21 alive at any given point of time. So that would only leave you with 10,000 flops per second per worm to do the physics and graphics!  If you instead wanted a proper graphics card for each worm you'd probably have to add 7 OOMs to that, getting you up to a 100 GFLOP card. This would be a bit higher than Ajeya estimated; it would be +25 OOMs more than GPT-3 cost instead of +18.

Personally I don't think the worms matter that much, so I think the true answer is more likely to be along the lines of "a graphics card per human-sized creature" which would be something like 10 billion graphics cards which would let you have 5e14 FLOP per card which would let you create some near-photorealistic real time graphics for each human-sized creature. 

Then there's also all the various ways in which we could optimize the evolutionary simulation e.g. as described here. I wouldn't be surprised if this shaves off 6 OOMs of cost.



 

Note that this analysis is going to wildly depend on how progress on "environment simulation efficiency" compares to progress on "algorithmic efficiency". If you think it will be slower then the analysis above doesn't work.

If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors? Seems plausible to me, could also be faster, idk. I don't think this point undermines Ajeya's report though because (a) we are never going to get to the evolution anchor anyway, or anywhere close, so how fast it approaches isn't really relevant except in very long-timelines scenarios, and (b) her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.

Meta: I feel like the conversation here and with Nuno's reply looks kinda like:

Nuno: People who want to use the evolutionary anchor as an upper bound on timelines should consider that it might be an underestimate, because the environment might be computationally costly.

You: It's not an underestimate: here's a plausible strategy by which you can simulate the environment.

Nuno / me: That strategy does not seem like it clearly supports the upper bound on timelines, for X, Y and Z reasons.

You: The evolution anchor doesn't matter anyway and barely affects timelines.

This seems bad:

  1. If you're going to engage with a subpoint that OP made that was meant to apply in some context (namely, getting an upper bound on timelines), stick within that context (or at least signpost that you're no longer engaging with the OP).
  2. I don't really understand why you bothered to do the analysis if you're not changing the analysis based on critiques that you agree are correct. (If you disagree with the critique then say that instead.)

If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors?

Yes, and in particular, the mechanism is that environment simulation cost might not decrease as fast as machine learning algorithmic efficiency. (Like, the numbers for algorithmic efficiency are anchored on estimates like AI and Efficiency, those estimates seem pretty unlikely to generalize to "environment simulation cost".)

her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.

Just because someone could change the numbers to get a different output doesn't mean that the original numbers weren't flawed and that there's no value in pointing that out?

E.g. suppose I had the following timelines model:

Input: N, the number of years till AGI.

Output: Timeline is 2022 + N.

I publish a report estimating N = 1000, so that my timeline is 3022. If you then come and give a critique saying "actually N should be 10 for a timeline of 2032", presumably I shouldn't say "oh, my spreadsheet already allows you to choose your own value of N, so it handles that nuance".

To be clear, my own view is also that the evolution anchor doesn't matter, and I put very little weight on it and the considerations in this post barely affect my timelines. 

Thanks Rohin, I really appreciate this comment.

Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.

It seems like you think I was strongly disagreeing with your claims; I wasn't. I upvoted your response and said basically "Seems plausible idk. Could go either way." 

And then I said that it doesn't really impact the bottom line much, for reasons XYZ. And you agree. 

But now it seems like we are opposed somehow even though we seem to basically be on the same page.

For context: I think I didn't realize until now that some people actually took the evolution anchor seriously as an argument for AGI by 2100, not in the sense I endorse (which is as a loose upper bound on our probability distribution over OOMs of compute) but in the much stronger sense I don't endorse (as an actual place to clump lots of probability mass around, and naively extrapolate moore's law towards across many decades). I think insofar as people are doing that naive thing I don't endorse, they should totally stop. And yes, as Nuno has pointed out, insofar as they are doing that naive thing, then they should really pay more attention to the environment cost as well as the brain-simulation cost, because it could maaaybe add a few OOMs to the estimate which would push the extrapolated date of AGI back by decades or even centuries.

Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.

No, that's not what I meant. I'm saying that the conversational moves you're making are not ones that promote collaborative truth-seeking.

Any claim of actual importance usually has a giant tree of arguments that back it up. Any two people are going to disagree on many different nodes within this tree (just because there are so many nodes). In addition, it takes a fair amount of effort just to understand and get to the same page on any one given node.

So, if you want to do collaborative truth-seeking, you need to have the ability to look at one node of the tree in isolation, while setting aside the rest of the nodes.

In general when someone is talking about some particular node (like "evolution anchor for AGI timelines"), I think you have two moves available:

  1. Say "I think the actually relevant node to our disagreement is <other node>"
  2. Engage with the details of that particular node, while trying to "take on" the views of the other person for the other nodes

(As a recent example, the ACX post on underpopulation does move 2 for Sections 1-8 and move 1 for Section 9.)

In particular, the thing not to do is to talk about the particular node, then jump around into other nodes where you have other disagreements, because that's a way to multiply the number of disagreements you have and fail to make any progress on collaborative truth-seeking. Navigating disagreements is hard enough that you really want to keep them as local / limited as possible.

(And if you do that, then other people will learn that they aren't going to learn much from you because the disagreements keep growing rather than progress being made, and so they stop trying to do collaborative truth-seeking with you.)

Of course sometimes you start doing move (2) and then realize that actually you think your partner is correct in their assessment given their views on the other nodes, and so you need to switch to move (1). I think in that situation you should acknowledge that you agree with their assessment given their other views, and then say that you still disagree on the top-level claim because of <other node>.

Thanks for this thoughtful explanation & model.

(Aside: So, did I or didn't I come across as unfriendly/hostile? I never suggested that you said that, only that maybe it was true. This matters because I genuinely worry that I did & am thinking about being more cautious in the future as a result.)

So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?

The thing about changing my mind also resonates--that definitely happened to some extent during this conversation, because (as mentioned above) I didn't realize Nuno was talking about people who put lots of probability mass on the evolution anchor. For those people, a shift up or down by a couple OOMs really matters, and so the BOTEC  I did about how probably the environment can be simulated for less than 10^41 flops needs to be held to a higher standard of scrutiny & could end up being judged insufficient.



 

So, did I or didn't I come across as unfriendly/hostile?

You didn't to me, but also (a) I know you in person and (b) I'm generally pretty happy to be in forceful arguments and don't interpret them as unfriendly / hostile, while other people plausibly would (see also combat culture). So really I think I'm the wrong person to ask.

So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?

I think you can do both, if it's clear that you're doing these as two separate things. (Which could be by having two different comments, or by signposting clearly in a single comment.)

In this particular situation I'm objecting to starting with (2), then switching to (1) after a critique without acknowledging that you had updated on (2) and so were going to (1) instead. When I see that behavior from a random Internet commenter I'm like "ah, you are one of the people who rationalizes reasons for beliefs, and so your beliefs do not respond to evidence, I will stop talking with you now". You want to distinguish yourself from the random Internet commenter.

(And if you hadn't updated on (2), then my objection would have been "you are bad at collaborative truth-seeking, you started to engage on one node and then you jumped to a totally different node before you had converged on that one node, you'll never make progress this way".)

OK. I'll DM Nuno.

Something about your characterization of what happened continues to feel unfair & inaccurate to me, but there's definitely truth in it & I think your advice is good so I will stop arguing & accept the criticism & try to remember it going forward. :)

Hey, thanks for sharing these. They seem like a good starting point. But I don't know whether to take them literally.

On a quick read, things I may not buy:

So that's about 1e10 FLOP per second per square meter available. So, you could divide the world into 10x10 meter squares and then have a 1e12 FLOP computer assigned to each square to handle the physics and graphics

Not sure if I buy this decomposition. For instance, taking into account that things can move from one 10x10m region to another/simulating the boundaries seems like it would be annoying. But you could have the world as a series of 10x10 rooms?

dynamically allocate compute so that you have more of it where your creatures are and don't waste as much simulating empty areas

I buy this, but worried about the world being consistent. There is also a memory tradeoff here.

Mmh, maybe I'm not so worried about FLOPs per se but about paralelizability/wall-clock time.

Well totally this thing would take a fuckton of wall-clock time etc. but that's not a problem, this is just a thought experiment -- "If we did this bigass computation, would it work?" If the answer is "Yep, 90% likely to work" then that means our distribution over OOMs should have 90% by +18.

Mmh, then OOMs of compute stops being predictive of timelines in this anchor, because we can't just think about how much compute we have but also about whether we can use it for this.

Sorta? Like, yeah, suppose you have 10% of your probability mass on the evolution anchor. Well, that means that like maaaaybe in 2090 or so we'll have enough compute to recapitulate evolution, and so maaaaybe you could say you have 10% credence that we'll actually build AGI in 2090 using the recapitulate evolution method. But that assumes basically no algorithmic progress on other paths to AGI. But anyhow if you were doing that, then yes it would be a good  counterargument that actually even if we had all the compute in 2090 we wouldn't have the clock time because latency etc. would make it take dozens of years at least to perform this computation. So then (that component of) your timelines would shift out even farther.

I think this matters approximately zero, because it is a negligible component of people's timelines and it's far away anyway so making it move even farther away isn't decision-relevant.

Well, I agree that this is pretty in the weeds, but personally this has made me view the evolutionary anchor as less forceful. 

Like, the argument isn't "ha, we're not going to be able to simulate evolution, checkmate AGI doomers", it's "the evolutionary anchor was a particularly forceful argument for giving a substantial probability to x-risk this century, even to people who might otherwise be very skeptical. The fact that it doesn't go through has a variety of small update, e.g., it marginally increases the value of non-x-risk longtermism"

Huh, I guess I didn't realize how much weight some people put on the evolution anchor.  I thought everyone was (like me) treating it as a loose upper bound basically, not something to actually clump lots of probability mass on.

In other words: The people I know who were using the evolutionary anchor (people like myself, Ajeya, etc.) weren't using it in a way that would be significantly undermined by having to push the anchor up 6 OOMs or so. Like I said, it would be a minor change to the bottom line according to the spreadsheet. Insofar as people were arguing for AGI this century in a way which can be undermined by adding 6 OOMs to the evolutionary anchor then those people are silly & should stop, for multiple reasons, one of which is that maaaybe environmental simulation costs mean that the evolution anchor really is 6 OOMs bigger than Ajeya estimates.

 

I'm entirely unconvinced that this is a relevant concern - if training data is the equivalent of complex environments, we kind-of get it for free, and even where we don't, we can simulate natural environments and other agents much more cheaply than nature.

if training data is the equivalent of complex environments, we kind-of get it for free

Don't disagree

we can simulate natural environments and other agents much more cheaply than nature

Also don't disagree, but this is a matter of degree, no? For example, I'm thinking that having an enviroment with many agents acting on each other and on the environment would make the training process less paralelizable.

Personally I found it pretty hard to give a number to "the least complex environment which could give rise to intelligent life"; if you have thoughts on how to bound this I'd be keen to hear them.

That makes sense, and I think we're mostly agreeing - it just seemed like you were skipping this entirely in your explanation. 

I would be keen to understand your comment! I'll also share my own understanding. Feel free to correct me if I'm incorrect.

--

This article suggests training data isn't free, and that AI models are already bottlenecked by data more than by compute:

https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

Also an RL agent may need data that is in response to specific actions, not just unlabelled data from the internet. For instance if an agent wants to invent a chemical reactor, we won't find data of every theoretically possible way an agent can fail to build a reactor. However an RL agent may wish to progressively make attempts and improve their skills - each attempt consisting of a long non-random sequence of actions. This would require simulating an environment or having an embodied agent actually in a physical environment.

--

As for the claim that simulating brains may be more compute-intensive than simulating environments (can I confirm this is your claim?), it seems plausible but also non-obvious (atleast to me). Even basic chemistry seems expensive to simulate, so do biochemical pathways, so do wind and sea currents (which affect many biological beings, but are also very chaotic mathematically). Quantum systems (such as chemical reactions) may still be simulate at some point using quantum computers. There have been attempts to simulate all of these and I don't know of scaling laws for them such that any reasonable amount of compute (say 10^10 times more FLOPs than today) will make all of them much more simulable than today.

And maybe we don't need to simulate any of this to get human-level AI but then that feels like it needs to be defended, ideally in a way that's really easy to verify. (Because I guess the point of bio anchors is to act as a set of arguments that is easier to verify than AI risk expert's existing timelines which tend to rely a lot on private insights.)

Yes, current ML is very sample-inefficient, but that's not a reason to say that the bound should be based on current sample efficiency. Second, the long-term feedback from series of actions is what policy nets in AlphaGo addressed, and I agree that it requires simulating an environment, but it says little about how rich that environment needs to be. And because running models is much cheaper than training them, self-play is tractable as a way to generate complex environments.

But overall, I agree that it's an assertion which would need to be defended.

Thank you for your reply!

I agree current sample efficiency shouldn't be assumed to be the limit. But forecasting how much sample efficiency will increase also feels hard to defend in an easy-to-verify argument.

I haven't thought much about how self-play alone could generate the environment, thank you for pointing this out! Typically IRL the environment looks very asymmetric compared to the agents, but that doesn't mean it has to necessarily be so to create AGI, mostly we don't seem to know with high certainty.

Mostly it's just, if you're going for easy-to-verify arguments, I'm unsure how airtight thse arguments can be made. (Maybe they can and I'm just unaware of it!) If you're going for accuracy rather than being easy-to-verify then I feel like there are a lot of other insights one can also bring in (besides the ones we discussed), that make one's timelines less dependent on bio anchors.

This question might depend on what the intention of bio anchors - are they supposed to be easier to verify than other views on AI timelines?

I think bio-anchors were supposed to be a single approach / base rate reference class, not necessarily the easiest to verify - but despite reservations, I don't know of other reference classes that are more easily verifiable.

Thank you for the reply! This makes sense.

It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data won't allow them to do flexibly without possibly impractical amounts of it. You'd need to anticipate how the environment would respond, and you'd basically be filling the entries of a giant lookup table for the responses you anticipate.

AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.

It's plausible the environments don't need to be very complex or detailed, though, to the point that most of the operations are still in the AI.

You don't necessarily need to collect training data, that's why RL works. And simulating an environment is potentially cheap, as you noted. So again, I'm unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.

Another consideration that I think Ajeya thought about but didn't go into the report is that anthropic considerations might mean that our planet's evolutionary path is unusually short compared to arbitrarily selected evolutionary paths. Which means using our specific evolutionary path as an anchor underestimates timelines.

(Of course there are many factors that suggest the evolutionary anchor is an overestimate as well). 

(Originally sent as feedback privately regarding an earlier version, then Nuño requested publishing as a comment instead)

Overall I think it's a fair point and would be curious what the response would be from Ajeya. I think you downplay the importance others have placed in the evolution anchor more than needed and you also could engage a bit more with Ajeya's existing arguments in the report.

You say that you think the evolution anchor isn't viewed as as important by others, but I'm not so sure: see e.g. Holden's blog post here which emphasizes it as an upper bound. I'd also look through this shortform. In particular it notes that Ajeya mentions in the report "I expect that some ML researchers would want to argue that we would need substantially more computation than was performed in the brains of all animals over evolutionary history; while I disagree with this, it seems that the Evolution Anchor hypothesis should place substantial weight on this possibility."

There's also this section of the report of which some parts somewhat close to directly address your concern, e.g.: "There are also some specific ways it seems that we could improve upon the “simulate natural selection” baseline, prima facie. For example, population sizes are a consequence of the carrying capacity of an ecological niche rather than being tuned to minimize the amount of computation used to evolve intelligent animals; it seems likely that they were far too large from a computational-efficiency standpoint. Additionally, the genetic fitness signal in the natural environment was often highly noisy, whereas we could plausibly design artificial environments which provide much cleaner tests of precisely the behaviors we are looking to select for."

Thanks for writing this post, have added it to my collection "AI timelines via bioanchors: the debate in one place".

A fairly minor point. You write:

After reading this, I think that the reader should:

//

3. Slightly increase credence that something like The Patient Philanthropy Project, which pretty much conditions on no AGI soon, is a good idea.

It seems to me that to the extent that one is persuaded by your argument, one should update in the direction of "patient longtermism". But that's different from patient philanthropy, as Owen Cotton-Barratt points out:

[T]here is no direct implication that "patient longtermists" should be less willing to spend money now than "urgent longtermists". Rather I think it's an open question which will depend on a lot of messy empirics (about giving opportunities) which position should be more in favour of saving money now. My current guess is to recommend spending rather than saving money at current margins to both patient and urgent longtermists.

This doesn't mean that what you wrote is strictly wrong, but it seems worth noticing.

Makes sense, thanks Stefan.

Not that important, but...  in terms of what intuitions people have, the split of the computation into neurons/environment  is not a reasonable model of how life works. Simple organisms do a ton of non-neuron-based computations distributed across many cells, and are able to solve pretty complex optimization problems. The neurons/environment  split pushes this into the environment , and this means the environment was sort of complex in a way for which people don't have good ituitions (e.g. instead of mostly thinking about costs of physics simulation, they should include stuff like immunse system simulations).

Great post! It does seem prima facie infeasible that recapitulating evolution would even be computable. Another thing to consider is that trying to simulate evolution may not yield general intelligence if just run once and it may need to be simulated many times in order to stumble on general intelligence which adds to the amount of computation that may be needed if turns out that coming up with general intelligence is very unlikely.

Even if simulating the environment is costly, we can be more efficient than evolution, so the environment implies simulated-evolution is harder but other factors Ajeya doesn't consider implies simulated-evolution is much easier.

Yes, but I'm saying that the estimation in the report wasn't "estimate the compute which evolution took, and then adjust for human ingenuity", it was "estimate something else".

Really interesting. I think there are connections to the extended mind thesis - where mental processes are in part constituted by the external body and world, such that cognition can't be neatly circumscribed to the brain.  This seems a deeper system than is modelled by 'computation done by neurons + training data'. 

Another complication is the relationship between ecological or environmental complexity and the evolution of intelligence. Peter Godfrey-Smith's 'Environmental Complexity and
the Evolution of Cognition' is a good read on this. Other comments on this post point to video game worlds and getting interaction by copying the evolving agent - but I think this may drastically understate the complexity of co-evolving sets of organisms in the real world.  

I think it's unlikely that developing artificial intelligence requires these wrinkles of mind extension or environmental complexity. But I interpret the evolutionary anchor argument as a generous upper bound based on what we know evolution did at least once. For that purpose, our model should probably defer to evolution's wrinkles rather than assume they're irrelevant.