It's the first chapter in a new guide about how to help make AI go well (aimed at new audiences).
I think it's generally important for people who want to help to understand the strategic picture.
Plus in my experience the thing most likely to make people take AI risk more seriously is believing that powerful AI might happen soon.
I appreciate that talking about this could also wake more people up to AGI, but I expect the guide overall will proportionally boost the safety talent pool a lot more than the speeding up AI talent pool.
(And long term I t...
Yes I basically agree that's the biggest limiting factor at this point.
However, a better base model can improve agency via e.g. better perception (which is still weak).
And although reasoning models are good at science and math, they still make dumb mistakes reasoning about other domains, and very high reliability is needed for agents. So I expect better reasoning models also helps with agency quite a bit.
I feel subtweeted :p As far as I can tell, most of the wider world isn't aware of the arguments for shorter timelines, and my pieces are aimed at them, rather than people already in the bubble.
That said, I do think there was a significant shortening of timelines from 2022 to 2024, and many people in EA should reassess whether their plans still make sense in light of that (e.g. general EA movement building looks less attractive relative to direct AI work compared to before).
Beyond that, I agree people shouldn't be making month-to-month adjustments to their ...
I wouldn't totally defer to them, but I wouldn't totally ignore them either. (And this is mostly besides the point since I'm overall I'm critical of using their forecasts and my argument doesn't rest on this.)
I only came across this paper in the last few days! (The post you link to is from 5th April; my article was first published 21st March.)
I want to see more commentary on the paper before deciding what to do about it. My current understanding:
o3-mini seems to be a lot worse than o3 – it only got ~10% on Frontier Math, similar to o1. (Claude Sonnet 3.7 only gets ~3%.)
So the results actually seem consistent with Frontier Math, except they didn't test o3, which is significantly ahead of other models.
The other factor seems to be that they evaluated the quality of the proofs rather than the ability to get a correct numerical answer.
I'm not sure data leakage is a big part of the difference.
So, OpenAI is telling the truth when it says AGI will come soon and lying when it says AGI will not come soon?
I don't especially trust OpenAI's statements on either front.
The framing of the piece is "the companies are making these claims, let's dig into the evidence for ourselves" not "let's believe the companies".
(I think the companies are most worth listening to when it comes to specific capabilities that will arrive in the next 2-3 years.)
I discuss expert views here. I don't put much weight on the superforecaster estimates you mention at this point because they were made in 2022, before the dramatic shortening in timelines due to chatGPT (let alone reasoning models).
They also (i) made compute forecasts that were very wrong (ii) don't seem to know that much about AI (iii) were selected for expertise in forecasting near-term political events, which might not generalise very well to longer-term forecasting of a new technology.
I agree we should consider the forecast, but I think it's ultimately...
Thank you!
I am roughly in agreement with this post by an AI expert responding to the other (less good) short- timeline article going around.
This post just points out that the AI 2027 article is an attempt to flesh out a particular scenario, rather than an argument for short timelines, which the authors of AI 2027 would agree with.
...I thought instead of critiquing the parts that I'm not an expert in, I might take a look at the part of this post that intersects with my field, when you mention material science discovery, and pour just a little bit of cold
I think Ege is one of the best proponents of longer timelines, and link to that episode in the article.
I don't put much stock in the forecast of AI researchers the graph is from. I see the skill of forecasting as very different from the skill of being a published AI researcher. A lot of their forecasts also seem inconsistent. A bit more discussion here: https://80000hours.org/2025/03/when-do-experts-expect-agi-to-arrive/
Financially, I'm already heavily exposed to short AI timelines via my investments.
The next few years, I expect AI revenues to continue to increase 2-4x per year, like they have recently, which gets you to those kinds of numbers in 2027.
There won't be widespread automation, rather AI will make money from a few key areas with few barriers, especially programming.
You could then reach an inflection point where AI starts to help with AI research. AI inference gets mostly devoted to that task for a while. Major progress is made, perhaps reaching AGI, without further external deployment.
Revenues would then explode after that point, but OpenAI ...
This is my understanding too – some crucial questions going forward:
Pretty sure o1 and Gemini have access to the internet.
The main way it's potentially misleading is that it's not a log plot (most benchmark results will look like exponentials on a linear scale) – however, I expect Deep Research would still seem above trend even if it was. I also think it's helpful to new readers to see some of the charts on linear scales, since in some ways it's more intuitive.
Glad it's useful! I categorise RL on chain of thought as a type of post-training, rather than test time compute. (Sometimes people lump them together as both 'inference scaling', but I think that's confusing.) I agree RL opens up novel capabilities you can't get just from next token prediction on the internet.
For test time compute, you need to do logarithmic increases of compute to get linear increases in accuracy on the benchmark. It's similar to the pretraining scaling law.
I agree test time compute isn't especially explosive – it mainly serves to "pull forward" more advanced capabilities by 1-2 years.
More broadly, you can swap training for inference: https://epoch.ai/blog/trading-off-compute-in-training-and-inference
On brute force, I mainly took Toby's thread to be saying we don't clearly have enough information to know how effective test time compute is vs. brute force.
The response to Michael is an interesting point, but it only concerns diminishing returns in individual capabilities of new members.
Diminishing returns are mainly driven by the quality of opportunities being used up, rather than the capabilities.
IIRC a 10x in resources to get a 3x in impact was a typical response in the old coordination forum survey responses.
In the past at 80k I'd often assume a 3x increase in inputs (e.g. advising calls) to get a 2x increase in outputs (impact-adjusted plan changes), and that seemed to be roughly consistent with th...
Aside: A more compelling argument against growth in this area to me is something like "EA should focus on improving its brand and comms skills, and on making reforms & changing its messaging to significantly reduce the chance of something like FTX happening again, before trying to grow aggressively again"; rather than "the possibility of scandals means it should never grow".
Another one is "it's even more high priority to grow X others movements than EA" rather than "EA is net negative to grow".
Less importantly, I also feel less confident coordination benefits would mean impact per member goes up with the number of members.
I understand that the value of a social network like Facebook grows with the number of members. But many forms of coordination become much harder with the number of members.
As an analogy, it's significantly easier for 2 people to decide where to go to dinner than for 3 people to decide. And 10 people in a group discussion can take ages to come to consensus.
Or, it's much harder to get a new policy adopted in an organisation of 1...
Thanks for the analysis! I think it makes sense to me, but I'm wondering if you've missed an important parameter: diminishing returns to resources.
If there are 100 community members they can take the 100 most impactful opportunities (e.g. writing DGB, publicising that AI safety is even a thing), while if there are 1000 people, they will need to expand into opportunities 101-1000, which will probably be lower impact than the first 100 (e.g. becoming the 50th person working on AI safety).
I'd guess a 10x increase to labour or funding working on EA things (eve...
Hey JWS,
These comments were off-hand and unconstructive, have been interpreted in ways I didn't intend, and twitter isn't the best venue for them, so I apologise for posting, and I'm going to delete them. My more considered takes are here. Hopefully I can write more in the future.
Hey Ben, I'll remove the tweet images since you've deleted them. I'll probably rework the body of the post to reflect that and happy to make any edits/retractions that you think aren't fair.
I apologise if you got unfair pushback as a result of my post, and regardlesss of your present/future affiliation with EA, I hope you're doing well.
I should maybe have been more cautious - how messaging will pan out is really unpredictable.
However, the basic idea is that if you're saying "X might be a big risk!" and then X turns out to be a damp squib, it looks like you cried wolf.
If there's a big AI crash, I expect there will be a lot of people rubbing their hands saying "wow those doomers were so wrong about AI being a big deal! so silly to worry about that!"
That said, I agree if your messaging is just "let's end AI!", then there's some circumstances under which you could look better after a crash e...
I agree people often overlook that (and also future resources).
I think bio and climate change also have large cumulative resources.
But I see this as a significant reason in favour of AI safety, which has become less neglected on an annual basis recently, but is a very new field compared to the others.
Also a reason in favour of the post-TAI causes like digital sentience.
It's worth separating two issues:
The Foundation had long been a major funder in the field, and made some great grants, e.g. providing support to the programs that ultimately resulted in the Nunn-Lugar Act and Cooperative Threat Reduction (See Ben Soskis's report). Over the last few years of this program, the Foundation decided to make a "big bet" on "political and technical solutions that reduce the world’s reliance on ...
I don't focus exclusively on philanthropic funding. I added these paragraphs to the post to clarify my position:
...I agree that a full accounting of neglectedness should consider all resources going towards the cause (not just philanthropic ones), and that 'preventing nuclear war' more broadly receives significant attention from defence departments. However, even considering those resources, it still seems similarly neglected as biorisk.
And the amount of philanthropic funding still matters because certain important types of work in the space can only be funde
It might take more than $1bn, but around that level, you could become a major funder of one of the causes like AI safety, so you'd already be getting significant benefits within a cause.
Agree you'd need to average 2x for the last point to work.
Though note the three pathways to impact - talent, intellectual diversity, OP gaps - are mostly independent, so you'd only need one of them to work.
Also agree in practice there would be some funging between the two, which would limit the differences, that's a good point.
Intellectual diversity seems very important to figuring out the best grants in the long term.
If atm the community, has, say $20bn to allocate, you only need a 10% improvement to future decisions to be worth +$2bn.
Funder diversity also seems very important for community health, and therefore our ability to attract & retain talent. It's not attractive to have your org & career depend on such a small group of decision-makers.
I might quantify the value of the talent pool around another $10bn, so again, you only need a ~10% increase here to be worth a b...
One quick point is divesting, while it would help a bit, wouldn't obviously solve the problems I raise – AI safety advocates could still look like alarmists if there's a crash, and other investments (especially including crypto) will likely fall at the same time, so the effect on the funding landscape could be similar.
With divestment more broadly, it seems like a difficult question.
I share the concerns about it being biasing and making AI safety advocates less credible, and feel pretty worried about this.
On the other side, if something like TAI starts to h...
I want to be clear it's not obvious to me OP is making a mistake. I'd lean towards guessing AI safety and GCBRs are still more pressing than nuclear security. OP also have capacity constraints (which make it e.g. less attractive to pursue smaller grants in areas they're not already covering, since it uses up time that could have been used to make even larger grants elsewhere). Seems like a good fit for some medium-sized donors who want to specialise in this area.
Agree it's most likely already in the price.
Though I'd stand behind the idea that markets are least efficient when it comes to big booms and busts involving large asset classes (in contrast to relative pricing within a liquid asset class), which makes me less inclined to simply accept market prices in these cases.
You could look for investments that do neutral-to-well in a TAI world, but have low-to-negative correlation to AI stocks in the short term. That could reduce overall portfolio risk but without worsening returns if AI does well.
This seems quite hard, but the best ideas I've seen so far are:
My impression is that of EA resources focused on catastrophic risk, 60%+ are now focused on AI safety, or issues downstream of AI (e.g. even the biorisk people are pretty focused on the AI/Bio intersection).
AI has also seem dramatic changes to the landscape / situation in the last ~2 years, and my update was focused on how things have changed recently.
So for both reasons most of the updates that seemed salient to me concerned AI in some way.
That said, I'm especially interested in AI myself, so I focused more on questions there. It would be ideal to hear from more bio people.
I also briefly mention nuclear security, where I think the main update is the point about lack of funding.
Hi Wayne,
Those are good comments!
On the timing of the profits, my first estimate is for how far profits will need to eventually rise.
To estimate the year-by-year figures, I just assume revenues grow at the 5yr average rate of ~35% and check that's roughly in line with analyst expectations. That's a further extrapolation, but I found it helpful to get a sense of a specific plausible scenario.
(I also think that if Nvidia revenue looked to be under <20% p.a. the next few quarters, the stock would sell off, though that's just a judgement call.)
On the ...
It's fair that I only added "(but not more)" to the forum version – it's not in the original article which was framed more like a lower bound. Though, I stand by "not more" in the sense that the market isn't expecting it to be *way* more, as you'd get in an intelligence explosion or automation of most of the economy. Anyway I edited it a bit.
I'm not taking revenue to be equivalent to value. I define value as max consumer willingness to pay, which is closely related to consumer surplus.
I agree risk also comes into it – it's not a risk-neutral expected value (I discuss that in the final section of the OP).
Interesting suggestion that the Big 5 are riskier than Nvidia. I think that's not how the market sees it – the big 5 have lower price & earnings volatility and lower beta. Historically chips have been very cyclical. The market also seems to think there's a significant chance Nvidia loses market share to TPUs or AMD. I think the main reason Nvidia has a higher PE ratio is due its earnings growth.
I agree all these factors go into it (e.g. I discuss how it's not the same as the mean expectation in the appendix of the main post, and also the point about AI changing interest rates).
It's possible I should hedge more in the title of the post. That said, I think the broad conclusion actually holds up to plausible variation in many of these parameters.
For instance, margin is definitely a huge variable, but Nvidia's margin is already very high. More likely the margin falls, and that means the size of the chip market needs to be even bigger than the estimate.
I do think you should hedge more given the tower of assumptions underneath.
The title of the post is simultaneously very confident ("the market implies" and "but not more"), but also somewhat imprecise ("trillions" and "value"). It was not clear to me that the point you were trying to make was that the number was high.
Your use of "but not more" implies you were also trying to assert the point that it was not that high, but I agree with your point above that the market could be even bigger. If you believe it could be much bigger, that seems inconsistent with...
Minor but I actually think Deepseek was pretty on trend for algorithmic efficiency (as explained in the post). The main surprise was that it was a Chinese company near the forefront of algorithmic efficiency (but here several months before I suggest that the Chinese are close to the frontier there).