AI energy forecasts may be missing large-scale inference demand

Benevolent_Rain

AI energy forecasts may be missing large-scale inference demand

Comments 3

Sorted by

New & upvoted

Mo Putera

1mo

SemiAnalysis' recent newsletter provides some data points on token spend vs labor cost ROIs for actual 1-20 hour tasks.

SemiAnalysis has written and talked extensively about our Claude Code usage, but it is important to emphasize that agentic AI is no longer limited to just coding. Our analysts are using agents every day to convert excel models into dashboards, create charts for all our notes, build financial models and analyze company earnings, and much more. These are all tasks that either 1) we simply wouldn’t have been able to do before or 2) would’ve previously taken our junior analysts many hours, taking them away from far more value added tasks.
The table below shows a handful of real examples from our own workflows, comparing token spend against what the equivalent human labor would have cost:
... We estimate that the true blended price per million tokens for running Opus 4.7 on agentic tasks at $0.99 despite the sticker price being $5/$25 per MTok. Agentic workloads have extremely high input-to-output ratios (our Claude Code usage has a ratio of about 300:1) and high cache hit rates (90%+). Because cached input tokens only cost $0.50/MTok, most of the tokens end up in the cheapest tier. We walk through the full methodology here.

Eyeballing, it looks like 8 hours of analyst-type work costs them $7-30 in Opus 4.7 token spend, so (very roughly) 7-30M tokens at their true blended price of ~$1 per M tokens, in contrast with the post's 40-1,300M token estimate, and already squarely here. I expect token usage to drop further for a given task with more advanced models, and also to vary a lot depending on (essentially) how much the big AI companies prioritise RLVR-ing them and on model jaggedness, but also for doable tasks to get much more complicated, like this and more.

Epoch BOTEC-ed a related question last year, prior to Claude Code: How many digital workers could OpenAI deploy? My main takeaway was "worker equivalents is probably more misleading than helpful if people just skim headline numbers" (which everyone does, speaking as someone who sometimes needs to produce headline numbers).

On the tasks that AIs are able to perform today, how many “human-equivalent digital workers” could frontier AI labs deploy to work on them?
Based on a speculative back-of-the-envelope calculation, we estimate that companies like OpenAI have the hardware to deploy on the order of 7 million digital workers, with a wide 90% confidence interval of 400,000 to around 300 million.2 This doesn’t mean that OpenAI could do the jobs of 7 million human employees today, because AIs can’t fully substitute for humans. But as AI progress continues, AIs will be able to perform an increasing fraction of the tasks that humans currently do.

Benevolent_Rain

1mo

Thanks so much Mo! I am tempted to make the following updates already - does this seem roughly right? Or is this still too high?

Token usage at 8 hrs centered on 5M tokens, with an upper limit closer to 100M. The reasoning for the
1. Upper range of 100M being that more complex tasks (assuming those from the study you quoted were low hanging fruits) might push this higher (as indicated by the compiler example), while
2. efficiency gains might push lower, it already seems that from METR's GPT-5.1-Codex-Max work <6 months ago it might, and this is very, very crude, be going lower.
Token price centered at $1 per million tokens, instead of $5. I could make this even lower as $1 might show a downward trend, but at the same time this low price seems more to be due to cache tokens which I had ignored in my analysis - the input and output tokens still seem priced at roughly the price I found

At the same time, I also feel like these numbers might still be too high - especially token price. The reason is that the super helpful links you sent point at pretty steep downward trends on token cost and point well taken on cache tokens being much cheaper.

Mo Putera

1mo

(I'm not at all an expert on any of this, please discount appropriately)

Agree with reasoning for directional adjustment and bounds, magnitude-wise seems a bit overcorrected? SemiAnalysis' figures roughly suggest 15M center. But you're on track to becoming correct given token efficiency trends anyhow
1. I wish I had a more empirically-grounded sense of how token usage varies by type of task, fixing task duration at 8 hours for a human professional (that you'd pay $400/day for, say). My guess from comparing model vs human jaggedness (e.g. this) is that leadership-level / early-employee / entrepreneurial / high-context / taste-heavy work would require way more tokens to get 8 hours of work done than the routine analyst-type / junior SWE etc tasks typical of benchmarks
My sense is global average cost per token will go down a lot due to the following, but very unclear as to the mix
1. a key driver of inference demand going forward being very cache tokens-heavy agentic workflows
2. a rising share of demand being satisficing not maximising w.r.t output quality for ever-growing task share (e.g. plan with Opus -> code with Sonnet or even DeepSeek models at 1-2 OOM cheaper price point)
3. race to the bottom pricing wars (DeepSeek again)

Comments

More from the author

Bioweapons shelter project launch

Benevolent_Rain, C Tilli, Vilhelm Skoglund, Kayla Kim·4y ago·10m read

Diet that is vegan, frugal, time-efficient, ~evidence-based and ascetic: An example of a non-Huel EA diet?

Benevolent_Rain·2y ago·9m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 2d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

172

The first video from Giving What We Can's new channel is out now!

JustinPortela·3d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·5d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·23h ago·1m read

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·1d ago·2m read

Mo Putera

1mo

SemiAnalysis' recent newsletter provides some data points on token spend vs labor cost ROIs for actual 1-20 hour tasks.

SemiAnalysis has written and talked extensively about our Claude Code usage, but it is important to emphasize that agentic AI is no longer limited to just coding. Our analysts are using agents every day to convert excel models into dashboards, create charts for all our notes, build financial models and analyze company earnings, and much more. These are all tasks that either 1) we simply wouldn’t have been able to do before or 2) would’ve previously taken our junior analysts many hours, taking them away from far more value added tasks.
The table below shows a handful of real examples from our own workflows, comparing token spend against what the equivalent human labor would have cost:
... We estimate that the true blended price per million tokens for running Opus 4.7 on agentic tasks at $0.99 despite the sticker price being $5/$25 per MTok. Agentic workloads have extremely high input-to-output ratios (our Claude Code usage has a ratio of about 300:1) and high cache hit rates (90%+). Because cached input tokens only cost $0.50/MTok, most of the tokens end up in the cheapest tier. We walk through the full methodology here.

On the tasks that AIs are able to perform today, how many “human-equivalent digital workers” could frontier AI labs deploy to work on them?
Based on a speculative back-of-the-envelope calculation, we estimate that companies like OpenAI have the hardware to deploy on the order of 7 million digital workers, with a wide 90% confidence interval of 400,000 to around 300 million.2 This doesn’t mean that OpenAI could do the jobs of 7 million human employees today, because AIs can’t fully substitute for humans. But as AI progress continues, AIs will be able to perform an increasing fraction of the tasks that humans currently do.

Model step	Input / assumption	Current central value
AI work scale	Workdays-equivalent per year	Scenario input
Tokens per workday-equivalent	PERT(5, 700, 2)*100000	~12M tokens
Token price	exp(PERT(log(0.05), log(9), log(1)))	~$1.4 / 1M tokens
Delivery cost share of token revenue	PERT(0.2, 1.2, 0.6)	~60%
Electricity share of delivery cost	PERT(0.05, 0.2, 0.11)	~12%
Electricity price	PERT(0.04, 0.08, 0.06)	~$0.06/kWh
Output	Inference electricity use	TWh/year

Source	URL	Evidence	Implied cost ratio	Role in model
OpenAI (H1 2025)	https://www.reuters.com/commentary/breakingviews/how-infer-method-to-openais-madness-2025-10-15/	$4.3B revenue vs $2.5B cost to deliver	58%	Lower bound of credible central range
OpenAI (adjusted)	https://www.reuters.com/technology/openai-sees-compute-spend-around-600-billion-by-2030-cnbc-reports-2026-02-20/	Adjusted gross margin ~33%	67%	Upper bound of credible central range
Anthropic	https://www.investing.com/news/stock-market-news/anthropic-trims-profit-margin-outlook-as-ai-operating-costs-rise--the-information-4459316	Gross margin ~40%	60%	Independent confirmation
DeepSeek	https://www.reuters.com/technology/chinas-deepseek-claims-theoretical-cost-profit-ratio-545-per-day-2025-03-01/	$87k cost vs $562k theoretical revenue	15.5%	Lower-bound / skew anchor

Source		Key numbers	Implied electricity share of full compute TCO
https://www.businessinsider.com/why-nvidia-worth-5-trillion-inside-35-billion-ai-datacenter-2025-10?utm_source=chatgpt.com	1 GW AI data center: $35B capex, $1.3B/year electricity		~10–16% (if amortized over 3–5 years, before other O&M)
https://en.wikipedia.org/wiki/Data_center?utm_source=chatgpt.com#cite_note-112	Electricity is >10% of total data center TCO (general baseline)		Establishes floor ≳10% in many cases

AI energy forecasts may be missing large-scale inference demand

AI energy forecasts may be missing large-scale inference demand

Core claim

Epistemic status

What the model does

My motivation

Uncertainty and model limitations

Introduction

Use of Monte Carlo analysis and uncertainty handling

Calculation steps and uncertainties/assumptions

1 – How many worker equivalents might AI replace?

Ways this unit could make the model overstate inference energy demand:

Ways this unit could make the model understate inference energy demand:

2 – How to compare human and AI work (8 hr task length)

Why the time anchor matters

What would update me toward a shorter anchor?

What would update me toward a longer anchor?

Why METR-style time horizons are useful but limited

3 – Estimated token usage at 8 hr task length

Why token usage is hard to estimate

Distribution used in the model

What would update me lower?

What would update me higher?

Success rate and accepted output

4 – Cost of tokens

5 – Compute costs as share of token revenue

6 – Energy cost share of compute cost

7 – Converting $ to kWh

Estimates of future inference energy usage

Example: Token usage required for 50% labor replacement