All of Ben_West🔸's Comments + Replies

Thanks for doing this. This question is marked as required but I think should either be optional or have a "none" option:
 

1
Clara Torres Latorre 🔸
Yes, this is a mistake. We'll fix it asap. Should be fixed now. Thank you for flagging it.

To decompose your question into several sub-questions:

  1. Should you defer to price signals for cause prioritization?
    1. My rough sense is that price signals are about as good as the 80th percentile EA's cause prio, ranked by how much time they've spent thinking about cause prioritization.
    2. (This is mostly because most EAs do not think about cause prio very much. I think you could outperform by spending ~1 week thinking about it, for example.)
  2. Should you defer to price signals for choosing between organizations within a given cause?
    1. This mostly seems decent to me. F
... (read more)
3
mako yass
Yeah, I feel for the first time founders, who idealistically wish that this part of the problem didn't so much exist. It oughtn't, afaict.

Thanks, that's helpful. Do you have a sense of where we are on the current S-curve? E.g., if capabilities continue to progress in a straight line through the end of this year, is that evidence that we have found a new S-curve to stack on the current one?

4
Toby_Ord
That's a great question. I'd expect a bit of slowdown this year, though not necessarily much. e.g. I think there is a 10x or so possible for RL before RL-training-compute reaches the size of pre-training compute, and then we know they have enough to 10x again beyond that (since GPT-4.5 was already 10x more), so there are some gains still in the pipe there. And I wouldn't be surprised if METR timelines keep going up in part due to increased inference spend (i.e. my points about inference scaling not being that good are to do with costs exploding, so if a cost-insensitive benchmark is going on, it might not register on it all that much). There is also room for more AI-research or engineering improvements to these things, and a lump of new compute coming in, making it a bit messy. Overall, I'd say my predictions are more about appreciable slowing in 2027+ rather than 2026.

the strength of this tail-wind that has driven much of AI progress since 2020 will halve

I feel confused about this point because I thought the argument you were making implies a non-constant "tailwind." E.g. for the next generation these factors will be 1/2 as important as before, then the one after that 1/4, and so on. Am I wrong?

2
Toby_Ord
Yeah, it isn't just like a constant factor slow-down, but is fairly hard to describe in detail. Pre-training, RL, and inference all have their own dynamics, and we don't know if there will be new good scaling ideas that breathe new life into them or create a new thing on which to scale. I'm not trying to say the speed at any future point is half what it would have been, but that you might have seen scaling as a big deal, and going forward it is a substantially smaller deal (maybe half as big a deal).

Interesting ideas! For Guardian Angels, you say "it would probably be at least a major software project" - maybe we are imagining different things, but I feel like I have this already. 

e.g. I don't need a "heated-email guard plugin" which catches me in the middle of writing a heated email and redirects me because I don't write my own emails anyway. I would just ask an LLM to write the email and 1) it's unlikely that the LLM would say something heated and 2) for the kinds of mistakes that LLMs might make, it's easy enough to put something in the agents... (read more)

Related, from an OAI researcher.

The AI Eval Singularity is Near

  • AI capabilities seem to be doubling every 4-7 months
  • Humanity's ability to measure capabilities is growing much more slowly
  • This implies an "eval singularity": a point at which capabilities grow faster than our ability to measure them
  • It seems like the singularity is ~here in cybersecurity, CBRN, and AI R&D (supporting quotes below)
  • It's possible that this is temporary, but the people involved seem pretty worried

Appendix - quotes on eval saturation

Opus 4.6

  • "For AI R&D capabilities, we found that Claude Opus 4.6 h
... (read more)

Related, from an OAI researcher.

Ah yes, this supports my pre-conceived belief that (1) we cannot reliably ascertain whether a model has catastrophically dangerous capabilities, and therefore (2) we need to stop developing increasingly powerful models until we get a handle on things.

Thanks for the article! I think if your definition of "long term" is "10 years," then EAAs actually do often think on this time horizon or longer, but maybe don't do so in the way you think is best. I think approximately all of the conversations about corporate commitments or government policy change that I have been involved in have operated on at least that timeline (sadly, this is how slowly these areas move). 

For example, you can see this CEA where @saulius projects out the impacts of broiler campaigns a cool 200 years, and links to estimates from... (read more)

1
James Brobin
Yeah, I guess my definition of long-term is more like twenty to a hundred years. Thanks for sharing your personal experience! As an outsider, I feel like I haven't seen many of those sorts of projects, but it's cool to see that people are actually doing them!

Ah yeah good point, I updated the text.

I'm excited about this series!

I would be curious what your take is on this blog post from OpenAI, particularly these two graphs:

Investment in compute powers leading-edge research and step-change gains in model capability. Stronger models unlock better products and broader adoption of the OpenAI platform. Adoption drives revenue, and revenue funds the next wave of compute and innovation. The cycle compounds.

While their argument is not very precise, I understand them to be saying something like, "Sure, it's true that the costs of both inference and training ... (read more)

I think this is a classic potential "correlation" problem. Probably Open AI just cherry picked data which looks good for them. They didn't pick a hypothesis to test, just put 2 graphs next to each other that look the same which is very weak data interpretaiton. Sure both compute and revenue might have increased at 3x a year for 2 years, but that doesn't tell us much. It doesn't  mean they have that much to do with each other directly. My guess is that of course there's some relationship between increased compute and revenue, but how much we just don't... (read more)

That is quite a surprising graph — the annual tripling and the correlation between the compute and revenue are much more perfect than I think anyone would have expected. Indeed they are so perfect that I'm a bit skeptical of what is going on. 

One thing to note is that it isn't clear what the compute graph is of (e.g. is it inference + training compute,  but not R&D?). Another thing to note is that it is year-end figures vs year total figures on the right, but given they are exponentials with the same doubling time and different units, that is... (read more)

EA Animal Welfare Fund almost as big as Coefficient Giving FAW now?

This job ad says they raised >$10M in 2025 and are targeting $20M in 2026. CG's public Farmed Animal Welfare 2025 grants are ~$35M.  

Is this right?

Cool to see the fund grow so much either way.

Agree that it’s really great to see the fund grow so much!

That said, I don’t think it’s right to say it’s almost as large as Coefficient Giving. At least not yet... :) 

The 2025 total appears to exclude a number of grants (including one to Rethink Priorities) and only runs through August of that year. By comparison, Coefficient Giving’s farmed animal welfare funding in 2024 was around $70M, based on the figures published on their website.

9
Jeff Kaufman 🔸
Specifically within animal welfare (this wasn't immediately clear to me, and I was very confused how CG's grants could be so low)

Hmm. I understood them to be saying they (semi-) voluntarily scaled back before phase 2 was complete, so we can't read that much into the fact that phases 2/3 (where the high quality journalism happens) didn't happen. Maybe I misunderstood?

2
NickLaing
Ah that's entirely possible.

My read of their phase one plan is that they were intending to get these pretty low quality tabloid stories as a springboard to getting higher quality stuff. Maybe that was a bad plan, but the fact that the bad tabloid articles were in fact bad tabloid articles doesn't seem to discredit that?

1
NickLaing
I think it partly does discredit that? Its a pretty low probability bet that bad tabloid articles will likely graduate to more serious articles. Especially given that this campaign actually did get quite a lot of media (maybe even more than expected?), and that still didn't happen.

Is this just a statement that there is more low-hanging fruit in safety research? I.e., you can in some sense learn an equal amount from a two-minute rollout for both capabilities and safety, but capabilities researchers have already learned most of what was possible and safety researchers haven't exhausted everything yet. 

Or is this a stronger claim that safety work is inherently a more short-time horizon thing?

4
Rohin Shah
It is more like this stronger claim. I might not use "inherently" here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The "natural" test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.

This is great. 

Thinking through my own area of Data Generation and why I would put it earlier than you, I wonder if there should be an additional factor like "Importance of quantity over quality."

For example, my sense is that having a bunch of mediocre ideas about how to improve pretraining is probably not very useful to capabilities researchers. But having a bunch of trajectories where an attacker inserts a back door is pretty useful to control research, even if these trajectories are all things that would only fool amateurs and not experts. And if y... (read more)

1
Jan Wehner🔸
I quite like the "Importance of quantity over quality" factor and hadn't thought of this before! It's slightly entangled with verifiability (if you can cheaply assess whether something is good, then it's fine to have a large quantity of attempts even if many are bad). But I think quantity vs quality also matters beyond that. I'll add it as a possible additional factor in the post. I agree that this factor advantages Data Generation and AI Control. I think Dangerous Capability Evals also benefits from quantity a lot.

Thanks Sarah! Is something written up about the CEA donation system? I'm surprised that that's a priority, but obviously know zero details.

2
Sarah Cheng 🔸
Thanks for the question Ben! The main reason that this is a priority is to help EA Funds (which is now part of CEA) grow and diversify their donations, by making it easier to gather info from donors[1] and build relationships with them, and giving us more freedom to optimize the UX of the donation flow. AWF in particular has ambitious 2026 plans and a significant funding gap, and we’d be excited to help them reach their donation goal for this year! :) 1. ^ GWWC, the primary platform EA Funds has used historically, defaults to opt-out for donor data sharing. As far as I understand, this prevents us from being able to contact the majority of donors. We recently added the option of donating via every.org as well, which is opt-in by default so that’s improved the situation.

Note that some of the founders have pledged to donate 80% of their equity. 

5
Zac Hatfield-Dodds
All of the founders, actually!

Yeah, I agree that the hypothetical EA seems less like a "radical" abolitionist (for some definition of "radical"). 

1
Holly Elmore ⏸️ 🔸
Yeah, freeing slaves, but not necessarily abolishing the institution. I’m not trying to be difficult— I think this difference in goals is the point. And it’s fine if you want to bite the bullet and say you wouldn’t be a radical abolitionist, but most modern people think they would have.

Your hypothetical EA sounds kind of like an abolitionist to me :)

The Society formed a ways-and-means committee to deal with the difficulty that more than half of the members, including Troup and Jay, owned slaves (mostly a few domestic servants per household). The committee proposed a plan for gradual emancipation: members would free slaves younger than 28 when they reached the age of 35, slaves between 28 and 38 in seven years' time, and slaves over 45 immediately. This proposal failed however, and the committee was dissolved.

https://en.wikipedia.org/wiki/New_York_Manumission_Society 

2
Holly Elmore ⏸️ 🔸
This isn’t abolitionism. Manumission means “letting go”.

Thanks for finding this and writing it up! And thanks to FRI for updating their report. 

I thought this was an interesting point, thanks for writing it.

I feel confused about this response. You're asking for people to give you examples of a thing occurring, I'm asking by what date range you wish to see examples in.

Okay; I guess I was confused by your question because I thought I'd said that in the main doc.

To repeat and with added explanation:  Only opinions from before ChatGPT count.

This is because ChatGPT moved the Overton window and changed which sorts of opinions would earn you the horror of contemptuous looks and lowered status, and my negative model of OpenPhil is that they miraculously arrived at a set of opinions which would balance which sort of looks they got from a weighted set of people they cared about.  So whatever happened after the ChatGPT ... (read more)

What time frame are you interested in? E.g. if someone says that they have <30y timelines today, would that meet your criteria?

-35
EliezerYudkowsky

Thanks! Perhaps I phrased this poorly; a person being a patient or not isn't the relevant factor, it's whether or not they are licensed. E.g. if you look at the FDA authorization for the first product it says:

The ContaCT mobile application is intended to be used by neurovascular specialists, such as vascular neurologists, neuro-interventional specialists, or users with similar training who have been pre-authorized by their Healthcare Organization or Facility.

I'm actually not sure whether one could generously interpret "similar training" to include e.g. rad... (read more)

4
NickLaing
Yeah as long as AI radiography interpretation isn't covered by insurance, forget about it. In general I think people massively underrate professional gate keeping in slowing down AI automation and economic takeover in general. Doctors have gatekept for ages, they will only double down here. Like in many situations, you will basically need the full consent of the people who's jobs will be taken, for those jobs to be taken.... Good luck with that.   We've seen the first phase with Hollywood, drivers and Radiology but I think even bigger resistance will come. Why would you not fight tooth and nail agains AI when its your own livelihood at stake?

I doubt that there are surveys of when people stayed home. You could maybe try to look at prediction markets but I'm not sure what you would compare them to to see if the prediction market was more accurate than some other reference group.

2
Yarrow Bouchard 🔸
That seems like the crux of the matter!

Thanks for collecting this timeline! 

The version of the claim I have heard is not that LW was early to suggest that there might be a pandemic but rather that they were unusually willing to do something about it because they take small-probability high-impact events seriously. Eg. I suspect that you would say that Wei Dai was "late" because their comment came after the nyt article etc, but nonetheless they made 700% betting that covid would be a big deal.

I think it can be hard to remember just how much controversy there was at the time. E.g. you say of... (read more)

-9
Yarrow Bouchard 🔸

Congrats Samantha and the AIM team!

Your answer is the best that I know of, sadly.

A thing you could consider is that there are a bunch of EAGx's in warm/sunny places (Ho Chi Minh City, Singapore, etc.). These cities maybe don't meet the definition of "hub", but they have enough people for a conference, which possibly will meet your needs.

1
James Brobin
Gotcha! I appreciate the help!

Thanks Vasco, I hadn't seen that. Do you know if anyone has addressed Nathan's "Comparative advantage means I'm guaranteed work but not that that work will provide enough for me to eat" point? (Apart from Maxwell, who I guess concedes the point?)

2
Vasco Grilo🔸
I think MaxWell conceded Nathan's point, and I do not know about anyone disputing it in a mathematical sense (for all possible parameters of economic models). However, in practice, what matters is how automation will plausibly affect wages, and human welfare more broadly.

why are there fewer horses?

+1 to this being an important question to ask.

6
Vasco Grilo🔸
Hi Nathan and Ben. I liked Maxwell's follow-up post What About The Horses?. ---------------------------------------- I agree. The last section of the post above briefly discusses this. Also on comparative advantage, I liked Noah Smith's post Plentiful, high-paying jobs in the age of AI.

+1 to maintaining justification standards across cause areas, thanks for writing this post!

Fwiw I feel notably less clueless about WAW than about AI safety, and would have assumed the same is true of most people who work in AI safety, though I admittedly haven't talked to very many of them about this. (And also haven't thought about it that deeply myself.)

Is the amount which has been donated to the fund visible anywhere?

2
Toby Tremlett🔹
On the frontpage banner :) 

Sorry, I don't mean models that you consider to be better, but rather metrics/behaviors. Like what can V-JEPA-2 (or any model) do that previous models couldn't which you would consider to be a sign of progress?

1
Yarrow Bouchard 🔸
The V-JEPA 2 abstract explains this: Again, the caveat here is that this is Meta touting their own results, so I take it with a grain of salt. I don't think higher scores on the benchmarks mentioned automatically imply progress on the underlying technical challenge. It's more about the underlying technical ideas in V-JEPA 2 — Yann LeCun has explained the rationale for these ideas — and where they could ultimately go given further research. I'm very skeptical of AI benchmarks in general because I tend to think they have poor construct validity, depending how you interpret them, i.e., insofar as they attempt to measure cognitive abilities or aspects of general intelligence, they mostly don't measure those things successfully.[1] The clearest and crudest example to illustrate this point is LLM performance on IQ tests. The naive interpretation is that if an LLM scores above average on an IQ test, i.e., above 100, then it must have the cognitive properties a human does when they score above average on an IQ test, that is, such an LLM must be a general intelligence. But many LLMs, such as GPT-4 and Claude 3 Opus, score well above 100 on IQ tests. Are GPT-4 and Claude 3 Opus therefore AGIs? No, of course not. So, IQ tests don't have construct validity when applied to LLMs if you think IQ tests measure general intelligence for AI systems. I don't think anybody really believes IQ tests actually prove LLMs are AGIs, which is why it's a useful example. But people often do use benchmarks to compare LLM intelligence to human intelligence based on similar reasoning. I don't think the reasoning is any more valid with those benchmarks than it is for IQ tests. Benchmarks are useful for measuring certain things; I'm not trying to argue with narrow interpretations. I'm specifically arguing with the use of benchmarks to put general intelligence on a number line, such that a lower score on a benchmark means an AI system is further away from general intelligence and a higher score

What are examples of what you would consider to be progress on "effective video prediction"?

0
Yarrow Bouchard 🔸
Possibly something like V-JEPA 2, but in that case I'm just going off of Meta touting its own results, and I would want to hear opinions from independent experts.

LLMs have made no progress on any of these problems

Can we bet on this? I propose: we give a video model of your choice from 2023 and one of my choice from 2025 two prompts (one your choice, one my choice) then ask some neutral panel of judges (I'm happy to just ask random people in a coffee shop) which model produced more realistic videos. 

0
Yarrow Bouchard 🔸
I’ll bet you a $10 donation to the charity of your/my choice that a judge we agree on with formal/credentialed expertise in deep learning research (e.g. an academic or corporate AI researcher) will say that typical autoregressive large language models like GPT-4/GPT-5 or Claude 2/Claude 4.5 have not non-trivially made or constituted progress on the AI research problem of learning from video data via approaches that don’t rely on pixel-level prediction. I’m open to counter-offers. I’ll also say yes to anyone who wants to take the other side of this bet.
0
Yarrow Bouchard 🔸
I didn’t say that pixel-to-pixel prediction or other low-level techniques haven’t made incremental progress. I said that this approach is ultimately forlorn — if the goal is human-level computer vision for robotics applications or AGI that can see — and LLMs didn’t make any progress on any alternative approaches.

I'm saying that the authors summarized their findings without caveats, and that those caveats would dramatically change how most people interpret the results. 

(Note that, despite the "MIT" name being attached, this isn't an academic paper, and doesn't seem to be trying to hold itself to those standards.)

4
Yarrow Bouchard 🔸
I recommend emailing the authors and asking for clarification. I’ve done this more than once in the past when I’ve had thoughts about papers I’ve read and have gotten some extremely helpful, illuminating replies. I always worry about bothering people, but I get the sense that, rather than being annoyed, people find it rewarding that anyone took an interest in their work, or at least don’t mind answering a quick email.

I agree that the authors encourage this misreading of the data by eg saying "95% of organizations are getting zero return" and failing to note the caveats listed in my comment. If you believe that this statement is referencing a different data set than the one I was quoting which doesn't have those caveats, I'd be interested to hear it.

2
Yarrow Bouchard 🔸
Are you saying the authors of the study are misreporting their own results?

95% of the time, AI fails to generate significant revenue for businesses that adopt it

I think this is a misreading of the study, though the article you link seems to make the same mistake. Here's the relevant graph:

The finding is that 5% of all companies (not just those that have adopted AI) had an executive who reported "a marked and sustained productivity and/or P&L impact" of a task-specific GenAI.

I think a more accurate summary of the paper is something like "80% of LLM pilots are reported as successful by executives."[1]

  1. ^

    Assuming that all successf

... (read more)
1
Yarrow Bouchard 🔸
Quote from the executive summary of the MIT Media Lab study, on page 3: Quote from page 6: Page 7:

When I worked at CEA, a standard question I would ask people was "what keeps you engaged with EA?" A surprisingly common answer was memes/shitposts.

This content has obvious downsides, but does solve many of the problems in OP (low time commitment, ~anyone can contribute, etc.).

+1, this seems more like a Task Y problem.

My impression is that if OP did want to write specialist blogposts etc. they would be able to do that (probably even better placed than a younger person, given their experience). (And conversely, 18 year olds who don't want to do specialist work or get involved in a social scene don't have that many points of attachment.)

I use DoneThat and like it, thanks for building it!

Thanks for writing this up - I think "you don't need to worry about reward hacking in powerful AI because solving reward hacking will be necessary for developing powerful AI" is an important topic. (Although your frame is more "we will fail to solve reward hacking and therefore fail to develop powerful AI," IIUC.)

I would find it helpful if you reacted more to the existing literature. E.g. I don't think anyone disagrees with your high-level point that it's hard to accurately supervise models, particularly as they get more capable, but also we have empirical... (read more)

I interpret OP's point about asymptotes to mean that he indeed bites this bullet and believes that the "compensation schedule" is massively higher even when the "instrument" only feels slightly worse?

Great points both and I agree that the kind of tradeoff/scenario described by @EJT and @bruce in his comment are the strongest/best/most important objections to my view (and the thing most likely to make me change my mind)

Let me just quote Bruce to get the relevant info in one place and so this comment can serve as a dual response/update. I think the fundamentals are pretty similar (between EJT and Bruce's examples) even though the exact wording/implementation is not:

A) 70 years of non-offsettable suffering, followed by 1 trillion happy huma

... (read more)

In his examples ( and  lexically ordered) there is no "most intense suffering which can be outweighed" (or "least intense suffering which can't be outweighed"). E.g. in the hyperreals  no matter how small  or large 

S* is only a tiny bit worse than S

In his examples, between any S which can't be outweighed and S* which can, there are an uncountably infinite number of additional levels of suffering! So I don't think it's correct to say it's only a tiny bit worse.

Oh yep nice point, though note that - e.g. - there are uncountably many reals between 1,000,000 and 1,000,001 and yet it still seems correct (at least talking loosely) to say that 1,000,001 is only a tiny bit bigger than 1,000,000.

But in any case, we can modify the argument to say that S* feels only a tiny bit worse than S. Or instead we can modify it so that S is degrees celsius of a fire that causes suffering that just about can be outweighed, and S* is degrees celsius of a fire that causes suffering that just about can't be outweighed.

Thanks for writing this Seth! I agree it's possible that we will not see transformative effects from AI for a long time, if ever, and I think it's reasonable for people to make plans which only pay off on the assumption that this is true. More specifically: projects which pay off under an assumption of short timelines often have other downsides, such as being more speculative, which means that the expected value of the long timeline plans can end up being higher even after you discount them for only working on long timelines.[1]

That being said, I think you... (read more)

2
Seth Ariel Green 🔸
Hi Ben, I agree that there are a lot of intermediate weird outcomes that I don't consider, in large part because I see them as less likely than (I think) you do. I basically think society is going to keep chugging along as it is, in the same way that life with the internet is certainly different than life without it but we basically all still get up, go to work, seek love and community, etc. However I don't think I'm underestimating how transformative AI would be in the section on why my work continues to make sense to me if we assume AI is going to kill us all or usher in utopia, which I think could be fairly described as transformative scenarios ;)  If McDonalds becomes human-labor-free, I am not sure what effect that would have on advocating for cage-free campaigns. I could see it going many ways, or no ways. I still think persuading people that animals matter, and they should give cruelty-free options a chance, is going to matter under basically every scenario I could think of, including that one.
Load more