Ben_West🔸

Thanks for all your work Joey! If it is the case that your counterfactual impact is lower now, it is coming down from a high place, because I have been impressed with AIM for a while and my impression is that you were pivotal in founding and running it.

METR: Measuring AI Ability to Complete Long Tasks

Ben_West🔸8d4

Fair enough! My guess is that when the trend breaks it will be because things have gone super-exponential rather than sub-exponential (some discussion here) but yeah, I agree that this could happen!

METR: Measuring AI Ability to Complete Long Tasks

Ben_West🔸8d*4

Thanks for the question David! I expect that I can't summarize this more simply than the paper does; particularly: section 4 goes into more detail on what the horizon means and section 8.1 discusses some limitations of this approach.

METR: Measuring AI Ability to Complete Long Tasks

Ben_West🔸8d4

So the claim is:

The 50% trend will break down at some length of task
The 80% trend will therefore break at $T / 4$
And maybe $T$ is large enough to cause some catastrophic risk, but $T / 4$ isn't

METR: Measuring AI Ability to Complete Long Tasks

Ben_West🔸8d18

Figure four averages across all models. I think figure six is more illuminating:

Basically, the 80% threshold is ~2 doublings behind the 50% threshold, or ~1 year. An extra year isn't nothing! But you're still not getting to 10+ year timelines.

AI is not taking over material science (for now): an analysis and conference report

Ben_West🔸15d4

Thanks for writing this up! I really like when people do concrete empirical surveys like this, it's helpful to get a sense of how widely current tools are actually being used.

I'm curious if you have thoughts about what automation would actually speed you up? It sounds like maybe something like "current LLMs but without hallucination?"

Also, do you have a sense for how much investment has been made into AI tools in CEST? My impression is that deepmind really loves getting into nature/science but has very little interest in actually commercializing these tools, so it feels not that surprising to me that the thing which got into science didn't actually get used.^[1] It would update me if they tried very hard to commercialize it but failed.

^{^}
I agree that this doesn't speak well of the editorial process though

From Comfort Zone to Frontiers of Impact: Pursuing A Late-Career Shift to Existential Risk Reduction

Ben_West🔸15d5

This was a great post, thanks for writing it up

Habryka [Deactivated]'s Quick takes

Ben_West🔸24d*25

It feels appropriate that this post has a lot of hearts and simultaneously disagree reacts. We will miss you, even (perhaps especially) those of us who often disagreed with you.

I would love to reflect with you on the other side of the singularity. If we make it through alive, I think there's a decent chance that it will be in part thanks to your work.

Kurzgesagt video on factory farming

Ben_West🔸26d6

I was excited that they did this and thought it was well produced. The focus on cost cutting feels like a double edged sword: it absolves viewers of responsibility, which makes them more open to the message but also less likely to do anything. I scrolled through the first couple pages of comments and saw a bunch of "corporations are greedy" complaints but couldn't find anyone suggesting a concrete behavioral change (for themselves or others).

I wonder if there's an adjacent version of this which keeps the viewer absolved of responsibility but still has a call to action. Plausible ideas:

Race to the top: e.g. specifically call out the worst corporate offender in the video
Political stuff, e.g. push for EU Commission to keep their cage banning promise
1. Maybe YouTube rules about politics prevents them saying this, not sure

In any case, kudos to the Kurzgesagt team for making a video on this which (as of this writing) has 2M+ views!

Ben_West's Quick takes

Ben_West🔸1mo31

OpportunitiesShow more

If you can get a better score than our human subjects did on any of METR's RE-Bench evals, send it to me and we will fly you out for an onsite interview

Caveats:

you're employable (we can sponsor visas from most but not all countries)
use same hardware
honor system that you didn't take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking

(Crossposted from twitter.)

Ben_West🔸

Bio

How others can help me

Posts 87

Sequences 3

Comments1057

Topic contributions6

Posts
87

Sequences
3

Comments
1057

Topic contributions
6