
Cross-posted to LessWrong
AI Safety veteran Holden Karnofsky thinks there’s a 49% chance his actions are making things worse.[1]
In 2025, Jesse Clifton even stepped down as the executive director of the Center on Long-Term risk because of similar reasons.
Even top AI Safety strategists don’t know what will make things better, and what will make things worse.
Why is it so hard to improve humanity’s odds?
And what can you do to choose your actions?
1) Hidden Failure Lets You Fail Without Knowing It
In AI Safety, impact is hard to measure, and thus lack of impact is often invisible. We call this "hidden failure". With hidden failure, projects fail to have a positive impact but the people doing the project don’t realise it.
To understand where hidden failure comes from, it’s useful to understand reasons why projects fail in general. These reasons fall on a spectrum:
- Wrong problem: You're addressing something with little influence on x-risk. For example, researching AI fairness when the core risk is misalignment.
- Wrong solution: Your solution doesn't solve the problem, even when competently executed. E.g. interpretability research that's technically novel but isn’t actually helpful.
- Poor execution: Your problem-solution set could be impactful but you're not executing your solution competently enough.
These factors can cause problems with both of the things you need to be impactful – adoption and effectiveness:
- A lack of adoption is relatively easy to spot if you want to[2] and can be remedied by entrepreneurial iteration.
- A lack of impact-effectiveness,[3] in contrast, can be particularly hard to spot, and that’s what we’re calling “hidden failure” in this post.
With hidden failure, you might have users, citations, and funding (i.e. you have “adoption”), and still fail to have impact or even make things worse.
Let us put that more bluntly: It’s literally possible for all your friends to think you’re successful and still be making things worse. Even within AI Safety. Even outside of frontier labs.
2) Why impact is harder than profit
Creating a profitable startup is hard. Achieving impact in AI Safety is even harder for several reasons:
- There is no clear (market) signal to guide you. In other words, it’s hard to measure success.
- To have impact, you need both adoption (like a for-profit)[4] AND effectiveness (unlike a standard for-profit).[5] In many ways, impact doesn’t just pose different challenges than profit. It poses extra challenges.
- AI Safety is largely pre-paradigmatic.
3) The pre-paradigmatic challenge
AI Safety doesn't have an established paradigm yet.[6] We can't predict with certainty what will be impactful. So why bother optimizing so deliberately?
First, imperfect predictions are still valuable. For example, AI Safety experts can often point out specific reasons why a given project or idea is unlikely to be impactful.[7]
Secondly, we argue the lack of a paradigm actually makes deliberate thinking about impact more important, not less. Without clear guides on what will lead to impact, you have to figure it out yourself.
The tools described in the next posts help you optimize for impact under uncertainty. The goal isn't to get it perfectly right or to cripple yourself with analysis paralysis.[8] But we do think most people would benefit from spending more time thinking about their impact.
So let's think strategically about impact. We’ll give a high-level overview of how to do that in an upcoming post, and we’ll help you measure your impact in another one.
Want to get notified of those upcoming posts? Subscribe at the Luc & Lens Academy substack https://lensacademy.substack.com/
- ^
We’re paraphrasing that from his appearance on the 80,000 hours podcast, around the 4:11:30 mark, where he said: “I think overall I would probably agree with you that the smaller you’re making the scope of where you’re hoping to have impact, the more reasonable it is to be like 60/40. But most people who go into AI are not going into it for that. Otherwise, if you want a small-scope, robustly positive impact, you should maybe work in a cause like farm animal welfare or global poverty. For the size of impact that tends to motivate people, I think it does get partially offset by this huge uncertainty about the sign.
I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one’s head, the more it’s going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it’s high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative.”
- ^
Though you shouldn’t underestimate your brain’s ability to make itself comfortable, satisfice, and employ motivated reasoning to have you accept mediocrity.
- ^
We’re using “impact-effectiveness” as a synonym for “effectiveness” as meant by the Impact Equation: Impact = Adoption x Effectiveness.
- ^
I will refer here and in other place to for-profits as regular companies not aimed at AI Safety. Of course, an AI Safety project can be set up as a for-profit too.
- ^
Although arguably, adoption is sometimes easier in a nonprofit setting. For example, the various fellowships have no trouble finding enough participants. In contrast, though, many products, tools, and blog posts do struggle to get adoption.
- ^
See e.g. https://ai-safety-atlas.com/chapters/03/07 or https://www.thecompendium.ai/ai-safety. Although instead of saying AI Safety is pre-paradigmatic, it’s more accurate to say that none of the existing paradigms is widely agreed to be sufficient for making the world safe, especially by higher level researchers in that paradigm. Aka, we have a bunch of paradigms, but they’re all pretty limited, and all-in-all we don’t even know yet what approaches will be required to make the world safe enough.
- ^
Though there are also areas where experts disagree. In such cases, it becomes even more important to assess the specific arguments they use.
- ^
See e.g. Holden Karnofsky on the 80000 hours podcast, where he says "When people ask me for career advice or whatever, the usual thing I’d say is: take a bunch of options that all seem competitive, and all seem like they could be the best thing, and that it’s not obvious which ones are better than others from an impact perspective. And from there I would say go with personal fit, go with the energy you feel to work on them."
