Hide table of contents

Summary

Last updated 2024-11-20.

It's been a while since I last put serious thought into where to donate. Well I'm putting thought into it this year and I'm changing my mind on some things.

I now put more priority on existential risk (especially AI risk), and less on animal welfare and global priorities research. I believe I previously gave too little consideration to x-risk for emotional reasons, and I've managed to reason myself out of those emotions.

Within x-risk:

  • AI is the most important source of risk.
  • There is a disturbingly high probability that alignment research won't solve alignment by the time superintelligent AI arrives. Policy work seems more promising.
  • Specifically, I am most optimistic about policy advocacy for government regulation to pause/slow down AI development.

In the rest of this post, I will explain:

  1. Why I prioritize x-risk over animal-focused longtermist work and global priorities research.
  2. Why I prioritize AI policy over AI alignment research.
  3. My beliefs about what kinds of policy work are best.

Then I provide a list of organizations working on AI policy and my evaluation of each of them, and where I plan to donate.

Cross-posted to my website.

I don't like donating to x-risk

(This section is about my personal motivations. The arguments and logic start in the next section.)

For more than a decade I've leaned toward longtermism and I've been concerned about existential risk, but I've never directly donated to x-risk reduction. I dislike x-risk on an emotional level for a few reasons:

  • In the present day, aggregate animal welfare matters far more than aggregate human welfare (credence: 90%). Present-day animal suffering is so extraordinarily vast that on some level it feels irresponsible to prioritize anything else, even though rationally I buy the arguments for longtermism.
  • Animal welfare is more neglected than x-risk (credence: 90%).[1]
  • People who prioritize x-risk often disregard animal welfare (or the welfare of non-human beings, whatever shape those beings might take in the future). That makes me distrust their reasoning on cause prioritization. (This isn't universally true—I know some people who care about animals but still prioritize x-risk.)
  • I find it distasteful the way people often talk about "human extinction", which seemingly ignores the welfare of all other sentient beings.[2]

For a while, I donated to animal-focused orgs that looked good under longtermism, like Sentience Institute. In recent years, I've avoided thinking about cause prioritization by supporting global priorities research (such as by donating to the Global Priorities Institute)—pay them to think about cause prioritization so I don't have to. I still believe there's a good case for that sort of research, but the case for existential risk is stronger (more on this below).

I've spent too long ignoring my rationally-formed beliefs about x-risk because they felt emotionally wrong. I'm normally pretty good at biting bullets. I should bite this bullet, too.

This decision to prioritize x-risk (mostly[3]) didn't happen because I changed my mind. It happened because I realized I was stupidly letting my emotional distaste toward x-risk sway my decision-making.

On the other hand, I've become more worried about AI in the last few years. My P(doom) hasn't really gone up, but the threat of misaligned AI has become more visceral. I believe unaligned AI is my most likely cause of death, and I'd rather not die.[4]

Cause prioritization

S-risk research and animal-focused longtermism

I believe animal-focused (or non-human-focused) longtermist work is important (credence: 95%), and that it's far more neglected than (human-focused) x-risk reduction (credence: 99%). I believe the same about s-risk research (and s-risks heavily overlap with animal-focused longtermism, so that's not a coincidence). But I also believe:

  1. At equivalent levels of funding, marginal work on x-risk is more cost-effective (credence: 75%) because non-human welfare is likely to turn out okay if we develop friendly AI.
  2. The cost-effectiveness of x-risk funding diminishes slowly enough that it's better even at current funding levels (credence: 65%), especially because some of the most promising sub-fields within x-risk remain poorly funded.

Improving animal welfare likely has good flow-through effects into the distant future. But I think those flow-through effects don't have a huge expected value compared to x-risk reduction because they only matter under certain conditions (I discussed these conditions a while back in Is Preventing Human Extinction Good? and On Values Spreading.)

This judgment is hard to make with confidence because it requires speculating about what the distant future will look like.

In Marcus Abramovitch's excellent writeup on where he donated in 2023, he said,

I don't think many x-risk organizations are fundamentally constrained on dollars and several organizations could be a lot more frugal and have approximately equal results.

I basically agree with this but I think there are some x-risk orgs that need more funding, and they're among some of the most promising orgs.

X-risk vs. global priorities research

A dilemma:

  1. We can't fully align AI until we solve some foundational problems in ethics and other fields.
  2. If we don't align AI, we will go extinct before we solve those foundational problems.

(One proposed solution is to conduct a long reflection, and basically put a superintelligent AI on standby mode until that's done. This proposal has some issues but I haven't heard anything better.)

So, should we focus on x-risk or global priorities research?

Ultimately, I think x-risk is the higher priority (credence: 70%). If we build a (mostly) friendly AI without really figuring out some details of AI alignment, maybe we can work things out from there. But the problems in global priorities research seem so complex that we have essentially no chance of solving them before AGI arrives (unless AGI turns out to take much longer than expected), regardless of how much funding goes to global priorities research.

Prioritization within x-risk

Among the various existential risks, AI risk stands out as clearly the most important. I believe this for essentially the same reasons that most people believe it.

In short:

  • Natural risks are less concerning than human-caused risks (credence: 98%).
  • Climate change is not a serious existential threat (credence: 90%). See John Halstead's report Climate Change & Longtermism.
  • Engineered pandemics are considerably less likely to cause extinction than AI (credence: 95%). I've heard biologists in the x-risk space claim that it would be very hard for a pandemic to cause total extinction.
  • Nuclear war is worrisome but less of an extinction risk than AI (credence: 85%). See 80,000 Hours' table of x-risk estimates for nuclear war.

For more, see Michael Aird's database of existential risk estimates.

AI safety technical research vs. policy

There are a few high-level strategies for dealing with AI risk. We can broadly classify them into (1) technical research and (2) policy. Basically:

  1. technical research = figure out how to prevent AI from killing everyone
  2. policy = increase the probability that policies/regulations will reduce x-risk

(You could further divide policy into research vs. advocacy—i.e., figure out what makes for good regulations vs. advocate for regulations to be enacted. I'll talk more about that later.)

I don't have any expertise in AI[5] and I don't know what kinds of alignment research are most promising, but experts can't seem to agree either—some think prosaic alignment will work, others think we need fundamentally new paradigms. (I lean toward the latter (credence: 70%).)

But I don't see how we are going to solve AI alignment. The best existing research seems like maybe it has some chance of someday leading to some method that could eventually solve alignment with enough work, perhaps.[6] Our best hope is that either (1) AGI turns out to be much harder to develop than it looks or (2) solving alignment turns out to be really easy for some unforeseen reason.[7] But those hopes are non-actionable—I can't increase their probability by donating money.

Ever since I became concerned about AI risk (about a decade ago), I've weakly believed that we were not on pace to solve alignment before AGI arrived. But I thought perhaps technical research would become sufficiently popular as the dangers of AI became more apparent. By now, it's clear that that isn't happening, and we're not going to solve AI alignment in time unless the problem turns out to be easy.

I used to be even more pessimistic about AI policy than technical research,[8] but now I think it's the more promising approach (credence: 80%). Surprisingly (to me), AI safety (as in notkilleveryoneism) is now kinda sorta mainstream, and there's some degree of political will for creating regulations that could prevent AI from killing everyone. SB 1047, which might have meaningfully decreased x-risk, saw widespread support. (Unfortunately, one particular guy with veto power did not support it.[9])

Another consideration: bad policy work can backfire. I don't know much about policy and I'm relatively bad at understanding people, so on priors, I don't expect to be good at figuring out which policy efforts will work. I used to think I should defer to people with better social skills. But now I've seen some of the poor results produced by policy orgs that care a lot about reputation management, and I've seen how messaging about extinction is much more palatable than the people with good social skills predicted (e.g., as demonstrated by public opinion polling), so I think I overrated others' judgment and underrated my own. As a consequence, I feel more confident that I can identify which policy orgs are doing good work.

In summary:

  • I don't think technical research is going to work.
  • Policy might work.
  • I think I'm qualified enough to evaluate policy orgs.

So I want to donate to something related to AI policy.

Quantitative model on research vs. policy

I built a coarse quantitative model on the expected value of donations to technical research vs. policy. The model inputs are very rough but the model illustrates some important principles.

(Disclaimer: First I decided to donate to AI policy, then I built the model, not the other way around.[10] If the model had disagreed with my beliefs, then I would have changed the model. But if I couldn't find a reasonable way to make the model fit my beliefs, then I would have changed my beliefs.)

Preventing x-risk works like voting. All the expected value of your vote comes from the situation where the outcome is exactly tied and your vote breaks the tie. If the expected vote is close to 50/50, your vote has a high EV. If the expected vote count is far from 50/50, there's an extremely small probability that your vote will matter.

If I believed that it would cost (say) $10 billion to solve AI alignment, and also the total spending without my donation would be close to $10 billion, then my donation to alignment research has a high EV. But in fact I believe it probably costs much more than we're going to spend[11] (assuming no regulations).

On a naive view, that means a donation to alignment research has extremely low EV. But that's not correct because it doesn't account for uncertainty. My median guess is that solving AI alignment will cost maybe $100 billion, and we will only actually spend $1 billion.[12] If my credence intervals for those two numbers followed normal distributions, then the probability of making a difference would be incredibly small (like, "number of atoms in the solar system" small) because normal distributions have extremely low probability mass in the tails. But my beliefs have wide credence intervals, and they're not normally distributed. So my distribution for cost-to-solve-alignment heavily overlaps with total-spending-on-alignment.

It's hard to have good intuitions for the probability that a donation makes a difference when that probability depends on the intersection of two overlapping fat-tailed distributions. That's the sort of thing a quantitative model can help with, even if you don't take the numbers too seriously.

Ultimately, I think AI policy has higher EV than technical research. According to my made-up numbers, donating to policy is ~3x more cost-effective than donating to research. Somewhat more pessimistic inputs can change the ratio to >1000x.[13] The ratio flips to <1x if you think there's close to a 50/50 chance that we solve alignment without any government intervention.

"Man versus man" conflicts within AI policy

Some prioritization decisions are positive sum: people can work on technical research and policy at the same time. But others are zero sum. I'm wary of "man versus man" conflict—of working in opposition to other (loosely) value-aligned people. But in policy, sometimes you have to engage in "man versus man" conflict. I want to think extra carefully before doing so.

There are a few such conflicts within AI policy.

Strategy A Strategy B
Do capabilities and safety research in parallel. We need more advanced models to better understand what AGI will look like. Slow down capabilities research to buy more time for safety research.
Don't push for regulations, they will excessively harm technological development. Push for regulations because the risk is worth it.
Take the time to figure out nuanced regulations that won't impede the good parts of AI. Push for regulations ASAP, even if they have worse side effects.
Cooperate with big AI companies to persuade them to behave more safely. Work against AI companies to stop their dangerous behaviors.
Diplomatically develop political connections, and later use those to push for AI safety policies. Loudly argue for AI safety policies now, even if it makes us look weird.

In every case, I like strategy B better. But how sure am I that I'm on the right side?

Parallel safety/capabilities vs. slowing AI

What are the arguments for advancing capabilities?

If we don't advance capabilities, China will. Or some other company that doesn't care about safety will.

This is the strongest argument by my assessment, but I still think it's wrong (credence: 90%).

  • The companies that have done the most to accelerate AI development have all done so in the name of safety. If they didn't believe advancing the frontier was the safest move, the world would be in a much safer position right now.
  • It's likely that international treaties could prevent arms races—they worked well(ish) for nuclear weapons.
  • China is behind the US on AI. There is no need to race harder when you're ahead.
  • By my amateurish judgment, China doesn't seem as interested in an arms race as the US. You don't need to race against someone who's not racing.
  • How sure are you that it's better for the United States to develop AI first? (China is less interested in controlling world politics than the US, and the Chinese government seems more concerned about AI risk than the US government.)
  • Who develops superintelligent AI first only matters in the narrow scenario where AI alignment is easy but also the AI can and will be used by its creators to take over the world:
    • If AI alignment is hard, it doesn't matter who develops it first because everyone dies either way.
    • If the AI is fully aligned, it will refuse to fulfill any unethical requests its creator makes (such as taking over the world).
  • I don't think we as a society have a good grasp on the game theory of arms races but I feel like the solution isn't "push the arms race forward even faster".

AI companies need to build state-of-the-art (SOTA) models so they can learn how to align those models.

I've heard people at Anthropic make this argument. But it's disingenuous (or at least motivated)[14] because Anthropic is accelerating capabilities, not just matching the capabilities of pre-existing models—and they have a history of almost-but-not-technically lying about whether they were going to advance capabilities).[15] And the argument doesn't really make sense because:

  • They could satisfy their stated goal nearly as well by training models that are (say) 3/4 as good as the state of the art (credence: 90%).
  • Or they could make deals with other companies to do safety research on their pre-existing SOTA models. This would satisfy their stated goal (credence: 98%). Companies might not be willing to cooperate like this, but surely it's worth trying (and then trying harder).
  • There are many types of plausibly-productive alignment research that don't require SOTA models (credence: 90%).[16]
  • Having SOTA models doesn't differentially improve alignment—it teaches you just as much about how to improve capabilities (credence: 60%).

If the bolded argument is correct, then AI companies should:

  1. Temporarily stop AI development.
  2. Learn everything they possibly can about AI alignment with the current model.
  3. Publish a report on how they would use a more capable AI to improve alignment.
  4. Get review from third-party alignment researchers.
  5. If reviewers have a strong consensus that the report is reasonable, only then resume AI development.

That is the sort of behavior you would see from a company that takes existential risk appropriately seriously.

We need to develop AI as soon as possible because it will greatly improve people's lives and we're losing out on a huge opportunity cost.

This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it's not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don't need to be a strong longtermist, you just need to accord significant value to people who aren't born yet.

I've heard a related argument that the size of the accessible lightcone is rapidly shrinking, so we need to build AI ASAP even if the risk is high. If you do the math, this argument doesn't make any sense (credence: 95%). The value of the outer edge of the lightcone is extremely small compared to its total volume.[17]

AI could be the best thing that's ever happened. But it can also be the best thing that's ever happened 10/20/100 years from now, and if delaying AI lets us greatly reduce existential risk, then it's worth the delay.

We should advance capabilities to avoid a "hardware overhang": a situation where AI can be improved purely by throwing more hardware at it, which is potentially dangerous because it could cause AI to leap forward without giving people time to prepare.

Sam Altman has made this argument. But he's disingenuous (credence: 95%) because he also wants to fund hardware advances, which will increase hardware overhang.[18] And:

  • This argument implies that AI companies should stop looking for algorithmic improvements (credence: 90%) because they don't produce overhang.
  • Pausing AI development would reduce demand for AI chips, slowing down hardware development.
  • Eliminating overhang only helps if we can meaningfully advance alignment using the higher level of capabilities. That seems unlikely to be worth the tradeoff because alignment has historically progressed very slowly. We are not on pace to solving alignment, with or without an overhang.
    • If we will be able to align a bigger model, shouldn't it be even easier to align the models we currently have? But we don't know how to align the models we have (beyond the superficial pseudo-alignment that RLHF produces).
  • An AI Impacts report described some examples of overhang in other industries. "None of them match the behavior that people seem to expect will happen with hardware overhang."

We need AGI to prevent some other existential risk from killing everyone.

Nearly every respectable person[19] who has estimated x-risk probabilities agrees that AI is by far the largest x-risk in the next century.

It's okay to advance capabilities because AI does not pose an existential risk.

This is a popular argument, but presumably everyone reading this already disagrees, so I'm not going to attempt to rebut it.


The parallel safety/capabilities side of the argument seems weak to me (and relies on a lot of what looks like motivated reasoning), so I feel comfortable supporting the pause side (credence: 85%).[20]

But there's some common ground:

  • Both sides should agree that slowing down hardware is good or at least neutral (credence: 75%).[21] This alleviates every concern except for the one about the opportunity costs of delaying development.
  • Both sides should support regulations and international treaties that restrict the speed of AI development (credence: 65%). International treaties alleviate concerns about arms races and about needing to stay on the cutting edge.

Freedom vs. regulation

Arguments against regulation:

Regulations to slow AI would require the government to take authoritarian measures.

This argument seems pretty wrong to me (credence: 95%). Other industries have much stricter regulations than AI without slipping into totalitarianism. If the regulations on AI GPUs were as strict as the ones on, say, pseudoephedrine, that would be sufficient to slow and monitor hardware development.[22]

Even if the regulations required individual people to turn in their GPUs to the government (and I don't know why that would be required because GPU manufacturing is pretty centralized), there's already precedent for that sort of thing in relatively free societies with e.g. the Australian government mandating that all citizens turn over their guns.

Regulations to slow AI might be nearly impossible to lift even if AI alignment gets solved, and then we won't get the glorious transhumanist future.

I do think this is a real concern. Ultimately I believe it's worth the tradeoff. And it does seem unlikely that excessive regulations could stay in place forever—I doubt we'd have the knowledge to develop friendly AI, but not the regulatory freedom, for (say) a thousand years.

(The United States essentially stopped developing nuclear power in the 1990s due to onerous regulations, but it just opened a new plant last year.)

Slow nuanced regulation vs. fast coarse regulation

Some argue that we should advocate for regulation, but push nuanced messaging to make sure we don't hamstring economic development.

This disagreement largely comes down to P(doom) and AI timelines:

  • If P(doom) is low, it's worth accepting some extra risk to figure out how to write careful regulations.
  • If timelines are long, we have plenty of time to figure out regulations.

If delaying regulations increases P(doom) by a lowish number like one percentage point, I don't think it's worth it—economically stifling regulations are not 1% as bad as extinction.

I think it's unlikely that transformative AI comes in the next five years. But it's not unthinkable. Metaculus (1, 2, 3) and surveyed experts don't think it's unthinkable either. And there could be a long delay between pushing for regulations and those regulations being implemented.

Dario Amodei, CEO of Anthropic, believes human-level AI could arrive within 1–2 years. That's not enough time to figure out nuanced regulations. If he's right, we need regulations right now. (Actually, we need regulations ten years ago, but the second best time is now.)

Given that forecasts put a reasonable probability on transformative AI arriving very soon, I don't see how it makes sense to delay regulations any more than we already have.

(I believe a lot of people get this wrong because they're not thinking probabilistically. Someone has (say) a 10% P(doom) and a 10% chance of AGI within five years, and they round that off to "it's not going to happen so we don't need to worry yet." A 10% chance is still really really bad.)

And I'm not sure it's feasible to write the sort of nuanced regulations that some people want. The ideal case is writing regulations that enable beneficial AI while preventing dangerous AI, but at the limiting case, that amounts to "legalize aligned AI and ban unaligned AI". The less we know about AI alignment, the less likely the regulations are to do the right thing. And we know so little that I'm not sure there's any real advantage to adding nuance to regulations.

(SB 1047 serves as a baseline: writing regulation with that level of nuance takes approximately zero marginal time because it's already been done. Pushing to delay regulation only makes sense if you think we need something significantly more nuanced than SB 1047.)

Working with vs. against AI companies

I believe it's better (on the margin) to work against AI companies (credence: 80%). I am not aware of any strong arguments or evidence for one side or the other, but I have a few bits of weak evidence.

(For someone with a larger budget, it might be worthwhile to commission an investigation into the track record of working with vs. against companies on this sort of thing.)

There's a moderately strong argument in favor of cooperating with AI companies on policy:

If AI safety advocates make enemies with AI companies, those companies will get into a political fight with safety advocates, and companies are more powerful so they will probably win.

How much stock you put in that argument depends on how well you think industry-friendly regulations will reduce x-risk. It seems to me that they won't. Good regulations will cause AI companies to make less money. If you advocate for regulations that companies like, then the regulations won't be good.[23] I don't see a middle ground where regulations prevent AI from killing everyone but also don't impede companies' profits. (A superintelligent AI is always going to be more profitable than anything else, unless it kills everyone.)

If we get into a political fight with AI companies, we might lose. But if we concede and let AI companies get the regulations they want, we definitely lose.[24]

Alternatively, you could join an AI company or try to cooperatively influence it from the outside.

The (anecdotal) track record of working with AI companies so far:

  1. People who worked for OpenAI were forced to sign non-disparagement agreements, preventing them from dissenting publicly.
  2. OpenAI claimed it would dedicate 20% of its compute to alignment research. A year later, the heads of the alignment team complained that they never got their promised compute; OpenAI lied and said they meant something different than the thing they obviously meant; a lot of the alignment team quit; then OpenAI gave up on pretending not to have lied and disbanded its alignment team.
  3. Geoffrey Hinton quit Google because, among other reasons, he thought he was too constrained by "self-censorship."
  4. Altman attempted to fire board member Helen Toner for criticizing the unsafeness of OpenAI's models.
  5. Open Philanthropy donated $30 million to OpenAI to buy a board seat. This board seat was probably instrumental in getting Altman fired for (essentially) disregarding safety, but the firing didn't stick and then all the safety-conscious board members got fired.

In each of these cases, working within the company made things definitively worse. The last instance came close to making things better but ultimately made things worse—it made OpenAI $30 million richer without making it safer.

I know of at least one potential counterexample: OpenAI's RLHF was developed by AI safety people who joined OpenAI to promote safety. But it's not clear that RLHF helps with x-risk.[25]

Maybe OpenAI makes it uniquely hard to change the culture from within, and you'd fare better with other companies. I don't think that's true because two of the other big AI players, Meta and Google, are larger and have more inertia and therefore should be harder to change. Only Anthropic seems easier to influence.

(But the founding members of Anthropic are concerned with x-risk and they're still racing to build superintelligent AI as fast as possible while admitting that they have no idea how to make it safe. Influence doesn't seem helpful: AI safety memes are in the positions of greatest possible influence—namely, the brains of the founders—but still aren't making Anthropic safe.)

At the risk of over-updating on random authors who I know nothing about: In 1968, James C. Thomson wrote an article called How Could Vietnam Happen? An Autopsy (h/t Jonas V). He wrote that, essentially, dissenting insiders don't protest because they want to accumulate more influence first. They delay expressing their dissent, always wanting to increase the security of their position, and never get to a point where they actually use their position to do good. Former OpenAI employee Daniel Kokotajlo says he observed this happening at OpenAI.

PhD Comics observes the same phenomenon in academia:

Hat tip to Bryan Caplan, who adds commentary:

The classic story is that tenure protects dissenters. [...]

The flaw with the argument is that academic dissenters remain ultra-rare. Far too rare to justify the enormous downsides of the tenure system. And from a bird’s-eye view, the full effect of tenure on dissent is mixed at best. Remember: To get tenure, a dissenter normally has to spend a decade and a half impersonating a normal academic. If you start the process as a non-conformist, the system almost always either weeds you out or wins you over. By the time you get tenure, a creepy chorus of “One of us! One of us!” is in order.

Finally, "with" vs. "against" isn't necessarily mutually exclusive. There are already many AI safety advocates working inside AI companies. Having a more radical flank on the outside could be a useful complementary strategy. (I'm not confident about this last argument.)

Political diplomacy vs. advocacy

I could say some of the same things about politics that I said about working from inside AI companies: empirically, people get caught in the trap of accumulating social capital and then never actually spend that capital.

Relatedly, some people think you should talk about mundane risks of AI and avoid discussing extinction to not look weird. I have a strong prior toward honesty—telling people what you care about and what your true motivations are, rather than misrepresenting your beliefs (or lying by omission) to make them sound more palatable. And I have a moderate prior against accumulating power—both the good guys and the bad guys want to accumulate power. Honesty is an asymmetric weapon and power is symmetric.

Conflicts that aren't "man vs. man" but nonetheless require an answer

There are some other areas of debate where funding one side doesn't necessarily hurt the other side, but I have a finite amount of money and I need to decide which type of thing to fund.

Pause vs. Responsible Scaling Policy (RSP)

Originally I wrote a long diatribe on why I don't like RSPs and how badly-written AI companies' RSPs are. But after spending some more time reading pro-RSP commentary, I realized my criticisms didn't matter because RSP advocates don't seem to like RSPs much either. The biggest advocates have said things like (paraphrased) "a full pause would be better, but it's not feasible, so an RSP is a reasonable compromise." If I understand correctly, they see an RSP as essentially a worse version of a pause but without the downsides. So the real disagreement is about how big the downsides of a pause are.

As far as I can tell, the main cruxes are:

  1. It's not time to pause yet.
  2. A pause is bad because it would create a hardware overhang.[26]
  3. If the US government mandates a pause, China will keep developing AI.
  4. A pause has negative EV because it delays the glorious transhumanist future.
  5. It's likely that we can trust companies to voluntarily implement good RSPs.

I've already talked about these and why I believe they're all incorrect. If you agree with my earlier arguments, then an unconditional pause makes more sense than an RSP.

I would be more optimistic about a government-enforced RSP than a voluntary RSP, but I believe that's not what people typically mean when they talk about RSPs.

Policy research vs. policy advocacy

Is it better to advocate for regulation/AI policy, or to do policy-relevant research?

I don't really know about this one. I get the sense that policy advocacy is more important, but I don't have much of an argument as to why.

  • The difference between no regulation vs. mediocre regulation is bigger than mediocre vs. good regulation. (credence: 70%)
  • Policy advocacy is more neglected (although they're both pretty neglected). (credence: 90%)
  • It doesn't seem that hard to write legislation to slow AI development. How much more research do we really need?[27] (credence: 50%)

I made some related arguments in slow nuanced regulation vs. fast coarse regulation. If nuanced regulation isn't worth it, then policy research likely isn't worth it because there's not much research to do. (Although you might still do research on things like which advocacy strategies are likely to be most effective.)

Advocacy directed at policy-makers vs. the general public

In favor of focusing on policy-makers:

  1. You get a bigger impact per person convinced, because policy-makers are the ones who actually enact regulations.

In favor of focusing on the public:

  1. A higher proportion of people will be receptive to your message. (And in fact, people are already broadly concerned about AI, so it might be less about convincing and more about motivating.)
  2. Policy-makers' activities are largely downstream of the public—they want to do what their constituents want.

I don't have much of an opinion about which is better—I think it depends on the specifics of the organization that's doing the advocacy. And both sorely need more funding.

Organizations

I'm not qualified to evaluate AI policy organizations[28] so I would like to delegate to an expert grantmaker. Unfortunately, none of the existing grantmakers work for me. Most focus on technical research.[29] Only Founders Pledge has up-to-date recommendations[30] on AI policy, but I didn't realize they existed until I had spent a lot of time looking into organizations on my own, and it turns out I have some significant disagreements with the Founders Pledge recs. (Three of the seven Founders Pledge recs are my three least favorite orgs among the ones I review below.)[31]

So I did it myself. I made a list of every org I could find that works on AI policy.[32] Then I did shallow evaluations of each of them.

Some preamble:

  • As a rule of thumb, I don't want to fund anything Open Philanthropy has funded. Not because it means they don't have room for more funding, but because I believe (credence: 80%) that Open Philanthropy has bad judgment on AI policy (as explained in this comment by Oliver Habryka and reply by Akash—I have similar beliefs, but they explain it better than I do). Open Philanthropy prefers to fund orgs that behave "respectably" and downplay x-risks, and does not want to fund any orgs that work against AI companies.[33] I don't want to fund any org that's potentially making it more difficult to communicate to policy-makers about AI x-risk or helping AI companies accelerate capabilities.
  • In the interest of making my life easier, I stopped investigating an organization as soon as I found a reason not to donate to it, so some of these writeups are missing obvious information.[34]
  • A lot of these orgs have similar names. I use full names for any orgs wherever the abbreviation is potentially ambiguous.[35]
  • There's an unfortunate dynamic where I won't donate to an org if I can't figure out what it's doing. But if an org spends a lot of time writing about its activities, that's time it could be spending on "real" work instead. I have no solution to this.[36]

Important disclaimers

  • When describing orgs' missions and activities, sometimes I quote or paraphrase from their materials without using quotation marks because the text gets messy otherwise. If I do quote without attribution, the source will be one of the links provided in that section.
  • I only spent 1–2 hours looking into each organization, so I could be substantially wrong in many cases.
  • It might have been good practice to share (parts of) this document with the reviewed organizations before publishing,[37] but I didn't do that, mainly because it would take a lot of additional work.[38] The only exception is if I referenced a private comment made by an individual, I asked permission from that individual before publishing it.
  • Potential conflict of interest: I have friends at METR and Palisade.
    • However, I didn't know I had a friend who worked at METR until after I had written the section on METR. I'm not good at keeping track of where my friends work.
  • I'm acquainted with some of the people at CSET, Lightcone, PauseAI, and Sentinel. I might have friends or acquaintances at other orgs as well—like I mentioned, I'm not good at knowing where people work.

AI Policy Institute

AI Policy Institute (mostly) runs public opinion polls on AI risks, some of which are relevant to x-risk. The polls cover some important issues and provide useful information to motivate policy-makers. Some examples:

  • 2.5x more voters support SB 1047 than oppose it. (source)
  • 56% of voters agreed it would be a good thing if AI progress was significantly slowed, vs. 27% disagreed. (source)
  • Voters' top priority on AI regulation is preventing catastrophic outcomes. (source)

This sort of work seems good. I'm not sure how big an impact it has on the margin. My intuition is that polls are good, but additional polls have rapidly diminishing returns, so I wouldn't consider AI Policy Institute a top donation candidate.

I could not find good information about its room for more funding. It did not respond to my inquiry on its funding situation.

AI Safety and Governance Fund

AI Safety and Governance Fund (which, to my knowledge, is a one-man org run by Mikhail Samin) wants to test and spread messages to reduce AI x-risk—see Manifund proposal. It plans to buy ads to test what sorts of messaging are most effective at communicating the arguments for why AI x-risk matters.

I like this project because:

  • Pushing for x-risk-relevant regulation is the most promising sort of intervention right now. But we don't have much data on what sorts of messaging are most effective. This project intends to give us that data.
  • Mikhail Samin, who runs the org, has a good track record of work on AI safety projects (from what I can see).
  • Mikhail has reasonable plans for what to do with this information once he gets it. (He shared his plans with me privately and asked me not to publish them.)
  • The project has room for more funding, but it shouldn't take much money to accomplish its goal.
  • The project received a speculation grant from the Survival and Flourishing Fund (SFF) and is reasonably likely to get more funding, but (1) it might not; (2) even if it does, I think it's useful to diversify the funding base; (3) I generally like SFF grants and I don't mind funging[39] SFF dollars.

AI Standards Lab

AI Standards Lab aims to accelerate the writing of AI safety standards (by standards bodies like ISO and NIST) by writing standards that orgs can adapt.

These standards are rarely directly relevant to x-risk. Improving standards on sub-existential risks may make it easier to regulate x-risks, but I would rather see an org work on x-risk more directly.

AI Standards Lab does not appear to be seeking donations.

Campaign for AI Safety

Campaign for AI Safety used to do public marketing and outreach to promote concern for AI x-risk. In early 2024, it got rolled in to Existential Risk Observatory, and the former organizers of the Campaign for AI Safety now volunteer for Existential Risk Observatory.

Campaign for AI Safety still has a donations page, but as far as I can tell, there is no reason to donate to it rather than to Existential Risk Observatory.

Centre for Enabling EA Learning and Research (CEEALAR)

CEEALAR runs the EA Hotel. Recently, it has focused on supporting people who work on AI safety, including technical research and policy.

Something like the EA Hotel could end up accidentally accelerating AI capabilities, but I'm confident that won't happen because Greg Colbourn, who runs the EA Hotel, is appropriately cautious about AI (he has advocated for a moratorium on AI development).

You could make a case that CEEALAR has a large multiplicative impact by supporting AI safety people. That case seems hard to make well, and in the absence of a strong case, CEEALAR isn't one of my top candidates.

Center for AI Policy

The Center for AI Policy is a 501(c)(4) nonprofit designed to influence US policy to reduce existential and catastrophic risks from advanced AI (source).

They are serious about x-risk and well-aligned with my position:

Our current focus is building "stop button for AI" capacity in the US government.

Unlike some other orgs, it's not bogged down by playing politics. For example, it's willing to call out Sam Altman's bad behavior; and it focuses on conducting advocacy now, rather than amassing influence that can be used later (I'm generally averse to power-seeking).

The org has proposed model legislation that makes some non-trivial policy proposals (see summary pdf and full text pdf. The legislation would:

  • require the customers buying $30,000 advanced AI chips to fill out a one-page registration form;
  • issue permits to the most advanced AI systems based on the quality of their safety testing;
  • define a reasonable set of emergency powers for the government so that they can intervene and shut down an AI system that’s in the process of going rogue.

This is a breath of fresh air compared to most of the policy proposals I've read (none of which I've discussed yet, because I'm writing this list in alphabetical order). Most proposals say things like:

  • make the regulation be good instead of bad;
  • simultaneously promote innovation and safety (there is no such thing as a tradeoff);
  • sternly tell AI companies that they need to not be unsafe, or else we will be very upset.

I'm paraphrasing for humor[40], but I don't think I'm exaggerating—I've read proposals from AI policy orgs that were equivalent to these, but phrased more opaquely. (Like nobody explicitly said "we refuse to acknowledge the existence of tradeoffs", but they did, in fact, refuse to acknowledge the existence of tradeoffs.)

Center for AI Policy has a target budget of $1.6 million for 2025 (source), and its current funding falls considerably short of this goal, so it can make good use of additional money.

Center for AI Safety

Center for AI Safety does safety research and advocates for safety standards. It has a good track record so far:

  • It drafted the original version of SB 1047.
  • Its Statement on AI Risk got signatures from major figures in AI and helped bring AI x-risk into the Overton window.
  • It's done some other work (e.g., writing an AI safety textbook; buying compute for safety researchers) that I like but I don't think is as impactful. The given examples are about supporting alignment research, and as I've said, I'm not as bullish on alignment research.
  • The Center for AI Policy Action Fund does lobbying, which might be good, but I can't find much public information about what it lobbies for. It did support SB 1047, which is good.
  • It's led by Dan Hendrycks. I've read some of his writings and I get the general sense that he's competent.

The Center for AI Safety has some work that I'm very optimistic about (most notably the Statement on AI Risk), but I'm only weakly to moderately optimistic about most of its activities.

It has received $9 million from Open Philanthropy (1, 2) and just under $1 million from the Survival and Flourishing Fund.

I have a good impression of the Center for AI Safety, but it's not one of my top candidates because (1) it's already well-funded and (2) it has done some things I really like, but those are diluted by a lot of things I only moderately like.

Center for Human-Compatible AI

Center for Human-Compatible AI does mostly technical research, and some advocacy. To my knowledge, the advocacy essentially consists of Stuart Russell using his influential position to advocate for regulation. While that's good, I don't think Stuart Russell is personally funding-constrained, so I don't think marginal donations to the org will help advocacy efforts.

Center for Long-Term Resilience

The Center for Long-Term Resilience is a think tank focused on reducing "extreme risks", which includes x-risks but also other things. It talks to policy-makers and writes reports. I'll focus on its reports because those are easier to assess.

About half of the org's work relates to AI risk. Some of the AI publications are relevant to x-risk (1); most are marginally relevant (2, 3) or not relevant (4).

I skimmed a few of its reports. Here I will give commentary on two of its reports, starting with the one I liked better.

I'm reluctant to criticize orgs that I think have good intentions, but I think it's more important to accurately convey my true beliefs. And my true belief is that these reports are not good (credence: 75%).

Transforming risk governance at frontier AI companies was my favorite report that I saw from the Center for Long-Term Resilience.

  • This was the only one of the org's recent reports that looked meaningfully relevant to x-risk.
  • The report correctly identifies some inadequacies with AI companies' risk processes. It proposes some high-level changes that I expect would have a positive impact.
  • That said, I don't think the changes would have a big impact. The proposal would make more sense for dealing with typical risks that most industries see, but it's not (remotely) sufficient to prepare for extinction risks. Indeed, the report proposes using "best practice" risk management. Best practice means standard, which means insufficient for x-risk. (And best practice means well-established, which means well-known, which means the marginal value of proposing it is small.)
  • The report implies that we should rely on voluntary compliance from AI companies. It proposes that companies should use external auditors, but not that those auditors should have any real power.
  • An illustrative quote from the Risk Oversight section: "Although they should not make the final decisions, the specialist risk and assurance [advisors] should play a 'challenger' role, pressure testing the business’s plans and decisions to ensure they are risk-informed." I disagree. Risk advisors should have veto power. The CEO should not have unilateral authority to deploy dangerous models.
  • The report has little in the way of concrete recommendations. Most of the recommendations are non-actionable—for example, "build consensus within business and civil society about the importance of more holistic risk management". Ok, how specifically does one do that?
  • Contrast this with the model legislation from the Center for AI Policy, where the one-page executive summary made proposals that were easier to understand, more concrete, and more relevant to x-risk.

Another example of a report, which I liked less: Response to ‘Establishing a pro-innovation approach to regulating AI’ (a reply to a request for proposals by the UK Office of AI).

  • The report makes four high-level proposals, all of which I dislike:
    1. "Promoting coherence and reducing inefficiencies across the regulatory regime" – Nobody needs to be told to reduce inefficiency. The only reason why any process is inefficient is because people don't know how to make it more efficient. How exactly am I supposed to reduce inefficiency? (This quote comes from the executive summary, where I can forgive some degree of vagueness, but the full report does not provide concrete details.)
    2. "Ensuring existing regulators have sufficient expertise and capacity" – Again, this is an applause light, not a real suggestion. No one thinks regulators should have insufficient expertise or capacity.
    3. "Ensuring that regulatory gaps can be identified and addressed" – More of the same.
    4. "Being sufficiently adaptive to advances in AI capabilities" – More of the same.
  • The report suggests regulating all AI with a single body rather than diffusely. I like this idea—if a regulatory body is going to prevent x-risk, it probably needs to have broad authority. (Except the report also says "we do not necessarily think [the regulator] needs to be a single body", which seems to contradict its earlier recommendation.)
  • The report says "It will become increasingly important to distribute responsibility across the entire supply chain of AI development". I think that's a good idea if it means restricting sales and exports of compute hardware. But it doesn't say that explicitly (in fact it provides no further detail at all), and I don't think policy-makers will interpret it that way.
  • "Recognise that some form of regulation may be needed for general-purpose systems such as foundation models in future." I would have written this as: "Recognize that strict regulation for general-purpose systems is urgently needed." Stop downplaying the severity of the situation.
  • If I were writing this report, I would have included evidence/reasoning on why AI risk (x-risk and catastrophic risk) is a major concern, and what this implies about how to regulate it. The report doesn't include any arguments that could change readers' minds.
  • In conclusion, this report is mostly vacuous. It contains some non-vacuous proposals in the full text (not represented in the executive summary), but the non-vacuous proposals aren't particularly concrete and aren't particularly useful for reducing x-risk.

An alternative interpretation is that the Center for Long-Term Resilience wants to build influence by writing long and serious-looking reports that nobody could reasonably disagree with. As I touched on previously, I'm not optimistic about this strategy. I disapprove of deceptive tactics, and I think it's a bad idea even on naive consequentialist grounds (i.e., it's not going to work as well as writing actionable reports would). And—perhaps more importantly—if the org's reports are low quality, then I can't trust that it does a good job when working with policy-makers.

Center for Security and Emerging Technology (CSET)

Center for Security and Emerging Technology does work on AI policy along with various other topics. It has received $105 million from Open Philanthropy.

I wouldn't donate to CSET because it has so much funding already, but I took a brief look at its publications.

The research appears mostly tangential or unrelated to x-risk, instead covering subjects like cybersecurity, deceptive/undesirable LLM output, and how the US Department of Defense can use AI to bolster its military power—this last report seems harmful on balance. Some of its reports (such as Enabling Principles for AI Governance) have the previously-discussed problem of being mostly vacuous/non-actionable.

CSET also works to put researchers into positions where they can directly influence policy (source).[41] Allegedly, CSET has considerable political influence, but I haven't identified any visible benefits from that influence (contrast with the Center for AI Safety, which wrote SB 1047). The most legible result I can find is that CSET has collaborated with the Department of Defense; without knowing the details, my prior is that collaborating with DOD is net negative. I would prefer the DOD to be less effective, not more. (Maybe CSET is convincing the DOD not to build military AI but I doubt it; CSET's reports suggest the opposite.)

CSET has the same issue as the Center for Long-Term Resilience: if your public outputs are low-quality (or even net harmful), then why should I expect your behind-the-scenes work to be any better?

Centre for Long-Term Policy

Centre for Long-Term Policy operates in Norway and focuses on influencing Norwegian policy on x-risk, longtermism, and global health.

I didn't look into it much because I think Norwegian AI policy is unlikely to matter—superintelligent AI will almost certainly not be developed in Norway, so Norwegian regulation has limited ability to constrain AI development.

From skimming its publications, they mostly cover subjects other than AI x-risk policy.

The Centre for Long-Term Policy received an undisclosed amount of funding from Open Philanthropy in 2024.

Centre for the Governance of AI

Centre for the Governance of AI does alignment research and policy research. It appears to focus primarily on the former, which, as I've discussed, I'm not as optimistic about. (And I don't like policy research as much as policy advocacy.)

Its policy research seems mostly unrelated to x-risk, for example it has multiple reports on AI-driven unemployment (1, 2).

My favorite of its published reports is Lessons from the Development of the Atomic Bomb. It's written by Toby Ord, who doesn't work there.

Centre for the Governance of AI has received $6 million from Open Philanthropy.

The org appears reasonably well-funded. I don't have major complaints about its work, but (1) the work does not look particularly strong and (2) it doesn't cover the focus areas that I'm most optimistic about.

CivAI

CivAI raises awareness about AI dangers by building interactive software to demonstrate AI capabilities, for example AI-powered cybersecurity threats.

This org is new which makes it difficult to evaluate. It appears to have the same theory of change as Palisade Research (which I review below), but I like Palisade better, for three reasons:

  1. None of CivAI's work so far appears relevant to x-risk. For example, its most recent demo focuses on generating fake images for deceptive purposes.
  2. I think Palisade's methods for demonstrating capabilities are more likely to get attention (credence: 65%).
  3. I'm more confident in Palisade's ability to communicate with policy-makers.

CivAI does not appear to be seeking donations. There is no option to donate through the website.

Control AI

Control AI runs advocacy campaigns on AI risk.

Its current campaign proposes slowing AI development such that no one develops superintelligence for at least the next 20 years, then using this time to establish a robust system for AI oversight. The campaign includes a non-vacuous proposal for the organizational structure of a regulatory body.

Control AI has a paper on AI policy that appears reasonable:

  • It acknowledges that voluntary commitments from AI companies are insufficient.
  • It proposes establishing international regulatory body that (1) imposes a global cap on computing power used to train an AI system and (2) mandates safety evaluations.
  • It proposes that regulators should have the authority to halt model deployment on a model that they deem excessively dangerous.

The campaign's proposal is similar. It lays out the most concrete plan I've seen for how to get to a place where we can solve AI alignment.

I listened to a podcast with Andrea Miotti, co-founder of Control AI. He mostly covered standard arguments for caring about AI x-risk, but he also made some insightful comments that changed my thinking a bit.[42]

I like the concept of Control AI's latest campaign, but I don't know how much impact it will have.[43]

Control AI's past campaigns (example) have received media coverage (example) and their policy objectives have been achieved, although it's not clear how much of a causal role Control AI played in achieving those objectives, or what Control AI actually did. Control AI clearly deserves some credit, or else news outlets wouldn't cite it.

Control AI might be as impactful as other advocacy orgs that I like, but I have more uncertainty about it, so it's not a top candidate. It would be fairly easy to change my mind about this.

I couldn't find any information about Control AI's funding situation, and I didn't inquire because it wasn't one of my top candidates.

Existential Risk Observatory

Existential Risk Observatory writes media articles on AI x-risk, does policy research, and publishes policy proposals (see pdf with a summary of proposals).

  • It appears to be having some success bringing public attention to x-risk via mainstream media, including advocating for a pause in TIME (jointly with Joep Meindertsma of PauseAI).
  • Its policy proposals are serious: it proposes implementing an AI pause, tracking frontier AI hardware, and explicitly recognizing extinction risk in regulations.
  • The research mainly focuses on public opinion, for example opinions on AI capabilities/danger (pdf) and message testing on an AI moratorium (pdf).

Existential Risk Observatory is small and funding-constrained, so I expect that donations would be impactful.

My primary concern is that it operates in the Netherlands. Dutch policy is unlikely to have much influence on x-risk—the United States is the most important country by far, followed by China. And a Dutch organization likely has little influence on United States policy. Existential Risk Observatory can still influence public opinion in America (for example via its TIME article), but I expect a US-headquartered org to have a greater impact.

Future of Life Institute (FLI)

FLI has done some good advocacy work like the 6-month pause letter (which probably reduced x-risk). It also has a $400 million endowment, so I don't think it needs any donations from me.

Future Society

The Future Society seeks to align AI through better governance. I reviewed some of its work, and it looks almost entirely irrelevant to x-risk.

Of The Future Society's recent publications, the most concrete is "List of Potential Clauses to Govern the Development of General Purpose AI Systems" (pdf). Some notes on this report:

  • The Future Society collected recommendations from industry staff, independent experts, and engineers from frontier labs. Engineers from frontier labs should not be trusted to produce recommendations, any more than petroleum engineers should be trusted to set climate change policy.
  • The proposals for mitigating harmful behavior are mostly vacuous and in some cases harmful. They largely amount to: keep building dangerous AI, but do a good job of making it safe.
  • "Use the most state-of-the-art editing techniques to erase capabilities and knowledge that are mostly useful for misuse." That's not going to work. (Palisade Research has demonstrated that it's easy to remove safeguards from LLMs.)
  • "Use state-of-the-art methods and tools for ensuring safety and trustworthiness of models, such as mechanistic interpretability." This sentence makes me think the authors don't have a good understanding of AI safety. The state of the art in mechanistic interpretability is nowhere close to being able to ensure the trustworthiness of models. We still have virtually no idea what's going on inside large neural networks.
  • The report proposes using the same industry-standard risk management model that the Center for Long-Term Resilience proposed. The same criticisms apply—this model is obvious enough that you don't need to propose it, and severely insufficient for mitigating extinction risks.
  • The report proposes "air gapping & sandboxing, no internet access" for powerful models. I feel like I shouldn't need to explain why that won't work.

Another report (pdf) submitted in response to the EU AI Act discussed seven challenges of "general-purpose AI". The second challenge is "generalization and capability risks, i.e. capability risks, societal risks and extinction risks". There is no further discussion of extinction risk, and this is the only place that the word "extinction" appears in any of The Future Society's materials. (The word "existential" appears a few times, but existential risks are not discussed.)

Horizon Institute for Public Service

Horizon Institute for Public Service runs a fellowship where it places people into positions in governments and think tanks. It claims to be reasonably successful. (I do not have much of an opinion as to how much credit Horizon Institute deserves for its fellows' accomplishments.)

Horizon Institute has received an undisclosed amount of funding from Open Philanthropy (along with some other big foundations).

Do Horizon fellows care about x-risk, and does their work reduce x-risk in expectation? Politico alleges that the Horizon Institute is a clandestine plot to get governments to care more about x-risk. I'm not a fan of clandestine plots, but that aside, should I expect Horizon fellows to reduce x-risk?

Most of their work is not legible, so I'm skeptical by default. Caring about x-risk is not enough to make me trust you. Some people take totally the wrong lessons from concerns about x-risk (especially AI risk) and end up increasing it instead. Case in point: OpenAI, DeepMind, and Anthropic all had founders who cared about AI x-risk, and two of those (OpenAI + Anthropic) were founded with the explicit mission of preventing extinction. And yet OpenAI is probably the #1 worst thing that has ever happened in terms of increasing x-risk, and DeepMind and Anthropic aren't much better.

I reviewed all the highlighted accomplishments of fellows that looked relevant to AI:

  1. AI risk management standards for NIST. Only marginally relevant to x-risk, but not bad.
  2. An article on how we shouldn't worry about x-risk (!!).
  3. Auditing tools for AI equity. Unrelated to x-risk.
  4. Detecting AI fingerprints. Marginally related to x-risk.
  5. Autonomous cyber defense. Increasing the capabilities of cybersecurity AI is plausibly net negative.[44]
  6. An article on the EU AI Act. Non-vacuous and discusses AI risk (not exactly x-risk, but close). Vaguely hints at slowing AI development.

In my judgment after taking a brief look, 3/6 highlighted writings were perhaps marginally useful for x-risk, 1/6 was irrelevant, and 2/6 were likely harmful. None were clearly useful.

Zvi Mowshowitz wrote:

In my model, one should be deeply skeptical whenever the answer to ‘what would do the most good?’ is ‘get people like me more money and/or access to power.’

I agree, but even beyond that, the Horizon fellows don't seem to be "people like me". They include people who are arguing against caring about x-risk.

I believe the world would be better off if Horizon Institute did not exist (credence: 55%).

And if I'm wrong about that, it still looks like Horizon fellows don't do much work related to x-risk, so the expected value of Horizon Institute is low.

Institute for AI Policy and Strategy

Institute for AI Policy and Strategy does policy research, focused on US AI regulations, compute governance, lab governance, and international governance with China.

I'm more optimistic about advocacy than policy research, so this org is not one of my top candidates. That said, I like it better than most AI policy research orgs. Some observations from briefly reading some of its research:

Institute for AI Policy and Strategy has received just under $4 million from Open Philanthropy (1, 2), and is seeking additional funding.

Lightcone Infrastructure

Lightcone runs LessWrong and an office that Lightcone calls "Bell Labs for longtermism".

Lightcone has a detailed case for impact on Manifund. In short, Lightcone maintains LessWrong, and LessWrong is upstream of a large quantity of AI safety work.

I believe Lightcone has high expected value and it can make good use of marginal donations.

By maintaining LessWrong, Lightcone somewhat improves many AI safety efforts (plus efforts on other beneficial projects that don't relate to AI safety). If I were very uncertain about what sort of work was best, I might donate to Lightcone as a way to provide diffuse benefits across many areas. But since I believe (a specific sort of) policy work has much higher EV than AI safety research, I believe it makes more sense to fund that policy work directly.

An illustration with some made-up numbers: Suppose that

  1. There are 10 categories of AI safety work.
  2. Lightcone makes each of them 20% better.
  3. The average AI safety work produces 1 utility point.
  4. Well-directed AI policy produces 5 utility points.

Then a donation to Lightcone is worth 2 utility points, and my favorite AI policy orgs are worth 5 points. So a donation to Lightcone is better than the average AI safety org, but not as good as good policy orgs.

Machine Intelligence Research Institute (MIRI)

MIRI used to do exclusively technical research. In 2024, it pivoted to focus on policy advocacy—specifically, advocating for shutting down frontier AI development. MIRI changed its mind at around the same time I changed my mind.

Some observations:

  • MIRI gets considerable credit for being the first to recognize the AI alignment problem.
  • I have a high opinion of the general competence of MIRI employees.
  • Historically, I have agreed with MIRI's criticisms of most technical alignment approaches, which suggests they have good reasoning processes. (With the caveat that I don't really understand technical alignment research.)
  • Eliezer Yudkowsky's TIME article publicly argued for AI pause and brought some attention to the issue (both positive and negative). My vibe sense says the article was valuable but who knows.
  • Eliezer personally has a strong track record of influencing (some subset of) people with the LessWrong sequences and Harry Potter and the Methods of Rationality.
  • I know that MIRI is serious about existential risk and isn't going to compromise its values.
  • Eliezer believes animals are not moral patients, which is kind of insane but probably not directly relevant. (Rule thinkers in, not out.)
  • MIRI (or at least Eliezer) says P(doom) > 95%.[45] Some people say this is crazy high and it makes MIRI want to do dumb stuff like shutting down AI. I do think 95% is too high but I think most people are kind of crazy about probability—they treat probabilities less than 50% as essentially 0%. Like if your P(doom) is 40%, you should be doing the same thing that MIRI is doing.[46] You should not be trying to develop AI as fast as possible while funding a little safety research on the side.

MIRI's new communications strategy has produced few results so far. We know that MIRI is working on a new website that explains the case for x-risk; a book; and an online reference. It remains to be seen how useful these will be. They don't seem like obviously good ideas to me,[47] but I expect MIRI will correct course if a strategy isn't working.

Until recently, MIRI was not seeking funding because it received some large cryptocurrency donations in 2021ish. Now it's started fundraising again to pay for its new policy work.

I consider MIRI a top candidate. It only recently pivoted to advocacy so there's not much to retrospectively evaluate, but I expect its work to be impactful.

Manifund

Manifund does not do anything directly related to AI policy. It's a fundraising platform. But I'm including it in this list because I'm impressed by how it's changed the funding landscape.

Many orgs have written fundraising pitches on Manifund. And for whatever reason, some of these pitches are way higher quality than what I'm used to. I'm not sure why—maybe Manifund's prompt questions draw out good answers.

For example, originally I was skeptical that donations to Lightcone Infrastructure could be competitive with top charities, but its Manifund page changed my mind. I donated $200 just as a reward for the excellent writeup.

Many of the orgs on my list (especially the smaller ones) wrote detailed pitches on Manifund that helped me decide where to donate. Manifund deserves part of the credit for that.

Manifund is free to use, but it sometimes asks large donors to give a percentage of their donations to cover its operating costs. Manifund didn't ask me to do that, so I didn't.

Model Evaluation and Threat Research (METR)

METR evaluates large AI models to look for potentially dangerous capabilities. Its most obvious theory of change—where it finds a scary result and then the AI company pauses development—mainly depends on (1) AI companies giving access to METR (which they often don't) and (2) AI companies ceasing model development when METR establishes harmful capabilities (which they probably won't—if there's any ambiguity, they will likely choose the interpretation that lets them keep making more money).

There's an indirect but more promising theory of change where METR demonstrates a template for capability evaluation which policy-makers then rely on to impose safety regulations. To that end, METR has engaged with NIST's AI risk management framework (pdf). This sounds potentially promising but it's not where I would put money on the margin because:

  1. I don't think we should wait to figure out a solid evaluation framework before writing regulations.
  2. Evaluations are helpful if we want to conditionally pause AI in the future, but not relevant if we want to unconditionally pause AI right now, and I believe we should do the latter.

Palisade Research

Palisade builds demonstrations of the offensive capabilities of AI systems, with the goal of illustrating risks to policy-makers.

Some thoughts:

  • Demonstrating capabilities is probably a useful persuasion strategy.
  • Palisade has done some good work, like removing safety fine-tuning from Meta's LLM.
  • I know some of the Palisade employees and I believe they're competent.
  • Historically, Palisade has focused on building out tech demos. I'm not sure how useful this is for x-risk, since you can't demonstrate existentially threatening capabilities until it's too late. Hopefully, Palisade's audience can extrapolate from the demos to see that extinction is a serious concern.
  • Soon, Palisade plans to shift from primarily building demos to primarily using those demos to persuade policy-makers.
  • Palisade has a smallish team and has reasonable room to expand.

Palisade has not been actively fundraising, but I believe it can put funding to good use—it has limited runway and wants to hire more people.

I think the work on building tech demos has rapidly diminishing utility, but Palisade is hiring for more policy-oriented roles, so I believe that's mostly where marginal funding will go.

PauseAI Global

(PauseAI Global and PauseAI US share the same mission and used to be part of the same org, so most of my comments on PauseAI Global also apply to PauseAI US.)

From Manifund:

PauseAI is a grassroots community of volunteers which aim to inform the public and politicians about the risks from superhuman AI and urge them to work towards an international treaty that prevents the most dangerous AI systems from being developed.

PauseAI is largely organised through local communities which take actions to spread awareness such as letter writing workshops, peaceful protests, flyering and giving presentations.

Historically, I've been skeptical of public protests. I think people mainly protest because it's fun and it makes them feel like they're contributing, not because it actually helps.[48] But PauseAI has been appropriately thoughtful (1, 2) about whether and when protests work, and it makes a reasonable case that protesting can be effective.

(See also the Protest Outcomes report by Social Change Lab. The evidence for the effectiveness of protests is a bit stronger than I expected.[49])

I'm skeptical of the evidence because I don't trust sociology research (it has approximately the worst replication record of any field). But I like PauseAI because:

  • Approximately zero percent of AI dollars go to AI safety, but approximately zero percent of AI safety dollars go to public advocacy.
  • Polls suggest that there's widespread public support for pausing AI, and PauseAI has a good shot at converting that public support into policy change.
  • The people running PauseAI seem to have a good idea of what they're doing, and it's apparent that they are seriously concerned about existential risk (for most AI policy orgs, I can't tell whether they care).
  • My impression is that the PauseAI founders went through a similar reasoning process as I did, and concluded that public advocacy was the most promising approach.
  • I've listened to interviews and read articles by leaders from a number of AI policy orgs, and I like the vibes of the PauseAI leaders the best. Many people working in AI safety have missing moods, but the PauseAI people do not. I don't put too much weight on vibes, but they still get nonzero weight.

Broadly speaking, I'm a little more optimistic about advocacy toward policy-makers than advocacy toward the public, simply because it's more targeted. But PauseAI is still a top candidate because its approach is exceptionally neglected.

PauseAI Global has no full-time employees; it focuses on supporting volunteers who run protests.

PauseAI US

PauseAI US organizes protests to advocate for pausing AI development.

Unlike PauseAI Global which has no full-time employees, PauseAI US has a small full-time staff who run protests and political lobbying efforts. I like PauseAI US a little better than PauseAI Global because most major AI companies are headquartered in the US, so I expect a US-based org to have more potential for impact.

PauseAI US also does grassroots lobbying (e.g., organizing volunteers to write letters to Congress) and direct lobbying (talking to policy-makers).

Grassroots lobbying makes sense as a neglected intervention. Direct lobbying isn't quite as neglected but it's still one of my favorite interventions. PauseAI US only has a single lobbyist right now, Felix De Simone. He's more junior than the lobbyists at some other policy orgs, but based on what I know of his background, I expect him to do a competent job.[50] PauseAI US is performing well on obvious surface-level metrics like "number of meetings with Congressional offices per person per month".

Sentinel rapid emergency response team

Sentinel monitors world events for potential precursors to catastrophes. It publishes a weekly newsletter with events of interest (such as "Iran launched a ballistic missile attack on Israel" or "Two people in California have been infected with bird flu").

Sentinel's mission is to alert relevant parties so that looming catastrophes can be averted before they happen.

You can read more information on Sentinel's Manifund page (short) and fundraising memo (long).

Some thoughts:

  • I believe almost nobody would do a good job of running Sentinel because it's hard to identify early warning signals of catastrophes. But Sentinel is run by members of Samotevsky Forecasting, who I expect to be unusually good at this.
  • The value of Sentinel depends on who's paying attention to its reports. I don't know who's paying attention to its reports.
  • Sentinel isn't immediately relevant to AI policy, but it could be extremely valuable in certain situations. Namely, it could provide early warning if AI x-risk rapidly increases due to some series of events.
  • AI x-risk aside, I still think Sentinel has high EV because it potentially significantly reduces catastrophic risk for a small budget. Without having investigated those cause areas, Sentinel is tentatively my #1 donation pick for reducing nuclear and biological x-risks.

Sentinel currently has four team members working part-time. With additional funding, its members could work full-time and it could hire more members and therefore do more comprehensive monitoring.

Simon Institute for Longterm Governance

Simon Institute supports policies to improve coordination, reduce global catastrophic risks, and embed consideration for future generations. It specifically focuses on influencing United Nations policy. (See the Year One Update from 2022.)

Most of the org's work appears not very relevant to x-risk. For example:

This work seems reasonably good, but not as high-impact as work that directly targets x-risk reduction.

Stop AI

Like PauseAI, Stop AI protests the development of superintelligent AI. Unlike PauseAI, Stop AI uses disruptive tactics like blocking entrances to OpenAI offices and blocking traffic.

This is a more high-variance strategy. I find it plausible that Stop AI's tactics are especially effective, but also likely that its tactics will backfire and decrease public support. So in the absence of some degree of supporting evidence, I'm inclined not to support Stop AI.

Stop AI's proposal seems overreaching (it wants to permanently ban AGI development) and it makes weak arguments.

From listening to an interview, I get the impression that the Stop AI founders aren't appropriately outcome-oriented and don't have a well-formulated theory of change. In the interview, they would offer reasoning for why they took a particular action and then when the interviewer would point out how that justification doesn't explain their behavior, they would switch to a different explanation. An example (paraphrased):

Stop AI: We blocked an entrance to OpenAI's office to make it harder for employees to build AGI. We feel that this is necessary to stop OpenAI from killing everyone.

Interviewer: Then why did you block traffic, since that affects innocent bystanders, not the people building AGI?

Stop AI: We need to black traffic to raise awareness.

This pattern occurred a few times. They have reasonable concerns about the dangers of AI, but they don't seem to have a good justification for why disruptive protests are the best way to handle those concerns.

(I can see an argument for blocking entrances to AI company offices, but I think the argument for blocking traffic is much weaker.)

In short, Stop AI is spiritually similar to PauseAI but with worse reasoning, worse public materials, and worse tactics.

Where I'm donating

My top candidates:

  1. AI Safety and Governance Fund
  2. PauseAI US
  3. Center for AI Policy
  4. Palisade
  5. MIRI

A classification of every other org I reviewed:[51]

Good but not funding-constrained: Center for AI Safety, Future of Life Institute

Would fund if I had more money: Control AI, Existential Risk Observatory, Lightcone Infrastructure, PauseAI Global, Sentinel

Would fund if I had a lot more money, but might fund orgs in other cause areas first:[52] AI Policy Institute, CEEALAR, Center for Human-Compatible AI, Manifund

Might fund if I had a lot more money: AI Standards Lab, Centre for the Governance of AI, Centre for Long-Term Policy, CivAI, Institute for AI Policy and Strategy, METR, Simon Institute for Longterm Governance

Would not fund: Center for Long-Term Resilience, Center for Security and Emerging Technology, Future Society, Horizon Institute for Public Service, Stop AI

Prioritization within my top five

Here's why I ordered my top five the way I did.

#1: AI Safety and Governance Fund

This is my top choice because:

  • It could greatly improve the value of future communications efforts.
  • It's cheap, which means it's cost-effective.

It would drop down the list quickly if it received more funding, but right now it's #1.

#2: PauseAI US

I would expect advocacy toward policy-makers to be more impactful than public advocacy if they had similar levels of funding (credence: 60%). But pause protests are extremely neglected, so I believe they're the most promising strategy on the margin. And PauseAI US is my favorite org doing protests because it operates in the United States and it appears appropriately competent and thoughtful.

Protests are especially unpopular among institutional funders, which makes them more promising for individual donors like me.

#3: Center for AI Policy

This is one of only three orgs (along with Palisade and PauseAI US) that meet four criteria:

  1. works to persuade policy-makers
  2. focuses on AI x-risk over other less-important AI safety concerns[53]
  3. focuses on United States policy[54]
  4. is funding-constrained[55]

I'm nearly indifferent between Center for AI Policy and Palisade. I slightly prefer the former because (1) its employees have more experience in politics and (2) its mission/messaging seems less palatable to institutional funders so I expect it to have a harder time raising money.

#4: Palisade

Palisade meets the same four criteria as Center for AI Policy. As a little twist, Palisade also builds tech demos with the purpose of demonstrating the dangers of AI to policy-makers. Those demos might help or they might not be worth the effort—both seem equally likely to me—so this twist doesn't change my expectation of Palisade's cost-effectiveness. I only slightly favor Center for AI Policy for the two reasons mentioned previously.

I personally know people at Palisade, which I think biases me in its favor, and I might put Palisade at #3 if I wasn't putting in mental effort to resist that bias.

#5: MIRI

MIRI plans to target a general audience, not policy-makers (update 2024-11-20: see correction below). That means they can reach more people but it's also lower leverage. My guess is that targeting a general audience is worse on balance.

I put PauseAI US higher than the two lobbying orgs because it has such a small budget. Like PauseAI US, MIRI's strategies are also neglected, but considerably less so.[56] I expect policy-maker outreach to be more effective than MIRI's approach (credence: 60%).

Lest I give the wrong impression, MIRI is still my #5 candidate out of 28 charities. I put it in the top five because I have a high opinion of MIRI leadership—I expect them to have reasonable prioritization and effective execution.

CORRECTION: MIRI's technical governance team does research to inform policy, and MIRI has spoken to policy-makers in the US government. This bumps up my evaluation of the org but I'm keeping it at #5 because working with policy-makers is only one part of MIRI's overall activities.

Where I'm donating (this is the section in which I actually say where I'm donating)

I agree with the standard argument that small donors should give all their money to their #1 favorite charity. That's how I've done it in the past, but this year I'm splitting my donations a little bit. I plan on donating:

  • $5,000 to AI Safety and Governance Fund
  • $5,000 to PauseAI Global[57]
  • $30,000 to PauseAI US

Here's why I'm splitting my donations:

  1. AI Safety and Governance Fund is small, and I don't want to represent too big a portion of its budget.
  2. I donated to PauseAI Global before writing this post, and my prioritization changed somewhat after writing it.[57:1]
  3. That leaves PauseAI US as my top candidate, so the rest of my donations will go there.

I already donated $5,000 to PauseAI Global, but I haven't made the other donations yet, so commenters have a chance to convince me to change my mind.

If you wish to persuade me privately (or otherwise discuss in private), you can email me at donations@mdickens.me or message me on the EA Forum.


  1. At least the highly effective kinds of animal welfare. Things like animal shelters get a lot of funding but they're not highly effective. ↩︎

  2. I believe humans are the most important species, but only because we will shape the future, not because we matter innately more.

    To be precise, I believe any individual human's welfare probably innately matters more than an individual animal of any other species. But there are so many more animals than humans that animals matter much more in aggregate. ↩︎

  3. I did change my mind in one relevant way—I used to think AI policy advocacy was very unlikely to work, and now I think it has a reasonable chance of working. More on this later. ↩︎

  4. So maybe I was being reasonable before and now I'm over-weighting AI risk because I'm worried about getting killed by AI? If I'm being irrational right now, how would I know? ↩︎

  5. I studied computer science in university and I took three AI classes: Intro to AI, Intro to Machine Learning, and Convolutional Neural Networks for Natural Language Processing. My grades were below average but not terrible. ↩︎

  6. Some alignment researchers think we're not that far away from solving AI alignment. I won't go into detail because I don't think I can do a great job of explaining my views. An informed alignment researcher could probably write >100,000 words detailing the progress in various subfields and predicting future progress to predict how close we are to solving alignment—something like this or this, but with more analysis and prediction—and some other informed alignment researcher could do the same thing and come up with a totally different answer.

    Feel free to change the numbers on my quantitative model and write a comment about what answer you got. ↩︎

  7. Like if "just use reinforcement learning to teach the AI to be ethical" turns out to work. (Which I doubt, but some people seem to think it will work so idk.) ↩︎

  8. I don't remember exactly what I used to believe. Maybe something like, "AI policy advocacy could be a good idea someday once AI looks more imminent and there's more political will, but it's not a good idea right now because people will think you're crazy." ↩︎

  9. "Replace Gavin Newsom with a governor with more integrity" might be an effective intervention, but probably not cost-effective—there's already too much money in state elections. ↩︎

  10. I realized midway through writing this post that I had made this major cause prioritization decision without even making up some numbers and slapping them into a Monte Carlo simulation, which was very out of character for me. ↩︎

  11. More accurately, I believe we live in one of two worlds:

    1. Prosaic alignment works, in which case we will probably not have any lack of funding for AI alignment, and my marginal donation has a small chance of making a difference.
    2. Alignment is hard, in which case we will probably not have nearly enough funding (assuming no regulation), and my marginal donation has a small chance of making a difference.

    And really there's a continuum between those, so there's a small chance that AI alignment is at just the right level of difficulty for marginal donations to make a difference. ↩︎

  12. There's a lot of spending on so-called safety research that's really fake safetywashing research. I wouldn't count that as part of my estimate. ↩︎

  13. For example, with the default inputs, the cost to solve alignment is set to a log-normal distribution with 25th/75th percentiles at $1 billion and $1 trillion. If you tighten the distribution to $10 billion to $1 trillion, the marginal value of spending on alignment research drops to ~0. ↩︎

  14. I've met individual researchers who said this and I don't think they were lying, but I think their beliefs were motivated by a desire to build SOTA models because SOTA models are cool. ↩︎

  15. I'm slightly concerned that I shouldn't pick on Anthropic because they're the least unethical of the big AI companies (as far as I can tell). But I think when you're building technology that endangers the lives of every sentient being who lives and who ever will live, you should be held to an extremely high standard of honesty and communication, and Anthropic falls embarrassingly, horrifyingly short of that standard. As they say, reality does not grade on a curve. ↩︎

  16. I would actually go further than that—I think the best types of alignment research don't require SOTA models. But that's more debatable, and it's not required for my argument. ↩︎

  17. Suppose conservatively that the lightcone will be usable for another billion years, and that we need to delay superintelligent AI by 100 years to make it safe. The volume of the lightcone is proportional to time cubed. Therefore, assuming a constant rate of expansion, delaying 100 years means we can only access 99.99997% of the lightcone instead of 100%. Even at an incredibly optimistic P(doom) (say, 0.001%), accelerating AI isn't worth it on a naive longtermist view. ↩︎

  18. I'm making an ad hominem argument on purpose. Altman's arguments seem bad to me, maybe I'm missing something because he understands AI better than I do, but in fact it looks like the better explanation isn't that I'm missing something, but that Altman is genuinely making bad arguments because his reasoning is motivated—and he's a known liar so I'm perfectly happy to infer that he's lying about this issue too. ↩︎

  19. Perhaps I'm using a circular definition of "respectable" but I don't consider someone respectable (on this particular issue) if they estimate P(doom) at <1%. ↩︎

  20. I put roughly 80% credence on each of five arguments. If each argument is independent, that means I'm probably wrong about at least one of them, but I don't think they're independent.

    And some of the arguments are more decisive than others. For example, if I'm wrong about the opportunity cost argument (#4), then I should switch sides. But if I'm wrong about the hardware overhang argument (#1) and overhang is indeed a serious concern, that doesn't necessarily mean we shouldn't slow AI development, it just means a slowdown improves safety in one way and harms safety in another way, and it's not immediately clear which choice is safer. ↩︎

  21. At least for some reasons people cite for not wanting to pause, they should agree with me on this. There are still some counter-arguments, like "we can't delay AI because that delays the glorious transhumanist future", but I consider those to be the weakest arguments. ↩︎

  22. You could argue that pseudoephedrine is over-regulated, and in fact I would agree. But I don't think those regulations are a particularly big problem, either. ↩︎

  23. Companies make more money from more powerful models, and powerful models are more dangerous. Power and safety directly trade off against each other until you can figure out how to build powerful models that aren't dangerous—which means you need to solve alignment first. ↩︎

  24. In some sense, if we get strong regulations, the companies win, because all the companies' employees and shareholders don't get killed by unfriendly AI. But they'll be unhappy in the short term because they irrationally prioritize profit over not getting killed.

    I don't understand what's going on here psychologically—according to the expressed beliefs of people like Dario Amodei and Shane Legg, they're massively endangering their own lives in exchange for profit. It's not even that they disagree with me about key facts, they're just doing things that make no sense according to their own (expressed) beliefs. ↩︎

  25. I'm inclined to say it doesn't matter at all, but some smart alignment researchers think it does, and they know more than me. Changing my mind wouldn't materially change my argument here.

    Paul Christiano, who ~invented RLHF, seems to believe that it is not a real solution to alignment, but improvements on the method might lead to a solution. (He wrote something like this in a comment I read a few days ago that now I can't find.)

    On the other hand, RLHF makes the AI look more aligned even if it isn't, and this might hurt by misleading people into thinking it's aligend, and they proceed with expanding capabilities when really they shouldn't.

    RLHF also makes LLMs less likely to say PR-damaging things. Without RLHF, AI companies might develop LLMs more cautiously out of fear of PR incidents. ↩︎

  26. This doesn't make sense as a crux because an RSP also creates a hardware overhang if it triggers, but I've already talked about why I dislike the "hardware overhang" argument in general. ↩︎

  27. "[thing I don't understand] must be simple, right?" -famous last words ↩︎

  28. At least that's what I thought when I first wrote this sentence, before I had looked into any AI policy orgs. After having looked into them, I found it pretty easy to strike some orgs off the list. I don't know what conversations orgs are having with policy-makers and how productive those conversations are, but I can read their public reports, and I can tell when their public reports aren't good. And if their reports aren't good, they probably don't do a good job of influencing policy-makers either. ↩︎

    • The Long-Term Future Fund (LTFF) has given very little money to AI policy.
    • The AI Risk Mitigation Fund is a spinoff of LTFF that focuses exclusively on AI safety. As of this writing, it hasn't made any grants yet, but I assume it will behave similarly to LTFF.
    • Longview Philanthropy's Emerging Challenges Fund and the Survival and Flourishing Fund have given some grants on AI policy, but mostly on other cause areas.
    • Manifund has a regranting program, but 3 out of 6 regranters are current or former employees at AI companies which makes me disinclined to trust their judgment; and their grants so far mostly focus on alignment research, not policy.
    • Larks used to write reviews of AI safety orgs, but they haven't done it in a while—and they primarily reviewed alignment research, not policy.
    • Nuño Sempere did some shallow investigations three years ago, but they're out of date.
    ↩︎
  29. As of this writing (2024-11-02), the recommendations are:

    1. Horizon Institute for Public Service
    2. Institute for Law and AI
    3. Effective Institutions Project's work on AI governance
    4. FAR AI
    5. Centre for Long-Term Resilience
    6. Center for Security and Emerging Technology
    7. Center for Human-Compatible AI
    ↩︎
  30. I don't know exactly what's going on with the difference between Founders Pledge recs and my top donation candidates. It looks to me like Founders Pledge puts too much stock in the "build influence to use later" theory of change, and it cares too much about orgs' legible status / reputation. ↩︎

  31. Some of these orgs don't exactly work on policy, but do do work that plausibly helps policy. There are some orgs fitting that description that I didn't review (e.g., AI Impacts and Epoch AI). I had to make judgment calls on reviewing plausibly-relevant orgs vs. saving time, and a different reviewer might have made different calls.

    In fact, I think I would be more likely to donate to (e.g.) AI Impacts than (e.g.) METR, so why did I write about METR but not AI Impacts? Mainly because I had already put some thought into METR and I figured I might as well write them down, but I haven't put much thought into AI Impacts. ↩︎

  32. Some Open Philanthropy employees stand to make money if AI companies do well; and Holden Karnofsky (who no longer works at Open Philanthropy, but used to run it) has expressed that he expects us to avert x-risk by an AI company internally solving alignment. ↩︎

  33. It's nice that AI safety is so much better funded than it used to be, but for my own sake, I kind of miss the days when there were only like five x-risk orgs. ↩︎

  34. The names made it difficult for me to edit this post. Many times I would be re-reading a sentence where I referenced some org, and I wouldn't remember which org it was. "Centre for the Governance of AI? Wait, is that the one that runs polls? Or the one that did SB 1047? Er, no, it's the one that spun out of the Future of Humanity Institute."

    Shout-out to Lightcone, Palisade, and Sentinel for having memorable names. ↩︎

  35. But maybe that's a silly thing to worry about. Compare: "I only invest in companies that don't have managers. Managers' salaries just take away money from the employees who do the real work." ↩︎

  36. I'm not convinced that it's good practice, but at least some people believe it is. ↩︎

  37. Also, I'm not sure how much consideration to give this, but I have a vague sense that sharing criticism with the orgs being criticized would hurt my epistemics. Like, maybe if I talk to them, I will become overly predisposed toward politeness and end up deleting accurate criticisms that I should've left in. ↩︎

  38. The word "funge" is not in the dictionary; I would define it as "causing a fungible good [in this case, money] to be used for a different purpose." That is, causing SFF to give some of its money to a different nonprofit. ↩︎

  39. To be honest it's not that funny given the stakes, but I try to find a little humor where I can. ↩︎

  40. The linked source isn't why I believe this claim; I believe it based on things I've heard in personal communications. ↩︎

  41. Examples:

    • Andrea brought up the classic argument that AI becomes really dangerous once it's self-improving. But, he said, it's not clear what exactly counts as self-improving. It's already something like self-improving because many ML engineers use LLMs to help them with work tasks. Andrea proposed that the really dangerous time starts once AI is about as competent as remote workers, because that's when you can massively accelerate the rate of progress. I don't have a strong opinion on whether that's true, but it made me think.
    • Andrea said it's a big problem that we don't have a "science of intelligence". We don't really know what it means for AIs to be smart, all we have is a hodgepodge of benchmarks. We can't properly evaluate AI capabilities unless we have a much better understanding of what intelligence is.
    ↩︎
  42. To be clear: sometimes, when people say "... but I don't know if X", that's a polite way of saying "I believe not-X." In this case, that's not what I mean—what I mean is that I don't know. ↩︎

  43. The report is about defending against cyber-attacks, not about executing cyber-attacks. But it's also about how to increase AI capabilities. (And an AI that's smarter about defending cyber-attacks might also be better at executing them.) I can see a good argument that this work is net negative but there's an argument the other way, too. ↩︎

  44. Several sources claim Eliezer's P(doom) > 95% but their source is a news article and the news article doesn't cite a source. I could not find any direct quote. ↩︎

  45. At least you should have the same goals, if not the same tactics. ↩︎

  46. I would not have thought "write Harry Potter fan fiction" was a good strategy, but I turned out to be wrong on that one. ↩︎

  47. Although to be fair, that's also why most AI "safety" researchers do capabilities research. ↩︎

  48. For example, various studies have looked at natural experiments where protests do or do not occur based on whether it rains, and they find that protesters' positions are slightly more popular when it does not rain. The effect shows up repeatedly across multiple studies of different movements. ↩︎

  49. I don't have meaningful insight into whether any particular person would be good at lobbying. I think I can identify that most people would be bad at it, so the best I can do is fail to find any reasons to expect someone to be bad. I don't see any reasons to expect Felix to be bad, except that he's junior but that's a weak reason. ↩︎

  50. Outside of the top five, I didn't think about these classifications very hard. ↩︎

  51. For example, I would probably fund Good Food Institute ahead of most of these. ↩︎

  52. That screens off most of the policy orgs on my list. ↩︎

  53. That screens off Control AI and Existential Risk Observatory. ↩︎

  54. That screens off Center for AI Safety and Future of Life Institute. ↩︎

  55. Only MIRI is pursuing the sorts of strategies that MIRI is pursuing, but it has >100x more money than PauseAI. ↩︎

  56. When I made the donation to PauseAI Global, I was under the impression that PauseAI Global and PauseAI US were one organization. That was true at one point, but they had split by the time I made this donation. If I had known that, I would have donated to PauseAI US instead. But I'm not bothered by it because I still think donations to PauseAI Global have high expected value.

    Also, when I donated the money, I wasn't planning on writing a whole post. It wasn't until later that I decided to do a proper investigation and write what you're currently reading. That's why I didn't wait before donating. ↩︎ ↩︎

137

5
11
4
2
7

Reactions

5
11
4
2
7

More posts like this

Comments63
Sorted by Click to highlight new comments since:

As a rule of thumb, I don't want to fund anything Open Philanthropy has funded. Not because it means they don't have room for more funding, but because I believe (credence: 80%) that Open Philanthropy has bad judgment on AI policy (as explained in this comment by Oliver Habryka and reply by Akash—I have similar beliefs, but they explain it better than I do).

This seems like an bizarre position to me. Sure, maybe you disagree with them (I personally have a fair amount of respect for the OpenPhil team and their judgement, but whatever, I can see valid reasons to criticise), but to consider their judgement not just irrelevant, but actively such strong negative evidence as to make an org not worth donating to, seems kinda wild. Why do you believe this? Reversed stupidity is not intelligence. Is the implicit model that all of x risk focused AI policy is pushing on some 1D spectrum such that EVERY org in the two camps is actively working against the other camp? That doesn't seem true to me.

I would have a lot more sympathy with an argument that eg other kinds of policy work is comparatively neglected, so OpenPhil funding it is a sign that it's less neglected.

I do have a lot of respect for the Open Phil team I just think they are making some critical mistakes, which is fully compatible with respectability.

The most straightforward reason is that Open Phil seemingly does not want to fund any AI policy org that explicitly prioritizes x-risk reduction, and doesn't want to fund any org that works against AI companies, and I want to fund orgs that do both of those things. So, even putting neglectedness aside, Open Phil funding an AI policy org is evidence that the org is following a strategy that I don't expect to be effective. That said, this consideration ended up not really being a factor in my decision-making because it's screened off by looking at what orgs are actually doing (I don't need to use heuristics for interpreting orgs' activities if I look at their actual activities).

I do have a lot of respect for the Open Phil team I just think they are making some critical mistakes, which is fully compatible with respectability

Sorry, my intention wasn't to imply that you didn't respect them, I agree that it is consistent to both respect and disagree.

Re the rest of your comment, my understanding of what you meant is as follows:

You think the most effective strategies for reducing AI x risk are explicitly black listed by OpenPhil. Therefore OpenPhil funding an org is strong evidence they don't follow those strategies. This doesn't necessarily mean that the orgs work is neutral or negative impact, but it's evidence against being one of your top things. Further, this is a heuristic rather than a confident rule, and you made the time for a shallow investigation into some orgs funded by OpenPhil anyway, at which point heuristics are screened off and can be ignored anyway.

Is this a correct summary?

It's an approximately correct summary except it overstates my confidence. AFAICT Open Phil hasn't explicitly blacklisted any x-risk strategies; and I would take Open Phil funding as weak to moderate evidence, not strong evidence.

Thanks for clarifying! I somewhat disagree with your premises, but agree this is a reasonable position given your premises

"Reversed stupidity is not intelligence" is a surprisingly insightful wee concept that I hadn't heard of before. Had a look at the stuff on LessWrong about it and found it helpful thanks!

I thought it seemed worth flagging that Open Philanthropy recently recommended a grant to Palisade Research. I investigated the grant, and am happy to see that Michael is also excited about their work and included them in his top five.

 

MIRI plans to target a general audience, not policy-makers.

(Writing this while on a flight with bad Wi-Fi, so I’ll keep it brief.)

Just wanted to quickly drop a note to say that we also do work targeted at policymakers, e.g., 

  • we have a new small team at MIRI working on technical governance topics whose work is mostly targeted at folks doing related work and policymakers;
  • I just spent the past week in DC (my third such trip this year) mostly meeting with policymakers; and
  • while our comms work is generally targeted at a more general audience, we usually have policymakers in mind as one (of many) target audiences.

(In general our work more directly targeted at policymakers has been less visible to date. That will definitely continue to be the case for some of it, but I’m hopeful that we’ll have more publicly observable outputs in the future.)

Thank you for the comment! I've added a correction to the post.

Note that we've only received a speculation grant from the SFF and haven’t received any s-process funding. This should be a downward update on the value of our work and an upward update on a marginal donation's value for our work.

I'm waiting for feedback from SFF before actively fundraising elsewhere, but I'd be excited about getting in touch with potential funders and volunteers. Please message me if you want to chat! My email is ms@contact.ms, and you can find me everywhere else or send a DM on EA Forum.

On other organizations, I think:

  • MIRI’s work is very valuable. I’m optimistic about what I know about their comms and policy work. As Malo noted, they work with policymakers, too. Since 2021, I’ve donated over $60k to MIRI. I think they should be the default choice for donations unless they say otherwise.
  • OpenPhil risks increasing polarization and making it impossible to pass meaningful legislation. But while they make IMO obviously bad decisions, not everything they/Dustin fund is bad. E.g., Horizon might place people who actually care about others in places where they could have a huge positive impact on the world. I’m not sure, I would love to see Horizon fellows become more informed on AI x-risk than they currently are, but I’ve donated $2.5k to Horizon Institute for Public Service this year.
  • I’d be excited about the Center for AI Safety getting more funding. SB-1047 was the closest we got to a very good thing, AFAIK, and it was a coin toss on whether it would’ve been signed or not. They seem very competent. I think the occasional potential lack of rigor and other concerns don't outweigh their results. I’ve donated $1k to them this year.
  • By default, I'm excited about the Center for AI Policy. A mistake they plausibly made makes me somewhat uncertain about how experienced they are with DC and whether they are capable of avoiding downside risks, but I think the people who run it are smart and have very reasonable models. I'd be excited about them having as much money as they can spend and hiring more experienced and competent people.
  • PauseAI is likely to be net-negative, especially PauseAI US. I wouldn’t recommend donating to them. Some of what they're doing is exciting (and there are people who would be a good fit to join them and improve their overall impact), but they're incapable of avoiding actions that might, at some point, badly backfire.

    I’ve helped them where I could, but they don’t have good epistemics, and they’re fine with using deception to achieve their goals.

    E.g., at some point, their website represented the view that it’s more likely than not that bad actors would use AI to hack everything, shut down the internet, and cause a societal collapse (but not extinction). If you talk to people with some exposure to cybersecurity and say this sort of thing, they’ll dismiss everything else you say, and it’ll be much harder to make a case for AI x-risk in the future. PauseAI Global’s leadership updated when I had a conversation with them and edited the claims, but I'm not sure they have mechanisms to avoid making confident wrong claims. I haven't seen evidence that PauseAI is capable of presenting their case for AI x-risk competently (though it's been a while since I've looked).

    I think PauseAI US is especially incapable of avoiding actions with downside risks, including deception[1], and donations to them are net-negative. To Michael, I would recommend, at the very least, donating to PauseAI Global instead of PauseAI US; to everyone else, I'd recommend ideally donating somewhere else entirely.

  • Stop AI's views include the idea that a CEV-aligned AGI would be just as bad as an unaligned AGI that causes human extinction. I wouldn't be able to pass their ITT, but yep, people should not donate to Stop AI. The Stop AGI person participated in organizing the protest described in the footnote. 
  1. ^

    In February this year, PauseAI US organized a protest against OpenAI "working with the Pentagon", while OpenAI only collaborated with DARPA on open-source cybersecurity tools and is in talks with the Pentagon about veteran suicide prevention. Most participants wanted to protest OpenAI because of AI x-risk and not because of Pentagon, but those I talked to have said they felt it was deceptive upon discovering the nature of OpenAI's collaboration with the Pentagon. Also, Holly threatened me trying to prevent the publication of a post about this and then publicly lied about our conversations, in a way that can be easily falsified by looking at the messages we've exchanged.

Update: I've received feedback from the SFF round; we got positive evaluations from two recommenders (so my understanding is the funding allocated to us in the s-process was lower than the speculation grant) and one piece of negative feedback. The negative feedback mentioned that our project might lead to EA getting swamped by normies with high inferential distances, which can have negative consequences; and that because of that risk, "This initiative may be worthy of some support, but unfortunately other orgs in this rather impressive lineup must take priority".

If you're considering donating to AIGSI/AISGF, please reach out! My email is ms@contact.ms.

Thanks for the comment! Disagreeing with my proposed donations is the most productive sort of disagreement. I also appreciate hearing your beliefs about a variety of orgs.


A few weeks ago, I read your back-and-forth with Holly Elmore about the "working with the Pentagon" issue. This is what I thought at the time (IIRC):

  • I agree that it's not good to put misleading messages in your protests.
  • I think this particular instance of misleadingness isn't that egregious, it does decrease my expectation of the value of PauseAI US's future protests but not by a huge margin. If this was a recurring pattern, I'd be more concerned.
  • Upon my first reading, it was unclear to me what your actual objection was, so I'm not surprised that Holly also (apparently) misunderstood it. I had to read through twice to understand.
  • Being intentionally deceptive is close to a dealbreaker for me, but it doesn't look to me like Holly was being intentionally deceptive.
  • I thought you both could've handled the exchange better. Holly included misleading messaging in the protest and didn't seem to understand the problem, and you did not communicate clearly and then continued to believe that you had communicated well in spite of contrary evidence. Reading the exchange weakly decreased my evaluation of both your work and PauseAI US's, but not by enough to change my org ranking. You both made the sorts of mistakes that I don't think anyone can avoid 100% of the time. (I have certainly made similar mistakes.) Making a mistake once is evidence that you'll make it more, but not very strong evidence.

I re-read your post and its comments just now and I didn't have any new thoughts. I feel like I still don't have great clarity on the implications of the situation, which troubles me, but by my reading, it's just not as big a deal as you think it is.

General comments:

  • I think PauseAI US is less competent than some hypothetical alternative protest org that wouldn't have made this mistake, but I also think it's more competent than most protest orgs that could exist (or protest orgs in other cause areas).
  • I reviewed PauseAI's other materials, although not deeply or comprehensively, and they seemed good to me. I listened to a podcast with Holly and my impression was that she had an unusually clear picture of the concerns around misaligned AI.
  • I think PauseAI US is less competent than some hypothetical alternative protest org that wouldn't have made this mistake, but I also think it's more competent than most protest orgs that could exist (or protest orgs in other cause areas).

Yes. In a short-timelines, high p(doom), world, we absolutely cannot let perfect be the enemy of the good. Being typical hyper-critical EAs might have lethal consequences[1]. We need many more people in advocacy if we are going to move the needle, so we shouldn't be so discouraging of the people who are actually doing things. We should just accept that they won't get everything right all the time. 

In a short-timelines world, where inaction means very high p(doom), the bar for counterfactual net-negative[2] is actually pretty high. PauseAI is very far from reaching it.

  1. ^

    Or maybe I should say, "might actually be net-negative in and of itself"(!)

  2. ^

    This term is over-used in EA/LW spaces, to the point where I think people often don't actually think through fully what they are actually saying by using it. Is it actually net negative, integrating over all expected future consequences in worlds where it both does and doesn't happen? Or is it just negative?

My top candidates:

  1. AI Safety and Governance Fund
  2. PauseAI US
  3. Center for AI Policy
  4. Palisade
  5. MIRI

A classification of every other org I reviewed:

Good but not funding-constrained: Center for AI Safety, Future of Life Institute

Would fund if I had more money: Control AI, Existential Risk Observatory, Lightcone Infrastructure, PauseAI Global, Sentinel

Would fund if I had a lot more money, but might fund orgs in other cause areas first: AI Policy Institute, CEEALAR, Center for Human-Compatible AI, Manifund

Might fund if I had a lot more money: AI Standards Lab, Centre for the Governance of AI, Centre for Long-Term Policy, CivAI, Institute for AI Policy and Strategy, METR, Simon Institute for Longterm Governance

Would not fund: Center for Long-Term Resilience, Center for Security and Emerging Technology, Future Society, Horizon Institute for Public Service, Stop AI

Your ranking is negatively correlated with my (largely deference-based) beliefs (and I think weakly negatively correlated with my inside view). Your analysis identifies a few issues with orgs-I-support that seem likely true and important if true. So this post will cause me to develop more of an inside view or at least prompt the-people-I-defer-to with some points you raise. Thanks for writing this post. [This is absolutely not an endorsement of the post's conclusions. I have lots of disagreements. I'm just saying parts of it feel quite helpful.]

People who prioritize x-risk often disregard animal welfare (or the welfare of non-human beings, whatever shape those beings might take in the future). ... This isn't universally true—I know some people who care about animals but still prioritize x-risk.

For what it's worth this hasn't been my experience: most of the people I know personally who are working on x-risk (where I know their animal views) think animal welfare is quite important. And for the broader sample where I just know diet the majority are at least vegetarian.

My impression is that CLTR mostly adds value via its private AI policy work. I agree its AI publications seem not super impressive but maybe that's OK.

Probably same for The Future Society and some others.

I appreciate the effort you’ve put into this, and your analysis makes sense based on publicly available data and your worldview. However, many policy organizations are working on initiatives that haven’t been/can't be publicly discussed, which might lead you to make some incorrect conclusions. For example, I'm glad Malo clarified MIRI does indeed work with policymakers in this comment thread.

Tone is difficult to convey online, so I want to clarify I'm saying the next statement gently: I think if you do this kind of report--that a ton of people are reading and taking seriously--you have some responsibility to send your notes to the mentioned organizations for fact checking before you post.

I also want to note: the EA community does not have good intuitions around how politics works or what kind of information is net productive for policy organizations to share. The solution is not to blindly defer to people who say they understand politics, but I am worried that our community norms actively work against us in this space. Consider checking some of your criticisms of policy orgs with a person who has worked for the US Government; getting an insider's perspective on what makes sense/seems suspicious could be useful. 

I think it's reasonable for a donor to decide where to donate based on publicly available data and to share their conclusions with others. Michael disclosed the scope and limitations of his analysis, and referred to other funders having made different decisions. The implied reader of the post is pretty sophisticated and would be expected to know that these funders may have access to information on initiatives that haven’t been/can't be publicly discussed.

While I appreciate why orgs may not want to release public information on all initiatives, the unavoidable consequence of that decision is that small/medium donors are not in a position to consider those initiatives when deciding whether to donate. Moreover, I think Open Phil et al. are capable of adjusting their own donation patterns in consideration of the fact that some orgs' ability to fundraise from the broader EA & AIS communities is impaired by their need for unusually-low-for-EA levels of public transparency.

"Run posts by orgs" is ordinarily a good practice, at least where you are conducting a deep dive into some issue on which one might expect significant information to be disclosed. Here, it seems reasonable to assume that orgs will have made a conscious decision about what general information they want to share with would-be small/medium donors. So there isn't much reason to expect that an inquiry (along with notice that the author is planning to publish on-Forum) would yield material additional information.[1] Against that, the costs of reaching out to ~28 orgs is not insignificant and would be a significant barrier to people authoring this kind of post. The post doesn't seem to rely on significant non-public information, accuse anyone of misconduct, or have other characteristics that would make advance notice and comment particularly valuable. 

Balancing all of that, I think the opportunity for orgs to respond to the post in comments was and is adequate here.

  1. ^

    In contrast, when one is writing a deep dive on a narrower issue, the odds seem considerably higher that the organization has material information that isn't published because of opportunity costs, lack of any reason to think there would be public interest, etc. But I'd expect most orgs' basic fundraising ask to have been at least moderately deliberate.

Here, it seems reasonable to assume that orgs will have made a conscious decision about what general information they want to share with would-be small/medium donors. So there isn't much reason to expect that an inquiry (along with notice that the author is planning to publish on-Forum) would yield material additional information.[1]

This seems quite false to me. Far from "isn't much reason", we already know that such an inquiry would have yielded additional information, because Malo almost definitely would have corrected Michael's material misunderstanding about MIRI's work.

Additionally, my experience of writing similar posts is that there are often many material small facts that small orgs haven't disclosed but would happily explain in an email. Even basic facts like "what publications have you produced this year" would be impossible to determine otherwise. Small orgs just aren't that strategic about what they disclose!

you have some responsibility to send your notes to the mentioned organizations for fact checking before you post

I spent a good amount of time thinking about whether I should do this and I read various arguments for and against it, and I concluded that I don't have that responsibility. There are clear advantages to running posts by orgs, and clear disadvantages, and I decided that the disadvantages outweighted the advantages in this case.

Thanks for being thoughtful about this! Could you clarify what your cost benefit analysis was here? I'm quite curious!

I did it in my head and I haven't tried to put it into words so take this with a grain of salt.

Pros:

  • Orgs get time to correct misconceptions.

(Actually I think that's pretty much the only pro but it's a big pro.)

Cons:

  • It takes a lot longer. I reviewed 28 orgs; it would take me a long time to send 28 emails and communicate with potentially 28 people. (There's a good chance I would have procrastinated on this and not gotten my post out until next year, which means I would have had to make my 2024 donations without publishing this writeup first.)
  • Communicating beforehand would make me overly concerned about being nice to the people I talked to, and might prevent me from saying harsh but true things because I don't want to feel mean.
  • Orgs can still respond to the post after it's published, it's not as if it's impossible for them to respond at all.

Here are some relevant EA Forum/LW posts (the comments are relevant too):

It takes a lot longer. I reviewed 28 orgs; it would take me a long time to send 28 emails and communicate with potentially 28 people. 

This is quite a scalable activity. When I used to do this, I had a spreadsheet to keep track, generated emails from a template, and had very little back and forth - orgs just saw a draft of their section, had a few days to comment, and then I might or might not take their feedback into account.

IIRC didn’t you somewhat frequently remove sections if the org objected because you didn’t have enough time to engage with them? (which I think was reasonably costly)

I remember removing an org entirely because they complained, though in that case they claimed they didn't have enough time to engage with me (rather than the opposite). It's also possible there are other cases I have forgotten. To your point, I have no objections to Michael's "make me overly concerned about being nice" argument which I do think is true.

Cool, I might just be remembering that one instance. 

Thanks for clarifying! Really appreciate you engaging with this. 

Re: It takes a lot longer. It seems like it takes a lot of time for you to monitor the comments on this post and update your top level post in response. The cost of doing that after you post publicly, instead of before, is that people who read your initial post are a lot less likely to read the updated one. So I don't think you save a massive amount of time here, and you increase the chance other people become misinformed about orgs.

Re: Orgs can still respond to the post after it's published. Some orgs aren't posting some information publicly on purpose, but they will tell you things in confidence if you ask privately. If you publicly blast them on one of these topics, they will not publicly respond. I know EAs can be allergic to these kind of dynamics, but politics is qualitatively different than ML research; managing relationships with multiple stakeholders with opposing views is delicate, and there are a bunch of bad actors working against AI safety in DC. You might be surprised by what kind of information is very dangerous for orgs to discuss publicly. 

I'm just curious, have you discussed any of your concerns with somebody who has worked in policy for the US Government? 

I just wanted to say I really liked this post and consider it a model example of reasoning transparency!

Centre for the Governance of AI does alignment research and policy research. It appears to focus primarily on the former, which, as I've discussed, I'm not as optimistic about. (And I don't like policy research as much as policy advocacy.)

I'm confused, the claim here is that GovAI does more technical alignment than policy research?

That's the claim I made, yes. Looking again at GovAI's publications, I'm not sure why I thought that at the time since they do look more like policy research. Perhaps I was taking a strict definition of "policy research" where it only counts if it informs policy in some way I care about.

Right now it looks like my past self was wrong but I'm going to defer to him because he spent more time on it than I'm spending now. I'm not going to spend more time on it because this issue isn't decision-relevant, but there's a reasonable chance I was confused about something when I wrote that.

Your past self is definitely wrong -- GovAI does way more policy work than technical work -- but maybe that's irrelevant since you prioritize advocacy work anyway (and GovAI does little of that).

This is great, thanks so much for writing it.

(I can see an argument for blocking entrances to AI company offices, but I think the argument for blocking traffic is much weaker.)

I think Stop AI have taken this criticism onboard (having encountered it from a number of places). Their plan for the last couple of months has been to keep blocking OpenAI's gates until they have their day in court[1] where they can make a "necessity" case for breaking the law to prevent a (much much) larger harm from occurring (or to prevent OpenAI from recklessly endangering everyone). Winning such a case would be huge.

  1. ^

    They feature heavily in this recent documentary that is well worth a watch.

I don’t have anything to negate the track record you stated, but knowing the model and some of the people, it feels a little harsh on HIPS. I would say that DC is a little unique in ‘people like me should have decision making power’ in the sense that there are people making important decisions right now who are really bad at this (like you mentioned with the great example in California). I am not an expert in this, but that doesn’t feel right.

This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it's not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don't need to be a strong longtermist, you just need to accord significant value to people who aren't born yet.

Here's a counter-argument that relies on the following assumptions:

First, suppose you believe unaligned AIs would still be conscious entities, capable of having meaningful, valuable experiences. This could be because you think unaligned AIs will be very cognitively sophisticated, even if they don't share human preferences.

Second, assume you're a utilitarian who doesn't assign special importance to whether the future is populated by biological humans or digital minds. If both scenarios result in a future full of happy, conscious beings, you’d view them as roughly equivalent. In fact, you might even prefer digital minds if they could exist in vastly larger numbers or had features that enhanced their well-being relative to biological life.

With those assumptions in place, consider the following dilemma:

  1. If AI is developed soon, there’s some probability p that billions of humans will die due to misaligned AI—an obviously bad outcome. However, if these unaligned AIs replace us, they would presumably still go on to create a thriving and valuable civilization from a utilitarian perspective, even though humanity would not be part of that future.

  2. If AI development is delayed by several decades to ensure safety, billions of humans will die in the meantime from old age who could otherwise have been saved by accelerated medical advancements enabled by earlier AI. This, too, is clearly bad. However, humanity would eventually develop AI safely and go on to build a similarly valuable civilization, just after a significant delay.

Given these two options, a utilitarian doesn't have strong reasons to prefer the second approach. While the first scenario carries substantial risks, it does not necessarily endanger the entire long-term future. Instead, the primary harm seems to fall on the current generation: either billions of people die prematurely due to unaligned AI, or they die from preventable causes like aging because of delayed technological progress. In both cases, the far future—whether filled with biological or digital minds—remains intact and flourishing under these assumptions.

In other words, there simply isn't a compelling utilitarian argument for choosing to delay AI in this dilemma.

Do you have any thoughts on the assumptions underlying this dilemma, or its conclusion?

I want to add that I think the argument you present here is better than any of the arguments I'd considered in the relevant section.

I think it's quite unlikely that a misaligned AI would create an AI utopia. Much more likely that it would create something that resembles a paperclip maximizer / something that has either no conscious experience, or experiences with no valance, or experiences with random valences.

What's your credence that humans create a utopia in the alternative? Depending on the strictness of one's definition, I think a future utopia is quite unlikely either way, whether we solve alignment or not.

It seems you expect future unaligned AIs will either be unconscious or will pursue goals that result in few positive conscious experiences being created. I am not convinced of this myself. At the very least, I think such a claim demands justification.

Given the apparent ubiquity of consciousness in the animal kingdom, and the anticipated sophistication of AI cognition, it is difficult for me to imagine a future essentially devoid of conscious life, even if that life is made of silicon and it does not share human preferences.

You should talk to David Pearce. His view of physicalism (phenomenal binding) precludes consciousness in digital minds[1].

  1. ^

    But he also goes further and claims that world-ending ASI is impossible for the reason of it requiring, yet lacking, unitary conscious experience (whereas I think that a "blind idiot god" is perfectly capable of destroying everything we value).

I think the position I'm arguing for is basically the standard position among AI safety advocates so I haven't really scrutinized it. But basically, (many) animals evolved to experience happiness because it was evolutionarily useful to do so. AIs are not evolved so it seems likely that by default, they would not be capable of experiencing happiness. This could be wrong—it might be that happiness is a byproduct of some sort of information processing, and sufficiently complex reinforcement learning agents necessarily experience happiness (or something like that).

Also: According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it's capable of happiness) because its own happiness is not the optimization target.

(Note: I realize I am ignoring some parts of your comment, I'm intentionally only responding to the central point so my response doesn't get too frayed.)

According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it's capable of happiness) because its own happiness is not the optimization target.

I agree that this is the standard story regarding AI risk, but I haven’t seen convincing arguments that support this specific model.

In other words, I see no compelling evidence to believe that future AIs will have exclusively abstract, disconnected goals—like maximizing paperclip production—and that such AIs would fail to generate significant amounts of happiness, either as a byproduct of their goals or as an integral part of achieving them.

(Of course, it’s crucial to avoid wishful thinking. A favorable outcome is by no means guaranteed, and I’m not arguing otherwise. Instead, my point is that the core assumption underpinning this standard narrative seems weakly argued and poorly substantiated.)

The scenario I find most plausible is one in which AIs have a mixture of goals, much like humans. Some of these goals will likely be abstract, while others will be directly tied to the AI’s internal experiences and mental states.

Just as humans care about their own happiness but also care about external reality—such as the impact they have on the world or what happens after they’re dead—I expect that many AIs will place value on both their own mental states and various aspects of external reality.

This ultimately depends on how AIs are constructed and trained, of course. However, as you mentioned, there are some straightforward reasons to anticipate parallels between how goals emerge in animals and how they might arise in AIs. For example, robots and some other types of AIs will likely be trained through reinforcement learning. While RL on computers isn’t identical to the processes by which animals learn, it is similar enough in critical ways to suggest that these parallels could have significant implications.

(I believe a lot of people get this wrong because they're not thinking probabilistically. Someone has (say) a 10% P(doom) and a 10% chance of AGI within five years, and they round that off to "it's not going to happen so we don't need to worry yet." A 10% chance is still really really bad.)

Yes! I've been saying this for a while, but most EAs still seem to be acting as if the median forecast is what is salient. If your regulation/alignment isn't likely to be ready until the median forecasted date for AGI/TAI/ASI, then in half of all worlds you (we) don't make it! When put like that, you can see that what seems like the "moderate" position is anything but - instead it is reckless in the extreme.

This is brilliant. I agree with almost all of it[1] - it's a good articulation of how my own thinking on this has evolved over the last couple of years[2]. My timelines might be shorter, and my p(doom) higher, but it's good to see an exposition for how one need not have such short timelines or high p(doom) to still draw the same conclusions. I recently donated significant amounts to PauseAI Global and PauseAI US. Your $30k to PauseAI US will get them to 5/6 of their current fundraising target - thank you!

  1. ^

    Some points of disagreement, additional information, and emphasis in other comments I made as I read through.

  2. ^

    Actually to be fair, it's more detailed!

We need to develop AI as soon as possible because it will greatly improve people's lives and we're losing out on a huge opportunity cost.

This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it's not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don't need to be a strong longtermist, you just need to accord significant value to people who aren't born yet.

I've heard a related argument that the size of the accessible lightcone is rapidly shrinking, so we need to build AI ASAP even if the risk is high. If you do the math, this argument doesn't make any sense (credence: 95%). The value of the outer edge of the lightcone is extremely small compared to its total volume.[17]

 

Accelerationists seem to not get to this part of Bostrom's Astronomical Waste[1], which is in fact the most salient part [my emphasis in bold]:

III. The Chief Goal for Utilitarians Should Be to Reduce Existential Risk

In light of the above discussion, it may seem as if a utilitarian ought to focus her efforts on accelerating technological development. The payoff from even a very slight success in this endeavor is so enormous that it dwarfs that of almost any other activity. We appear to have a utilitarian argument for the greatest possible urgency of technological development.

However, the true lesson is a different one. If what we are concerned with is (something like) maximizing the expected number of worthwhile lives that we will create, then in addition to the opportunity cost of delayed colonization, we have to take into account the risk of failure to colonize at all. We might fall victim to an existential risk, one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.8 Because the lifespan of galaxies is measured in billions of years, whereas the time-scale of any delays that we could realistically affect would rather be measured in years or decades, the consideration of risk trumps the consideration of opportunity cost. For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.

Therefore, if our actions have even the slightest effect on the probability of eventual colonization, this will outweigh their effect on when colonization takes place. For standard utilitarians, priority number one, two, three and four should consequently be to reduce existential risk. The utilitarian imperative “Maximize expected aggregate utility!” can be simplified to the maxim “Minimize existential risk!”.

  1. ^

    TIL that highlighting a word and pasting (cmd-V) a URL makes it a link.

I know of at least one potential counterexample: OpenAI’s RLHF was developed by AI safety people who joined OpenAI to promote safety. But it’s not clear that RLHF helps with x-risk.

I'd go further and say that it's not actually a counterexample. RLHF allowed OpenAI to be hugely profitable - without it they wouldn't've been able to publicly release their models and get their massive userbase.

I really liked this post and would be extremely happy to see more of those, especially if there is substantial disagreement. 

To do some small pushback on policy orgs that seem to do "vacuous" reports: when I once complained about the same on the side of some anti-safety advocates, someone quoted me this exact passage from Harry Potter and the Order of the Phoenix (vanilla, not the fanfic):

[Umbridge makes an abstract sounding speech]
[Hermione said] 'It explained a lot.'
'Did it?' said Harry in surprise. 'Sounded like a load of waffle to me.'
There was some important stuff hidden in the waffle,' said Hermione grimly.
'Was there?' said Ron blankly.
'How about: "progress for progress's sake must be discouraged"? How about: "pruning
wherever we find practices that ought to be prohibited"?'
'Well, what does that mean?' said Ron impatiently.
'I'll tell you what it means,' said Hermione through gritted teeth. 'It means the Ministry's
interfering at Hogwarts.'

I've discussed with people in "big instances" and they pretty much confirmed my suspicion. Officials who play the "real game" have two channels of communication. There is the "face", often a mandated role that they haven't chosen, meant to represent interests that can (and sometimes do) conflict with their. Then there is the "real person", with its political opinions, and nationality.

The two are articulated together through means of careful word selection and PR management. Except behind closed doors, actors will not give the "person's" reasons for doing something, as this could lead to serious trouble (the media alone is enough to be a threat). They will generate rationalizations in line with the "face" that, in some instances, may suspiciously aline with the "person's" reason, and in some other instances, could serve as dogwhistles. However, their interlocutor is usually aware that they are rationalizations, and will push back with other rationalizations. There is, to some extent, a real person-to-person exchange, and I expect orgs that are good at this game to appear vacuous from the outside.

There are exceptions to this strategy, of course (think Donald Trump, Mr Rogers, or, for a very French example, Elise Lucet). Yet even those exceptions are not naive and take for granted that some degree of hypocrisy is being displayed by the counterpart.

It might be that most communication on X-risk really is happening, it's just happening in Umbridgese. This may be a factor you've already taken in consideration, however.

I considered this argument when writing the post and had several responses although they're a bit spread out (sorry I know it's very long—I even deleted a bunch of arguments while editing to make it shorter). See the second half of this section, this section, and the commentary on some orgs (1, 2, 3).

In short:

  • If an org's public outputs are low-quality (or even net harmful), and its defense is "I'm doing much better things in private where you can't see them, trust me", then why should I trust it? All available evidence indicates otherwise.
  • I generally oppose hiding your true beliefs on quasi-deontological grounds. (I'm a utilitarian but I think moral rules are useful.)
  • Openness appears to work, e.g. the FLI 6-month pause letter got widespread support.
  • It's harder to achieve your goals if you're obfuscating them.

Responding to only one minor point you made, the 6-month pause letter seems like the type of thing you oppose: it's not able to help with the risk, it just does deceptive PR that aligns with the goals of pushing against AI progress, while getting support from those who disagree with their actual goal.

I had remembered that the pause letter talked about extinction. Reading again, it doesn't use the word extinction; it does say "Should we risk loss of control of our civilization?" which is similar but somewhat ambiguous. CAIS' Statement on AI Risk would have been a better example.

Thanks for mentioning Sentinel. Two points:

  • Who is paying attention to reports? Although people reading our minutes is a nice side effect (and a rod for serendipity, etc.), the point is also that we have an emergency response team that could act in the event of an incipient catastrophe.
  • Sentinel is also valuable if the generators for being worried about AI x-risk are right, but the specifics are wrong. In some sense we expect to be surprised.

As an aside, the reports end up being critical in worlds where the rapid response is needed, since they show ongoing attention, and will be looked at in retrospect. But they can also be used more directly to galvanize news coverage on key topics and as evidence by policy orgs. Promoting that avenue for impact seems valuable.

I don't understand what's going on here psychologically—according to the expressed beliefs of people like Dario Amodei and Shane Legg, they're massively endangering their own lives in exchange for profit. It's not even that they disagree with me about key facts, they're just doing things that make no sense according to their own (expressed) beliefs.

Does anyone know what's going on here? Dan Fagella says it's a "Sardanapalus urge", to want to be destroyed by their sand god (not anyone else's), but I suspect it's something more like extreme hubris[1] - irrational overconfidence. This is a very Silicon Valley / entrepreneurial trait. You pretty much have to go against the grain and against all the odds to win really big. But it's one thing with making money, and another with your life (and yet another with everyone else on the planet's lives too!). 

I strongly believe that if Amodei, Altman, Legg and Hassabis were sat round a table with Omega and a 6 shooter with even 1 bullet in the chamber, they wouldn't play a game of actual Russian Roulette with a prize of utopia/the glorious transhumanist future, let alone such a game with a prize of a mere trillion dollars.

  1. ^

    The biggest cases of hubris in the history of the known universe.

You may want to add something like [AI Policy] to the title to clue readers into the main subject matter and whether they'd like to invest the time to click on and read it. There's the AI tag, but that doesn't show up on the frontpage, at least on my mobile.

Good idea, I'll do that!

Edit: I decided I didn't like the way it looked so I reverted it. I do agree there is a tension between wanting a short title and wanting a descriptive title, I feel like short is better in this case.

Thanks, Michael.

I believe unaligned AI is my most likely cause of death, and I'd rather not die.

What do you think is the probability of human extinction over the next 10 years? How about the probability of human population reaching 1 billion or less in the middle of one of the next 10 years (2025 to 2034)? How low would these probabilities (or others you prefer to specify) have to be for you to donate to animal welfare?

I haven't put serious thought into probabilities and I notice how AI experts' probabilities are all over the map, which makes me think nobody has a great model. That said, my off-the-cuff probabilities are something like

  • if humanity doesn't get its act together, 75% chance that AI causes extinction
  • unconditional 30% chance that AI causes extinction
  • when that happens depends on AI timelines, and I defer to Metaculus / AI Impacts / etc. on predicting timelines. maybe 20% chance of superintelligent AI within 10 years

Qualitatively, I think the prosaic alignment folks are too optimistic about alignment being easy so my P(doom) is higher than theirs.

Thanks, Michael. I think your numbers suggest your unconditional probability of human extinction over the next 10 years is 6 % (= 0.3*0.2). Power laws fit to battle deaths per war have a mean tail index of 1.60, such that battle deaths have a probability of 2.51 % (= 0.1^1.60) of becoming 10 times as large. Applying this to your estimate would suggest a probability of a 10 % population drop over the next 10 years of 2.39 (= 0.06/0.0251), i.e. impossibly high. What is your guess for an AI catastrophe killing 10 % of the population in the next 10 years? I suspect it is not that different from your guess for the probability of extinction, whereas this should be much lower according to typical power laws describing catastrophe deaths?

For reference, I think the probability of human extinction over the next 10 years is lower than 10^-6. Somewhat relatedly, fitting a power law to Metaculus' community predictions about small AI catastrophes, I estimated a probability of human extinction before 2100 due to an AI malfunction of 0.004 %. 

Applying this to your estimate would suggest a probability of a 10 % population drop over the next 10 years of 2.39

Tell me if I'm understanding this correctly:

  1. My (rough) numbers suggest a 6% chance that 100% of people die
  2. According to a fitted power law, that implies a 239% chance that 10% of people die

I disagree but I like your model and I think it's a pretty good way of thinking about things.

On my plurality model (i.e. the model to which I assign the plurality of subjective probability), superintelligent AI (SAI) either kills 100% of people or it kills no people. I don't think the outcome of SAI fits a power law.

A power law is typically generated by a combination of exponentials, which might be a good description of battle deaths, but I don't think it's a good description of AI. I think power laws are often a decent fit for combinations of heterogeneous events (such as mass deaths from all causes combined), but maybe not a great fit, so I wouldn't put too much credence in the power law model in this case.

I think it's very unlikely that an AI catastrophe kills 10% of the population in the next 10 years (not 10^-6 unlikely, more like 10^-3 unlikely). I can think of a few ways this could happen (e.g., a country gives an autonomous AI control over its nuclear arsenal and the AI decides to nuke a bunch of cities), but they seem much less likely than an SAI deciding to completely extinguish humanity.

I estimated a probability of human extinction before 2100 due to an AI malfunction of 0.004 %.

Even if you put 99% credence in this model, surely P(extinction) will be dominated by other models? Even within the model, P(extinction) should be higher than that based on uncertainty about the value of the alpha parameter.

Tell me if I'm understanding this correctly:

  1. My (rough) numbers suggest a 6% chance that 100% of people die
  2. According to a fitted power law, that implies a 239% chance that 10% of people die

On 1, yes, and over the next 10 years or so (20 % chance of superintelligent AI over the next 10 years, times 30 % chance of extinction quickly after superintelligent AI)? On 2, yes, for a power law with a tail index of 1.60, which is the mean tail index of the power laws fitted to battle deaths per war here.

I think it's very unlikely that an AI catastrophe kills 10% of the population in the next 10 years (not 10^-6 unlikely, more like 10^-3 unlikely).

I meant to ask about the probability of human population becoming less than (not around) 90 % as large as now over the next 10 years, which has to be higher than the probability of human extinction. Since 10^-3 << 6 %, I guess your probability of a population loss of 10 % or more is just slighly higher than your probability of human extinction.

Even if you put 99% credence in this model, surely P(extinction) will be dominated by other models? Even within the model, P(extinction) should be higher than that based on uncertainty about the value of the alpha parameter.

I think using a power law will tend to overestimate the probability of human extinction, as my sense is that tail distributions usually start to decay faster as severity increases. This is the case for the annual conflict deaths as a fraction of the global population, and arguably annual epidemic/pandemic deaths as a fraction of the global population. The reason is that the tail distribution has to reach 0 for a 100 % population loss, whereas a power law will predict that going from 8 billion to 16 billion deaths is as likely as going from 4 billion to 8 billion deaths.

How do you feel about EA's investing in AI companies with their personal portfolio?

It depends. I think investing in publicly-traded stocks has a smallish effect on helping the underlying company (see Harris (2022), Pricing Investor Impact). I think investing in private companies is probably much worse and should be avoided.

Isn't the more important point about having a conflict of interest with pauseAI efforts?

Yes that's also fair. Conflicts of interest are a serious concern and this might partially explain why big funders generally don't support efforts to pause AI development.

I think it's ok to invest a little bit into public AI companies, but not so much that you'd care if those companies took a hit due to stricter regulations etc.

Curated and popular this week
Relevant opportunities