Hide table of contents

I often think about "the road to hell is paved with good intentions".[1] I'm unsure to what degree this is true, but it does seem that people trying to do good have caused more negative consequences in aggregate than one might naively expect.[2] "Power corrupts" and "power-seekers using altruism as an excuse to gain power" are two often cited reasons for this, but I think don't explain all of it.

A more subtle reason is that even when people are genuinely trying to do good, they're not entirely aligned with goodness. Status-seeking is a powerful motivation for almost all humans, including altruists, and we frequently award social status to people for merely trying to do good, before seeing all of the consequences of their actions. This is in some sense inevitable as there are no good alternatives. We often need to award people with social status before all of the consequences play out, both to motivate them to continue to try to do good, and to provide them with influence/power to help them accomplish their goals.

A person who consciously or subconsciously cares a lot about social status will not optimize strictly for doing good, but also for appearing to do good. One way these two motivations diverge is in how to manage risks, especially risks of causing highly negative consequences. Someone who wants to appear to do good would be motivated to hide or downplay such risks, from others and perhaps from themselves, as fully acknowledging such risks would often amount to admitting that they're not doing as much good (on expectation) as they appear to be.

How to mitigate this problem

Individually, altruists (to the extent that they endorse actually doing good) can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.[3]

Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk. These could be risk management officers within organizations, or people empowered to manage risk across the EA community.[4]

Socially, we can reward people/organizations for taking risks seriously, or punish (or withhold rewards from) those who fail to do so. This is tricky because due to information asymmetry, we can easily create "risk management theaters" akin to "security theater" (which come to think of it, is a type of risk management theater). But I think we should at least take notice when someone or some organization fails, in a clear and obvious way, to acknowledge risks or to do good risk management, for example not writing down a list of important risks to be mindful of and keeping it updated, or avoiding/deflecting questions about risk.[5] More optimistically, we can try to develop a culture where people and organizations are monitored and held accountable for managing risks substantively and competently.

  1. ^

    due in part to my family history

  2. ^

    Normally I'd give some examples here, but we can probably all think of some from the recent past.

  3. ^

    I try to do this myself in the comments.

  1. ^

    an idea previously discussed by Ryan Carey and William MacAskill

  2. ^

    However, see this comment.

Show all footnotes

42

0
0
1

Reactions

0
0
1
Comments5


Sorted by Click to highlight new comments since:

My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?

  1. Being wrong or overconfident, distracting people or harming the world with bad ideas.
    1. Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writings.
  2. The idea isn't bad, but some people take it too seriously or too far.
    1. Convey my uncertainties. Monitor subsequent discussions and try to argue against people taking my ideas too seriously or too far.
  3. Causing differential intellectual progress in an undesirable direction, e.g., speeding up AI capabilities relative to AI safety, spreading ideas that are more useful for doing harm than doing good.
    1. Check ideas/topics for this risk. Self-censor ideas or switch research topics if the risk seems high.
  4. Being first to talk about some idea, but not developing/pursuing it as vigorously as someone else might if they were first, thereby causing a net delay in intellectual or social progress.
    1. Not sure what to do about this one. So far not doing anything except to think about it.
  5. PR/political risks, e.g., talking about something that damages my reputation or relationships, and in the worst case harms people/causes/ideas associated with me.
    1. Keep this in mind and talk more diplomatically or self-censor when appropriate.

There's also the unilateralist's curse: suppose someone publishes an essay about a dangerous, viral idea that they misjudge to be net-positive; after 20 other people also thought about it but judged it to be net-negative.

Individually, altruists [...] can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.

Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk.

I’ve been surprised by how this seems to be a bit of a blind spot in our community.[1] I’ve previously written a couple of comments—excerpted below—on this theme, about the state of community building. These garnered a decent number of upvotes, but I don’t think they led to any concrete actions or changes. (For instance, the second comment never received a reply from Open Phil.)

My attempts to raise this concern [about optimizing for numbers/hype at the expense of i) cause prio, ii) addressing particular talent bottlenecks, and iii) mitigating downside risks] with other community builders, including those above me, were mostly dismissed. This worried me. It seemed like the community building machine was not open to the hypothesis that (some of) what it was doing might be ineffective, or, worse, net negative. (More on the latter below.) On top of this, there seemed to be a tricky second-order effect at play: evaporative cooling whereby the community builders who developed concerns like mine exited, only to be replaced by more bullish community builders. The result: a disproportionately bullish community building machine. And there didn't appear to be any countermeasures in place. For example, there was plenty of funding available if one wanted a paid role doing community building. But, in addition to the social disincentive, there was no funding available for evaluating/critiquing the impact of community building—at least, not that I was aware of.

(link)


There was near-consensus that Open Phil should generously fund promising AI safety community/movement-building projects they come across

Would you be able to say a bit about to what extent members of this working group have engaged with the arguments around AI safety movement-building potentially doing more harm than good? For instance, points 6 through 11 of Oli Habryka's second message in the “Shutting Down the Lightcone Offices” post (link). If they have strong counterpoints to such arguments, then I imagine it would be valuable for these to be written up.

(link)

  1. ^

    I mean, if one has a high prior on one’s actions being robustly positive, then it makes sense to continue full steam ahead without worrying about risks. (Because there is a tradeoff: spending time considering risks means spending less time acting.) However, I don’t think this level of confidence is warranted for the vast majority of longtermist interventions. For more, see this comment by Linch.

While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.

Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.

Individually, altruists (to the extent that they endorse actually doing good) can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.

I think this works well when done in private, but asking around among friends is difficult for people who don't have an extensive EA network and risks that they inadvertently only ask around within their filter bubble.

Asking around publicly, e.g., on the Forum, is something that I and probably others too have mostly come to regret. Currently it's still uncommon to try to red-team your own interventions publicly, so when someone does do it, the intervention is not perceived as particularly well red-teamed but as particularly risky.

This could be avoided by making such red-teaming a lot more common, but that is hard. Perhaps a dedicated subforum could help too, one where only people interested in helping with such red-teaming efforts see the posts.

Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
 ·  · 15m read
 · 
In our recent strategy retreat, the GWWC Leadership Team recognised that by spreading our limited resources across too many projects, we are unable to deliver the level of excellence and impact that our mission demands. True to our value of being mission accountable, we've therefore made the difficult but necessary decision to discontinue a total of 10 initiatives. By focusing our energy on fewer, more strategically aligned initiatives, we think we’ll be more likely to ultimately achieve our Big Hairy Audacious Goal of 1 million pledgers donating $3B USD to high-impact charities annually. (See our 2025 strategy.) We’d like to be transparent about the choices we made, both to hold ourselves accountable and so other organisations can take the gaps we leave into account when planning their work. As such, this post aims to: * Inform the broader EA community about changes to projects & highlight opportunities to carry these projects forward * Provide timelines for project transitions * Explain our rationale for discontinuing certain initiatives What’s changing  We've identified 10 initiatives[1] to wind down or transition. These are: * GWWC Canada * Effective Altruism Australia funding partnership * GWWC Groups * Giving Games * Charity Elections * Effective Giving Meta evaluation and grantmaking * The Donor Lottery * Translations * Hosted Funds * New licensing of the GWWC brand  Each of these is detailed in the sections below, with timelines and transition plans where applicable. How this is relevant to you  We still believe in the impact potential of many of these projects. Our decision doesn’t necessarily reflect their lack of value, but rather our need to focus at this juncture of GWWC's development.  Thus, we are actively looking for organisations and individuals interested in taking on some of these projects. If that’s you, please do reach out: see each project's section for specific contact details. Thank you for your continued support as we
 ·  · 11m read
 · 
Our Mission: To build a multidisciplinary field around using technology—especially AI—to improve the lives of nonhumans now and in the future.  Overview Background This hybrid conference had nearly 550 participants and took place March 1-2, 2025 at UC Berkeley. It was organized by AI for Animals for $74k by volunteer core organizers Constance Li, Sankalpa Ghose, and Santeri Tani.  This conference has evolved since 2023: * The 1st conference mainly consisted of philosophers and was a single track lecture/panel. * The 2nd conference put all lectures on one day and followed it with 2 days of interactive unconference sessions happening in parallel and a week of in-person co-working. * This 3rd conference had a week of related satellite events, free shared accommodations for 50+ attendees, 2 days of parallel lectures/panels/unconferences, 80 unique sessions, of which 32 are available on Youtube, Swapcard to enable 1:1 connections, and a Slack community to continue conversations year round. We have been quickly expanding this conference in order to prepare those that are working toward the reduction of nonhuman suffering to adapt to the drastic and rapid changes that AI will bring.  Luckily, it seems like it has been working!  This year, many animal advocacy organizations attended (mostly smaller and younger ones) as well as newly formed groups focused on digital minds and funders who spanned both of these spaces. We also had more diversity of speakers and attendees which included economists, AI researchers, investors, tech companies, journalists, animal welfare researchers, and more. This was done through strategic targeted outreach and a bigger team of volunteers.  Outcomes On our feedback survey, which had 85 total responses (mainly from in-person attendees), people reported an average of 7 new connections (defined as someone they would feel comfortable reaching out to for a favor like reviewing a blog post) and of those new connections, an average of 3