ofer

1095Joined Jun 2017

Bio

Send me anonymous feedback: https://docs.google.com/forms/d/1qDWHI0ARJAJMGqhxc9FHgzHyEFp-1xneyl9hxSMzJP0/viewform

Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.


I'm interested in ways to increase the EV of the EA community by mitigating downside risks from EA related activities. Without claiming originality, I think that:

  • Complex cluelessness is a common phenomenon in the domains of anthropogenic x-risks and meta-EA (due to an abundance of crucial considerations). It is often very hard to judge whether a given intervention is net-positive or net-negative.
  • The EA community is made out of humans. Humans' judgement tends to be influenced by biases and self-deception. That is a serious source of risk, considering the previous point.
    • Some potential mitigations involve improving some aspects of how EA funding works, e.g. with respect to conflicts of interest. Please don't interpret my interest in such mitigations as accusations of corruption etc.

Feel free to reach out by sending me a PM. (Update: I've turned off email notifications for private messages. If you send me a time sensitive PM, consider also pinging me about it via the anonymous feedback link above.)

Comments
182

Answer by oferSep 29, 202230

What are the upsides and downsides of doing AI governance research at an AI company, relative to doing it at a non-profit EA organization?

Reasonably determining whether an anthropogenic x-risk related intervention is net-positive or net-negative is often much more difficult[1] than identifying the intervention as potentially high-impact. With less than 2 minutes to think, one can usually do the latter but not the former. People in EA can easily be unconsciously optimizing for impact (which tends to be much easier and aligned with maximizing status & power) while believing they're optimizing for EV. Using the term "impact" to mean "EV" can exacerbate this problem.


  1. Due to an abundance of crucial considerations. ↩︎

ofer19d14-11

I haven’t seen anything that makes me think that someone in EA doesn’t care about the sign of their impact

It's not about people not caring about the sign of their impact (~everyone in EA cares); it's about a tendency to behave in a way that is aligned with maximizing impact (rather than EV).

I’d certainly be interested in any evidence of that

Consider this interview with one of the largest funders in EA (the following is based on the transcript from the linked page):

Rob: "What might be distinctive about your approach that will allow you to find things that all the other groups haven’t already found or are going to find?"

[...]

SBF: But having gotten that out of the way, I think that being really willing to give significant amounts is a real piece of this. Being willing to give 100 million and not needing anything like certainty for that. We’re not in a position where we’re like, “If you want this level of funding, you better effectively have proof that what you’re going to do is great.” We’re happy to give a lot with not that much evidence and not that much conviction — if we think it’s, in expectation, great. Maybe it’s worth doing more research, but maybe it’s just worth going for. I think that is something where it’s a different style, it’s a different brand. And we, I think in general, are pretty comfortable going out on a limb for what seems like the right thing to do.

.

Rob Wiblin: OK, so with that out of the way, what’s a mistake you think at least some nontrivial fraction of people involved in effective altruism are making?

[...]

SBF: Then the last thing is thinking about grantmaking. This is definitely a philosophical difference that we have as a grantmaking organization. And I don’t know that we’re right on it, but I think it’s at least interesting how we think about it. Let’s say we evaluate a grant for 48 seconds. After 48 seconds, we have some probability distribution of how good it’s going to be, and it’s quite good in expected value terms. But we don’t understand it that well; there’s a lot of fundamental questions that we don’t know the answer to that would shift our view on this.

Then we think about it for 33 more seconds, and we’re like, “What might this probability distribution look like after 12 more hours of thinking?” And in 98% of those cases, we would still decide to fund it, but it might look materially different. We might have material concerns if we thought about it more, but we think they probably won’t be big enough that we would decide not to fund it.

Rob Wiblin: Save your time.

SBF: Right. You can spend that time, do that, or you could just say, “Great, you get the grant, because we already know where this is going to end up.” But you say that knowing that there are things you don’t know and could know that might give you reservations, that might turn out to make it a mistake. But from an expected value of impact perspective —

Rob Wiblin: It’s best just to go ahead.

SBF: Yeah, exactly. I think that’s another example of this, where being completely comfortable doing something that in retrospect is a little embarrassing. They’ll go, “Oh geez, you guys funded that. That was obviously dumb.” I’m like, “Yeah, you know, I don’t know.” That’s OK.

[...]

Rob Wiblin: Yeah. It’s so easy to get stuck in that case, where you are just unwilling to do anything that might turn out to be negative.

SBF: Exactly. And a lot of my response in those cases is like, “Look, I hear your concerns. I want you to tell me — in writing, right now — whether you think it is positive or negative expected value to take this action. And if you write down positive, then let’s do it. If you write down negative, then let’s talk about where that calculation’s coming from.” And maybe it will be right, but let’s at least remove the scenario where everyone agrees it’s a positive EV move, but people are concerned about some…

Notably, the FTX Foundation's regranting program "gave over 100 people access to discretionary budget" (and I'm not aware of them using a reasonable mechanism to resolve the obvious unilateralist's curse problem). One of the resulting grants was a $215,000 grant for creating an impact market. They wrote:

This regrant will support the creation of an “impact market.” The hope is to improve charity fundraising by allowing profit-motivated investors to earn returns by investing in charitable projects that are eventually deemed impactful.

A naive impact market is a mechanism that incentivizes people to carry out risky projects—that might turn out to be beneficial—while regarding potential harmful outcomes as if they were neutral. (The certificates of a project that ended up being harmful are worth as much as the certificates of a project that ended up being neutral, namely nothing.)

I strongly agree with this comment, except that I don't think this issue is minor.

IMO, this issue is related to a very troubling phenomenon that EA is seemingly undergoing in the past few years: people in EA tend to sometimes do not think much about their EV, and instead strive to have as much impact as possible. "Impact" is a sign-neutral term ("COVID-19 had a large impact on international travel"). It's very concerning that many people in EA now use it interchangeably with "EV", as if EA interventions in anthropogenic x-risk domains cannot possibly be harmful. One can call this phenomenon "sign neglect".

Having a major EA organization named "EV" (as an acronym for something that is not "expected value") may exacerbate this problem by further decreasing the usage of the term "EV", and making people use sign-neutral language instead.

It's a very important question.

However, it probably doesn't make sense to keep this information to oneself since other people can begin to work on research and mitigation if they are aware of the risk.

I don't think this is always the case. In anthropogenic x-risk domains, it can be very hard to decrease the chance of an existential catastrophe from a certain technology, and very easy to inadvertently increase it (by drawing attention to an info hazard). Even if the researchers (within EA) are very successful, their work can easily be ignored by the relevant actors in the name of competitiveness ("our for-profit public-benefit company takes the risk much more seriously than the competitors, so it's better if we race full speed ahead", "regulating companies in this field would make China get that technology first", etc.).

(See also: The Vulnerable World Hypothesis.)

Hi there!

Your website says:

Encultured AI is a for-profit video game company with a public benefit mission: to develop technologies promoting the long-term survival and flourishing of humanity and other sentient life.

Can you share any information about the board of directors, the investors, and governance mechanisms (if there are any) that aim to cause the company to make good decisions when facing conflicts between its financial goals and EA-aligned goals?

Hi there!

There could be harms to engaging in work around atomically precise manufacturing. For example, if the technology would truly be harmful overall, then speeding up its development through raising interest in the topic could cause harm.

I agree. Was there a meta effort to evaluate whether the potential harms from publishing such an article ("written for an audience broadly unfamiliar with EA") outweigh the potential benefits?

I'm not sure what exactly we disagree on. I think we agree that it's extremely important to appreciate that [humans tend to behave in a way that is aligned with their local incentives] when considering meta interventions related to anthropogenic x-risks and EA.

groups of people aren't single agents

I agree. But that doesn't mean that the level of coordination and the influence of conflicts of interest in EA are not extremely important factors to consider/optimize.

deciding that the goal of a movement should be chosen even if it turns out that it is fundamentally incompatible with human, economic, and other motives leads to horrific things.

Can you explain this point further?

What does Effective Altruism look like if it is successful?

Ideally, a very well-coordinated group that acts like a single, rational wise, EA-aligned agent. (Rather than a poorly coordinated set of individuals who compete for resources and status by unilaterally doing/publishing impressive, risky things related to anthropogenic x-risks, while being subject to severe conflicts of interest).

Load More