ofer

Send me anonymous feedback: https://docs.google.com/forms/d/1qDWHI0ARJAJMGqhxc9FHgzHyEFp-1xneyl9hxSMzJP0/viewform

Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.


I'm interested in ways to increase the EV of the EA community by mitigating downside risks from EA related activities (primarily ones that are related to anthropogenic x-risks). I think that:

  • Complex cluelessness is a common phenomenon in the realm of anthropogenic x-risks. It is often very hard to judge whether a high impact intervention is net-positive or net-negative.
  • The EA community is made out of humans. Humans' judgement tends to be influenced by biases and self-deception. That is a serious source of risk, considering the previous point.
    • Some potential mitigations involve improving some aspects of how EA funding works, e.g. with respect to conflicts of interest.
      • Please don't interpret my interest in such mitigations as accusations of corruption etc.

Feel free to reach out by sending me a PM here or on my website.

Topic Contributions

Comments

Some unfun lessons I learned as a junior grantmaker

In general, what do you think of the level of conflict of interests within EA grantmaking?

My best guess, based on public information, is that CoIs within longtermism grantmaking are being handled with less-than-ideal strictness. For example, generally speaking, if a project related to anthropogenic x-risks would not get funding without the vote of a grantmaker who is a close friend of the applicant, it seems better to not fund the project.

(For example, Anthropic raised a big Series A from grantmakers closely related to their president Daniella Amodei’s husband, Holden Karnofsky!)

My understanding is that Anthropic is not a nonprofit and it received funding from investors rather than grantmakers. Though Anthropic can cause CoI issues related to Holden's decision-making about longtermism funding. Holden said in an interview:

Anthropic is a new AI lab, and I am excited about it, but I have to temper that or not mislead people because Daniela, my wife, is the president of Anthropic. And that means that we have equity, and so [...] I’m as conflict-of-interest-y as I can be with this organization.


Do you think COIs pose a significant threat to the EA’s epistemic standards?

I think CoIs can easily influence decision making (in general, not specifically in EA). In the realm of anthropogenic x-risks, judging whether a high-impact intervention is net-positive or net-negative is often very hard due to complex cluelessness. Therefore, CoI-driven biases and self-deception can easily influence decision making and cause harm.

How should grantmakers navigate potential COIs? How should this be publicly communicated?

I think grantmakers should not be placed in a position where they need to decide how to navigate potential CoIs. Rather, the way grantmakers handle CoIs should be dictated by a detailed CoI policy (that should probably be made public).

Some unfun lessons I learned as a junior grantmaker

Thank you for the info!

I understand that you recently replaced Jonas as the head of the EA Funds. In January, Jonas indicated that the EA Funds intends to publish a polished CoI policy. Is there still such an intention?

Some unfun lessons I learned as a junior grantmaker

Hi Linch, thank you for writing this!

I started off with a policy of recusing myself from even small CoIs. But these days, I mostly accord with (what I think is) the equilibrium: a) definite recusal for romantic relationships, b) very likely recusal for employment or housing relationships, c) probable recusal for close friends, d) disclosure but no self-recusal by default for other relationships.

In January, Jonas Vollmer published a beta version of the EA Funds' internal Conflict of Interest policy. Here are some excerpts from it:

Any relationship that could cause significantly biased judgment (or the perception of that) constitutes a potential conflict of interest, e.g. romantic/sexual relationships, close work relationships, close friendships, or living together.

.

The default suggestion is that you recuse yourself from discussing the grant and voting on it.

.

If the above means we can’t evaluate a grant, we will consider forwarding the application to another high-quality grantmaker if possible. If delegating to such a grantmaker is difficult, and this policy would hamper the EA community’s ability to make a good decision, we prefer an evaluation with conflict of interest over none (or one that’s significantly worse). However, the chair and the EA Funds ED should carefully discuss such a case and consider taking additional measures before moving ahead.

Is this consistent with the current CoI policy of the EA Funds?

Optimizing Public Goods Funding with blockchain tech and clever incentive design (RETROX)

Suppose Alice is working on a dangerous project that involves engineering a virus for the purpose of developing new vaccines. Fortunately, the dangerous stage of the project is completed successfully (the new virus is exterminated before it has a chance to leak), and now we have new vaccines that are extremely beneficial. At this point, observing that the project had a huge positive impact, will Retrox retroactively fund the project?

We Ran an AI Timelines Retreat

We aimed for participants to form evidence-based views on questions such as:

[...]

  • What are the most probable ways AGI could be developed?

A smart & novel answer to this question can be an information hazard, so I'd recommend consulting with relevant people before raising it in a retreat.

Optimizing Public Goods Funding with blockchain tech and clever incentive design (RETROX)

Suppose Alice is working on a risky project that has a 50% chance of ending up being extremely beneficial and 50% chance of ending up being extremely harmful. If the project ends up being extremely beneficial, will Retrox allow Alice to make a lot of money from her project?

The biggest risk of free-spending EA is not optics or motivated cognition, but grift

Grifters are optimizing only to get themselves money and power; EAs are optimizing for improving the world.

I think it is not so binary in reality. It's likely that almost no one thinks about themselves as a grifter; and almost everyone in EA is at least somewhat biased towards actions that will cause them to have more money and power (on account of being human). So, while I think this post points at an extremely important problem, I wouldn't use the grifters vs. EAs dichotomy.

When is AI safety research harmful?

Option value considerations dictate that we continue doing AI safety research even if we’re unsure of its value because it’s much easier to stop a research program than to start one.

I think the opposite is often true. Once there are people who get compensated for doing X it can be very hard to stop X. (Especially if it's harder for impartial people, who are not experts-in-X, to evaluate X.)

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Thanks, you're right. There's this long thread, but I'll try to explain the issues here more concisely. I think the theorems have the following limitations that were not reasonably explained in the paper (and some accompanying posts):

  1. The theorems are generally not applicable for stochastic environments (despite the paper and some related posts suggesting otherwise).
  2. The theorems may not be applicable if there are cycles in the state graph of the MDP (other than self-loops in terminal states); for example:
    • The theorems are not applicable in states from which a reversible action can be taken.
    • The theorems are not applicable in states from which only one action (that is not POWER-seeking) allows to reach a cycle of a given length.

I'm not arguing that the theorems don't prove anything useful. I'm arguing that it's very hard for the readers of the paper (and some accompanying posts) to understand what the theorems actually prove. Readers need to understand about 20 formal definitions that build on each other to understand the theorems. I also argue that the lack of explanations about what the theorems actually prove, and some of the informal claims that were made about the theorems, are not reasonable (and cause the theorems to appear more impressive). Here's an example for such an informal claim (taken from this post):

Not all environments have the right symmetries

  • But most ones we think about seem to
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Hey there!

And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I'd encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there's a strong reason for a system optimizing those objectives to get to the states that give them many more options.

After spending a lot of time on understanding that work, my impression is that the main theorems in the paper are very complicated and are limited in ways that were not reasonably explained. (To the point that, probably, very few people understand the main theorems and what environments they are applicable for, even though the work has been highly praised within the AI alignment community).

Load More