Evan Hubinger (he/him/his)

See: https://www.alignmentforum.org/users/evhub

Wiki Contributions


You can talk to EA Funds before applying

Academic projects are definitely the sort of thing we fund all the time. I don't know if the sort of research you're doing is longtermist-related, but if you have an explanation of why you think your research would be valuable from a longtermist perspective, we'd love to hear it.

You can talk to EA Funds before applying

Since it was brought up to me, I also want to clarify that EA Funds can fund essentially anyone, including:

  • people who have a separate job but want to spend extra time doing an EA project,
  • people who don't have a Bachelor's degree or any other sort of academic credentials,
  • kids who are in high school but are excited about EA and want to do something,
  • fledgling organizations,
  • etc.
You can now apply to EA Funds anytime! (LTFF & EAIF only)

I'm one of the grant evaluators for the LTFF and I don't think I would have any qualms with funding a project 6-12 months in advance.

Concerns with ACE's Recent Behavior

To be clear, I agree with a lot of the points that you're making—the point of sketching out that model was just to show the sort of thing I'm doing; I wasn't actually trying to argue for a specific conclusion. The actual correct strategy for figuring out the right policy here, in my opinion, is to carefully weigh all the different considerations like the ones you're mentioning, which—at the risk of crossing object and meta levels—I suspect to be difficult to do in a low-bandwidth online setting like this.

Maybe it'll still be helpful to just give my take using this conversation as an example. In this situation, I expect that:

  • My models here are complicated enough that I don't expect to be able to convey them here to a point where you'd understand them without a lot of effort.
  • I expect I could properly convey them in a more high-bandwidth conversation (e.g. offline, not text) with you, which I'd be willing to have with you if you wanted.
  • To the extent that we try to do so online, I think there are systematic biases in the format which will lead to beliefs (of at least the readers) being systematically pushed in incorrect directions—as an example, I expect arguments/positions that use simple, universalizing arguments (e.g. Bayesian reasoning says we should do this, therefore we should do it) to lose out to arguments that involve summing up a bunch of pros and cons and then concluding that the result is above or below some threshold (which in my opinion is what most actual true arguments look like).
Concerns with ACE's Recent Behavior

I think you're imagining that I'm doing something much more exotic here than I am. I'm basically just advocating for cooperating on what I see as a prisoner's-dilemma-style game (I'm sure you can also cast it as a stag hunt or make some really complex game-theoretic model to capture all the nuances—I'm not trying to do that there; my point here is just to explain the sort of thing that I'm doing).


A and B can each choose:

  • public) publicly argue against the other
  • private) privately discuss the right thing to do

And they each have utility functions such that

  • A = public; B = private:
    • u_A = 3
    • u_B = 0
    • Why: A is able to argue publicly that A is better than B and therefore gets a bunch of resources, but this costs resources and overall some of their shared values are destroyed due to public argument not directing resources very effectively.
  • A = private; B = public:
    • u_A = 0
    • u_B = 3
    • Why: ditto except the reverse.
  • A = public; B = public:
    • u_A = 1
    • u_B = 1
    • Why: Both A and B argue publicly that they're better than each other, which consumes a bunch of resources and leads to a suboptimal allocation.
  • A = private; B = private:
    • u_A = 2
    • u_B = 2
    • Why: Neither A nor B argue publicly that they're better than each other, not consuming as many resources and allowing for a better overall resource allocation.

Then, I'm saying that in this sort of situation you should play (private) rather than (public)—and that therefore we shouldn't punish people for playing (private), since punishing people for playing (private) has the effect of forcing us to Nash and ensuring that people always play (public), destroying overall welfare.

Concerns with ACE's Recent Behavior

For example would you really not have thought worse of MIRI (Singularity Institute at the time) if it had labeled Holden Karnofsky's public criticism "hostile" and refused to respond to it, citing that its time could be better spent elsewhere?

To be clear, I think that ACE calling the OP “hostile” is a pretty reasonable thing to judge them for. My objection is only to judging them for the part where they don't want to respond any further. So as for the example, I definitely would have thought worse of MIRI if they had labeled Holden's criticisms as “hostile”—but not just for not responding. Perhaps a better example here would be MIRI still not having responded to Paul's arguments for slow takeoff—imo, I think Paul's arguments should update you, but MIRI not having responded shouldn't.

Would you update in a positive direction if an organization does effectively respond to public criticism?

I think you should update on all the object-level information that you have, but not update on the meta-level information coming from an inference like “because they chose not to say something here, that implies they don't have anything good to say.”

Do you update on the existence of the criticism itself, before knowing whether or how the organization has chosen to respond?


Concerns with ACE's Recent Behavior

That's a great point; I agree with that.

Concerns with ACE's Recent Behavior

I disagree, obviously, though I suspect that little will be gained by hashing it out in more here. To be clear, I have certainly thought about this sort of issue in great detail as well.

Concerns with ACE's Recent Behavior

It clearly is actual, boring, normal, bayesian evidence that they don't have a good response. It's not overwhelming evidence, but someone declining to respond sure is screening off the worlds where they had a great low-inferential distance reply that was cheap to shoot off that addressed all the concerns. Of course I am going to update on that.

I think that you need to be quite careful with this sort of naive-CDT-style reasoning. Pre-commitments/norms against updating on certain types of evidence can be quite valuable—it is just not the case that you should always update on all evidence available to you.[1]

  1. To be clear, I don't think you need UDT or anything to handle this sort of situation, you just need CDT + the ability to make pre-commitments. ↩︎

Concerns with ACE's Recent Behavior

To be clear, I think it's perfectly reasonable for you to want ACE to respond if you expect that information to be valuable. The question is what you do when they don't respond. The response in that situation that I'm advocating for is something like “they chose not to respond, so I'll stick with my previous best guess” rather than “they chose not to respond, therefore that says bad things about them, so I'll update negatively.” I think that the latter response is not only corrosive in terms of pushing all discussion into the public sphere even when that makes it much worse, but it also hurts people's ability to feel comfortably holding onto non-public information.

Load More