Summer 2022 Research Intern @ Center for Human-Compatible AI
Working (0-5 years experience)
1781Berkeley, CA, USAJoined Jul 2016



Hello, I'm Michael! I first learned about effective altruism from reading 80,000 Hours back in middle school, sometime after my brother (an avid reader of fanfiction) recommended to me Harry Potter and the Methods of Rationality. I co-founded EA at Georgia Tech in April 2021 and was a Summer 2022 research intern at the Center for Human-Compatible AI.


What do you think are the main reasons behind wanting to deploy your own model instead of training an API? Some reasons I can think of:

For anyone interested, the Center for AI Safety is offering up to $500,000 in prizes for benchmark ideas: SafeBench (mlsafety.org)

Just so I understand, are all four of these quotes arguing against preference utilitarianism?

I'm curious whether the reason why EA may be perceived as a cult while, e.g., environmentalist and social justice activism are not, is primarily that the concerns of EA are much less mainstream.

I appreciate the suggestions on how to make EA less cultish, and I think they are valuable to implement, but I don't think they would have a significant effect on public perception of whether EA is a cult.

I agree, that seems concerning. Ultimately, since the AI developers are designing the AIs, I would guess that they would try to align the AI to be helpful to the users/consumers or to the concerns of the company/government, if they succeed at aligning the AI at all. As for your suggestions "Alignment with whoever bought the AI? Whoever users it most often? Whoever might be most positively or negatively affected by its behavior? Whoever the AI's company's legal team says would impose the highest litigation risk?" – these all seem plausible to me.

On the separate question of handling conflicting interests: there's some work on this (e.g., "Aligning with Heterogeneous Preferences for Kidney Exchange" and "Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning"), though perhaps not as much as we would like.

But I sometimes have a fear in the back of my mind that some of the attendees who are intrigued by these ideas are later going to look up effective altruism, get the impression that the movement’s focus is just about existential risks these days, and feel duped.  Since EA pitches don’t usually start with longtermist ideas, it can feel like a bait and switch.

To avoid the feeling of a bait and switch, I think one solution is to introduce existential risk in the initial pitch. For example, when introducing my student group Effective Altruism at Georgia Tech, I tend to say something like: "Effective Altruism at Georgia Tech is a student group which aims to empower students to pursue careers tackling the world's most pressing problems, such as global poverty, animal welfare, or existential risk from climate change, future pandemics, or advanced AI." It's totally fine to mention existential risk – students still seem pretty interested and happy to sign up for our mailing list.

I think AI alignment isn't really about designing AI to maximize for the preference satisfaction of a certain set of humans. I think an aligned AI would look more like an AI which:

  • is not trying to cause an existential catastrophe or take control of humanity
  • has had undesirable behavior trained out or adversarially filtered
  • learns from human feedback about what behavior is more or less preferable
    • In this case, we would hope the AI would be aligned to the people who are allowed to provide feedback
  • has goals which are corrigible
  • is honest, non-deceptive, and non-power-seeking
Load more