Evan R. Murphy
AI Alignment ResearcheratIndependent/Non-profit
Working (6-15 years of experience)

Formerly a software engineer at Google, now I'm doing independent AI alignment research.

Because of my focus on AI alignment, I tend to post more on LessWrong and AI Alignment Forum than I do here.

I'm always happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!

Topic Contributions


Open Thread: Spring 2022

In particular, the analogy with alchemy seems apropos given that concepts like sentience are very ill posed.

I took another look at that section, interesting to learn more about the alchemists.

I think most AI alignment researchers consider 'sentience' to be unimportant for questions of AI existential risk - it doesn't turn out to matter whether or not an AI is conscious or has qualia or anything like that. [1] What matters a lot more is whether AI can model the world and gain advanced capabilities, and AI systems today are making pretty quick progress along both these dimensions. 

What would you say are good places to get up to speed on what we've learned about AI risk and the alignment problem in the past 8 years?

My favorite overview of the general topic is the AGI Safety Fundamentals course from EA Cambridge. I found taking the actual course to be very worthwhile, but they also make the curriculum freely available online. Weeks 1-3 are mostly about AGI risk and link to a lot of great readings on the topic. The weeks after that are mostly about looking at different approaches to solving AI alignment.

As for what has changed specifically in the last 8 years. I probably can't do  the topic justice, but a couple things that jump out at me:

  • The "inner alignment" problem has been identified and articulated. Most of the problems from Bostrom's Superintelligence (2014) fall under the category of what we now call "outer alignment", as the inner alignment problem wasn't really known at that time. Outer alignment isn't solved yet, but substantial work has been done on it. Inner alignment, on the other hand, is something many researchers consider to be more difficult.

    Links on inner alignment: Canonical post on inner alignment, Article explainer,  Video explainer
  • AI has advanced more rapidly than many people anticipated. People used to point to many things that ML models and other computer programs couldn't do yet as evidence that we were a long way from having anything resembling AI. But AI has now passed many of those milestones.

    Here I'll list out some of those previously unsolved problems along with AI advances since 2015 that have solved them: Beating humans at Go (AlphaGo), beating humans at StarCraft (AlphaStar), biological protein folding (AlphaFold), having advanced linguistic/conversational abilities (GPT-3, PaLM), generalizing knowledge to competence in new tasks (XLand), artistic creation (DALL·E 2), multi-modal capabilities like combined language + vision + robotics (SayCan, Socratic Models, Gato).

    Because of these rapid advances, many people have updated their estimates of when transformative AI will arrive to many years sooner than they previously thought. This cuts down on the time we have to solve the alignment problem.


[1]: It matters a lot whether the AI is sentient for moral questions around how we should treat advanced AI. But those are separate questions from AI x-risk.

People in bunkers, "sardines" and why biorisks may be overrated as a global priority

That's a very good point.

With the assumption of longtermist ethics which I mentioned in the post, I think the difference in likelihoods has to be very large to make a difference though. Because placing equal value on future human lives to present ones makes extinction risks astronomically worse than catastrophic non-extinction risks.

(I don't 100% subscribe to longtermist ethics, but that was the frame I was taking for this post.)

Open Thread: Spring 2022

You may have better luck getting responses to this posting on LessWrong with the 'AI' and 'AI Governance' (https://www.lesswrong.com/tag/ai-governance) tags, and/or on the AI Alignment Slack.

I skimmed the article. IMO it looks like a piece from circa 2015 dismissive of AI risk concerns. I don't have time right now to go through each argument, but it looks pretty easily refutable esp. with all that we've continued to learn about AI risk and the alignment problem in the past 8 years.

Was there a particular part from that link you found particularly compelling?

You don’t have to respond to every comment

Agreed. The trend of writing "Epistemic status" as one of the first things in a post without a definition or explanation (kudos to Lizka for including one) has bothered me for some time. It immediately and unnecessarily alienates readers by making them feel like they need to be familiar with the esoteric word "epistemic", which usually has nothing to do with the rest of the post.

Would be happy to see this frequent jargon replaced with something like "How much you should trust me", "Author confidence" or "Post status" (maybe there's a better phrase, just some examples that come to mind).

AGI Ruin: A List of Lethalities

Welcome to the field! Wow, I can imagine this post would be an intense crash course! :-o

There are some people who spend time on these questions. It's not something I've spent a ton of time on, but I think you'll find interesting posts related to this on LessWrong and AI Alignment Forum, e.g. using the value learning tag. Posts discussing 'ambitious value learning' and 'Coherent Extrapolated Volition' should be pretty directly related to your two questions.

Most students who would agree with EA ideas haven't heard of EA yet (results of a large-scale survey)

Really interesting observations.

I would say the conversion rate is actually shockingly low. Maybe CEA has more information on this, but I would be surprised if more than 5% of people who do Introductory EA fellowships make a high impact career change.

Do you have any sense of how many of those people are earning to give or end up making donation to effective causes play a significant role in their lives? I wonder if 5% is at least a little pessimistic for the "retention" of effective altruists if it's not accounting for people who take this path to making an impact.

EA is more than longtermism

I learned a lot from reading this post and some of the top comments, thanks for the useful analysis.

Throughout the post and comments people are tending to classify AI safety as a "longtermist" cause. This isn't wrong, but for anyone less familiar with the topic, I just want to point out that there are many of us who work in the field and consider AI to be a near-to-medium term existential risk.

Just in case "longtermism" gave anyone the wrong impression that AI x-risk is something we definitely won't be confronted with for 100+ years. Many of us think it will be much sooner than that (though there is still considerable uncertainty and disagreement about timelines).

See the related post "Long-Termism" vs. "Existential Risk" by Scott Alexander.

aogara's Shortform

You're right, that paragraph was confusing. I just edited it to try and make it more clear.

Load More
AI Alignment ResearcheratIndependent/Non-profit
Working (6-15 years of experience)