Three Key Issues I’ve Changed My Mind About

Holden Karnofsky

This is a linkpost for https://www.openphilanthropy.org/blog/three-key-issues-ive-changed-my-mind-about

Philanthropy - especially hits-based philanthropy - is driven by a large number of judgment calls. At the Open Philanthropy Project, we’ve explicitly designed our process to put major weight on the views of individual leaders and program officers in decisions about the strategies we pursue, causes we prioritize, and grants we ultimately make. As such, we think it’s helpful for individual staff members to discuss major ways in which our personal thinking has changed, not only about particular causes and grants, but also about our background worldviews.

I recently wrote up a relatively detailed discussion of how my personal thinking has changed about three interrelated topics: (1) the importance of potential risks from advanced artificial intelligence, particularly the value alignment problem; (2) the potential of many of the ideas and people associated with the effective altruism community; (3) the properties to look for when assessing an idea or intervention, and in particular how much weight to put on metrics and “feedback loops” compared to other properties. My views on these subjects have changed fairly dramatically over the past several years, contributing to a significant shift in how we approach them as an organization.

I’ve posted my full writeup as a personal Google doc. A summary follows.

Changing my mind about potential risks from advanced artificial intelligence

I first encountered the idea of potential risks from advanced artificial intelligence - and in particular, the value alignment problem - in 2007. There were aspects of this idea I found intriguing, and aspects I felt didn’t make sense. The most important question, in my mind, was “Why are there no (or few) people with relevant-seeming expertise who seem concerned about the value alignment problem?”

I initially guessed that relevant experts had strong reasons for being unconcerned, and were simply not bothering to engage with people who argued for the importance of the risks in question. I believed that the tool-agent distinction was a strong candidate for such a reason. But as I got to know the AI and machine learning communities better, saw how Superintelligence was received, heard reports from the Future of Life Institute’s safety conference in Puerto Rico, and updated on a variety of other fronts, I changed my view.

I now believe that there simply is no mainstream academic or other field (as of today) that can be considered to be “the locus of relevant expertise” regarding potential risks from advanced AI. These risks involve a combination of technical and social considerations that don’t pertain directly to any recognizable near-term problems in the world, and aren’t naturally relevant to any particular branch of computer science. This is a major update for me: I’ve been very surprised that an issue so potentially important has, to date, commanded so little attention - and that the attention it has received has been significantly (though not exclusively) due to people in the effective altruism community.

More detail on this topic

Changing my mind about the effective altruism (EA) community

Note: This section focuses on the parts of the effective altruist community that I did not initially encounter as people donating to, or spreading the word about, GiveWell and its top charities.

I’ve had a longstanding interest in the effective altruism community. I identify as part of this community, and I share some core values with it (in particular, the goal of doing as much good as possible). However, for a long time, I placed very limited weight on the views of a particular subset of the people I encountered through this community. (This was largely because they seemed to have a tendency toward reaching very unusual conclusions based on seemingly simple logic unaccompanied by deep investigation. I had the impression that they tended to be far more willing than I was to “accept extraordinary claims without extraordinary evidence” in some sense, a topic I’ve written about several times (here, here and here).

A number of things have changed.

Potential risks from advanced AI, discussed above, is one topic I’ve changed my mind about: I previously saw this as a strange preoccupation of the EA community, and now see it as a major case where the community was early to highlight an important issue.
More generally, I’ve seen the outputs from a good amount of cause selection work at the Open Philanthropy Project. I now believe that the preponderance of the causes that I’ve seen the most excitement about in the effective altruism community are outstanding by our criteria of importance, neglectedness and tractability. These causes include farm animal welfare and biosecurity and pandemic preparedness in addition to potential risks from advanced artificial intelligence. They aren’t the only outstanding causes we’ve identified, but overall, I’ve increased my estimate of how well excitement from the effective altruism community predicts what I will find promising after more investigation.
I’ve seen EA-focused organizations make progress on galvanizing interest in effective altruism and growing the community. I’ve seen some effects of this directly, including more attention, donors, and strong employee candidates for GiveWell and the Open Philanthropy Project.
I’ve gotten to know some community members better generally, and my views on some general topics (below) have changed in ways that have somewhat reduced my skepticism of the kinds of ideas effective altruists pursue.

I now feel the EA community contains the closest thing the Open Philanthropy Project has to a natural “peer group” - a set of people who consistently share our basic goal (doing as much good as possible), and therefore have the potential to help with that goal in a wide variety of ways, including both collaboration and critique. I also value other sorts of collaboration and critique, including from people who question the entire premise of doing as much good as possible, and can bring insights and abilities that we lack. But people who share our basic premises have a unique sort of usefulness as both collaborators and critics, and I’ve come to feel that the effective altruism community is the most logical place to find such people.

This isn’t to say I support the effective altruism community unreservedly; I have concerns and objections regarding many ideas associated with it and some of the specific people and organizations within it. But I’ve become more positive compared to my early impressions.

More detail on this topic

Changing my mind about general properties of promising ideas and interventions

Of the topics discussed here, this one is the hardest to trace the evolution of my thinking on, and the hardest to summarize.

I used to think one should be pessimistic about any intervention or idea that doesn’t involve helpful “feedback loops” (trying something, seeing how it goes, making small adjustments, and trying again many times) or useful selective processes (where many people try different ideas and interventions, and the ones that are successful in some tangible way become more prominent, powerful, and imitated). I was highly skeptical of attempts to make predictions and improve the world based primarily on logic and reflection, when unaccompanied by strong feedback loops and selective processes.

I still think these things (feedback loops, selective processes) are very powerful and desirable; that we should be more careful about interventions that don’t involve them; that there is a strong case for preferring charities (such as GiveWell’s top charities) that are relatively stronger in terms of these properties; and that much of the effective altruism community, including the people I’ve been most impressed by, continues to underweight these considerations. However, I have moderated significantly in my view. I now see a reasonable degree of hope for having strong positive impact while lacking these things, particularly when using logical, empirical, and scientific reasoning.

Learning about the history of philanthropy - and learning more about history more broadly - has been a major factor in changing my mind. I’ve come across many cases where a philanthropist, or someone else, seems to have had remarkable prescience and/or impact primarily through reasoning and reflection. Even accounting for survivorship bias, my impression is that these cases are frequent and major enough that it is worth trying to emulate this sort of impact. This change in viewpoint has both influenced and been influenced by the two topics discussed above.

More detail on this topic

Conclusion

Over the last several years, I have become more positive on the cause of potential risks from advanced AI, on the effective altruism community, and on the general prospects for changing the world through relatively speculative, long-term projects grounded largely in intellectual reasoning (sometimes including reasoning that leads to “wacky” ideas) rather than direct feedback mechanisms. These changes in my thinking have been driven by a number of factors, including by each other.

One cross-cutting theme is that I’ve become more interested in arguments with the general profile of “simple, logical argument with no clear flaws; has surprising and unusual implications; produces reflexive dissent and discomfort in many people.” I previously was very suspicious of arguments like this, and expected them not to hold up on investigation. However, I now think that arguments of this form are generally worth paying serious attention to until and unless flaws are uncovered, because they often represent positive innovations.

The changes discussed here have caused me to shift from being a skeptic of supporting work on potential risks from advanced AI and effective altruism organizations to being an advocate, which in turn has been a major factor in the Open Philanthropy Project’s taking on work in these areas. As discussed at the top of this post, I believe that sort of relationship between personal views and institutional priorities is appropriate given the work we’re doing.

I’m not certain that I’ve been correct to change my mind in the ways described here, and I still have a good deal of sympathy for people whose current views are closer to my former ones. But hopefully I have given a sense of where the changes have come from.

More detail is available here:

Some Key Ways in Which I’ve Changed My Mind Over the Last Several Years

Effective Altruism Forum
EA Forum

Three Key Issues I’ve Changed My Mind About

19

Changing my mind about potential risks from advanced artificial intelligence

Changing my mind about the effective altruism (EA) community

Changing my mind about general properties of promising ideas and interventions

Conclusion

19

Reactions

More posts like this