Intellectual Diversity in AI Safety

by KR4 min read22nd Jul 20208 comments

19

AI alignmentExistential risk
Frontpage

There are these undercurrents running through the way I hear people talk about everyone not already inside the AI-safety umbrella that imply they’re not worth talking to until they understand all the basic premises, where basic premises are something like “all of Superintelligence and some of Yudkowsky”. If you talk to these AI safety people, they’re generally willing to acknowledge some version of this pretty explicitly.

No one wants to rehash the same arguments a million times. (“So, like Skynet? Killer robots? Come on, you can just unplug it.”) But if everyone has to be more-or-less on board with some mandatory reading as the price of entry, you’re going to get a more homogeneous field than you otherwise could have gotten.

Why do I think that drawing in a wide variety of viewpoints is important?

The less varied the intellectual pedigree of AI safety is, the more likely it is that everyone is making correlated mistakes.

In my opinion, the landscape of AI’s future is dominated by unknown unknowns. We have not yet even thought of all of the ways it could go, let alone which are more likely or how to deal with them.

In part, I think the homogeneity of people’s background worldviews is an effect of the small number of people that quite recently drew a reasonably large group of people’s attention to the issue, which is only to their credit (otherwise, there might be no conversation to speak of, homogeneous or otherwise). But if you’re trying to do creative work and come up with as many possibilities as you can, you want intellectual diversity in the people who are thinking about the problem. If everyone’s first exposure to AI safety involved foom, for instance, they’re going to be thinking very different thoughts from someone who’s never heard of it. Even if they disagree, it might color their later intuitions.

It seems to me that AI safety has already allowed weak, confused, or just plain incorrect arguments to stand due to insufficient questioning of shared assumptions. Ben Garfinkel argues in On Classic Arguments for AI Discontinuities that classical arguments fail to adequately distinguish between a sudden jump to AGI and one from AGI to superintelligent systems. By arguing for the latter assuming the former, they overestimate the possibility of a catastrophic jump from AGI to superintelligence.

That’s one set of assumptions that someone has put in effort to untangle. I would be very surprised if there weren’t a lot more buried in our fundamental understanding of the issues.

Disadvantages

The obvious counter-argument is that most fields do not work like this and seem to be the better for it. No one’s going to take a biologist seriously if they’re running around quoting Lamark. Deriving your own physics from first principles is the domain of crackpots. In general, discarding the work of previous thinkers wholesale is not often a good idea.

Why do I think it’s worth trying here? AI safety is a pre-paradigmatic science that is much newer than biology and physics. As it stands, it is also much less grounded in testable facts. A lot of intellectual progress in the basic underpinnings seems to be made when someone says “I thought of a way that AI could go, here’s a blog post about why I think so”. If it’s a good, persuasive seeming argument, some people integrate it into their worldviews and consider that as a scenario that needs to be prepared for.

Other downsides:

  • It is harder to talk to someone who doesn’t have your shared concepts.
  • Low rate of interesting outsider opinions to arguments that have well established counter-arguments or are otherwise very obviously flawed.
  • Not being introduced to previous work means that people will spend a lot of time rederiving existing concepts.

Conclusion

I don’t think all the existing arguments are bad, or that we should jettison everything and start over, or anything so dramatic. The current state of knowledge is the work of a lot of very smart people that have created something very valuable. But I do think it would be helpful to aim for a wider variety of viewpoints.

Some possible actions:

  • I’m not sure how you would get people from outside the existing structures on board with the basic program without exposing them to the existing arguments, but it seems like an interesting experiment to try. What do you get when you take some intelligent ML people or evolutionary biologists or economists or philosophers or whatever you think is an interesting background to start thinking through the problem and ask them to do so without priming them with the large number of established concepts already floating around?

Obviously this is kind of dumb as presented, no one does math by teaching people basic algebra and then going “okay, now rederive modern mathematics”, but I suspect there’s a better thought out version of this proposal that might have interesting results.

  • Someone thinking within the existing intellectual framework might benefit from talking through their ideas with someone who they respect who hasn’t engaged with it much.
  • It might be worth people’s time to try to really understand the criticism of well-informed outsiders, and try to see if they disagree in fundamental assumptions.

19

8 comments, sorted by Highlighting new comments since Today at 7:56 PM
New Comment

It seems like lots of active AI safety researchers, even a majority, are aware of Yudkowsky and Bostrom's views but only agree with parts of what they have to say (e.g. Russell, Amodei, Christiano, the teams at DeepMind, OpenAI, etc).

There may still not be enough intellectual diversity, but having the same perspective as Bostrom or Yudkowsky isn't a filter to involvement.

We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.

I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.

I'm curious what your experience was like when you started talking to AI safety people after already coming to come of your own conclusions. Eg I'm curious if you think that you missed major points that the AI safety people had spotted which felt obvious in hindsight, or if you had topics on which you disagreed with the AI safety people and think you turned out right.

I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.

I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.

In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.

My impression is that people like you are pretty rare, but all of this is based off subjective impressions and I could be very wrong.

Have you met a lot of other people who came to AI safety from some background other than the Yudkowsky/Superintelligence cluster?

Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g., https://distill.pub/2019/safety-needs-social-scientists).