Daniel Dewey: The Open Philanthropy Project's work on potential risks from advanced AI

EA Global

This is a linkpost for https://www.youtube.com/watch?v=Nfrh4K3d_Z0&list=PLwp9xeoX5p8NnWYsybl_ZRMtaK7uBr4sN&index=14&t=18s

In this 2017 talk, Daniel Dewey presents Open Philanthropy Project's work and thinking on advanced artifical intelligence. He also gives an overview over the field, distinguishing between strategic risks - related to how influential actors will react to the rise of advanced AI systems - and misalignment risks - related to whether AI systems will reliably do what we want them to do.

The transcript below is lightly edited for readability.

The Talk

I'm the program officer at the Open Philanthropy Project in charge of potential risks from advanced AI. This is an area we're spending a lot of our senior staff time on recently, so I wanted to give an update on the work that we're doing in this area, how we think about it, and what our plans are going forward.

So, there are four basic concepts that I want to really make sure to drive home during the course of this talk, and if you watch out for these, I think they'll help you understand how we're thinking about this area.

I think there are a lot of different ways to frame potential risks from advanced AI that can inform different kinds of approaches and interventions and activities. And it can be a bit hard to understand why we're doing the things we're doing without understanding the way we're thinking about them. Also, I should mention, I didn't really frame this talk up as the perfect introduction to this area if you're not already somewhat familiar.

These are the four basic concepts:

Transformative AI, which is how we think broadly about the impacts that AI could have in the future that we care most about affecting our activities;
Strategic risks, having to do with how the most influential actors in the world will react to the prospect of transformative AI;
Misalignment risks, which have to do with being able to build AI systems that reliably do what their operators want them to do;
Our strategy in this area. The way we're currently planning on making a difference, which is field building.

Transformative AI

So, to start off, there's this idea of transformative AI. Basically looking ahead at the kinds of impacts we expect AI to have. We think there are a lot things that could happen and there's a lot of uncertainty about precisely what is going to happen. But something that seems reasonable is to expect AI to have an impact that is comparable to or larger than that of the Industrial or Agricultural Revolutions. And that's intended to capture a lot of possible sorts of scenarios that could happen.

So, we might see AI progress lead to automated science and technology development, which could lead to a really rapid increase in technological progress. We might see artificial general intelligence (sometimes abbreviated AGI), meaning AI systems that can do anything that a human can do, roughly. And that would really change the dynamics of the economy and how the economy functions. We might see systems that can do anything that a human or a group of humans can do. So AI systems could operate organizations autonomously. Maybe companies, non-profits, parts of government.

And then sort of looming over all of this is the idea that we shouldn't really expect AI to stop at the point of human-level competence, but we should expect the development of super-intelligent AI systems. It's not clear exactly what the distribution of capabilities of these systems would be and there are a lot of different possibilities.

The reason I've chose this picture in the slides is because it shows the change in the way human influence was wielded on the world during the Industrial Revolution. You can see this traditional set of biofuel usage down at the bottom and then over the course of the Industrial Revolution, that became a very small percentage of the overall influence that humanity wielded. Most of what we were doing in the world came to depend on these new energy sources.

The idea of transformative impact comes from AI becoming a really large percentage of how humanity influences the world. That most of the influence we have could be via AI systems that are hopefully acting on our behalf.

Based on the conversations we've had with a lot of AI researchers, it's pretty reasonable to think that this could happen sometime in the next 20 years. I'm saying greater than 10% chance by 2036 because we said 20 years last year and so we don't want to always be saying 20 years later as years continue.

So there’s this really big change in the world, there's a lot of variation in what could happen, and it's hard to predict exactly what is going to be most critical and what kinds of things we might want to make a difference on.

So here is our general strategy in this area. We can imagine two different worlds. One of them is a world where transformative AI comes somewhat by surprise, maybe it comes relatively early. And there aren't a lot of people who have been spending much of their career thinking full time about these problems, really caring about longterm outcomes for humanity. And then there's an alternate world where those professional people have existed for a while. They're working in fields with each other. They're critiquing each other's work.

And we think that the prospect of good outcomes is a lot more likely in cases where these fields have existed for a while, where they're really vibrant. They have some of the best people in policies, some of the best people in machine learning and AI research in them. And where those people have been thinking really specifically about how transformative AI could affect the long run trajectory of human civilization.

So, our basic plan is to affect field building. To try to move these fields ahead, in terms of quality and in terms of size. And a really useful thing about this is that if you wanna affect the longterm trajectory of civilization, you don't really get to run several experiments to see which interventions are going to work well. So it's really hard to get feedback on whether what you're doing is helping.

So, what we'd like to do is start really keeping track of how these fields grow over time so that we can tell which kinds of interventions are making a difference. And it's not a sure thing that field growth is the correct strategy to pursue but it at least gives us something to measure and track to see if what we're doing is making a difference.

Strategic Risks

I'm starting with strategic risks because I think they have historically been less emphasized in the EA community. By strategic risks, I mean risks that could be caused by the way major, influential actors in the world react to the prospect of artificial general intelligence, or super-intelligence, or other kinds of transformative AI. And the way that they choose to use these technologies to affect the world. So sort of the policies and strategies they adopt.

For example, if you expect this big curve of human influence in the world to be mostly about artificial intelligence in the future, then that's a big opportunity for different actors to have more influence in the future than they do today or an opportunity for that influence to be rebalanced. Maybe between different countries, between different industries. It feels like there's a strong chance that as influential actors start noticing that this might happen, that there could be preemptive conflict. There could be arms races or development races between governments or between companies.

If a government or company gains a really strong advantage in artificial intelligence, they might use it in a way that isn't in the best interest of the most people. So we could see a shift in the way resources and rights are distributed in the future. I classify that as a misuse of artificial intelligence. We want to make sure that transformative AI is used in a way that benefits the most people the most.

And then a final thing to think about is the possibility of accidental risks, risks of building AI systems that malfunction and do things that don't really benefit anyone, that weren't intentional. Then racing to develop artificial intelligence could be a big increase in that risk, because if you spend time and money and resources on making systems safer, you're spending less on racing.

What we'd like to do is build up a field of people who are trying to answer the key question of what should influential actors do in different scenarios depending on how AI development plays out. Its important to consider different scenarios because there’s a lot of variation in how the future could go.

And there are a lot of existing relevant areas of expertise, knowledge and skill that seem like they're really relevant to this problem. So, geopolitics, global governance. It seems important for AI strategists to have pretty good working knowledge of AI and machine learning techniques and to be able to understand the forecasts that AI developers are making. And there's a lot of history in technology policy and the history of transformative technologies such that I hope that there are lessons that we could take from those. And of course, there's existing AI risk thought. So, Nick Bostrom's Superintelligence, things that have been done by other groups in the effective altruist community.

And so, our activities in this area of AI strategic risk right now, how are they going? I think that the frank summary is that we're not really sure how to build this field. Open Philanthropy Project isn't really sure. It's not really clear where we're going to find people who have the relevant skills. There's not, as far as we can tell, a natural academic field or home that already has the people who know all of these things and look at the world in this way. And so, our activities right now are pretty scattered and experimental. We're funding the Future of Humanity Institute and I think that makes sense to do, but we're also interacting a lot with government groups, think tanks, companies, people who work in technology policy, and making a few experimental grants to people in academia and elsewhere just to see who is going to be productive at doing this work.

I think it's really unclear and something I'd love to talk to people about more. Like how are we going to build this AI strategy field so that we can have professional AI strategists who can do the important work when it's most timely?

Misalignment Risks

So, the other category of risk that I wanna talk about is misalignment risks. I've used a picture of a panda. This is an adversarial example. It's a crafted image that's designed to make an AI system make an incorrect decision. And it's been sort of a recent, really hot topic in machine learning because it shows the fragility of some kinds of machine learning models that are really popular right now.

This kind of fragility is not a full picture of the problems of AI misalignment. It’s not a full picture of when AI systems don't reliably do the things that their operators want them to do, but I think it's a good simple, straightforward example. The intent of training a neural network on these images was to get the neural network to make the same classifications that humans would. And it turns out to not be very hard to come up with a situation where the neural network will just do something completely different from what any human would say.

So, broadly speaking, misalignment risks refer to situations where we can make really influential AI systems and most of our influence over the world is flowing through these AI systems, but we can't make these systems reliably pursue the objectives that their operators intend. So, if we see this, a similar shaped graph as ended the Industrial Revolution where almost everything that humans are doing in the world is going through AI systems, and most of the way the world goes in the future depends on those decisions sort of lining up well with what humans want, then it's a really bad situation if we're not really sure if AI systems are going to do the things we want them to do, if they misinterpret what we want them to do, if they're gonna act unreliably when they're in situations we haven't anticipated before.

So, we've been talking a lot to groups like the Machine Intelligence Research Institute, to the Future of Humanity Institute, and also to technical advisors of ours who are at industrial research labs like OpenAI and Deep Mind and then also to people in academia, machine learning researchers.

And there are a couple of priority areas of research that we think are really important if you want to advance the technical capability of building AI systems that reliably do the things that their operators want them to do: reward learning and reliability.

So reward learning is this idea that it would be quite bad if we could build AI systems that can pursue easily specifiable goals like things you can measure in the world that are like how much money is in this bank account or how rewards come in through this particular channel that's flowing back to the AI. Most of the things humans care about in the world aren't easily measured in that way. So, there's a question of whether we can get AI systems to learn a task by interacting with humans in a way that makes them sort of cooperatively refine their understanding of what our goals are and act conservatively in cases where they have a lot of uncertainty and where the impact on the world could be very great if they've made the wrong evaluation of what their operator's objectives are.

And then on the reliability side, there's this question of how we train AI systems in really limited subsets of the situations that they'll eventually be functioning in. So if we want AI systems to make important decisions in the world, especially if the world is changing rapidly and dramatically, we need to be really sure that AI systems are not going to function dramatically differently in those situations than they did in training.

At Open Philanthropy Project, we've encountered a bunch of different models and ideas about how hard AI alignment will be. There's some people we've talked to who think that AI alignment is like really, really closely related to all of the things that we'll need to do in order to make AI systems effective in the world in the first place. Those problems are just gonna be solved along the way. On this view, maybe it doesn't hurt to get started ahead of time, but it's not an urgent issue. And we've talked to other people who think that there are a ton of open, unsolved problems that we have no idea how to make traction on. And that we need to get started yesterday on solving these problems. And there are a lot of people in the middle. Probably the majority of people are somewhere in between, in terms of AI and machine learning researchers.

So, we're highly uncertain about how hard alignment will be and we think that it makes a lot of sense to get started on this academic field building in this area. If the worst case scenario is that we build this field and the problems turn out to be easier than we expected, that seems pretty good.

I think we're a lot clearer how misalignment field building will go than we are about how strategic risk field building will go. In reward learning and reliability, and then in AI alignment more broadly, I think that the academic field of AI and machine learning research contains the people who have the kinds of skills and capabilities that we need for AI alignment research already. And this is an area where philanthropic funding can just directly have an impact. There's a bit of a funding puzzle to do with having all these different chickens and eggs that you need in order to get a good research field up and running. And that includes having professors who can host students, having students who are interested in working on these problems and having workshops and venues that can coordinate the research community and kind of weave people together so that they can communicate about what questions are most important.

I think it's obvious that this kind of field building work could pay off in the longer term. If you imagine this AI alignment community building up over many decades, it's obvious. But actually, I think that even if we want to develop experts who will be ready to make essential contributions on short timelines, this is among the best ways to do that, because we're finding PhD students who have a lot of the necessary skills already and getting them to start thinking about and working on these problems as soon as we can.

So, this is a scenario where we've done a pretty significant amount of grant making so far and we have some more in the works. There have been a couple big grants to senior academics in artificial intelligence and machine learning. The biggest ones being to Stuart Russell and his co-investigators, several other professors, at the Center for Human Compatible AI, which is based in Berkeley and also has branches at couple of their universities. There's another big grant that went to Joshua Bengio and bunch of his co-investigators at The Montreal Institute for Learning Algorithms. And that's a fairly recent grant. There are more students coming into that institute in the fall who we're hoping to get involved with this research.

With other professors, we're making some planning grants so that we can spend time interacting with those professors and talking with them a lot about their research interests and how they intersect with our interests in this area. Overall, we're taking a really personal, hands-on approach with grants to academic researchers in this area because I think our interests and the research problems we think are most important are a little bit unusual and a little bit difficult to communicate about.

So, I think it's important for us to do these sort of relationship-based grants and to really spend the time talking to the students and professors in order to figure out what kinds of project would be most effective for them to do.

So far, the main support that we've lent to students is via their professors. So often academic grants will support a professor, part of a professor's time and much of several of their students' times. But this fall we're hoping to offer a fellowship for PhD students, which is a major way that machine learning PhD students are supported.

I'm quite bullish on this. I think that it's reasonable to expect a lot of the really good research and ideas to come from these PhD students who will have started thinking about these things earlier in their careers and had more opportunity to explore a really wide variety of different problems and approaches. But again, offering a PhD fellowship is not something we've done before so I think it's going to be sort of experimental and iterative to figure out how exactly it's going to work.

As far as workshops, we've held a workshop at Open Philanthropy Project for a bunch of grantees and potential grantees. Basically, as an experiment to see what happens when you bring together these academics and ask them to give talks about the AI alignment problem. We were quite happy with this. I think that people quickly jumped on board with these problems and are exploring a set of ideas that are closely related to the fields that they were working on before, but are approaching them from an angle that's closer to what we think might be required to handle AI alignment.

There are also workshops like Reliable Machine Learning in the Wild that have been in academic machine learning conferences, which are the major way that academics communicate with each other and publish results. Conferences dominate over journals in the field of machine learning. So we think supporting workshops at conferences is a good way to build up this community.

And it really depends on being able to communicate these problems to professors and students because they're the primary organizing force in these workshops.

There are other developments that I think you guys might be especially interested in. There's the Open Philanthropy Project partnership with OpenAI, which I think Holden talked about a little bit yesterday. We're quite excited about this. It's an unusual grant because it's not the case that we're just contributing money to a group and then letting them pursue the activities that they were going to pursue anyway. It's like a really active partnership between us and them to try to pool our talents and resources to pursue better outcomes from transformative AI.

So, I'm really excited about that. It's not clear exactly what kinds of results and updates and communications it makes sense to expect from that because it's still pretty early, but I have high hopes for it. We funded the Machine Intelligence Research Institute last year and we're still in a lot of conversations with them about their particular outlook on this problem and the work that they're doing.

There's a collaboration between OpenAI and Deep Mind. So this is something that the Open Philanthropy Project isn’t funding or playing a role in directly, but I think it's an exciting development just for people who care about this area. So, OpenAI's a nonprofit and Deep Mind is part of Google, but in theory they could be viewed as competitors for producing artificial general intelligence. So I think it's really encouraging to see their safety teams working together and producing research on the alignment problem. I think that's a robustly positive thing to do.

I also happen to think that the research that they did jointly publish, which is about learning from human feedback - so, having an AI system demonstrate a series of behaviors and having a human rate those behaviors and using those ratings to guide the learning of the AI system - I think this is a really promising research direction. A lot of this research is related to Paul Christiano's concept of act-based agents, which personally I'm really optimistic about as a new direction in the AI alignment problem.

Our Strategy in this Area

So, overall, the takeaway here: last year we published a blog post on the philanthropic opportunity that we saw from transformative AI. And looking back on that a year later, I think that short timelines still look plausible. This greater than 10% chance over the next 20 years of developing transformative AI seems really real. And additionally, we increasingly think that Open Philanthropy Project can make the biggest difference in the world where timelines are short in that way. So, a major criterion that we apply to the work that we're doing is: would this be useful if AGI were developed within the next 20 years or so.

Neglectedness still looks really high. We haven't seen a lot of other funders jumping into this space over the next year and I think it was really possible given the increase in attention to artificial general intelligence, that this space would become much more crowded. I think Open Philanthropy Project and this community are still in a pretty unusual position to influence outcomes in this area just because it is so neglected.

And after having done some experiments in strategy and field building in technical AI alignment research, I think tractability looks higher than it did before. It's probably within the general range that we thought it was in, but maybe more concentrated in the high end. Just as we've gone on and talked to more and more AI researchers, it's been easier than expected to communicate the things that we're interested to find common ground between what they think they could do productive research on and what we think would make the biggest difference for the future trajectory of human civilization.

So those are the continued high-priorities for us. We're still spending a lot of senior staff time on it and I think it's a cause area that it makes sense to pay attention to if you're interested in the long-term trajectory of human civilization.

I'll take questions now, and thanks for your time.

Q&A

Question: Do you think that we should or if it is even possible to slow the advance of AI until some of these areas can mature that you're investing in?

Daniel Dewey: I think that's a good question. My current guess is that we don't have very good levers for affecting the speed of AI development. I think there's so much money and so much pressure in the rest of society to develop artificial intelligence that it’s not in a place where we have a particularly strong advantage. Slowing down technology is, I think, quite difficult to do and it would take a really concerted effort on the part of a much larger community.

But on top of that, I think it's a really open question how much it makes sense to think of this as like a race between two totally separate technologies, which are like capabilities and safety. My experience has been that you need a certain amount of capability in order to really do a lot of the research on AI safety.

So, yeah. It doesn't seem that tractable to me and even if it were more tractable, I think it's still sort of an open strategic question.

Question: Okay. Great. Next question.

Given the massive advantage that someone or some group could gain from winning the AI race, let's say, it seems to this questioner that the strategic considerations are perhaps the biggest risk. So, how does the field building that you're engaged in help us avoid this sort of arms race scenario in AI?

Daniel Dewey: I don't want to express too much confidence about this, but the way that I currently see the strategic field building work playing out is that we don't really want people making up their strategies on the fly, in a panic at the last minute. And if there are people who have done work ahead of time and gained expertise in the strategic considerations that are going on here, I think that we can have much better, more detailed, more well worked out plans for groups to coordinate with each other to achieve their shared interests.

And then also if there are some groups that we think will use AI more responsibly, or some governmental structures that we think would be more conducive to overall flourishing, I think that's not something you can work out at the last minute. So, I see developing a strategy for mitigating harms from misuse or from racing as something that we need these strategy experts to do. I don't think it's something that we can do in our spare time or something that people can do casually while they're working on something else. I think it's something that you really want people working on full time.

So I guess that's my perspective. Since we don't know what to do, that we should develop these experts.

Question: Another question that touches on several of the themes that you just mentioned there. How do you expect that AI development will impact human employment and how do think that will then impact the way that governments choose to engage with this whole area?

Daniel Dewey: Yeah. This a super good question.

I don't have a good answer to this question. I think that there are interesting lessons from self-driving cars where I think most people who have been keeping up with self-driving cars, with the raw technological progress, have been a little bit surprised by the slowness of this technology to roll out into the world.

So, I think one possibility that's worth considering is that it takes so long to bring a technology from a proof of concept in the lab to a broad scale in the world. That there could be this delay that causes a big jump in effective capabilities in the world where maybe we have, in the lab, the technology to replace a lot of human labor but it takes a long time to restructure the marketplace or to pass regulatory barriers or handle other mundane obstacles to applying a new technology.

But I think it's absolutely worth considering and it's an important strategic question if there going to be things like employment or things like autonomous weapons that will cause governments to react dramatically to AI in the really short term. In the US the big example is truck driving. Is autonomous truck driving going to cause some concerted reaction from the US government? I don't really know. I think this is a question we would like to fund to answer.

Question: Obviously, there's a lot of debate between openness and more closed approaches in AI research.

Daniel Dewey: Yeah.

Question: The grant to OpenAI's a big bet, obviously, on the open side of that ledger. How are you thinking about open and closed or that continuum between those two extremes and how does your bet on OpenAI fit into that?

Daniel Dewey: So, I don't actually think that the bet on OpenAI is a strong vote in favor of openness. I think that their philosophy, as I understand it in this area, is that openness is something that they think is a good heuristic. Like it's a good place to start from in some sense. That if one of the things you're worried about is uneven distribution of power, there's this powerful mechanism of distributing information and capabilities and technology more widely.

But if you go and look at what they've written about it, especially more recently, they've been pretty clear that they're going to be pragmatic and flexible and that if they're sitting around a table and they've developed something and their prediction is that releasing it openly would cause horrible consequence, they're not going to be like, "Well, we committed to being open. I guess we have to release this even though we know it's going to be awful for the world."

My perspective on openness is that, I mean, this is a boring answer. I think it's one of these strategic questions that like you can do a shallow analysis and say like, if you're worried about the risk of a small group of people taking a disproportionate chunk of influence and that that would be really bad, then maybe you want to be more open. If you're mostly worried about offense beating defense and only one hostile actor could cause immense harm, then you're probably gonna be more excited about closedness then openness.

But I think we need to move past this shallow strategic analysis. Like, we need people working in a real way on the detailed, nitty-gritty aspects of how different scenarios would play out, because I don't think there's a simple conceptual answer to whether openness or closedness is the right call.

Question: Well, we'll have it to leave it there for today. Round of applause for Daniel Dewey.

Daniel Dewey: Cool. Thank you.

Effective Altruism Forum
EA Forum