Markus Anderljung and Ben Garfinkel: Fireside chat on AI governance

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

129

Let's taboo the V-word

lincolnq·2d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·2d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

Recent opportunities to take action

The Talk

Getting involved in AI governance

Markus: Ben, how did you get into the field of AI governance?

Ben: I majored in physics and philosophy at Yale, and was considering working in the philosophy of physics. I started thinking about it not being the most useful or employable field. At the same time, I got interested in EA [effective altruism]. Then, Allan Dafoe was transitioning to focus on a new Centre for the Governance of AI. He was looking for researchers. I seized the opportunity — it seemed important, with not enough people working in [the field]. That’s how I got involved.
Markus: What happened next?

Ben: I was a Research Assistant there for about a year, and, at the same time, held a job at the Centre for Effective Altruism. I then had the opportunity to transition to the AI governance area full-time.

Markus: Sounds a bit random — just having this opportunity pop up.

Ben: There was indeed an element of randomness, but it wasn’t a random thing to do. I got really interested in long-termism and EA, so AI governance was on my radar — and yes, an opportunity lined up at the same time.
Markus: How has your work in the field changed?

Ben: Quite a lot. I still have a broad focus in the area, but when I started, there was this sense that AI was going to be a very important field. It was around 2016. AlphaGo had just come out. Superintelligence had been written, so a good fraction of long-termist concern was on AI. AI seemed to be this really transformative technology, with risks we didn’t understand very well yet.

Almost no one was working on AI safety at the time or thinking about long-term AI governance challenges, and not that much AI governance was going on. So the early questions were “What is going on here?” and “What are we trying to do?” A lot of the early research was probably more naive, with people transitioning to the field and not knowing that much about AI.

Markus: And you’re now doing a DPhil at Oxford?

Ben: Yes, I just started [working toward a DPhil] in international relations. My highest credential so far in the area [of AI governance] remains an undergraduate degree in an unrelated field, so it seemed useful to have a proper degree in a domain more relevant to governance than physics.

Let’s turn the question on you, Markus. How did you get involved?

Markus: It’s a slightly convoluted story. I was first involved in EA issues at university, around 2011-2013. I got involved with Giving What We Can and [other EA] organizations. Then I spent a few years transitioning from [the belief that] long-termism — or the closest [approximation] of it back then — was true, but there wasn’t much to do about it in the emerging technology field, to becoming increasingly convinced that there were things to do.

When I graduated from university in 2016, I moved back to Sweden. I considered building career capital, so I went into management consulting for a few years, which was very fun and interesting. After a while it felt that the trajectory had leveled off, and that I could do more in the cause areas I cared about.

I transitioned into work for EA Sweden. Building a community seemed like a good idea, and heading up the organization was a great opportunity. I got funding for that and did it for about a year. I then became convinced that I could do more with my career outside of Sweden.

I was looking for other options, thinking that my comparative advantage was “I’m a management consultant who understands the research, someone who gets it.” I applied to organizations where I thought my broader skill set of operations and recruitment could help. GovAI was one of those organizations.

It wasn’t like AI governance was the most important thing and I had to help there; it was just part of a broader class of areas that seemed useful.

The relative importance of AI governance research

Markus: Let’s talk about your recent research. At EA Global: London 2018, you gave a talk.

Ben: “How sure are we about this AI stuff?”

Markus: Right. So, I want to ask: How sure are we about this AI stuff?

Ben: There are two ways to be sure. One is robust. [It centers on the question] “If we had all of the facts and considerations, would this still be a top priority for EA?” The other [centers on the question] “Given our limited information and the expectations we have, are we sure it still makes sense to put a lot of resources into this area?”

With the first question, it’s really hard to be sure. There’s so much we don’t know: what future AI systems will look like, the rate of progress, which institutions will matter, the timelines. That’s true of any transformative technology; we don’t have a great picture.

Markus: Will it be different for AI versus other causes?

Ben: If you compare AI to climate change, there’s a lot of uncertainty in climate models. We don’t know everything about the feedback loops, or how to think about extreme events — is there a one-in-100 probability or a one-in-1,000 probability? We still have a basic sense of the parameters of the problem, such as how hot things will get (to some extent) and how bad it is.

With AI, if human labor becomes essentially unnecessary in the long term, as it’s replaced by AI systems, we don’t know what that world looks like or how it’s organized. It’s very difficult to picture. It’s like being in the 1500s and describing the internet in very rough terms, as [something that will be] more efficient [and enable] faster communication and information retrieval. You could have reasoned a bit about this future — maybe there would be larger businesses, since you can communicate on a larger scale. But you wouldn’t be visualizing the internet, but rather very fast carrier pigeons or something. You’re going to be way off. It’s hard to even find single dimensions where the answer is clear.

I think that’s about where we are with AI, which is a long-winded way of saying that it’s hard to be sure.

I actually feel pretty good about the weaker standard (“How sure are we, given these considerations, that a decent chunk of the EA movement should be focused on this?”). Overall, I think a smaller portion of the EA portfolio should be focused on this, but at least a few dozen people should be actively thinking about long-term applications of AI, and we’re not far from that number at the moment.

Markus: That sounds smaller than I expected. When you say a smaller portion of the EA portfolio should be focused on AI, what’s your current ballpark estimate of what that percentage is?

Ben: I think it’s really hard to think about the spread between things. Maybe it’s something along the lines of having one in five people who are fully engaged and oriented on the long term thinking about AI. That would be great.

Markus: Whereas now, you think the number is more like three or four in five people?

Ben: Yeah, it feels to me that it might be more than half, but I’m not sure that’s correct.

Markus: What interventions would you like these people to do instead?

Ben: There’s a lot of uncertainty. A lot of it is based on the skill set that these people have and the sort of work they’d be inclined to do. There being a lot of math and computer science [in EA] may justify the strong AI focus, but let’s imagine completely fungible people who are able to work on anything.

I really think fundamental cause prioritization research is still pretty neglected. There's a lot of good work being done at the Global Priorities Institute. There are some broad topics that seem relevant to long-term thinking that not many people are considering. They include questions like those I was working on: “Should we assume that we don’t have really great opportunities to influence the future now, relative to what future people might have if we save our money?” and “Should we pass resources on?” These seem crucial for the long-termist community.

The importance and difficulty of meta-level research

Ben: Even within AI, there are strangely not that many people thinking about, at the meta-level, the pathways for influence in AI safety and governance. What exactly is the nature of the risk? I think there’s a handful of people doing this sort of work, part-time, on the side of other things.

For example, Rohin Shah is doing a lot of good [by thinking through] “What exactly is the case for AI risk?” But there are not that many people on that list compared to the set of people working on AI safety. There’s an abstract argument to be made: Before you put many resources into object-level work, it’s quite useful, early on, to put them toward prioritizing different kinds of object-level work, in order to figure out what, exactly, is motivating the object-level work.

Markus: One of your complaints in [your past] talk was that people seemed to be putting a lot of research into this topic, but haven’t produced many proper writeups laying out the argument motivating people’s choices. Do you think we’ve seen improvement there? There are a few things that have been published since then.

Ben: Yeah, there’s definitely been an improvement since I gave the talk. I think the time at which I gave the talk was kind of a low point.

There was an initial period of time, after the publication of Superintelligence, when the motivation for AI governance, for a lot of people, corresponded to the nature of AI risk. Over time, there was some sort of transition; people have very different visions of this. The change happened along a lot of dimensions. One of them is that Superintelligence focuses on a very discrete transition to advanced AI, in which not much is happening, and then we transition to quite advanced systems in a matter of days or weeks. A lot of people moved away from that.

Also, a lot of people, myself included, started thinking about risks that weren’t specifically safety-oriented. Superintelligence discusses these but they’re not the main focus.

Markus: What do you mean by “not safety-oriented”?

Ben: There’s a lot of concern you might have about the future of AI not necessarily being great. For example, in a scenario in which human labor and government functions have been automated, it may not be a great world in terms of democracy and representation of the will of the people.

Another category is ethical [and concerns] the moral status of AI systems. Maybe those decisions are made wrongly, or are substantial deviations from the best possible case.

Markus: So these are risks that don’t [involve] accidents with very powerful systems.

Ben: We’ve had major technological transitions in history which haven’t been uniformly good. The classic one is the Neolithic Revolution — the introduction of agriculture — having a few aftereffects like the rise of the state. It’s difficult to do an overall assessment. Some of the results were positive, and some were very much not, like slavery becoming a massive institution, disease, and the establishment of hierarchies instead of decentralized decision making.

It’s not hard to imagine that if there’s another transition, in which human labor is replaced by capital, that [transition] may have various effects that aren’t exactly what we want.

Markus: Yes, and in these previous transitions, the bad consequences have been permanent structural effects, like slavery being more economically viable.

So [the time of your EA Global talk] — November 2018 — was the low point? In what sense?

Ben: It was the low point in the sense of people having changed their justifications quite a bit in a lot of different areas, [without those changes being] reflected in much published writing [other than] maybe some blog posts.

There hasn’t been a massive improvement, but there definitely has been useful stuff published since then. For example, Paul Christiano wrote a series of blog posts arguing for AI safety even in the context of a continuous transition; Richard Ngo did some work to taxonomize different arguments and lay out [current thinking in] the space; Tom Sittler did similar work; and Rohin Shah presented a case for AI risk in a good sequence called “Value Learning,” a series of essays that laid out the nature of the alignment problem.

I think that was after —

Markus: “Reframing Superintelligence”?

Ben: Yeah, Eric Drexler’s work at FHI [the Future of Humanity Institute] also came out framing his quite different picture of AI progress and the nature of the risks. Also MIRI [the Machine Intelligence Research Institute] put out a paper on what they call “mesa-optimization,” which corresponds to one of their main arguments for why they’re worried about AI risk, and which wasn’t in Superintelligence.

There have been a decent amount of publications, but quite a bit fewer that I would ideally want. There are still a lot of viewpoints that aren’t captured in any existing writing, and a lot of writing is fairly short blog posts. Those are useful, but I’m not very comfortable with putting a lot of time and work into an area where justifications are short blog posts.

It’s obviously very difficult to communicate clearly about this. We don’t have the right picture of how things will go. It’s not uncommon to have arguments about what a given post is actually saying, which is not a great signal for our community being on the same page about the landscape of arguments and considerations.

Markus: Why do you think this is? Is it due to individual mistakes? Not spending enough time on this meta-level question?

Ben: To some extent, yes. There are complications. Working on this stuff is fairly difficult. It requires an understanding of the current landscape, of arguments in this area, of what’s going on in AI safety and in machine learning. It also requires the ability to do conceptual analysis and synthesis. There are perhaps not that many people right now [who meet these criteria].

Another complicating factor is that most people currently working in this area have just recently come into it, so there’s an unfortunate dynamic where people have the false sense that the problem framing is a lot more [advanced] than it actually is — that it just hasn’t been published, and that the people in the know have a good understanding.

When you enter an area, you aren’t usually in a great position to do this high-level framing work, because you don’t really know what exists in terms of unpublished Google Docs. It’s quite easy, and maybe sensible, when entering the area, to not do high-level [thinking and writing], and instead pick an object-level topic to work on.

Some of us might be better off dropping the object-level research program and [addressing] more high-level problems. Some have been doing this in their spare time, while their [main area of study] is object-level. It does seem like a difficult transition: to stop an object-level project and embark on a loose, “what-are-we-even-doing” project.

Markus: Are there particular viewpoints that you feel haven’t been accurately represented, or written up thoroughly enough?

Ben: Paul Christiano, for example, has written a few blog posts. One is called “What Failure Looks Like.” It shows what a bad outcome would look like, even in a scenario with a slow, gradual transition. However, there’s a limit on how thoroughly you can communicate in the form of a blog post. There is still a lot of ambiguity about what is being described — an active disaster? A lost opportunity? What is the argument for this being plausible? There’s a lot more that could be done there.

I feel similarly about this idea of mesa-optimization, which is now, I think, one of the primary justifications that MIRI has for assigning a high probability to AI safety risk. I saw a lot of ambiguity around what this concept exactly is. Maybe different people are trying to characterize it differently or are misunderstanding the paper, or the paper is ambiguous. It doesn’t seem like everyone is on the same page about what exactly mesa-optimization is. The paper argues that there might be this phenomenon called mesa-optimization, but doesn’t try to make the argument that, because this phenomenon might arise, then we should view it as a plausible existential risk. I think that work still hasn’t been done.

Markus: So the arguments that are out there ought to be scrutinized more. Are there arguments or classes of viewpoints that you feel don’t even have an initial writeup?

Ben: I think there are a couple. For example, Allan [Dafoe] is thinking quite a lot about structural risks. Maybe the situation starts getting bad or disappointing in terms of our current values. It’s a bit nebulous like the Neolithic Revolution — not a concrete disaster. Some structural forces could push things in a direction you ideally wouldn’t want.

Similarly, there’s not much writing on whether AI systems will eventually have some sort of moral status. If they do, is that a plausible risk, and one that will be important enough for longtermists to focus on?

Those are probably two of the main things that stand out in my mind as plausible justifications for focusing on AI, but where I can’t point to a longer writeup.

Markus: What if I were to turn this around on you? You’re here, you’re working on these issues. What is your stab at a justification?

Ben: The main point is that it’s very likely that AI will be really transformative. We will eventually get to the point where human labor is no longer necessary for most things, and that world will look extremely different.

Then, there are around a half-dozen vaguely sketched arguments for why there might be some areas with EA leverage that would make the future much better or much worse — or why it could go either way.

It is hard to [determine] which topics may have long-term significance. I don’t think they [comprise] a large set, and there’s value in [surfacing] that information right now, in getting clearer on “what’s going on in AI.” It seems like one of the few places where there’s currently the opportunity to do useful, [future-focused] work.

Markus: So the argument is: “If you’re a long-termist, you should work in the areas that hold great leverage over the future, and this seems like one of the best bets.”

Ben: Yeah, that’s basically my viewpoint. The influence of historical events is extremely ambiguous. How plausible is it for us to know what our impact today will have 100 years from now? In the 1300s, people’s focus may have been on stopping Genghis Khan. I think that would have been ethically justified, but from a long-termist perspective, things are less clear. Genghis Khan’s impact may have ultimately been good because of the trade networks established and [other such factors]. We’re unable to discern good from bad from a long-termist perspective. Pick any century from more than five centuries ago, and you’ll be in the same position.

I think we should have a strong prior that justifies working on issues for their present influence, but for the long-term view, we shouldn’t prioritize issues where we can’t predict what difference they’ll make hundreds of years in the future.

There are not many candidates for relevant long-term interventions. Insofar as there are semi-plausible arguments for why AI could be one of them, I think it’s more useful to put resources into figuring out what is going on in the space and [improving] the value of information, rather than putting resources into object-level issues.

The role of GovAI

Ben: So, Markus, could you tell me what GovAI is currently up to in this space?

Markus: Broadly, we’re working on AI governance questions from a long-termist perspective, and I spend most of my time doing what I can to make sure we build an organization that does the best research.

In practice, we have a lot of different research projects going on. I personally spend a lot of time with recruiting, growing the organization. We run a GovAI Fellowship, where people spend three months doing research on topics that relate to the kinds of things we’re interested in. That’s a path into the AI governance space, and something we’ll continue doing for the foreseeable future. We [award] 10 fellowships every year. We’ll see whether we’ll be able to bring people [on-site] this summer. I’m pretty excited so far about this as a way of getting people into the field.

Since 2016, we’ve not been able to build up institutions in the field that provide clear pathways for people. I think this fellowship is one example of how we can do that.

My hope is to have, a few years down the line, a team of a dozen great researchers in Oxford doing great research. In terms of the research that we’ll do, we’re [an unusual] organization, in that we could define the very broad problem of AI governance as “everything required to make AI go well that isn’t technical AI safety.” That’s a lot.

It will span fields ranging from economics, to law, to policy — a tremendous number of different topics. I’m hoping that, over time, we’ll build narrower expertise as we get clearer on the meta picture and the specific fields that people need to work on.

A few years down the line, I’d really like for us and others in this space to have at least somewhat solid [recommendations] in [situations] like a corporation asking what their publication norms should be, or what sorts of internal mechanisms they should have to make sure they’re held accountable to the beautiful principles they’ve written up (e.g., “benefit humanity with our research”).

I don’t think we have that yet. Those are the kinds of questions I’m hoping we can make progress on in the next few years.

Career recommendations

Ben: Besides just applying for the GovAI Fellowship, do you have other career recommendations for people interested in [entering or exploring whether to enter] this space?

Markus: In general, I [subscribe to the view] that if you’re trying to figure out if you should be doing something, then do a bit of it. Find some bit of research that you can do and try it.

There aren’t a lot of opportunities that look like the GovAI Fellowship in the long-termist AI governance space, but others that are similar are at the Center for Security and Emerging Technology, based in Washington, DC. There are also junior roles at DeepMind and OpenAI, where you might do [some exploratory] research. But there aren’t many such roles — probably fewer than a dozen a year.

I would encourage people to think much more broadly. You might try to work at a wider set of technology companies like Microsoft or Facebook as they start building up ethics and policy teams. It would be awesome to have people in these types of “council” roles.

Another good idea would be to use your studies to dip your toe into the water. You could do your bachelor’s or master’s dissertation on a relevant topic. Some are looking into PhDs in this area as well, and that may be a good way to expand the field.

The other main tip is to engage with a lot of the research. Read everything coming out of institutions like ours, or CSET [the Center for Security and Emerging Technology]. Try to really engage with it, keeping in mind that the authors may be smart and good at what they do, but aren’t oracles and don’t have that much knowledge. There’s a lot of uncertainty in this space, so read things with an open mind. Consider what may be wrong, stay critical, and try to form your own views instead of directly [adopting] other people’s conclusions. Form your own model of how you think the world could go.

Ben: Sounds good. Very wise.

Markus: Do you have any tips? What would you have told your past self?

Ben: I can’t think of anything beyond what you just said, other than to have checked my email in case Allan Dafoe was looking for research assistants in this area, with a fairly non-competitive process at that time. Not many people were interested in it back then.

Markus: Right, so get really lucky.

Ben: Get really lucky — that’s exactly what I would tell my past self.

Markus: Cool.

Ben: Well, it’s been fun chatting!

Markus: Yes, and I’ll add my new sign-off: Stay safe, stay sane, and see you later!