A personal take on longtermist AI governance

lukeprog

A personal take on longtermist AI governance

lukeprog

8 min readJul 16, 2021

173

Comments 7

Sorted by

New & upvoted

weeatquince🔸

Thank you Luke for sharing your views. I just want to pick up one thing you said where your experience of the longtermist space seems sharply contrary to mine.

You said: "We lack the strategic clarity ... [about] intermediate goals". Which is a great point and I fully agree. Also I am super pleased to hear you have been working on this. You then said:

I caution that several people have tried this ... such work is very hard

This surprised me when I read it. In fact my intuition is that such work is highly neglected, almost no one has done any of this and I expect it is reasonably tractable. Upon reflection I came up with three reasons for my intuition on this.

1. Reading longtermist research and not seeing much work of this type.

I have seem some really impressive forecasting and trend analysis focused but if anyone had worked on setting intermediate goals I would expect to see some evidence of basic steps such as listing out a range of plausible intermediate goals or consensus building exercises to set viable short and mid term visions of what AI governance progress looks like (maybe it's there and I've just not seen it). If anyone had made a serious stab at this I would expect to have seen thorough exploration exercises to map out and describe possible near-term futures, assumption based planning, scenario based planning, strategic analysis of a variety of options, tabletop exercises, etc. I have seen very little of this.

2. Talking to key people in the longtermist space and being told this research is not happening.

For a policy research project I was considering recently I went and talked to a bunch of longtermists about research gaps (eg at GovAI, CSET, FLI, CSER, etc). I was told time and time again that policy research (which I would see as a combination of setting intermediate goals and working out what policies are needed to get there) was not happening, was a task for another organisation, was a key bottleneck that no-one was working on, etc.

3. I have found it fairly easy to make progress on identifying intermediate goals and short-term policy goals that seem net-positive for long-run AI governance

I have an intermediate goal of: key actors in positions of influence over AI governance are well equipped to make good decisions if needed (at an AI crunch time). This leads to specific policies such as: Ensuring clear lines of responsibility exist in military procurement of software /AI or, if regulation happens it should be expert driven outcome based regulation or some of the ideas here. I would be surprised if longtermists looking into this (or other intermediate goals I routinely use) would disagree with the above intermediate goal or that the policy suggestions move us towards that goal. I would say this work has not been difficult.

– –

So why is our experience of the longtermist space so different. One hunch I have is that we are thinking of different things when we consider "strategic clarity on intermediate goals".

In supporting governments to make long-term decisions and has given me a sense of what long-term decision making and "intermediate goal setting" and long-term decision making involves. This colours the things I would expect to see if the longtermist community was really trying to do this kind of work and I compare longtermists' work to what I understand to be best practice in other long-term fields (from forestry to tech policy to risk management). This approach leaves me thinking that there is almost no longtermist "intermediate goal setting" happening. Yet maybe you have a very different idea of what "intermediate goal setting involves" based on other fields you have worked in.

It might also be that we read different materials and talk to different people. It might be that this work has happened I've just missed it or not read the right stuff.

– –
Does this matter? I guess I would be much more encouraging about someone doing this work than you are and much more positive about how tractable such work is. I would advise that anyone doing this work should have a really good grasp of how wicked problems are addressed and how long-term decision making works in a range of non-EA fields and the various tools that can be used.

lukeprog

As far as I know it's true that there isn't much of this sort of work happening at any given time, though over the years there has been a fair amount of non-public work of this sort, and it has usually failed to convince people who weren't already sympathetic to the work's conclusions (about which intermediate goals are vs. aren't worth aiming for, or about the worldview cruxes underlying those disagreements). There isn't even consensus about intermediate goals such as the "make government generically smarter about AI policy" goals you suggested, though in some (not all) cases the objection to that category is less "it's net harmful" and more "it won't be that important / decisive."

weeatquince🔸

Thank you Luke – great to hear this work is happening but still surprised by the lack of progress and would be keen to see more such work out in public!

(FWIW Minor point but I am not sure I would phrase a goal as "make government generically smarter about AI policy" just being "smart" is not good. Ideally want a combination of smart + has good incentives + has space to take action. To be more precise when planning I often use COM-B models, as used in international development governance reform work, to ensure all three factors are captured and balanced.)

Michael_Wiebe

Our AI focus area is part of our longtermism-motivated portfolio of grants,^[2] and we focus on AI alignment and AI governance grantmaking that seems especially helpful from a longtermist perspective. On the governance side, I sometimes refer to this longtermism-motivated subset of work as "transformative AI governance" for relative concreteness, but a more precise concept for this subset of work is "longtermist AI governance."^[3]

What work is "from a longtermist perspective" doing here? (This phrase is used 8 times in the article.) Is it: longtermists have pure time preference = 0, while neartermists have >0, so longtermists care a lot more about extinction than neartermists do (because they care more about future generations). Hence, longtermist AI governance means focusing on extinction-level AI risks, while neartermist AI governance is about non-extinction AI risks (eg. racial discrimination in predicting recidivism).

If so, I think this is misleading. Neartermists also care a lot about extinction, because everyone dying is really bad.

Is there another interpretation that I'm missing? Eg. would neartermists and longtermists have different focuses within extinction-level AI risks?

Michael_Wiebe

One possible response is about long vs short AI timelines, but that seems orthogonal to longtermism/neartermism.

Vasco Grilo🔸

Hi Michael,

Neartermists also care a lot about extinction, because everyone dying is really bad.

I think this only makes sense for high extinction risk. If extinction risk is less than 1 % per century, or less than 10^-4 per year, it would allow for a life expectancy longer than 10 k years. This is nothing on a cosmological timescale, but much longer than the current human life expectancy. If a generation lasts 30 years, it would take 333 (= 10^4/30) to reach 10 k years. So extinction risk has a pretty small impact on one's life expectancy, and those of one's children.

lukeprog

11mo

I am copying footnote 19 from the post above into this comment for easier reference/linking:

The "defense in depth" concept originated in military strategy (Chierici et al. 2016; Luttwak et al. 2016, ch. 3; Price 2010), and has since been applied to reduce risks related to a wide variety of contexts, including nuclear reactors (International Nuclear Safety Advisory Group 1996, 1999, 2017; International Atomic Energy Agency 2005; Modarres & Kim 2010; Knief 2008, ch. 13.), chemical plants (see "independent protection layers" and "layers of protection analysis" in Center for Chemical Process Safety 2017), aviation (see "Swiss cheese model" in Shappell & Wiegmann 2000), space vehicles (Dezfuli 2015), cybersecurity and information security (McGuiness 2021; National Security Agency 2002 & 2010; Amoroso 2011; Department of Homeland Security 2016; Riggs 2003; Lohn 2019), software development (Including for purposes beyond software security, e.g. software resilience; Adkins et al. 2020, ch. 8), laboratories studying dangerous pathogens (WHO 2020; CDC 2020; Rappert & McLeish 2007; National Academies 2006, which use different terms for "defense in depth"), improvised explosive devices (see "web of prevention" in Revill 2016), homeland security (Echevarria II & Tussing 2003), hospital security (see "layers of protection" in York & MacAlister 2015), port security (McNicholas 2016, ch. 10), physical security in general (Patterson & Fay 2017, ch. 11), control system safety in general (see "layers of protection" in Barnard 2013; Baybutt 2013), mining safety (Bonsu et al. 2016), oil rig safety (see "Swiss cheese model" in Ren et al. 2008), surgical safety (Collins et al. 2014), fire management (Okray & Lubnau II 2003, pp. 20-21), health care delivery (Vincent et al. 1998), and more. Related (and in some cases near-identical) concepts include the "web of prevention" (Rappert & McLeish 2007; Revill 2016), the "Swiss cheese model" (Reason 1990; Reason et al. 2006; Larouzee & Le Coze 2020), "layers of protection" (Center for Chemical Process Safety 2017), "multilayered defense" or "diversity of defense" (Chapple et al. 2018, p. 352), "onion skin" or "lines of defense" (Beaudry 2016, p. 388), or "layered defense" (May et al. 2006, p. 115). Example partially-overlapping "defense layers" for high-stakes AI development and deployment projects might include: (1) tools for blocking unauthorized access to key IP, e.g. secure hardware enclaves for model weights, (2) tools for blocking unauthorized use of developed/trained IP, akin to the PALs on nuclear weapons, (3) tools and practices for ensuring safe and secure behavior by the humans with access to key IP, e.g. via training, monitoring, better interfaces, etc., (4) methods for scaling human supervision and feedback during and after training high-stakes ML systems, (5) technical methods for gaining high confidence in certain properties of ML systems, and properties of the inputs to ML systems (e.g. datasets), at all stages of development (a la Ashmore et al. 2019), (6) background checks & similar for people being hired or promoted to certain types of roles, (7) legal mechanisms for retaining developer control of key IP in most circumstances, (8) methods for avoiding or detecting supply chain attacks, (9) procedures for deciding when and how to engage one's host government to help with security/etc., (10) procedures for vetting and deciding on institutional partners, investors, etc. (11) procedures for deciding when to enter into some kinds of cross-lab (and potentially cross-state) collaborations, tools for executing those collaborations, and tools for verifying another party's compliance with such agreements, (12) risk analysis and decision support tools specific to high-stakes AI system developers, (13) whistleblowing / reporting policies, (14) other features of high-reliability organizations, a la Dietterich (2018) and Shneiderman (2020), (15) procedures for balancing concerns of social preference / political legitimacy and ethical defensibility, especially for deployment of systems with a large and broad effect on society as a whole, e.g. see Rahwan (2018); Savulescu et al. (2021), (16) special tools for spot-checking / double-checking / cross-checking whether all of the above are being used appropriately, and (17) backup plans and automatic fail-safe mechanisms for all of the above.

Comments

More from the author

114

AI safety is extremely bottlenecked on grantmakers

lukeprog·2mo ago·3m read

Help us find founders for new AI safety projects

lukeprog·7mo ago·2m read

243

EA needs consultancies

lukeprog·5y ago·10m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

138

Let's taboo the V-word

lincolnq·3d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·16h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

weeatquince🔸

Thank you Luke for sharing your views. I just want to pick up one thing you said where your experience of the longtermist space seems sharply contrary to mine.

You said: "We lack the strategic clarity ... [about] intermediate goals". Which is a great point and I fully agree. Also I am super pleased to hear you have been working on this. You then said:

I caution that several people have tried this ... such work is very hard

3. I have found it fairly easy to make progress on identifying intermediate goals and short-term policy goals that seem net-positive for long-run AI governance

Open Phil's AI governance work so far (a recap)

First, some key points from my previous post:

In practice, Open Phil's grantmaking in Potential Risks from Advanced Artificial Intelligence is split in two:

One part is our grantmaking in "AI alignment," defined here as "the problem of creating AI systems that will reliably do what their designers want them to do even when AI systems become much more capable than their designers across a broad range of tasks."^[1]
The second part, which I lead, is our grantmaking in "AI governance," defined here as "local and global norms, policies, laws, processes, politics, and institutions (not just governments) that will affect social outcomes from the development and deployment of AI systems."

Our AI focus area is part of our longtermism-motivated portfolio of grants,^[2] and we focus on AI alignment and AI governance grantmaking that seems especially helpful from a longtermist perspective. On the governance side, I sometimes refer to this longtermism-motivated subset of work as "transformative AI governance" for relative concreteness, but a more precise concept for this subset of work is "longtermist AI governance."^[3]

It's difficult to know which “intermediate goals” we could pursue that, if achieved, would clearly increase the odds of eventual good outcomes from transformative AI (from a longtermist perspective). As such, our grantmaking so far tends to focus on:

…research that can help clarify how AI technologies may develop over time, and which intermediate goals are worth pursuing.
…research and advocacy aimed at the few intermediate goals we've come to think are clearly worth pursuing, such as particular lines of AI alignment research, and creating greater awareness of the difficulty of achieving high assurance in the safety and security of increasingly complex and capable AI systems.
…broad field-building activities, for example scholarships, career advice for people interested in the space, professional networks, etc.^[4]
…training, advice, and other support for actors with plausible future impact on transformative AI outcomes, as always with a focus on work that seems most helpful from a longtermist perspective.

Key bottlenecks

Since our AI governance grantmaking began in ~2015,^[5] we have struggled to find high-ROI grantmaking opportunities that would allow us to move grant money into this space as quickly as we'd like to.^[6] As I see it, there are three key bottlenecks to our AI governance grantmaking.

Bottleneck #1: There are very few longtermism-sympathetic people in the world,^[7] and even fewer with the specific interests, skills, and experience to contribute to longtermist AI governance issues.

As a result, the vast majority of our AI governance grantmaking has supported work by people who are (as far as I know) not sympathetic to longtermism (and may have never heard of it). However, it's been difficult to find high-ROI grantmaking opportunities of this sort, too, because, as mentioned above:

Bottleneck #2: We lack the strategic clarity and forecasting ability to know which "intermediate goals" are high-ROI or even net-positive to pursue (from a longtermist perspective). If we had more clarity on intermediate goals, we could fund more people who are effectively pursuing those goals, whether they are sympathetic to longtermism or not.

In the past few years, I've spent hundreds of hours discussing possible high-value intermediate goals with other "veterans" of the longtermist AI governance space. Thus I can say with some confidence that there is very little consensus on which intermediate goals are net-positive to pursue, or even on more fundamental questions such as "Is AI x-risk lower if AI progress is faster or slower?" or "Could a broadly superhuman 'prosaic' AGI system be robustly aligned with human values at all?" or "What are the most likely paths by which AI technology could lead to existential catastrophe or eucatastrophe?" In fact, there might be even less consensus on such questions than there was a few years ago.^[8]

To some extent, such lack of consensus should be expected for almost any "wicked problem," especially those requiring long-range forecasting (which it's not clear anyone can do successfully). However, my sense is that the challenge of reaching strategic consensus (or at least strategic clarity^[9]) has been even harder in longtermist AI governance than it has been in, for example, longtermism-motivated biosecurity and pandemic preparedness (another Open Phil focus area).

So, can we fund work that will help to clarify which intermediate goals are worth pursuing (from a longtermist perspective)?^[10] We've done some of this, but (per bottleneck #1) there aren't many longtermists who can be funded to do this work, and also:

Bottleneck #3: It's difficult to define and scope research projects that can help clarify which intermediate goals are worth pursuing from a longtermist perspective even if those research projects are done by people who are not themselves thinking about the issues from a longtermist perspective.^[11]

My advice to longtermists interested in AI governance

To be clear, I think we're making progress on these bottlenecks, and we may see major breakthroughs on one or more of them in the next few years. But given current bottlenecks, what should longtermists do today? My intuitions about this have evolved considerably over the last few years, and they will no doubt continue to evolve. In general, the advice below is still fairly tentative, and I hope that people interested to help mitigate AI x-risk will try to think through the situation themselves and (in some cases) pursue strategies that are quite different from what I advise below (while taking care to avoid causing accidental harm).

That said, as of today, here is some of my advice:

Given the lack of strategic clarity/consensus in longtermist AI governance, project and career path decisions should be especially influenced by factors like experimentation / learning about yourself, aptitude development, and other career capital development.

In the absence of greater strategic clarity/consensus (via strategy-relevant research and by just seeing how things play out in the world, and thus which scenarios we're most likely headed toward), one high-EV option is to prioritize future impact, e.g. by building credibility both deserved and perceived over many years of dedicated service — while continuing to study and think through how you can best have a positive impact in potential future scenarios. Once there is more strategic clarity/consensus, you may then be in a position to help increase the degree to which wise and broadly beneficial AI governance ideas are actually implemented (by governments, companies, etc.). Also, your experience in positions of generally increasing credibility and responsibility will help you think more about the details of the incentives and constraints facing people in and near such roles, thus improving your ability to contribute to the quest for greater strategic clarity (see #4 below).

It might be especially valuable to focus on potential impact during a plausible "AI crunch time,"^[12] i.e. a period lasting 1-20 years when the decisions most impactful on TAI outcomes might be made. Of course, it's difficult to know when such an AI crunch time might occur, how long it might last, and which kinds of credibility and roles might lend themselves to helpful impact, but I think we have enough information now to make some reasonable guesses. Personally, it seems to me^[13] that if there is an AI crunch time before 2100, it is most likely to begin between 2025 and 2060, it is most likely to last 2-15 years, and many of the most impactful decisions will be made by (i) technology and national security policymakers in the US, China, and a few other jurisdictions,^[14] (ii) AI-leading tech firms with access to very large compute resources,^[15] and perhaps (iii) a few key semiconductor firms.^[16]
I hope to write more in the future about how best to pursue these particular career paths and more generally optimize for AI crunch time impact. In the meantime, see e.g. 80,000 Hours on AI Policy and US AI Policy, two articles on working in Congress, Science and Technology Policy Opportunities, AI policy careers in the EU, and AI Governance Career Paths for Europeans (but I don't endorse everything in those posts). I wish there were similar guides on how to work toward impactful roles outside the US/European government and think tank space, but I don't yet have any to recommend.^[17]
If you might be interested in a US AI policy path, consider applying to the Open Philanthropy Technology Policy Fellowship. It was designed to accommodate people who don't yet know much about the policy process or which paths might interest them.

Another thing you could do in the absence of greater strategic clarity is to help increase the number of thoughtful longtermists with particular aptitudes and other career capital that is most relevant to working on AI governance, e.g. by promoting longtermism to key audiences and by helping longtermists develop relevant aptitudes and career capital. (Prioritizing future impact can also help with this, because e.g. it might get you into roles with hiring authority.) Some ideas for how to do this are in the post Open Philanthropy is seeking proposals for outreach projects.

You can also contribute to research that may provide greater strategic clarity in the future. However, I caution that several people have tried this (including me) over the past several years, and it's not clear we have much better strategic clarity or consensus than we did a few years ago. I think some recent work has produced small amounts of added strategic clarity,^[18] but it's not yet sufficient to make it much clearer which intermediate goals are robustly good to pursue — in part because answers to that question are dependent on many factors, only a few of which have been studied in depth so far. Such work is very hard to do in a way that is likely correct, convincing to others, and thorough enough to be action-guiding.

If you want to try this kind of work, in most cases I recommend that you first (1) study many different relevant topics, (2) discuss the issues in depth with "veterans" of the topic, and (3) gain experience working in relevant parts of key governments and/or a top AI lab (ideally both) so that you acquire a detailed picture of the opportunities and constraints those actors operate with.
Or, you can try to help answer one or more narrowly-scoped questions that an AI x-risk motivated person who is closer to having those advantages has identified as especially action-informing (even if you don't have the full context as to why).
I hope to write more in the future about which research projects I personally think might be the most action-informing on the current margin. In the meantime, if you're interested in this path, you could work on learning as much as you can about longtermism-motivated AI alignment and AI governance issues, and also study broader relevant topics such as machine learning, security studies, and how other high-stakes technology industries use a "defense in depth" approach to avoiding catastrophic failure.^[19]

Notes

I borrow this definition from here, except that I've replaced the term "users" with "designers." A case can be made for either phrasing and I don't mean to take a strong stance between them in this post. ↩︎
See also the update here on the distinction between Open Phil's longtermist work and our Global Health and Wellbeing (formerly "near-termist") work. ↩︎
I didn't quite use the phrase "longtermist AI governance" in the previous post, but e.g. see footnote 15 there. By longtermist AI governance I mean "longtermism-motivated AI governance/strategy/policy research, practice, advocacy, and talent-building." As far as I know, the term originates with Allan Dafoe. ↩︎
See also Allan Dafoe on the "asset-decision model of research impact" and the "field building model of research" here. ↩︎
There wasn't as clear a distinction between our AI alignment work and our AI governance work in 2015, and I wasn't leading our AI governance work at the time. For a chronological list of our grants related to AI governance going back to 2015, see footnote 23 here. ↩︎
Because we want to help others as much as possible per dollar with our grantmaking, we aim to only recommend grants above some threshold of expected benefit per dollar. If a grant opportunity doesn't meet that threshold, then we'd prefer to recommend those dollars to a higher-ROI opportunity (perhaps in another focus area), or save the dollars for a future time when we expect higher-ROI opportunities available. For more detail on our traditional “100x bar” (now more like a 1000x bar) for benefit produced per dollar, see GiveWell’s Top Charities Are (Increasingly) Hard to Beat. We use a different threshold for our longtermist grantmaking, but our thoughts on what that threshold should be are still under development, and we haven't yet written much about how we think about the ROI threshold for our longtermist grantmaking. However, for some basic context on how we think about "worldview diversification" and our "last dollar," see here and here. For more on how "sign uncertainty" interacts with the ROI threshold for our longtermist grantmaking, see footnotes 19-20 here. ↩︎
I estimate there are a few thousand people who would self-report as "primarily longtermist," based on results from the 2019 Effective Altruism Survey conducted by Rethink Priorities, specifically that 40% of surveyed effective altruists listed the long-term future as their top cause area, and this post estimates that there are ~6500 "active" effective altruists. There are several reasons this could be an underestimate or an overestimate, but after a brief exchange with David Moss of Rethink Priorities, I think "a few thousand" is a reasonable guess. Compare this to other relatively niche, relatively "philosophical" communities, e.g. to ~700,000 registered Libertarian voters in the U.S. (see "voter registration totals" here; for estimates of Libertarians in a looser sense see here), or to ~3.3 million vegetarians in the U.S. alone (using the "1% of adults both self-identify as vegetarians and report never consuming meat" estimate here), or to the member count of the subreddit for antinatalism (currently ~105,000, though presumably only a fraction of that number would self-identify as antinatalist). ↩︎
This might be partly because there are more veterans of the longtermist AI governance space than there used to be, and hence more views. ↩︎
Expert consensus is not required for Open Phil to feel it has enough strategic clarity to devote large amounts of time and money pursuing particular intermediate goals, but strategic consensus is probably correlated with strategic clarity, and in the case of longtermist AI governance I think the lack of consensus reflects an empirically difficult-to-resolve lack of strategic clarity on fundamental questions and on potential intermediate goals. ↩︎
See also this post on "disentanglement research." ↩︎
In a minority of cases, this is in part due to information hazards associated with some kinds of research projects. More often, it's because one's angle of attack on a question varies depending on whether one is thinking about outcomes from a longtermist perspective vs. some other perspective. ↩︎
Even if there is an "AI crunch time" during your lifetime, many of the most impactful decisions might be made prior to AI crunch time, though it may be difficult to know that far in advance which decisions they are or which options are most beneficial. ↩︎
I won't argue for these guesses here. ↩︎
Other potentially-key jurisdictions include the UK, the EU, the Netherlands, South Korea, and Japan. (The latter three are leaders in key parts of the semiconductor supply chain). ↩︎
E.g. Google, Microsoft, and Amazon. ↩︎
This includes companies playing different roles in the semiconductor supply chain, e.g. TSMC, ASML, Google's semiconductor team(s), and perhaps particular startups such as Graphore (though it's harder to predict which of these will be influential in the future). ↩︎
See also 80,000 Hours on China-focused careers, though it's not focused on AI. ↩︎
Post-2015 research that I've found especially informative for thinking about longtermist AI governance issues includes: (1) recent OpenAI and Open Phil research on AI timelines (e.g. AI and Compute, AI and Efficiency, GPT-3, Scaling Laws for Autoregressive Generative Modeling, Scaling Laws for Transfer at OpenAI, and Draft report on AI timelines, Modeling the Human Trajectory, Report on Whether AI Could Drive Explosive Economic Growth, and Report on Semi-informative Priors at Open Phil), (2) CSET's work on semiconductor supply chains and related policy options (culminating in e.g. The Semiconductor Supply Chain and Securing Semiconductor Supply Chains), and (3) Robin Hanson's book The Age of Em. ↩︎
The "defense in depth" concept originated in military strategy (Chierici et al. 2016; Luttwak et al. 2016, ch. 3; Price 2010), and has since been applied to reduce risks related to a wide variety of contexts, including nuclear reactors (International Nuclear Safety Advisory Group 1996, 1999, 2017; International Atomic Energy Agency 2005; Modarres & Kim 2010; Knief 2008, ch. 13.), chemical plants (see "independent protection layers" and "layers of protection analysis" in Center for Chemical Process Safety 2017), aviation (see "Swiss cheese model" in Shappell & Wiegmann 2000), space vehicles (Dezfuli 2015), cybersecurity and information security (McGuiness 2021; National Security Agency 2002 & 2010; Amoroso 2011; Department of Homeland Security 2016; Riggs 2003; Lohn 2019), software development (Including for purposes beyond software security, e.g. software resilience; Adkins et al. 2020, ch. 8), laboratories studying dangerous pathogens (WHO 2020; CDC 2020; Rappert & McLeish 2007; National Academies 2006, which use different terms for "defense in depth"), improvised explosive devices (see "web of prevention" in Revill 2016), homeland security (Echevarria II & Tussing 2003), hospital security (see "layers of protection" in York & MacAlister 2015), port security (McNicholas 2016, ch. 10), physical security in general (Patterson & Fay 2017, ch. 11), control system safety in general (see "layers of protection" in Barnard 2013; Baybutt 2013), mining safety (Bonsu et al. 2016), oil rig safety (see "Swiss cheese model" in Ren et al. 2008), surgical safety (Collins et al. 2014), fire management (Okray & Lubnau II 2003, pp. 20-21), health care delivery (Vincent et al. 1998), and more. Related (and in some cases near-identical) concepts include the "web of prevention" (Rappert & McLeish 2007; Revill 2016), the "Swiss cheese model" (Reason 1990; Reason et al. 2006; Larouzee & Le Coze 2020), "layers of protection" (Center for Chemical Process Safety 2017), "multilayered defense" or "diversity of defense" (Chapple et al. 2018, p. 352), "onion skin" or "lines of defense" (Beaudry 2016, p. 388), or "layered defense" (May et al. 2006, p. 115). Example partially-overlapping "defense layers" for high-stakes AI development and deployment projects might include: (1) tools for blocking unauthorized access to key IP, e.g. secure hardware enclaves for model weights, (2) tools for blocking unauthorized use of developed/trained IP, akin to the PALs on nuclear weapons, (3) tools and practices for ensuring safe and secure behavior by the humans with access to key IP, e.g. via training, monitoring, better interfaces, etc., (4) methods for scaling human supervision and feedback during and after training high-stakes ML systems, (5) technical methods for gaining high confidence in certain properties of ML systems, and properties of the inputs to ML systems (e.g. datasets), at all stages of development (a la Ashmore et al. 2019), (6) background checks & similar for people being hired or promoted to certain types of roles, (7) legal mechanisms for retaining developer control of key IP in most circumstances, (8) methods for avoiding or detecting supply chain attacks, (9) procedures for deciding when and how to engage one's host government to help with security/etc., (10) procedures for vetting and deciding on institutional partners, investors, etc. (11) procedures for deciding when to enter into some kinds of cross-lab (and potentially cross-state) collaborations, tools for executing those collaborations, and tools for verifying another party's compliance with such agreements, (12) risk analysis and decision support tools specific to high-stakes AI system developers, (13) whistleblowing / reporting policies, (14) other features of high-reliability organizations, a la Dietterich (2018) and Shneiderman (2020), (15) procedures for balancing concerns of social preference / political legitimacy and ethical defensibility, especially for deployment of systems with a large and broad effect on society as a whole, e.g. see Rahwan (2018); Savulescu et al. (2021), (16) special tools for spot-checking / double-checking / cross-checking whether all of the above are being used appropriately, and (17) backup plans and automatic fail-safe mechanisms for all of the above. ↩︎