Hide table of contents

Summary and Introduction

People who want to improve the trajectory of AI sometimes think their options for object-level work are (i) technical work on AI alignment or (ii) non-technical work on AI governance. But there is a whole other category of options: technical work in AI governance. This is technical work that mainly boosts AI governance interventions, such as norms, regulations, laws, and international agreements that promote positive outcomes from AI. This piece provides a brief overview of some ways to do this work—what they are, why they might be valuable, and what you can do if you’re interested. I discuss:

  • Engineering technical levers to make AI coordination/regulation enforceable (through hardware engineering, software/ML engineering, or heat/electromagnetism-related engineering)
  • Information security: Developing and implementing systems and best practices for securing model weights and other AI technology
  • Forecasting AI development
  • Technical standards development
  • Grantmaking or management to get others to do the above well
  • Advising on the above
  • Other work

[Update] Additional categories which the original version of this piece (from 2022) under-emphasized or missed are:

  • AI control: Developing systems and best practices for overseeing and constraining AI systems that may not be trustworthy (example)
  • Model evaluations: Developing technical evaluations of the safety of AI systems (discussion, examples)
  • Forecasting hardware trends (examples)
  • Cooperative AI: Research in game theory, ML, and decision theory for designing AI systems in ways that avoid costly coordination failures (discussion, examples)

I expect there will likely be one or more resources providing more comprehensive introductions to many of these topics in early 2024. For now, see the above links to learn more about the topics added in the update, and see below for more discussion of the originally listed topics.

Acknowledgements

Thanks to Lennart Heim, Jamie Bernardi, Gabriel Mukobi, Girish Sastry, and others for their feedback on this post. Mistakes are my own.

Context

What I mean by “technical work in AI governance”

I’m talking about work that:

  1. Is technical (e.g. hardware/ML engineering) or draws heavily on technical expertise; and
  2. Contributes to AI’s trajectory mainly by improving the chances that AI governance interventions succeed[1] (as opposed to by making progress on technical safety problems or building up the communities concerned with these problems).

Neglectedness

As of writing, there are (by one involved expert’s estimate) ~8-15 full-time equivalents doing this work with a focus on especially large-scale AI risks.[2]

Personal fit

For you to have a strong personal fit for this type of work, technical skills are useful, of course (including but not necessarily in ML), and interest in the intersection of technical work and governance interventions presumably makes this work more exciting for someone.

Also, whatever it takes to make progress on mostly uncharted problems in a tiny sub-field[3] is probably pretty important for this work now, since that’s the current nature of these fields. That might change in a few years. (But that doesn’t necessarily mean you should wait; time’s ticking, someone has to do this early-stage thinking, and maybe it could be you.)

What I’m not saying

I’m of course not saying this is the only or main type of work that’s needed. (Still, it does seem particularly promising for technically skilled people, especially under the debatable assumption that governance interventions tend to be more high-leverage than direct work on technical safety problems.)

Types of technical work in AI governance

Engineering technical levers to make AI coordination/regulation enforceable

To help ensure AI goes well, we may need good coordination and/or regulation.[4] To bring about good coordination/regulation on AI, we need politically acceptable methods of enforcing them (i.e. catching and penalizing/stopping violators).[5] And to design politically acceptable methods of enforcement, we need various kinds of engineers, as discussed in the next several sections.[6]

Hardware engineering for enabling AI coordination/regulation

To help enforce AI coordination/regulation, it might be possible to create certain on-chip devices for AI-specialized chips or other devices at data centers. As a non-exhaustive list of speculative examples:

  • Devices on network switches that identify especially large training runs could be helpful.
    • They could help enforce regulations that apply only to training runs above a certain size (which, among other benefits, seem much easier politically than trying to regulate all uses of compute).
  • If there were on-chip devices tracking the number of computations done on chips, that could help an agency monitor how much compute various data centers and organizations are using.
    • That could help enforce regulations whose application depends on the amount of compute being used by an AI developer or data center (which, among other benefits, seems much easier politically than trying to regulate everyone who uses compute).
  • Dead man’s switches on AI hardware (or other hardware-enabled authorization requirements) could peacefully keep rogue organizations from harmful AI development or deployment (e.g. by interfering early on in a training run).

Part of the engineering challenge here is that, ideally (e.g. for political acceptability), we may want such devices to not only work but to also be (potentially among other desired features):

  • Secure;
  • Privacy-preserving;
  • Cheap;
  • Tamper-indicating; and
  • Tamper-proof.[7]

Software/ML engineering for enabling AI coordination/regulation

Software (especially ML) engineering could help enforce AI coordination/regulation in various ways[8], including the following:

  • Methods/software for auditing ML models could help determine when and how regulations should be applied (e.g. it could help determine that some model may not be deployed yet because it has capabilities that current safety methods do not address) (see here for an example of such work);
  • ML applications to satellite imagery (visual and infrared) could help identify secret data centers;
  • Software (maybe ML) for analyzing hardware devices or perhaps video data could help detect efforts to tamper with the hardware devices discussed in the previous section; and
  • ML applications to open-source data or other types of data could help identify violations.

For enforcing AI coordination/regulation against particularly motivated violators, it could be helpful to be able to identify hidden chips or data centers using their heat and electromagnetic signatures. People who know a lot about heat and electromagnetism could presumably help design equipment or methods that do this (e.g. mobile equipment usable at data centers, equipment that could be installed at data centers, methods for analyzing satellite data, and methods for analyzing data collected about a facility from a nearby road.)

Part of the challenge here is that these methods should be robust to efforts to conceal heat and electromagnetic signatures.

Information security

Information security could matter for AI in various ways, including the following:

  • It would be bad if people steal unsafe ML models and deploy them. It would also be bad if AI developers rush to deploy their own models (e.g. with little testing or use of safety methods) because they are scared that, if they wait too long, someone else will steal their models and deploy them first. Sufficiently good information security in AI developers (including their external infrastructure) would mitigate these problems.
  • Information security in regulatory agencies might help enable coordination/regulations on AI to be enforced in a politically acceptable way; it could assure AI developers that their compliance will be verified without revealing sensitive information, while assuring a regulator that the data they are relying on is authentic.
    • This could include the use of cryptographic techniques in the hardware devices, model evaluation software, and other equipment discussed above.
  • Information security in hardware companies could help keep the semiconductor supply chain concentrated in a small number of allied countries, which might help enable governance of this supply chain.

See here, here, and here (Sections 3.3 and 4.1), and listen here [podcast] for more information. As these sources suggest, information security overlaps with—but extends beyond—the engineering work mentioned above.

Forecasting AI development

AI forecasters answer questions about what AI capabilities are likely to emerge when. This can be helpful in several ways, including:

  • Helping AI governance researchers account for ways in which near-term advances in AI will change the strategic landscape (e.g. through the introduction of new tools or new threats, or through raising how much attention various actors are paying to AI);
  • Helping determine the urgency and acceptable timelines for various kinds of work; and
  • Helping set parameters for (coordinated) AI regulations (e.g. if some regulation would only apply to models trained with at least some amount of compute, precisely how many FLOPs should be treated as highly risky? What are the cost penalties of decentralized training, which might change what regulators need to look for at each data center?)

Typically, this work isn’t engineering or classic technical research; it often involves measuring and extrapolating AI trends, and sometimes it is more conceptual/theoretical. Still, familiarity with relevant software or hardware often seems helpful for knowing what trends to look for and how to find relevant data (e.g. “How much compute was used to train recent state–of-the-art models?”), as well as for being able to assess and make arguments on relevant conceptual questions (e.g. “How analogous is gradient descent to natural selection?”).

See here (Section I) and here[9] for some collections of relevant research questions; see [1], [2], [3], [4], and [5] for some examples of AI forecasting work; and listen here [podcast] for more discussion.

Technical standards development

One AI risk scenario is that good AI safety methods will be discovered, but they won’t be implemented widely enough to prevent bad outcomes.[10] To help with this, translating AI safety work into technical standards (which can then be referenced by regulations, as is often done) might help. Relatedly, standard-setting could be a way for AI companies to set guardrails on their AI competition without violating antitrust laws.

Technical expertise (specifically, in AI safety) could help standards developers (i) identify safety methods that it would be valuable to standardize, and (ii) translate safety methods into safety standards (e.g. by precisely specifying them in widely applicable ways, or designing testing and evaluation suites for use by standards[11]).

Additionally, strengthened cybersecurity standards for AI companies, AI hardware companies, and other companies who process their data could help address some of the information security issues mentioned above.

See here for more information.

Grantmaking or management to get others to do the above well

Instead of doing the above kinds of work yourself, you might be able to use your technical expertise to (as a grantmaker or manager) organize others in doing such work. Some of the problems here appear to be standard, legible technical problems, so it might be very possible for you to leverage contractors, grantees, employees, or prize challenge participants to solve these problems, even if they aren’t very familiar with or interested in the bigger picture.

Couldn’t non-experts do this well? Not necessarily; it might be much easier to judge project proposals, candidates, or execution if you have subject-matter expertise. Expertise might also be very helpful for formulating shovel-ready technical problems.

Advising on the above

Some AI governance researchers and policymakers may want to bet on certain assumptions about the feasibility of certain engineering or infosec projects, on AI forecasts, or on relevant industries. By advising them with your relevant expertise, you could help allies make good bets on technical questions. A lot of this work could be done in a part-time or “on call” capacity (e.g. while spending most of your work time on what the above sections discussed, working at a relevant hardware company, or doing other work).

Others?

I’ve probably missed some kinds of technical work that can contribute to AI governance, and across the kinds of technical work I identified, I’ve probably missed many examples of specific ways they can help.

Potential next steps if you’re interested

Contributing in any of these areas will often require you to have significant initiative; there aren’t yet very streamlined career pipelines for doing most of this work with a focus on large-scale risks. Still, there is plenty you can do; you can:

  • Learn more about these kinds of work, e.g. by following the links in the above sections (as well as this link, which overlaps with several hardware-related areas).

  • Test your fit for these areas, e.g. by taking an introductory course in engineering or information security, or by trying a small, relevant project (say, on the side or in a research internship).

  • Build relevant expertise, e.g. by extensively studying or working in a relevant area.

    • Grantmakers like the Long-Term Future Fund might be interested in supporting relevant self-education projects.
  • Learn about and pursue specific opportunities to contribute, especially if you have a serious interest in some of this work or relevant experience, e.g.:

    • Reach out to people who work in related areas (e.g. cold-email authors of relevant publications, or reach out at community conferences).
    • Apply for funding if you have a project idea.
      • Georgetown’s Center for Security and Emerging Technology (CSET) might be interested in funding relevant projects (though, speculating based on a public announcement from the relevant grantmaker, they might have limited capacity in this area for the next few months).
    • Keep an eye out for roles on relevant job boards.
  • Feel free to reach out to the following email address if you have questions or want to coordinate with some folks who are doing closely related work[12]:

    • technical-ai-governance [ät] googlegroups [döt] com

<!-- Footnotes themselves at the bottom. -->

Notes


  1. This includes creating knowledge that enables decision-makers to develop and pursue more promising AI governance interventions (i.e. not just boosting interventions that have already been decided on). ↩︎

  2. Of course, there are significantly more people doing most of these kinds of work with other concerns, but such work might not be well-targeted at addressing the concerns of many on this forum. ↩︎

  3. courage? self-motivation? entrepreneurship? judgment? analytical skill? creativity? ↩︎

  1. To elaborate, a major (some would argue central) difficulty with AI is the potential need for coordination between countries or perhaps labs. In the absence of coordination, unilateral action and race-to-the-bottom dynamics could lead to highly capable AI systems being deployed in (sometimes unintentionally) harmful ways. By entering enforceable agreements to mutually refrain from unsafe training or deployments, relevant actors might be able to avoid these problems. Even if international agreements are infeasible, internal regulation could be a critical tool for addressing AI risks. One or a small group of like-minded countries might lead the world in AI, in which case internal regulation by these governments might be enough to ensure highly capable AI systems are developed safely and used well. ↩︎

  2. To elaborate, international agreements and internal regulation both must be enforceable in order to work. The regulators involved must be able to catch and penalize (or stop) violators—as quickly, consistently, and harshly as is needed to prevent serious violations. But agreements and regulations don’t “just” need to be enforceable; they need to be enforceable in ways that are acceptable to relevant decision-makers. For example, decision-makers would likely be much more open to AI agreements or regulations if their enforcement (a) would not expose many commercial, military, or personal secrets, and (b) would not be extremely expensive. ↩︎

  3. After all, we currently lack good enough enforcement methods, so some people (engineers) need to make them. (Do you know of currently existing and politically acceptable ways to tell whether AI developers are training unsafe AI systems in distant data centers? Me neither.) Of course, we also need others, e.g. diplomats and policy analysts, but that is outside the scope of this post. As a motivating (though limited) analogy, the International Atomic Energy Agency relies on a broad range of equipment to verify that countries follow the Treaty on the Non-Proliferation of Nuclear Weapons. ↩︎

  4. Literally “tamper-proof” might be infeasible, but “prohibitively expensive to tamper with at scale” or “self-destroys if tampered with” might be good enough. ↩︎

  5. This overlaps with cooperative AI. ↩︎

  6. Note the author of this now considers it a bit outdated. ↩︎

  7. In contrast, some other interventions appear to be more motivated by the worry that there won’t be time to discover good safety methods before harmful deployments occur. ↩︎

  8. This work might be similar to the design of testing and evaluation suites for use by regulators, mentioned in the software/ML engineering section. ↩︎

  9. I’m not managing this email; a relevant researcher who kindly agreed to coordinate some of this work is. They have a plan that I consider credible for regularly checking what this email account receives. ↩︎

Show all footnotes
Comments3


Sorted by Click to highlight new comments since:

+1 to this proposal and focus.

On 'technical levers to make AI coordination/regulation enforceable', there is a fair amount of work suggesting that e.g. arms control agreements have often dependend on/been enabled by new technological avenues for enabling unilateral monitoring (or for enabling cooperative, but non-intrusive monitoring - e.g. sensors on missile factories, as part of the US-USSR INF Treaty), have been instrumental (see Coe and Vaynmann 2020 ).

That doesn't mean that it's always an unalloyed good: there are indeed cases where new capabilities can introduce new security or escalation risks (e.g. Vaynmann 2021); they can also perversely hold up negotiations; e.g. Richard Burns (link, introduction) discusses a case where the involvement of engineers in designing a monitoring system for the Comprehensive Test Ban Treaty, actually held up negotiations of the regime, basically because the engineers focused excessively on technical perfection of the monitoring system [beyond a level of assurance that would've been strictly politically required by the contracting parties], which enabled opponents of the treaty to paint it as not giving sufficiently good guarantees.

Still, beyond improving enforcement, there's interesting work on ways that AI technology could speed up and support the negotiation of treaty regimes (Deeks 2020, 2020b, Maas 2021), both for AI governance specifically, and in supporting international cooperation more broadly.

I am a software engineer who transitioned to tech/AI policy/governance. I strongly agree with the overall message (or at least title) of this article: that AI governance needs technical people/work, especially for the ability to enforce  regulation. 

However in the 'types of technical work' you lay out I see some gaping governance questions/gaps. You outline various tools that could be built to improve the capability of actors in the governance space, but there are many such actors, and tools by their nature are dual use - where is the piece on who these tools would be wielded by, and how they can be used responsibly? I would be more excited about seeing new initiatives in this space that clearly set out which actors it works with for which kinds of policy issues and which not and why. Also there is a big hole around not being conflicted etc. There's lots of legal issues that can't be avoided that crop up when you need to actually use such tools in any context beyond a voluntary initiative of a company (which does not give as many guarantees as things that apply to all current and future companies, like regulations or to some extent standards). There is and will be increasingly a huge demand for companies with practical AI auditing expertise - this is a big opportunity to start trying to fill that gap. 

I think the section on 'advising on the above' could be fleshed out a whole lot more. At least I've found that because this area is very new, there is a lot of talking to do with lots of different people, lots of translation, before getting to actually do these things... it's helpful if you're the kind of technical person who is willing to learn how to communicate to a non-technical audience, and to learn from people with other backgrounds about the constraints and complexities of the policymaking world, and derives satisfaction from this. I think this is hugely worthwhile though - and if you're the kind of person who is willing to do that and looking for work in the area, do get in touch as I have some opportunities (in the UK).

Finally, I'll just more explicitly now highlight the risk of technical people being used for the aims of others (that may or may not lead to good outcomes) in this space. In my view, if you really want to work in this intersection you should be asking the above questions about anything you build - who will use this thing and how, and what are the risks and can I reduce them. And when you advise powerful actors, bringing your technical knowledge and expertise, do not be afraid to also give your opinions to decision-makers on what might lead to what kinds of real world outcomes, and ask questions about the application aims, and improve those aims.

Thanks for the comment! I agree these are important considerations and that there's plenty my post doesn't cover. (Part of that is because I assumed the target audience of this post--technical readers of this forum--would have limited interest in governance issues and would already be inclined to think about the impacts of their work. Though maybe I'm being too optimistic with the latter assumption.)

Were there any specific misuse risks involving the tools discussed in the post that stood out to you as being especially important to consider?

More from Mau
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f