Recently I've been thinking about the pros and cons of working on near-term technical AI safety and assurance. This includes topics such as interpretability for near-term systems, generalizability / robustness, AI security, testing, verification, and the like.

Here are my own considerations so far:

(Note: In what follows I use the term Transformative AI (TAI) very loosely to mean any type of AI that has a decent chance of leading to a global catastrophe if safety challenges are not addressed first.)


  1. Some approaches to these topics might actually turn out to work directly for TAI, especially where those approaches may not be pursued given the default trajectory (i.e., without EA intervention) of research from industry / government / academia.
  2. This kind of research directly helps create a set of tools, techniques, organizations, regulations, etc., that iteratively builds on itself in the way that technology tends to do, such that whenever TAI becomes a real problem we will already have solutions or the resources to quickly find solutions.
  3. Promoting this kind of research in industry / gov't / academia helps influence others in those communities to create a set of tools, techniques, organizations, regulations, etc., such that whenever TAI becomes a real problem we will already have solutions or the resources to quickly find solutions.
  4. Research into these topics fosters a broader concern for AI safety topics in the general public (either directly or as a side effect of researchers / gov't / etc. respecting those topics more), which could lead to public pressure on industry / gov't to develop solutions, and that may help mitigate risks from TAI.

(For whatever it's worth, my personal inside view leans towards 3 as the most plausibly important from an EA point of view.)


  1. Research into these topics, if successful, would remove some very large barriers that are currently preventing AI from being deployed in many applications that would be extremely valuable to industry or government (including the military). Removing these barriers would dramatically increase the value of AI to industry and government, which would accelerate AI development in general, potentially leading to TAI arriving before we're ready for it.
  2. Research into these topics, if only partially successful, might remove enough barriers for industry / government to start deploying AI systems that eventually prove to be unsafe. Plausibly, those AI systems might become part of an ecosystem of other AIs which together have the potential to lead to a catastrophe (along the lines of Paul Christiano's "out with a whimper" or "out with a bang" scenarios).
  3. Dramatically increasing the value of AI could also potentially lead to arms races between corporations or governments, which could lead to one side or another cutting safety corners as they're developing TAI (races to the bottom).
  4. If you are concerned about lethal autonomous weapons, then removing these barriers might greatly increase the chance that various governments might deploy LAWs. This is true even if you're not working for the government, since the government definitely follows industry developments pretty closely.


I'm also interested in how these pros and cons might change if you're doing research for large organizations (industry or government) that might plausibly have the capacity to eventually build TAI-type systems, but where the research you do will not be publicly available due to proprietary or secrecy reasons. If it makes a difference, let's assume that you're working at a place that is reasonably ethical (as corporations and governments  go) and that is at least somewhat aware of AI ethics and safety concerns.

I think that in this situation you'd have both a reduction in the value of the pros (since your solutions won't spread beyond your organization, at least for some time) and in the potential damage of the cons (for the same reason). But it seems to me that the cons are still mostly there, and possibly made worse: The lowered barriers to deployment would still probably lead your organization to press its advantage, thereby increasing the market (or strategic) value of AI as perceived by competitors, thereby leading to more resources poured into AI research in general - only now the competition might not have all the best safety solutions available to it because they're proprietary.


I'm curious what others think about all this. I would also appreciate links to good previous discussions of these topics. The only one I know of at the moment is this post, which discusses some of these considerations but not all.




New Answer
New Comment

2 Answers sorted by

There's been quite a bit written on the "pro" side: 

Also ARCHES, Concrete Problems in AI safety, etc

But not so much on the "con" side - people have generally just thought about opportunity cost. Your point that it might speed up harmful (due to safety, misuse or structural risks) applications is a really useful and important one! Would be hard to weigh things up - getting into tricky differential technological development territory. Would love for there to be more thinking on this topic.