High Impact Careers in Formal Verification: Artificial Intelligence

quinn

Comments 7

Sorted by

New & upvoted

As someone who's worked both in ML for formal verification with security motivations in mind, and (now) directly on AGI alignment, I think most EA-aligned folk who would be good at formal verification will be close enough to being good at direct AGI alignment that it will be higher impact to work directly on AGI alignment. It's possible this would change in the future if there are a lot more people working on theoretically-motivated prosaic AGI alignment, but I don't think we're there yet.

Rohin Shah

Planned summary for the Alignment Newsletter:

This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:
1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation
2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we will not be able to produce a computational version
3. **Nonspecifiable safety:** we will never write down a specification for safe AI.
Formal verification techniques are applicable only to the first case. Unfortunately, it seems that no one expects the first case to hold in practice: even CHAI, with its mission of building provably beneficial AI systems, is talking about proofs in the informal specification case (which still includes math), on the basis of comments like [these](
https://www.alignmentforum.org/posts/nd692YfFGfZDh9Mwz/an-69-stuart-russell-s-new-book-on-why-we-need-to-replace?commentId=4LhBaSuYPyFvTnDrQ
) in Human Compatible. In addition, it currently seems particularly hard for experts in formal verification to impact actual practice, and there doesn’t seem to be much reason to expect that to change. As a result, the author is relatively pessimistic about formal verification as a route to reduce existential risk from failures of AI alignment.

quinn

3 year update: I consider this 2 year update to be a truncated version of the post, but it's actually too punchy and even superficial.

My opinion lately / these days is too confused and nuanced to write about.

jtcbrule

I think this post makes some excellent points; for brevity, I'm just going to mention my (potential) disagreements.

The punchline is, of course, that the world goes on turning! Tony Hoare, an early pioneer of formal methods, wrote in 1995 “It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve”

I suspect (although it's hard to check) this is because most software is mostly used by people who aren't trying to break it. Weird edge cases, for a lot of software, usually don't have that bad of a consequence.

Throw ML into the mix and any divergence between your intended (possibly implicit) utility function and the one you actually programmed can be ruthlessly exploited. In other words: in the past, the world just didn't suffer significantly from the kind of problem that formal methods was intended to solve. But that was when human beings were responsible for the logic.

A particularly colorful example of this is story about the (RL?) agent that was being trained to land an aircraft on a carrier in simulation. Due to an overflow error, it learned a strategy of crashing into the deck at very high velocity. Most human programmers would not think to try that; it may be the case that formal methods were mostly unnecessary in the past, but become critical for guaranteeing safety of autonomy.

Your point about not knowing what the specification for safe AI looks like is still well taken.

formally verify an entire project end to end is often not the ask

Leslie Lamport has suggest that most of the value of formal methods is in getting programmers to actually write down specifications. I believe it's possible that:

Formal verification of AGI is not viable
Many critical components of AGI can be formally specified
Better tooling can help understand/explore/check specifications

An anecdote from industry: Amazon actually budgets engineer hours to write TLA+, which is not used to formally verify implementations of algorithms, but does let programmers model check their specifications, e.g. prove the absence of deadlocks. Speculatively, we might imagine improved formal methods tools and techniques that allow AI researcher to check that their specifications are sound, even while end-to-end verification remains intractable.

Overall, I'm still mildly pessimistic about formal verification being a high-impact career, largely for the reasons that you describe.

lukeprog

Is it easy to dig up a source for the RL agent that learned to crash into the deck?

jtcbrule

I don't remember where I initially read about it, but I found a survey paper that mentions it (and several other, similar anecdotes), along with some citations

https://arxiv.org/abs/1803.03453v1

mako yass

in which a minor slip-up means instant death for everyone so a 1 – epsilon probability of success is unacceptable.

Oh, does Eliezer still think (speak?) that way? I think that would be the first clear reasoning error (that can't just be written off as a sort of opinionated specialization) I've seen him make about AI strategy. In a situation where there's a certain yearly baseline risk of the deployment of misaligned AGI occurring (which is currently quite low and so this wouldn't be active yet!), it does actually become acceptable to deploy a system that has a well estimated risk of being misaligned. Techniques that only have a decent chance of working is actually useful and should be collected enthusiastically.

I don't know that he is still taking a zero risk policy, I've been seeing a lot more "no it will almost certainly be misaligned" recently, but it could have given rise to a lot of erroneous inferences.

Comments

jtcbrule

I think this post makes some excellent points; for brevity, I'm just going to mention my (potential) disagreements.

The punchline is, of course, that the world goes on turning! Tony Hoare, an early pioneer of formal methods, wrote in 1995 “It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve”

Your point about not knowing what the specification for safe AI looks like is still well taken.

formally verify an entire project end to end is often not the ask

Leslie Lamport has suggest that most of the value of formal methods is in getting programmers to actually write down specifications. I believe it's possible that:

Formal verification of AGI is not viable
Many critical components of AGI can be formally specified
Better tooling can help understand/explore/check specifications

Overall, I'm still mildly pessimistic about formal verification being a high-impact career, largely for the reasons that you describe.

High Impact Careers in Formal Verification: Artificial Intelligence

When is a career “high impact”?

What does “formal” mean, and why does it matter?

FV in Artificial Intelligence

What does Stuart Russell mean by “provably beneficial?”

Distinguishing theories and products

Conclusion

Subproblems, not the whole problem

Is altruistic pressure needed?