I recently co-authored a paper with Pablo Moreno, Implications of Quantum Computing for Artificial Intelligence alignment research, which can be accessed through arXiv.
Our paper focuses on analyzing the interaction between Quantum Computing (QC) and the current landscape of research in Artificial Intelligence (AI) alignment, and we weakly conclude that knowledge of QC is unlikely to be helpful to address current bottlenecks in AI alignment.
In this post I intend to very briefly summarize the generator of the main arguments of the paper, convey the main conclusions and invite the reader to read the full report if they wish to get a deeper intuition or see our list of open questions.
It might be tempting to conclude that QC has important implications for AI alignment since there are some promising avenues of research in Quantum Machine Learning, so QC might end up being an integral component of future AI systems.
However, we argue that for the most part QC can be simplified away as a black box accelerator that lets you exponentially speed up certain computations - the so-called quantum speedup. This is relevant because we believe that current research in alignment should feel free to use invocations to that kind of oracles to discuss formal solutions for the different problems of the field, and worry about its concrete efficient implementation later down the line.
The biggest challenge that QC supposes for AI Alignment purposes is what we called quantum obfuscation - the fact that reading the contents of a quantum computing is hard to do classically, which may render some oversight mechanisms we might design useless.
However most research agendas and problems AI alignment researchers are working on have little to do with the actual implementation of low-level oversight mechanisms, and focus rather on aligning the incentives of AI systems to cooperatively send information to its operators in an interpretable way.
Furthermore, there might be direct analogues of classical oversight in the quantum realm, so research conducted in this stage may be rescued later instead of wasted.
We have also looked into reasons why QC might be a good tool to solve some AI alignment subproblems, and identified a couple of cases. They are however not especially promising.
First, we identified the possibility of using access to quantum computing as an amplification of an overseer that verifies or provides the reward in a way hard to understand by an agent being verified - we call this exploiting quantum asymmetry.
Second, we might be able to exploit quantum isolation to monitor quantum agents - the fact that a quantum computer has to remain isolated to be able to achieve quantum speedups. This might point in the direction of a tripwire that would allow us to detect whether a system has interacted with the outside world without our consent. Albeit we have not looked into this in-depth, we weakly argue against the possibility of an efficient schema of this type.
Long story short, we do not believe that QC is a critical area of knowledge for advancing current research agendas of technical AI alignment, and I would weakly recommend against pursuing a career in it for this purpose or funding research in this intersection.
For the full discussion of our reasoning and a list of open questions, I refer the reader to our paper.
This post was written by Jaime Sevilla, summer fellow at the Future of Humanity Institute. I want to thank Pablo Moreno for working with me on this topic and his feedback on this summary.
First off, I really appreciate the straightshooter conclusion of 'QC is unlikely to be helpful to address current bottlenecks in AI alignment.' even while you both spent many hours looking into it.
Second, I'm curious to hear any thoughts on the amateur speculation I threw at Pablo in a chat at the last AI Safety Camp:
Would quantum computing afford the mechanisms for improved prediction of the actions that correlated agents would decide on?
As a toy model, I'm imagining hundreds of almost-homogenous reinforcement learning agents within a narrow distribution of slightly divergent maps of the state space, probability weightings/policies, and environmental inputs. Would current quantum computing techniques, assuming the hardware to run them on is available, be able to more quickly/precisely derive the % portions of those agents at say State1 would take Action1, Action2, or Action3?
I have a broad vague sense that if that set-up works out, you could leverage that to create a 'regulator agent' for monitoring some 'multi-agent system' composed of quasi-homogenous autonomous 'selfish agents' (e.g. each negotiating on behalf of their respective human interest group) that has a meaningful influence on our physical environment. This regulator would interface directly with a few of the selfish agents. If that selfish agent subset are about to select Action1, it will predict what % of other, slightly divergent algorithms would also decide Action1. If the regulator prognoses that an excessive number of Action1s will be taken – leading to reduced rewards to or robustness of the collective (e.g. Tragedy of the Commons case of overutilisation of local resources) – it would override that decision by commanding a compensating number of the agents to instead select the collectively-conservative Action2.
That's a lot of jargon, half of which I feel I have little clue about... But curious to read any arguments you have on how this would (not) work.
I think so! But I also think that you can do it easily with a bunch of GPUs. Let me explain: the idea is parallelizing the process of the agents and then just sampling from the agents. You can do that using "quantum parallelism", but I feel it will be simpler to just use GPUs for that.
I believe that you might be able to get some (polyno
... (read more)