Aligned AI is an Oxford based startup focused on applied alignment research. Our goal is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence (related Alignment Forum post here).
We are lead by Stuart Armstrong and Rebecca Gorman, and advised by Dylan Hadfield-Menell, Adam Gleave, Justin Shovelain, Charles Pattison, and Anders Sandberg.
We think AI poses an existential risk to humanity, and that reducing the chance of this risk is one of the most impactful things we can do with our lives. Here we focus not on the premises behind that claim, but rather on why we're particularly excited about Aligned AI's approach to reducing AI existential risk.
- We believe AI Safety research is bottle-necked by a core problem: how to extrapolate values from one context to another.
- We believe solving value extrapolation is necessary and almost sufficient for alignment.
- Value extrapolation research is neglected, both in the mainstream AI community and the AI safety community. Note that there is a lot of overlap between value extrapolation and many fields of research (e.g. out of distribution detection, robustness, transfer learning, multi-objective reinforcement learning, active reward learning, reward modelling...) which provide useful research resources. However, we've found that we've had to generate our most of the key concepts ourselves.
- We believe value extrapolation research is tractable (and we've had success generating the key concepts).
- We believe distributing (not just creating) alignment solutions is critical for aligning powerful AIs.
How we'll do this
Solving value extrapolation will involve solving multiple subproblems. Therefore the research groups will iterate through sub-projects, like the ones presented here. The aim is to generate sub-projects that are close to current published research in machine learning, but whose solutions are designed to generalise. Our groups will take these projects, implement them in code, and build solutions for the relevant problem. At that point, we will either extend the project to investigate it in more depth, or write up the results and move on - passing the results to the library development team as needed.
At a high level, our research is structured around a linear pipeline, starting from theory and becoming progressively more applied. Each stage of the pipeline has tight feedback loops, and also inform the other stages of the pipeline (e.g. theory leads to experiments leads to revised theory). The following sections describe how such a process might go.
Once a sub component is deemed sufficiently promising, we will want to test it in code. To do so, we will generative "sub-project" ideas designed to be simple to implement but scalable to larger environments and models.
Minimum viable (sub-)project
We will start a sub-project with a "MVP", implementing the simplest project that captures the core of our approach.
Test sub-projects in higher dimensional settings
After implementing a successful MVP, we will iteratively experiment in increasingly high dimensional settings (think Deep Reinforcement Learning from Human Feedback to Learning to Summarize from Human Feedback).
We will employ a "red-teaming" methodology similar to that of the Alignment Research Center, considering worst case scenarios and how a given approach handles them.
What we plan to produce
If we believe we can commercialize a successful sub-project responsibly (without differential enhancing AI capabilities), it will be incorporated into our product and marketed to potential adopters (e.g. tech companies meeting regulatory requirements for fairness, robustness, etc).
We will attempt to publish compelling sub-project results to machine learning conferences, both to gain credibility and promote the adoption of our methods. This will be subject to info-hazards considerations.
We will continue to post our alignment theory to LessWrong/Alignment Forum and solicit feedback from the alignment community. This will also be subject to info-hazards considerations.
Documenting our failures is crucial for saving other researchers time. We will aim to document theoretical failures (e.g. idea x won't work because y) and empirical failures (e.g. it was harder to implement idea x with architecture y because z).
Our core principles
These are the core principles of Aligned AI.
- We do alignment research, not capability research.
- Aligned AI will be a benefits corporation, with AI alignment as the social benefit.
- We will develop our corporate structure to limit the company from unaligned behaviour.
- We have an ethics board to advise us on AI safety decisions.
- We are developing an info-hazards policy.
We're currently near the top of our pipeline and are looking to hire research and engineering staff that can implement existing project ideas and help generate new ones. Our website is here, with the direct applications link here.
Who we are
Dr. Stuart Armstrong has been working on AI risks for years, at the Future of Humanity Institute and MIRI (following on from a wandering academic life of mathematics, quantum gravity, bio-computation medicine, and general existential risk research). He pioneered many key concepts in AI alignment, including interruptibility, low-impact AIs, counterfactual Oracle AIs, the difficulty/impossibility of learning human preferences without assumptions, and how to nevertheless learn these preferences. On top of journal and conference publications, he’s been extensively posting about his research on the Alignment Forum.
Rebecca Gorman, Co-Founder and CEO of Aligned AI, is an AI safety researcher, technologist, and businessperson. Since 2017 she has been researching safety for AI systems that are designed to consider humans part of their 'environment', including information systems and social media, and has co-authored several research papers on the subject.
Oliver Daniels-Koch is a bright early career ML practitioner and former intern at Charles River Analytics. At Charles River, his research focused on using Pearlian causal model for explainable, competency-aware reinforcement learning agents.
Thank you for your time and feedback!
When founding this company, we've talked with many researchers and experienced people, within the AI research community and outside it. These conversations have all been awesome and useful - thanks! All mistakes and views expressed in this post are our own.
We are especially keen on getting feedback from the Effective Altruist community.