Aligned AI is an Oxford based startup focused on applied alignment research. Our goal is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence (related Alignment Forum post here).

We are lead by Stuart Armstrong and Rebecca Gorman, and advised by Dylan Hadfield-Menell, Adam Gleave, Justin Shovelain, Charles Pattison, and Anders Sandberg.

Our Premises

We think AI poses an existential risk to humanity, and that reducing the chance of this risk is one of the most impactful things we can do with our lives. Here we focus not on the premises behind that claim, but rather on why we're particularly excited about Aligned AI's approach to reducing AI existential risk.

  1. We believe AI Safety research is bottle-necked by a core problem: how to extrapolate values from one context to another.
  2. We believe solving value extrapolation is necessary and almost sufficient for alignment.
  3. Value extrapolation research is neglected, both in the mainstream AI community and the AI safety community. Note that there is a lot of overlap between value extrapolation and many fields of research (e.g. out of distribution detection, robustness, transfer learning, multi-objective reinforcement learning, active reward learning, reward modelling...) which provide useful research resources. However, we've found that we've had to generate our most of the key concepts ourselves.
  4. We believe value extrapolation research is tractable (and we've had success generating the key concepts).
  5. We believe distributing (not just creating) alignment solutions is critical for aligning powerful AIs.

How we'll do this

Solving value extrapolation will involve solving multiple subproblems. Therefore the research groups will iterate through sub-projects, like the ones presented here. The aim is to generate sub-projects that are close to current published research in machine learning, but whose solutions are designed to generalise. Our groups will take these projects, implement them in code, and build solutions for the relevant problem. At that point, we will either extend the project to investigate it in more depth, or write up the results and move on - passing the results to the library development team as needed.

Research methodology

At a high level, our research is structured around a linear pipeline, starting from theory and becoming progressively more applied. Each stage of the pipeline has tight feedback loops, and also inform the other stages of the pipeline (e.g. theory leads to experiments leads to revised theory). The following sections describe how such a process might go.

Sub-project generation

Once a sub component is deemed sufficiently promising, we will want to test it in code. To do so, we will generative "sub-project" ideas designed to be simple to implement but scalable to larger environments and models. 

Minimum viable (sub-)project

We will start a sub-project with a "MVP", implementing the simplest project that captures the core of our approach.

Test sub-projects in higher dimensional settings

After implementing a successful MVP, we will iteratively experiment in increasingly high dimensional settings (think Deep Reinforcement Learning from Human Feedback to Learning to Summarize from Human Feedback).

Red-teaming

We will employ a "red-teaming" methodology similar to that of the Alignment Research Center, considering worst case scenarios and how a given approach handles them.

What we plan to produce

Software library

If we believe we can commercialize a successful sub-project responsibly (without differential enhancing AI capabilities), it will be incorporated into our product and marketed to potential adopters (e.g. tech companies meeting regulatory requirements for fairness, robustness, etc).

ML publications

We will attempt to publish compelling sub-project results to machine learning conferences, both to gain credibility and promote the adoption of our methods. This will be subject to info-hazards considerations.

Theory blogposts

We will continue to post our alignment theory to LessWrong/Alignment Forum and solicit feedback from the alignment community. This will also be subject to info-hazards considerations.

Failures blog

Documenting our failures is crucial for saving other researchers time. We will aim to document theoretical failures (e.g. idea x won't work because y) and empirical failures (e.g. it was harder to implement idea x with architecture y because z).

Our core principles

These are the core principles of Aligned AI.

  • We do alignment research, not capability research.
  • Aligned AI will be a benefits corporation, with AI alignment as the social benefit.
  • We will develop our corporate structure to limit the company from unaligned behaviour.
  • We have an ethics board to advise us on AI safety decisions.
  • We are developing an info-hazards policy.

Hiring

We're currently near the top of our pipeline and are looking to hire research and engineering staff that can implement existing project ideas and help generate new ones. Our website is here, with the direct applications link here.

Who we are

Dr. Stuart Armstrong has been working on AI risks for years, at the Future of Humanity Institute and MIRI (following on from a wandering academic life of mathematics, quantum gravity, bio-computation medicine, and general existential risk research). He pioneered many key concepts in AI alignment, including interruptibility, low-impact AIs, counterfactual Oracle AIs, the difficulty/impossibility of learning human preferences without assumptions, and how to nevertheless learn these preferences. On top of journal and conference publications, he’s been extensively posting about his research on the Alignment Forum.

Rebecca Gorman, Co-Founder and CEO of Aligned AI, is an AI safety researcher, technologist, and businessperson. Since 2017 she has been researching safety for AI systems that are designed to consider humans part of their 'environment', including information systems and social media, and has co-authored several research papers on the subject.

Oliver Daniels-Koch is a bright early career ML practitioner and former intern at Charles River Analytics. At Charles River, his research focused on using Pearlian causal model for explainable, competency-aware reinforcement learning agents.

Thank you for your time and feedback!

When founding this company, we've talked with many researchers and experienced people, within the AI research community and outside it. These conversations have all been awesome and useful - thanks! All mistakes and views expressed in this post are our own.

We are especially keen on getting feedback from the Effective Altruist community. 

Contact information

63

Mentioned in
8 comments, sorted by Click to highlight new comments since: Today at 4:27 AM
New Comment

What is the difference between this, ARC, Redwood Research, MIRI, and Anthropic?

[-][anonymous]6mo 28

Different approaches. ARC, Anthropic, and Redwood seem to be more in the "prosaic alignment" field (see eg Paul Christiano's post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don't know about. (Apologies if I have mischaracterised any of these organisations).

Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI's stuff being the main exception - values extrapolation has pretty weak connections with logical uncertainty and inner alignment).

But the different approach should be quite complementary - progress by any group should make the task easier for the others.

This is very helpful, thank you!

Comment copied to new "Stuart Armstrong" account:

Different approaches. ARC, Anthropic, and Redwood seem to be more in the "prosaic alignment" field (see eg Paul Christiano's post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don't know about. (Apologies if I have mischaracterised any of these organisations).

Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI's stuff being the main exception - values extrapolation has pretty weak connections with logical uncertainty and inner alignment).

But the different approach should be quite complementary - progress by any group should make the task easier for the others.

For more discussion on this, you can see the comment threads on this lesswrong post and this alignment forum post

If for some reason OP thinks this is unhelpful feel free to remove my comment.

Congratulations, very exciting!

(FYI the hyperlink from " buildaligned.ai " doesn't work for some reason, but pasting that url into one's address bar does)

Thanks! Should be corrected now.