Aligned AI is an Oxford based startup focused on applied alignment research. Our goal is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence (related Alignment Forum post here).
In the tradition of AI safety startups, Aligned AI will be doing an AMA this week, from today, Tuesday the 1st of March, till Friday the 4th, inclusive. It will be mainly me, Stuart Armstrong, answering these questions, though Rebecca Gorman and Oliver Daniels-Koch may also answer some of them. GPT-3 will not be invited.
From our post introducing Aligned AI:
We think AI poses an existential risk to humanity, and that reducing the chance of this risk is one of the most impactful things we can do with our lives. Here we focus not on the premises behind that claim, but rather on why we're particularly excited about Aligned AI's approach to reducing AI existential risk.
- We believe AI Safety research is bottle-necked by a core problem: how to extrapolate values from one context to another.
- We believe solving value extrapolation is necessary and almost sufficient for alignment.
- Value extrapolation research is neglected, both in the mainstream AI community and the AI safety community. Note that there is a lot of overlap between value extrapolation and many fields of research (e.g. out of distribution detection, robustness, transfer learning, multi-objective reinforcement learning, active reward learning, reward modelling...) which provide useful research resources. However, we've found that we've had to generate our most of the key concepts ourselves.
- We believe value extrapolation research is tractable (and we've had success generating the key concepts).
- We believe distributing (not just creating) alignment solutions is critical for aligning powerful AIs.
I am wondering what you think about the notion that persons develop their values in response to the systems that they exist in, which may be suboptimal; then, suboptimal values could be developed. For example, if there is a situation of scarcity or external abuse, persons may seek to dominate others to keep safe, whereas, in the scenario of abundance and overall consideration, persons may seek to develop considerate relationships with others to increase their and others wellbeing. Assuming that currently, some perceived scarcity and abuse exists in various environments, it could be suboptimal, from the long-term potential of humanity perspective, if that is measured by overall enjoyment in pursuing 'the most good' objectives, to extrapolate values now, if these are reinforced by AI. A solution can be to offer individuals an understanding of various situations and let them decide which ones they would prefer (e. g. a person in scarcity offered an understanding of abundance can instead of choosing threatening ability select ability to enjoy being enjoyed). This could work if all individuals are asked and all possibilities shareably understood. Since this is challenging, an alternative is to entertain persons on their perspective of an optimal system that they would like to see exist (rather than one which would benefit them personally), considering the objectives, under perfect alternatives awareness, of all individuals. What do you think about some of these thoughts on gathering values to extrapolate? Are you going to implement it or look for research in this area of values understanding under overall consideration and perfect alternatives understanding? I will also appreciate any comments on my Widespread values brainstorming draft which was developed using this reasoning.
Ok, that makes sense. Rhetorically, how would one differentiate the terminal values worth keeping from those worth updating. For example, hospitality 'requirement' from the free ability to choose to be hospitable from the ability to choose environments of various hospitability attitudes. I would really offer the emotional understanding of all options and let individuals freely decide. This should resolve the issue of persons favoring their environments due to limited awareness of alternatives or the fear of consequences of choosing an alternative. Then, yo... (read more)