Summary
I believe that advanced AI systems will likely be aligned with the goals of their human operators, at least in a narrow sense. I’ll give three main reasons for this:
- The transition to AI may happen in a way that does not give rise to the alignment problem as it’s usually conceived of.
- While work on the alignment problem appears neglected at this point, it’s likely that large amounts of resources will be used to tackle it if and when it becomes apparent that alignment is a serious problem.
- Even if the previous two points do not hold, we have already come up with a couple of smart approaches that seem fairly likely to lead to successful alignment.
This argument lends some support to work on non-technical interventions like moral circle expansionor improving AI-related policy, as well as work on special aspects of AI safety like decision theory or worst-case AI safety measures.
What do you think about technical interventions on these problems, and "moral uncertainty expansion" as a more cooperative alternative to "moral circle expansion"?
Working on these problems makes a lot of sense, and I'm not saying that the philosophical issues around what "human values" means will likely be solved by default.
I think increasing philosophical sophistication (or "moral uncertainty expansion") is a very good idea from many perspectives. (A direct comparison to moral circle expansion would also need to take relative tractability and importance into account, which seems unclear to me.)