Stuart Armstrong

No, they don't. It is akin to saying "most people endorse some form of 'communism'." We can point to a lot of overlap between theoretical communism and values that most people endorse; this doesn't mean that people endorse communism. That's because communism covers a lot more stuff, including a lot of historical examples and some related atrocities. Eugenics similarly covers a lot of historical examples, including some atrocities (not only in fascist countries), and this is what the term means to most people - and hence, in practice, what the term means.

Many people endorse screening embryos for genetic abnormalities. The same people would respond angrily if you said they endorsed eugenics; the same way that people who endorse minimum wages would respond angrily if you said they endorsed communism. Eugenics is evil because it descriptively describes something evil; trying to force it into some other technical meaning is incorrect.

We're Aligned AI, AMA

Stuart Armstrong3y1

Thanks, that makes sense.

I've been aware of those kind of issues; what I'm hoping is that we can get a framework to include these subtleties automatically (eg by having the AI learn them from observations or from human published papers) without having to put it all in by hand ourselves.

We're Aligned AI, AMA

Stuart Armstrong3y2

Hey there! It is a risk, but the reward is great :-)

Value extrapolation makes most other AI safety approaches easier (eg interpretability, distillation and amplification, low impact...). Many of these methods also make value extrapolation easier (eg interpretability, logical uncertainty,...). So I'd say the contribution is superlinear - solving 10% of AI safety our way will give us more than 10% progress.
I think it already has reframed AI safety from "align AI to the actual (but idealised) human values" to "have an AI construct values that are reasonable extensions of human values".
Can you be more specific here, with examples from those fields?
I see value extrapolation as including almost all my previous ideas - it would be much easier to incorporate model fragments into our value function, if we have decent value extrapolation.

We're Aligned AI, AMA

Stuart Armstrong3y2

An AI that is aware that value is fragile will behave in a much more cautious way. This gives a different dynamic to the extrapolation process.

We're Aligned AI, AMA

Stuart Armstrong3y7

Nothing much to add to the other post.
Imagine that you try to explain to a potential superintelligence that we want it to preserve a world with happy people in it by showing it videos of happy people. It might conclude that it should make people happy. Or it might conclude that we want more videos of happy people. The latter is more compatible with the training that we have given it. The AI will be safer if it hypothesizes that we may have meant the former, despite having given it evidence more compatible with the latter, and pursues both goals rather than merely the latter. This is what we are working towards.
Value alignment. Good communication and collaboration skills. Machine learning skills. Smart, reliable, and creative. Good at research. At present we are looking for a Principal ML Engineer and other senior roles.
The ability to move quickly from theory to model to testing the model and back

Stuart Armstrong

Posts 2

Comments15

Posts
2

Comments
15