Anthropic's new post "When AI builds itself" is interesting for a number of reasons but its focus on judgement caught my attention. At one point they say:
"An area of human comparative advantage, for now, is research taste and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end."
https://lnkd.in/gYGXw4V8

In choosing "which problems matter" the model may need to address incommensurable values and it seems that a lot may turn on whether these systems get better at addressing novel questions of incommensurability. If you re unsure what incommensurability means read the very short "Apples and Oranges" paper below!

Inspired by the work of the late David Hodgson, an Australian judge who also managed to publish three philosophy books with Oxford University Press, one on utilitarianism, one on consciousness and another on free will , I published this short piece on AI and incommensurability in the journal AI&Society last year. 

The models might get better at dealing with novel issues of incommensurability. A lot (including alignment) seems to turn or whether they do and it might be an idea to try and find a way to see if they are improving - an incommensurability benchmark
With this in mind, in the paper I propose a sort of Turing Test of incommensurability  here:

https://lnkd.in/gfQG56fJ

What do others think? What might be a useful way of working out if agentic AIs  are able to rationally respond to incommensurable values when they make decisions? What would be a good benchmark?

1

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities