I can plausibly see such sensors for physical pain but not for emotional pain. Emotional pain is the far more potent teacher of what is valuable and what is not, what is important and what is not. Intelligence needs direction of this sort for learning.
So, can you build embodied AGI with emotional responses built in-- that last like emotions and so are suitable teachers like emotions? Building in empathy (both for happiness and suffering) and the pain of disapproval to AGI would be crucial.
I think we could try to build AGI, but I am skeptical it could be anything useful or helpful (a broad alignment problem) because of vague or inapt success criteria, and because of the lack of embodiment of AGI (so it won’t get beat up on by the world generally or have emotional/affective learning). Because of these problems, I think we shouldn't try (1).
Further, I am trying this line of argument out to see if it will encourage (3) (not building AGI), because these concerns cast doubt on the value of AGI to us (and thus the incentives to build it).
This takes on additional potency if we embrace the shift to thinking about “should" and not just “can" in scientific and technological development generally. So that brings us to the questions I think we should be asking, which is how to encourage a properly responsible approach to AI, rather than shifting credences on the Future Funds' propositions about.
Does that make sense?
Let me then be more specific. Take Bostrom's definition. What are all the cognitive tasks in all the domains of human interest? I think this is super vague and ill-defined. We can train up AI on specific tasks we want to accomplish (and have ways of training AI to do so, because success or failure can be made clear). But there is no "all cognitive tasks in the domains of human interest" training because we have no such list, and for some crucial tasks (e.g. ethics) we cannot even define success clearly.
GPT-3 is impressive for writing, other AI is impressive for image production, but such systems also produce amusingly wrong outputs at times. What is more impressive is useful AI for tasks like the protein folding success.
Our success has been evolutionarily framed (survival, spread of species), and tested against a punishing world. But AGI will not be embodied. So what counts as success or failure for non-task specific AI? Back to Bostrom's definition, we have no such set of cognitive tasks defined or delineated.
So what are the incentives for such a creation? You say they are immense. I want to press on the idea that they are substantial.