compassionml.com
My perspective is that even though current meat production is quite efficient, from the fundamental physics there's no way that growing a whole living being with a brain and bones and all that is the most efficient possible way of producing this (and immune systems are irrelevant if you have good enough isolation). I do agree that at our current tech level it seems like synthetic meat won't be competitive anytime soon. While vegan alternatives are delicious to many people, it's not exactly the same (though wanting to eat animals for psychological reasons is definitely part of it). Though I do agree that these issues are uncertain!
The intent was that, conditional on AI sharing most but not all human values, the AIs wouldn't change their own values later.
You could have a world where all humans die and the AIs later change their own values, and you could also have worlds where partially aligned AIs don't wipe out humanity but change their values to be better (e.g. internalizing the goal of being aligned) or worse (e.g. internalizing paperclip maximizer) by our measures.
In worlds where the first TAIs share most but not all human values, what do you think most likely happens?
Thanks Dawn, taking these in turn:
1: "Robust alignment" is a deliberately vague term, it's meant to incorporate your views about how hard alignment is (e.g. UDT vs. well intentioned)
4: It's a hard question, our perspective is that the backfire->cluelessness-> don't act chain can be thought of as low tractability
5: By "stable under reflection" we meant the AI reflecting on it's own values (while interacting with the world), where agreement means they wouldn't change their values much (stylistically: an AI that shares 70% of our values in 2030 has those same values in 3030). But you're right that how AIs interact (beyond competition, handled in the last question) is important.
7. S-risks do break the scale and we couldn't find a good simple way to deal with that (though we'll do other polls more directly on that later). The intent of "will" was to match 100% expected probability to 100% agree on the scale
Some people believe that if we get partial alignment (i.e. cares about what we want, but also cares about other things) then we can get decent outcomes for the future (analogous to humans being partially aligned to each other). But others think that if we don't get alignment perfect ASIs will have incentive to take over, and then will either have value-drift towards something orthogonal to humans or will deliberately reformat it's own values. "Stable under reflection" is the opinion that this wouldn't happen: that ASIs that care somewhat about humans would continue to care somewhat about humans in the long term
That was the intervention class we had in mind, though there could be other pretraining interventions that don't fall cleanly into good/bad values (e.g. promoting risk aversion)