Community builder @ Center on Long-Term Risk
137 karmaJoined Working (0-5 years)


Sorted by New


Topic contributions

Thanks :) Good point.

Minor point: I don't think it's strictly true that reducing risks of undesirable lock-ins is robustly good no matter what the expected value of the future is. It could be that a lock-in is not good, but it prevents an even worse outcome from occurring.

I included other existential risks in order to counter the following argument: "As long as we prevent non-s-risk-level undesirable lock-ins in the near-term, future people can coordinate to prevent s-risks." This is a version of the option value argument that isn't about extinction risk. I realize this might be a weird argument for someone to make, but I covered it to be comprehensive.

But the way I wrote this, I was pretty much just focused on extinction risk. So I agree it doesn't make a lot of sense to include other kinds of x-risks. I'll edit this now.

Yeah, there might be a correlation in practice, but I think intelligent agents could have basically any random values. There are no fundamentally incorrect values, just some values that we don't like or that you'd say lack importance nuance. Even under moral realism, intelligent systems don't necessarily have to care about the moral truth (even if they're smart enough to figure out what the moral truth is). Cf. the orthogonality thesis.

But AIs could value anything. They don’t have to value some metric of importance that lines up with what we care about on reflection. That is, it wouldn’t be a blunder in an epistemic sense. AIs could know their values lack nuance and go against human values, and just not care.

Or maybe you’re just saying that, with the path we’re currently on, it looks like powerful AIs will in fact end up with nuanced values in line with humanity’s. I think this could still constitute a value lock-in, though, just not one that you consider bad. And I expect there would still be value disagreements between humans even if we had perfect information, so I’m skeptical we could ever instill values into AIs that everyone is happy about it.

I’m also not sure AI would cause a value lock-in, but more because powerful AIs may be widely distributed such that no single AGI takes over everything.

Yeah. Though for a utilitarian it could still be instrumentally good to believe the points in this post, at least on an emotional level. But lying to yourself is plausibly a bad norm for other reasons, and in any case, this type of reasoning is the exact thing the post is arguing against.