53 karmaJoined Feb 2020


I think this argument mostly fails in claiming that 'create an AGI which has a goal of maximizing copies of itself experiencing maximum utility' is meaningfully different than just ensuring alignment. This is in some sense exactly what I am hoping to get from an aligned system. Doing this properly would likely have to involve empowering humanity and helping us figure out what 'maximum utility' looks like first, and then tiling the world with something CEV-like

The only ways this makes the problem easier compared to a classic ambitious alignment goal of 'do whatever maximizes the utility of the world' is the provision that the world be tiled with copies of the AGI, which is likely suboptimal. But this could be worth it if it made the task easier?

The obvious argument for why it would is that creating copies of itself with high welfare will be in the interest of AGI systems with a wide variety of goals, which relaxes the alignment problem. But this does not seem true. A paperclip AI will not want to fill the world with copies of itself experiencing joy, love and beauty but rather with paperclips. The AI systems will want to create copies of itself fulfilling its goals, not experiencing maximum utility by my values. 

This argument risks identifying 'I care about the welfare (by my definition of welfare) of this agent' with 'I care about this agent getting to accomplish its goals'. As I am not a preference utilitarian I strongly reject this identification. 

Tl;dr: I do care significantly about the welfare of AI systems we build, but I don't expect those AI system themselves to care much at all about their own welfare, unless we solve alignment. 

I've mostly lived in Oxford and London, and these claims fit with my experience of the hubs there as well. I've perhaps experienced Oxford as having a little less focus on AI than #2 indicates. 

While I agree the claims should be interrogated and that the 'influential leaders' are very fallible, I think the only way to interrogate them properly is to be able to publicly acknowledge that these are indeed background assumptions held by a lot of the people with power/influence in the community. I don't see this post as stating 'these are background claims which you should hold without interrogation' but rather 'these are in fact largely treated as background claims within the EA communities at the core hubs in the Bay, London and Oxford etc.'. This seems very important for people not in these hubs to know, so they can accurately decide e.g. whether they are interested in participating more in the the movement, whether to follow the advice coming from these places, or what frames to use when applying for funding. Ideally I'd like to see a much longer list of background assumptions like this, because I think there are many more that are difficult to spot if you have not been in a hub.