I really like the way Derek Parfit distinguishes between consequentialist and non-cosequentialist theories in 'Reasons and Persons'.
All moral theories give people aims. A consequentialist theory gives everyone the same aims (e.g. maximize total happiness). A non-consequentialist theory gives different people different aims (e.g. look after your own family).
There is a real important difference there. Not all moral theories are consequentialist.
Thanks for writing this up, this is a really interesting idea.
Personally, I find points 4, 5, and 6 really unconvincing. Are there any stronger arguments for these, that don't consist of pointing to a weird example and then appealing to the intuition that "it would be weird if this thing was conscious"?
Because to me, my intuition tells me that all these examples would be conscious. This means I find the arguments unconvincing, but also hard to argue against!
But overall I get that given the uncertainty around what consciousness is, it might be a good idea to use implementation considerations to hedge our bets. This is a nice post.
I think this is an interesting question, and I don't know the answer.
I think two quite distinct ideas are being conflated in your post though: (i) 'earning to give' and (ii) the GWWC 10% pledge.
These concepts are very different in my head.
'Earning to give': When choosing a career with the aim of doing good, some people should pick a career to maximize their income (perhaps subject to some ethical constraints), and then give a lot of it away to effective causes (probably a lot more than 10%). This idea tells you which jobs you should decide to work in.
GWWC pledge: Pretty much whoever you are, if you've got a decent income in a rich country, you should give 10% of it away to effective causes. This idea says nothing about which jobs you should be working in.
I think these two ideas are very different.
'Earning to give' gets a lot of criticism from people outside EA, but I don't see much criticism of the idea of donating 10% of your income. Sure, you can call the amount arbitrary and dispute the extent to which it is an obligation, but I think even major critics of EA often concede that the 10% pledge is still an admirable thing to do.
Thank you! This is exactly what I wanted to read!
Thanks for this reply! That makes sense. Do you know how likely people in the field think it is that AGI will come from just scaling up LLMs vs requiring some big new conceptual breakthrough? I hear people talk about this question but don't have much sense about what the consensus is among the people most concerned about AI safety (if there is a consensus).
I've seen people already building AI 'agents' using GPT. One crucial component seems to be giving it a scratchpad to have an internal monologue with itself, rather than forcing it to immediately give you an answer.
If the path to agent-like AI ends up emerging from this kind of approach, wouldn't that make AI safety really easy? We can just read their minds and check what their intentions are?
Holden Karnofsky talks about 'digital neuroscience' being a promising approach to AI safety, where we figure out how to read the minds of AI agents. And for current GPT agents, it seems completely trivial to do that: you can literally just read their internal monologue in English and see exactly what they're planning!
I'm sure there are lots of good reasons not to get too hopeful based on this early property of AI agents, although for some of the immediate objections I can think of I can also think of responses. I'd be interested to read a discussion of what the implications of current GPT 'agents' are for AI safety prospects.
A few reasons I can think of for not being too hopeful, and my thoughts:
A final thought/concern/question: if 'digital neuroscience' did turn out to be really easy, I'd be much less concerned about the welfare of humans, and I'd start to be a lot more concerned about the welfare of the AIs themselves. It would make them very easily exploitable, and if they were sentient as well then it seems like there's a lot of scope for some pretty horrific abuses here. Is this a legitimate concern?
Sorry this is such a long comment, I almost wrote this up as a forum post. But these are very uninformed naive musings that I'm just looking for some pointers on, so when I saw this pinned post I thought I should probably put it here instead! I'd be keen to read comments from anyone who's got more informed thoughts on this!
I really like this argument. I think there's another way of framing it that occurred to me when reading it, that I also found insightful (though it may already be obvious):
The lower bound this gives seems less strict (1/2 X/N in the case that p=1/2, instead of X/N), but it helps me understand intuitively why the answer has to come out this way, and why the value of contributing to voting is directly analogous to the value of contributing to, say, Parfit's water tank for the injured soldiers, even though there are no probabilities involved there.
If as a group you do something with value O(1), then the value of individual contributions should usually be O(1/N), since value (even in expectation) is additive.
Point taken, although I think this is analogous to saying: Counterfactual analysis will not leave us predictably worse off if we get the probabilities of others deciding to contribute right.
Thank you for this correction, I think you're right! I had misunderstood how to apply Shapley values here, and I appreciate you taking the time to work through this in detail.
If I understand correctly now, the right way to apply Shapley values to this problem (with X=8, Y=2) is not to work with N (the number of players who end up contributing, which is unknown), but instead to work with N', the number of 'live' players who could contribute (known with certainty here, not something you can select), and then:
Have I understood the Shapley value approach correctly? If so, I think my final conclusion still stands (even if for the wrong reasons) that a Shapley value analysis will lead to sub-optimal N (number of players deciding to participate). Since the optimal N here is 2 (or 1, which has same value).
As for whether the framing of the problem makes sense, with N as something we can select, the point I was making was that in a lot of real-world situations, N might well be something we can select. If a group of people have the same goals, they can coordinate to choose N, and then you're not really in a game-theory situation at all. (This wasn't a central point to my original comment but was the point I was defending in the comment you're responding to)
Even if you don't all have exactly the same goals, or if there's a lot of actors, it seems like you'll often be able to benefit by communicating and coordinating, and then you'll be able to improve over the approach of everyone deciding independently according to a Shapley value estimate: e.g. Givewell recommending a funding allocation split between their top charities.
Edit: Vasco Grilo has pointed out a mistake in the final paragraph of this comment (see thread below), as I had misunderstood how to apply Shapley values, although I think the conclusion is not affected.
If the value of success is X, and the cost of each group pursuing the intervention is Y, then ideally we would want to pick N (the number of groups that will pursue the intervention) from the possible values 0,1,2 or 3, so as to maximize:
(1-(1/2)^N) X - N Y
i.e., to maximize expected value.
If all 3 groups have the same goals, they'll all agree what N is. If N is not 0 or 3, then the best thing for them to do is to get together and decide which of them will pursue the intervention, and which of them won't, in order to get the optimum N. They can base their decision of how to allocate the groups on secondary factors (or by chance if everything else really is equal). If they all have the same goals then there's no game theory here. They'll all be happy with this, and they'll all be maximizing their own individual counterfactual expected value by taking part in this coordination.
This is what I mean by coordination. The fact that their individual approaches are different is irrelevant to them benefiting from this form of coordination.
'Maximize Shapley value' will perform worse than this strategy. For example, suppose X is 8, Y is 2. The optimum value of N for expected value is then 2 (2 groups pursue intervention, 1 doesn't). But using Shapley values, I think you find that whatever N is, the Shapley value of your contribution is always >2. So whatever every other group is doing, each group should decide to take part, and we then end up at N=3, which is sub-optimal.