In this post, I explore and evaluate an internal bargaining (IB) approach to moral uncertainty. On this account, the appropriate decision under moral uncertainty is the one that would be reached as the result of negotiations between agents representing the interests of each moral theory, who are awarded your resources in proportion to your credence in that theory. This has only been discussed so far by Greaves and Cotton-Barratt (2019), who give a technical account of the approach and tentatively conclude that the view is inferior to the leading alternative approach to moral uncertainty, maximise expected choiceworthiness (MEC). I provide a more intuitive sketch of how internal bargaining works, and do so in a wide range of cases. On the basis of the cases, as well as considering some challenges for the view and its theoretical features, I tentatively conclude it is superior to MEC. I close by noting one implication relevant to effective altruists: while MEC seems to push us towards a (fanatical) adherence to longtermism, internal bargaining would provide a justification for something like worldview diversification. 
Notes to reader: (1) I’m deliberately writing this in a fairly rough-and-ready way rather than as a piece of polished philosophy. If I had to write it as the latter, I don’t think it would get written for perhaps another year or two. I’ll shortly begin working on this topic with Harry Lloyd, an HLI Summer Research Fellow, and I wanted to organise and share my thoughts before doing that. In the spirit of Blaise Pascal, I should say that if I had had more time, I would have written something shorter. (2) This can be considered a ‘red-team’ of current EA thinking.
When philosophers introduce the idea of moral uncertainty - uncertainty about what we ought, morally, to do - they often quickly point out that we are used to making decisions in the face of empirical uncertainty all the time.
Here’s the standard case of empirical uncertainty: it might rain tomorrow, but it might not. Should you pack an umbrella? The standard response to this is to apply expected utility theory: you need to think about the chance of it raining, the cost of carrying an umbrella, and the cost of getting wet if you don’t carry an umbrella. Or, more formally, you need to assign credences (strengths of belief) and utilities (numbers representing value) to the various outcomes.
Hence, when it’s pointed out that we’re also often uncertain about what we ought to do - should we, for example, be consequentialists or deontologists? - the standard thought is that our account of moral uncertainty should probably work much like our account of empirical uncertainty. The analogous account for moral uncertainty is called maximise expected choiceworthiness (MEC) (MacAskill, Bykvist, and Ord, 2020). The basic idea is that we need to assign credences to the various theories as well as a numerical value on how choiceworthy the relevant options are on those theories. The standard case to illuminate this is:
Meat or Salad: You are choosing whether to eat meat from a factory farm or have a salad instead. You have a 40% credence in View A, a deontological theory on which eating meat is seriously morally wrong, and a 60% credence in View B, a consequentialist theory on which both choices are permissible. You’d prefer to eat meat.
Intuitively, you ought to eat the salad. Why? Even though you have less credence in A than B, when we consider the relative stakes for each view, we notice that View A cares much more about avoiding the meat. Hence, go for the salad as that maximises choiceworthiness.
MEC is subject to various objections (see MacAskill, Bykvist, and Ord, 2020 for discussions and ultimately a defence of the view). First, the problem of intertheoretic comparisons: there is no broadly accepted way to make comparisons of choiceworthiness across theories. For instance, a deontological theory might say murder is morally worse than theft, but say nothing about how much worse the former is; how would we compare this against a utilitarian theory that compares outcomes in units of welfare losses and gains?
Second, MEC also leads to problematic forms of fanaticism, where a theory in which one has very low credence, but according to which there is an enormous amount at stake, can dictate what one should do. For instance, consider:
Lives or Souls: You’re about to donate to the Against Malaria Foundation and you know that your donation would save one child’s life in expectation. Someone then shows you evidence that the Against Hell Foundation reliably converts one person to a certain religion and thereby saves their soul from eternal damnation. You have almost no credence in that religion, but know that saving a soul is, on that religion, incomparably more valuable than saving one life. You realise that MEC implies you ought to give to the Against Hell Foundation instead, but this seems wrong.
Relatedly, MEC is worryingly fanatical not only about what you should do, but how much you should do (MacAskill, Bykvist, and Ord, 2020, chapter 2). Consider:
Partial Singerian: You have a 10% credence in the view advocated by Peter Singer that citizens of rich countries ought to give a large proportion of their resources to help those in poverty. The remainder of your credence is in common-sense views of morality on which doing this, whilst laudable, is not required.
What does MEC conclude? Well, the Singerian view holds there is a lot at stake if you give - think of all the lives you could save! - so it seems that you are pressed to accept a very demanding moral theory anyway. What seems odd and objectionable about MEC is the way that, in a certain sense, one small part of you can ‘bully’ the rest of you into doing what it wants if it cares enough.
Are there any alternatives to MEC? The other commonly-discussed approach is My Favourite Theory (MFT), which tells you to follow the moral theory you think is most likely to be true. But MFT seems worse than MEC.
First, MFT is vulnerable to individuation, how theories are divided up (Gustafsson and Torpman, 2014, section 5; MacAskill and Ord, 2018, pp.8-9). Note that for Meat or Salad, MFT currently recommends you choose meat as you have 60% in that theory. However, you then realise there are two versions of consequentialism (e.g. act-consequentialism and rule-consequentialism) and you have a 30% credence in each. Now MFT tells you to eat the salad. So, to function at all, MFT needs a theory of option-individuation that is robust to these sorts of objections.
Second, even supposing such a theory could be found, MFT is also insensitive to the stakes. If you had a 51% credence in a theory on which X is slightly better than Y, and a 49% credence in another theory on which X was enormously worse than Y, it would still recommend X.
What’s more, neither MEC nor MFT can account for the strong intuition many people have that, in the face of our moral uncertainty, we should sometimes ‘split the pot’ and diversify our resources across different options. Prominently, Holden Karnofsky of Open Philanthropy has advocated for worldview diversification, which involves “putting significant resources behind each worldview [one] considers plausible”. As a motivating case, consider:
Poverty or Robots: You have 60% of your credence in a total utilitarian moral theory on which the greatest priority is preventing existential risks to humanity, such as those posed by unsafe artificial intelligence. But you have 40% of your credence in the person-affecting view on which morality is about “making people happy, not making happy people” (Narveson, 1967) and that, according to this view, helping those in global poverty is the priority. On the total utilitarian theory, using money to reduce poverty is good, but hundreds of times less cost-effective than funding existential risk.
MEC and MFT will both recommend that all the resources go towards existential risk reduction. In fact, MEC would still recommend this even if you only had 1% credence in the Total Utilitarian view (cf. Greaves and Ord, 2019). But many people feel the right thing to do is, nevertheless, to split your resources rather than give the whole pot to either cause.
That we don’t have a justification for ‘worldview diversification’ is a problem. Many people practice worldview diversification and seem to believe it is an appropriate response to moral uncertainty. At the very least, we want an account of moral uncertainty that could make sense of this.
Those are some challenges for both MEC and MFT. I will now suggest an approach to moral uncertainty that might do better. This has been introduced by Greaves and Cotton-Barratt (2019) who call it bargaining-theoretic, but I prefer, and will use, the more intuitive internal bargaining (IB). As noted above, on this approach, decision-making under moral uncertainty is modelled as if it were a case of bargaining between different parts of the agent, where each sub-agent is committed to a moral view and resources are awarded to sub-agents in proportion to the credence the agent has in those views. In other words, we can say the appropriate response to ethical uncertainty is what is decided in the moral marketplace.
Greaves and Cotton-Barratt (2019) provide a technical treatment of how this view would work which draws on bargaining theory in economics. It is sufficiently technical that I expect only mathematically sophisticated readers will come away from it feeling confident they understood how internal bargaining would work and are able to form a view on how plausible an approach it is. Here, I aim to provide a commonsense account of roughly how the view works by relying on nothing more than our ordinary intuitions about bargaining. Because we are so used to bargaining in ordinary life, this should carry us a long way. Hence, I don’t aim to say exactly how the view works - that requires the sort of further detail in Greaves and Cotton-Barratt (2019). I’m not sure I have a substantially different view of how internal bargaining would play out - although it’s possible I’ve misread their account. However, I am more enthusiastic about how suitable internal bargaining is as a response to moral uncertainty.
2. What would happen in the moral marketplace?
Here’s a familiar scenario. You and a friend are deciding where to go for dinner. You prefer Italian food. They prefer Indian food. But you both prefer to spend time with each other. What do you do? Maybe you compromise on a third option, Mexican. Maybe you take it in turns so each of you can go to your preferred option. But maybe there’s no compromise that works for you both and so you each do your own thing. This is deeply familiar and very intuitive and, to give it a fancy term, it is interpersonal bargaining. We engage in interpersonal bargaining in our personal lives, our work lives, in politics, and so on.
Suppose we conceive of moral uncertainty as the result of intrapersonal bargaining between different theories- I call this internal bargaining. You divide yourself up into sub-agents, each of which is fully committed to a moral theory. You then allocate your resources - your money and time - to these sub-agents in proportion to how strongly you believe in each and do what your sub-agents collectively decide.
To see how this might work I’m proposing a taxonomy of the various scenarios and what would happen in each. For simplicity, I’ll only consider that the agent has credence in two moral theories, although I don't think this changes the results.
I’ll go through these one at a time, starting with three scenarios where you can divide the resources between your sub-agents. What will differ is whether the sub-agents have convergent priorities (you and I want the same thing), conflicting priorities (we want different things) or unrelated priorities (it doesn’t make any difference to me if you get more of what you want and vice versa). I’ll then consider what happens if resources aren’t divisible.
Table 1: Taxonomy of moral uncertainty scenarios
Intrapersonal moral trade
Intrapersonal moral trade
Or, perhaps, overpowering
Split the pot
(aka worldview diversification)
Note: There is no ‘indivisible resources, unrelated priorities’ scenario. If resources aren’t divisible, then inevitably one agent getting their preferred option is at the cost of the other achieving their preferred option.
Divisible resources, convergent priorities
Example: Charitable Consensus
Both moral views agree on which charity would do the most good.
All the resources go to one charity. There are no disagreements or any need for bargaining.
Divisible resources, conflicting priorities
Example: More or Fewer
On theory A, life-saving interventions are the priority. On theory B, the Earth is overpopulated and the priority is planning interventions to reduce the population size. However, both A and B each think that funding a third, life-saving intervention, e.g. alleviating poverty, is nearly as effective as their own top choice. You have equal credences in theories A and B.
Result: Intrapersonal moral trade
The sub-agents realise that, if they each pursue their own preferred option, they will effectively cancel each other out - A would increase the total population and B would reduce it. Hence, by the lights of their own theory, they would each prefer it if they collectively funded poverty reduction, so that’s what they choose to do.
(In fact, I’m oversimplifying here. It won’t always be the case that agents agree to engage in moral trade. That will depend on their relative resources and how good they think the available options are. For instance, if theory A has £1,000 and theory B £1, then theory A might prefer putting all its money towards saving lives, and having B slightly counteract that money, than agreeing to a compromise. In this case, the agents will split the pot, as I elaborate on in the next example.)
Divisible resources, unrelated priorities
Example: Poverty or Robots (as given above)
You have 60% of your credence in a total utilitarian moral theory on which the greatest priority is preventing existential risks to humanity, such as those posed by unsafe artificial intelligence. But you have 40% of your credence in the person-affecting view on which morality is about “making people happy, not making happy people” (Narveson, 1967) and that, according to this view, helping those in global poverty is the priority. On the total utilitarian theory, using money to reduce poverty is good, but hundreds of times less cost-effective than funding existential risk.
In this case, we might suppose, the sub-agents have effectively unrelated priorities: money to existential risk doesn’t really impact poverty and vice versa. What’s more, the sub-agents can’t find any scope for moral trade with the other: they each conclude that the best option, by their own lights, would be to do their own thing.
Result: Split the pot (aka worldview diversification)
Each sub-agent allocates all of its resources to its preferred option. In other words, the outcome is effectively worldview diversification, so we’ve identified a straightforward way of justifying this.
There are a couple of other observations to make here. Because IB leads to pot-splitting when theories disagree about the priority, it also offers a very natural way to resist fanaticism. Recall Lives or Souls earlier, where you had a tiny credence in a view on which saving souls was the priority. Yet, because you only have a tiny credence on that view, that sub-agent only gets allocated a trivial amount of resources. Hence, IB is non-fanatical: it doesn’t ignore fanatical views altogether, but these have little impact in decision-making precisely because resources are awarded in proportion to credences and the agent has so little credence in them. This strikes me as a serious advantage of IB over MEC. If we take MEC seriously, we’d seemingly need to account for all sorts of ‘weird’ fanatical views. IB safely contains them. I’ll return to whether this lack of fanaticism may be a problem in a later section.
For similar reasons, IB also seems able to defuse the issues that MEC faces about demandingness. Recall the Partial Singerian case from before, where you have a 10% credence in the view you should give away your resources so long as they will make others better off, and a 90% credence in common-sense morality. There’s no obvious bargain to be struck here, so we might imagine they would just allocate their own share of the pot as they see fit. Roughly then, the Singerian sub-agent would give away all of their 10% share and the common-sense agent a little bit of theirs, with the result that the person ends up giving around (perhaps a bit over) 10% to charity.
Similarly, this seems a good response to the demandingness of morality, and far more palatable than the response given by MEC or MFT. The former pushes us to give all our spare resources away, even if we have little credence in the Singerian theory. The latter cannot account for the non-trivial Singerian belief that we ought to do as much as we can to help others. What’s more, IB is sensitive - perhaps even respectful - of our credence: those who are very sympathetic to the Singerian view that morality is demanding will still conclude they should do lots, but those who are less sympathetic are not pushed to do so.
Let’s now turn to cases where you don’t seem to be able to divide resources between the sub-agents. Again, cases can vary by how aligned the sub-agents’ priorities are.
Indivisible resources, convergent priorities
Example: Cake or Death
A runaway trolley is on course to run over someone tied to a track. You can pull a lever to switch the trolley to another line, but if you do, it will squash a nice piece of chocolate cake you were hoping to eat. Both moral theories agree you ought to pull the lever.
Result: Unity - you pull the lever
The salient difference, in contrast to the cases above, is that you can’t split your resources; you have to choose between the options. However, as with Charitable Consensus, this is straightforward because both parties agree.
Indivisible resources (at a time), conflicting priorities
Example: Meat or Salad
You are choosing whether to eat meat from a factory farm or have a salad instead. You have a 40% credence in View A on which eating the meat is seriously morally wrong and a 60% credence in View B on which each choice is permissible. You’d prefer to eat meat.
This is a more interesting case for internal bargaining and I’ll consider four ways this could go.
1. Try to split the pot
An initial thought, taking inspiration from Poverty or Robots, is that you should try to split the pot: you could ask the restaurant to give you 40% of a normal meat order and 60% of a normal salad order. But, leaving aside how impractical this is, it seems to get the wrong answer: on View A, this is still much worse than if you’d only had salad, but only barely better on View B. This is a poor compromise because it is insufficiently sensitive to the stakes.
Another possible option would be to use a lottery, e.g. there’s a 40% chance that View A wins and you eat the salad and a 60% chance that View B wins and you eat the meat. This is similarly unappealing, again because it doesn’t account for the stakes.
3. Intrapersonal bargaining over time
However, a more creative option, along the lines of More or Fewer is open: the sub-agents can strike a bargain, but in this case, they do so over time (aka intrapersonal intertemporal moral trade). To see this, let’s first consider how an interpersonal version of this case could play out:
Vegan friend: Two friends would like to meet for dinner. A has a really strong preference for vegan food, and will only eat at vegan restaurants. B would prefer to eat meat, but really isn’t that fussed either way.
Plausibly, the real-life bargain is that they would agree to keep meeting but only do so at vegan restaurants. And that could be the end of the story. However, we can also imagine that B would negotiate for something in return: “Hey, we always go to the restaurants you want - and that’s fine, you know I don’t care what I eat. But next time we go out for drinks, I’m choosing the bar”. Exactly whether and what B negotiates for in return, and what they end up agreeing to, will depend on what A and B each care about.
Hence, in Meat or Salad, we can imagine, metaphorically speaking, that View A would protest to B about how it really matters to them that you end up ordering the salad. Plausibly, B would agree that they order the salad, but negotiate for something in return at a later date.
What might B negotiate for? Admittedly, this is much less intuitive to think about in cases of moral theories than people, but it’s not impossible to sketch a result that they would agree to. Suppose theory A is a deontological theory on which there is a very strong duty not to harm others, but a relatively weak duty to benefit others (i.e. of beneficence). Theory B is classical utilitarianism. Potentially, the bargain would be that you don’t eat meat but, in return, you end up doing more charity work than you otherwise would have.
A key point to note is that internal bargaining - like MEC - is able to get the intuitively right result about what to do in Meat or Salad. It does this by allowing each sub-agent to be sensitive to the stakes according to their own theory and then bargain to get more of what they want. It tells a different story of how moral uncertainty works and requires us to be a bit inventive in thinking about how internal bargaining might play out.
However, there’s (at least) one more option. If we’re thinking about ordinary interpersonal bargaining, a live option is for the stronger party to force the weaker to do what it wants. To extend the metaphor, we might suppose the ‘stronger’ theory, i.e. the one if which you have more credence, could just ‘overpower’ the weaker one and pick its preferred option. In this case, IB would function much like My Favourite Theory (MFT) and theory B would ‘win’ and you would eat the meat.
This raises the question of whether IB would function like MFT all of the time. If so, that would be a poor result for IB, given already noted problems with MFT. It seems we should say something like this: we suppose your sub-agents are entitled to their ‘share’ of your current and future resources - your resources being your money and time. Hence, while theory B could perhaps impose their will in this situation, the foreseeable result is that, in turn, A won’t cooperate with B in some later situation where B would really want A’s resources. Therefore, B may well conclude it’s in their own best interest to ‘play nice’: whilst they’d rather eat the meat and could insist on it, because they really don’t mind, they strike a bargain with A and ask for something in return. As such, overpowering may be more of the exception than the rule. 
When would overpowering occur? Perhaps unsurprisingly, a moral theory will get overpowered if you have low credence in it and it wants something different from what the rest of your agents want. In others, it is precisely fanatical theories that will get overpowered: it seems we use the term fanatical to describe a theory that you have low credence in and attaches very high stakes to some option that the theories you have most credence in disprefer. Given this, it might be better to describe IB as anti-fanatical, rather than merely non-fanatical.
I don’t think it would be useful to discuss at much greater length how the bargaining could play out: it is only metaphorical and there is scope to ‘tweak’ the metaphor and get different outcomes - of course, we would want to try to provide some rationale for these tweaks. 
3. Problems and open questions
IB seems to have handled the cases well so far - there wasn’t a scenario where it clearly got the intuitively ‘wrong answer’. Let’s turn to problems and open questions for the view. I’ll only discuss these briefly; it’s beyond the scope of this essay to solve them.
Can IB account for the value of moral information?
In cases of empirical uncertainty, the value of information is the amount a decision-maker would pay to get closer to the truth. For instance, an ice-cream seller might be prepared to pay for a weather forecast: if he knows it will rain, he might do something else instead. Importantly, information only has value if it improves the quality of your future decision.
When it comes to morality, many people have the view that studying ethics is useful in order to make them more confident about what they ought to do. But IB doesn’t seem able to account for the possibility there is value of moral information (MacAskill, Bykvist and Ord, 2020, ch. 9). To see this, consider:
Wise Philanthropist: You are 40% confident in theory A (on which AI safety is the priority) and 60% confident in theory B (on which poverty alleviation is the priority). You could decide your future allocation of resources now. Or you could study more moral philosophy. If you study more, you suspect you will end up 50% confident in each theory.
Recall that the sub-agents are certain in their theory and resources are allocated in proportion to the agent’s credences. Given this, we can see that A would be wholly against studying - they stand only to lose - and B wholly for it - they stand only to gain. 
There are a couple of puzzles here. First, what would the agent do? Would A metaphorically overpower B and stop you from hitting the books? Or would A use the time it has allocated to pursue A’s priority, whereas B would use some of its time allocation to study philosophy? As we’ve seen, bargaining is complicated.
Second, how can we capture the idea that we are able to gain moral information if all the sub-agents are already certain? Perhaps we should extend the metaphor by proposing the existence of some undecided sub-agents: the value of moral information relates to them forming a view.
How grand are the bargains?
Bargaining gets more complicated the more we try to account for it. Consider:
Road Ahead: 40% credence in theory A on which reducing existential risk is the priority, and 60% in theory B on which the priority is alleviating poverty. You are a student and face two choices: you can opt for a career in either AI safety research or international development. You can also donate your spare resources to AI safety or poverty alleviation. On theory A, both your money and time are five times more valuable if put towards AI safety rather than poverty. On theory B, your resources towards poverty are twice as valuable.
We could take these choices in isolation in which case, naively, you split the pot with both your donations and your career, i.e. in the latter case you spend 60% of your career in international development and 40% in AI safety. However, splitting one’s career seems impractical: if you try to split your career, you are unlikely to have much success in either field.
That suggests the sub-agents may well prefer to be able to negotiate both at the same time. In this case, you might agree to specialise in one career but then donate considerably more to the other cause, a quite different outcome.
Greaves and Cotton-Barratt (2019) identify a challenge for IB: what we choose may depend on whether we consider a ‘small world’ (a simple set of options) or a more complicated ‘grand world’. Whilst the maximally grand world is, in principle, the appropriate one to consider, this is often impractical. Hence, we should want our decision theory to give approximately the same answer in the small world context as it would in the large one.
This doesn’t strike me as a reason against, in principle, using an IB approach to moral uncertainty, so much as a reminder that decision-making is complicated in practice. After all, even if we are morally certain, planning our future lives is already complicated; it doesn’t seem to follow that we should give up and follow simplistic rules.
Fanaticism and the analogy to empirical uncertainty
Two further possible objections to IB are that it is insufficiently fanatical and it fails to be analogous to empirical uncertainty. I’ll take these together and I suspect they are related.
MacAskill, Bykvist, and Ord (2020) respond to the accusation that MEC is objectionably fanatical by pointing out that fanaticism is equally a problem for expected utility theory in empirical uncertainty. If we are happy with expected utility theory - despite its fanatical results - then, by extension, we should be equally happy with MEC. So, fanaticism is an intuitive cost, but it’s one we should expect to pay. Greaves and Cotton-Barratt (2019) similarly state it’s unclear if fanaticism is objectionable or can be reasonably avoided.
A couple of quick replies. Firstly, most people seem to think that fanaticism is a problem, both for MEC and expected utility theory. Hence, insofar as one finds fanaticism problematic, it is an advantage for IB that it avoids fanaticism.
Second, if moral uncertainty and empirical uncertainty are analogous, then we should expect equivalent theories in each case. But, how confident should we be that this is the right analogy? Perhaps they are relevantly different. Here's a case that may motivate this. Consider two choices:
Lives: You can pick (A) to save 1 life for certain, or (B) a 1 in a million chance of saving 1 billion lives.
Life or Soul: You can pick (A) to save 1 life for certain, or (B) a 100% chance to convert someone to an unspecified religion. According to this religion, converting someone saves their soul and is equivalent to saving 1 billion lives. You assign a probability of 1 in a million that the claims of this religion are true.
On the surface, we have one 'safe bet' option and one fanatical, i.e. low probability, high stakes option. However, it doesn't seem a mistake to choose (B) in Lives, but it does to pick (B) in Life or Soul. Yet, MEC would treat these as structurally equivalent, with the latter choice being 1,000 times as choiceworthy. Intuitively, there is a difference between cases where there is (a) a low probability of a high payoff where that payoff certainly has value and (b) certainty of a payoff that you think is very likely to have no value. There seems something additional troubling about choosing the fanatical option in the face of moral uncertainty. IB is able to avoid this.
What about regress?
Once we realise we are uncertain about morality, we face an apparent challenge of infinite regress: presumably, we should be uncertain in our theory of moral uncertainty too. What would we do then?
I’m not sure if IB helps with this worry. IB makes sense of moral uncertainty by saying we need to distribute our resources to internal sub-agents who are certain in their view. It doesn’t make sense to then ask what the sub-agents should do if they are uncertain: after all, we’ve stipulated that they are. However, when considering the options for moral uncertainty, your credence will still be somewhat split between IB, MFT, and MEC (and whatever else is on the table) as the ways to resolve moral uncertainty.
4. Taking stock of internal bargaining
Now we’ve got a sense of how IB would function in a range of scenarios and considered some problems, we can take more of a view of how plausible it is as an approach to moral uncertainty. I think most people would agree these are the desirable theoretical features, even if they disagree about their relative importance:
- Stake sensitivity (decisions can change if stakes change)
- Credence sensitivity (decisions can change if credences change)
- Does not require intertheoretic comparisons of value
- Non-fanatical (decisions not dictated by low-credence, very high-stake theories)
- Robustness to individuation (decisions not changed by how moral theories are individuated)
- Provides a justification for worldview diversification (beyond non-moral considerations e.g. diminishing marginal returns)
- Accounts for the value of moral information
- Avoids regress
Hence, if we put our three contender theories - MFT, MEC, and IB - side by side, it seems they would look something like this. I accept that I haven’t explained all these fully here.
|Intertheoretical value comparisons unnecessary||✓||x||✓|
|Robustness to individuation||x||✓||✓|
|Justifies worldview diversification||x||x||✓|
|Accounts for the value of moral information||?||✓||?|
On this basis, internal bargaining looks like an appealing option and ‘scores’ better than the main alternative, MEC.
5. A practical implication
What difference would it make in practice to adopt IB, rather than the alternatives of MFT or MEC? It’s not easy to say much about this: the specifics will obviously depend on how plausible someone finds different moral theories, their views on the empirical facts, and the details of how the different approaches to moral uncertainty are implemented.
That said, I do want to draw out and close on one practical implication we’ve noticed in passing already. One important recent idea within effective altruist thinking is longtermism, the view that improving the long-term future is the moral priority (Greaves and MacAskill, 2021). What’s the case? To quote Moorhouse (2021):
Three ideas come together to suggest this view. First, future people matter. Our lives surely matter just as much as those lived thousands of years ago — so why shouldn’t the lives of people living thousands of years from now matter equally? Second, the future could be vast. Absent catastrophe, most people who will ever live have not yet been born. Third, our actions may predictably influence how well this long-term future goes. In sum, it may be our responsibility to ensure future generations get to survive and flourish.
If longtermism is true, it would seem to imply that efforts to reduce existential risk, e.g. from unsafe AI are a higher priority than efforts to help people alive today, e.g. by reducing poverty.
At first glance, a key part of the argument is, speaking roughly, the moral priority we should attach to people alive today compared to future lives. After all, people alive today actually exist and we can make them better off. But future people may never exist, and it’s puzzling to think that we really benefit them by creating them. Hence - and again very roughly - we might assume that longtermism won't get off the ground if we don't give future people that much weight.
However, it would seem that, according to MEC, these doubts are practically irrelevant: after all, surely we have some credence in a Total Utilitarian view and, on this view, we value present lives as much as merely possible future lives. Given how many lives there could be in the future and the fact we can, presumably, affect the long-term, it seems we end up being pushed to longtermism even if we only have a tiny credence in such a view (cf. Greaves and Ord, 2017). Therefore, we should abandon all our other, non-longtermist altruistic projects. This will strike many as objectionably fanatical.
Saliently, internal bargaining does not seem to get this result. Instead, it seems the appropriate response is to engage in worldview diversification and split one’s resources: you should commit your resources to longtermism in proportion to the credence you have in moral views on which longtermism is true, and the rest to non-longtermism causes. More generally, and very roughly, it suggests that you should commit your resources to a given cause in proportion to the credence you have in moral views on which that cause is the priority. At least, you should do this if the priorities of those theories are unrelated in the sense described earlier. If the moral theories have conflicting priorities, then intrapersonal moral trade or overpowering are live options - which occurs would depend on the details. As it happens, if you look at what effective altruists actually prioritise, i.e. existential risk reduction, global poverty, and factory farming, these do seem like unrelated priorities: it's not, e.g. that reducing factory farming makes much difference to the risk of rogue AI and vice versa.
I should stress two further points. First, IB would not prevent someone from committing (nearly) all their resources to longtermism if they have (nearly) all their credence in moral views on which it is the priority. IB merely avoids the fanatical result that everyone, almost no matter what their beliefs, should commit all their resources to it. Second, and conversely, IB may nevertheless imply that even people who are not enthusiastic about longtermism, i.e. they only have a small credence in moral views on which it is the priority, should allocate some of their resources on the basis of it. Hence, IB in itself does not provide grounds to ignore longtermism altogether - longtermists may consider this a victory of sorts. I leave it to further work to consider exactly how internal bargaining may play out here.
I'd like to thank Caspar Kaiser, Samuel Dupret, Barry Grimes, and Patrick Kaczmarek for their feedback. As ever, all remaining errors are Patrick's (cf. Plant 2019, p4)
Now, there is a justification for diversifying that does not appeal to moral uncertainty. Suppose you think that the interventions exhibit diminishing marginal returns, so you do the most good by funding one intervention and then switching to another. The total utilitarian might believe that spending a little bit of money on AI safety research goes a long way, but once all the low-hanging fruit has been picked it’s more impactful to give to poverty. Of course, the facts might be different. You could think that spending extra money on existential risk reduction is always going to do more good than spending money to reduce poverty. In this case, the total utilitarian would tell you not to diversify.
But notice that this discussion of whether to diversify makes no reference to moral uncertainty and takes place within a single moral theory: we might call this intra-worldview diversification. What we don’t yet have is a justification for splitting our resources based on uncertainty about morality, what we can call inter-worldview diversification. I take it that when most people appeal to worldview diversification they have the ‘inter’ version in mind, and this is how I will use the term here.
It also safely handles the challenge of infectious comparability (MacAskill, 2013)
IB behaves like MFT because there are only two theories at play. If we accounted for the fact that our credence could be split amongst many theories, then the metaphorical result would be the option with the largest proportion of sub-agents in favour would be the one that got chosen. In this case, IB would work like an alternative approach to moral uncertainty in the literature called My Favourite Option that I have not raised so far. The obvious objection to this approach is that it is insensitive to the stakes.
This mirrors the result that in a one-shot Prisoner’s Dilemma, it is rational for each party to defect, but that in a repeated Prisoner’s Dilemma, the rational strategy is to cooperate.
As Greaves and Cotton-Barratt (2019) point out, in ordinary bargaining theory, a disagreement point represents what would happen if the bargaining parties cannot agree. However, when it comes to intrapersonal bargaining, because the bargaining is only metaphorical, there is no clear empirical matter of fact about what the disagreement point would be.
Patrick Kaczmarek makes the interesting suggestion that agents will also be motivated to acquire non-moral information that convinces the other agents to support their existing preferences. This is analogous to the way people will often seek information that will convince others to agree with them.
I'm happy to see more discussion of bargaining approaches to moral uncertainty, thanks for writing this! Apologies, this comment is longer than intended -- I hope you don't mind me echoing your Pascalian slogan!
My biggest worry is with the assumption that resources are distributed among moral theories in proportion to the agent's credences in the moral theories. It seems to me that this is an outcome that should be derived from a framework for decision-making under moral uncertainty, not something to be assumed at the outset. Clearly, credences should play a role in how we should make decisions under moral uncertainty but it's not obvious that this is the right role for them to play. In Greaves and Cotton-Barratt (2019), this isn't the role that credences play. Rather, credences feed into the computation of the asymmetric Nash Bargaining Solution (NBS), as in their equation (1). Roughly, credences can be thought to correspond to the relative bargaining power of the various moral theories. There's no guarantee that the resulting bargaining solution allocates resources to each theory in proportion to the agent's credences and this formal bargaining approach seems much more principled than allocating resources in proportion to credences, so I prefer the former. I doubt your conclusions significantly depend on this but I think it's important to be aware that what you described isn't the same as the bargaining procedure in Greaves and Cotton-Barratt (2019).
I like how you go through how a few different scenarios might play out in Section 2 but while I think intuition can be a useful guide, I think it's hard to how things would play out without taking a more formal approach. My guess is that if you formalised these decisions and computed the NBS that things would often but not always work out as you hypothesise (e.g. divisible resources with unrelated priorities won't always lead to worldview diversification; there will be cases in which all resources go to one theory's preferred option).
I'm a little uncomfortable with the distinction between conflicting priorities and unrelated priorities because unrelated priorities are conflicting once you account for opportunity costs: any dollars spent on theory A's priority can't be spent on theory B's priority (so long as these priorities are different). However, I think you're pointing at something real here and that cases you describe as "conflicting priorities" will tend to lead to spending resources on compromise options rather than splitting the pot, and that the reverse is true for cases you describe as "unrelated priorities".
The value of moral information consideration is interesting. It should be possible to provide a coherent account of the value of moral information for IB because the definition of the value of information doesn't really depend on the details of how the agent makes a decision. Acquiring moral information can be seen as an act/option etc. just like any other and all the moral theories will have views about how good it would be and IB can determine whether the agent should choose that option vs other options. In particular, if the agent is indifferent (as determined by IB) between 1. acquiring some moral information and paying $x and 2. not acquiring the information and paying nothing, then we can say that the value of the information to the agent is $x. Actually computing this will be hard because it will depend on all future decisions (as changing credences will change future bargaining power), but it's possible in principle and I don't think it's substantially different to/harder than the value of moral information on MEC. However, I worry that IB might give quite an implausible account of the value of moral information, for some of the reasons you mention. Moral information that increases the agent's credence in theory A will give theory A greater bargaining power in future decisions, so theory A will value such information. But if that information lowers theory B's bargaining power, then theory B will be opposed to obtaining the information. It seems likely that the agent will problematically undervalue moral information in some cases. I haven't thought through the details of this though.
I didn't find the small vs grand worlds objection in Greaves and Cotton-Barratt (2019) very compelling and agree with your response. It seems to me to be analogous to the objections to utilitarianism based on the infeasibility of computing utilities in practice (which I don't find very compelling).
On regress: perhaps I'm misunderstanding you, but this seems to me to be a universal problem in that we will always be uncertain about how we should make decisions under moral uncertainty. We might have credences in MFT, MEC and IB, but which of these (if any) should we use to decide what to do under uncertainty about what to do under moral uncertainty (and so on...)?
I think you have a typo in the table comparing MFT, MEC and IB: MEC shouldn't be non-fanatical. Relatedly, my reading of Greaves and Cotton-Barratt (2019) is that IB is more robust to fanaticism but still recommends fanatical choices sometimes (and whether it does so in practice is an open question), so a tick here might be overly generous (though I agree that IB has an advantage over MEC here, to the extent that avoiding fanaticism is desirable).
One concern with IB that you don't mention is that the NBS depends on a "disagreement point" but it's not clear what this disagreement point should be. The disagreement point represents the utilities obtained if the bargainers fail to reach an agreement. I think the random dictator disagreement point in Greaves and Cotton-Barratt (2019) seems quite natural for many decision problems, but I think this dependence on a disagreement point counts against bargaining approaches.
I don't think I understand the thinking here. It seems fairly natural to say "I am 80% confident in theory A, so that gets 80% of my resources, etc.", and then to think about what would happen after that. It's not intuitive to say "I am 80% confident in utilitarianism, that gets 80% 'bargaining power'". But I accept it's an open question, if we want to do something internal bargaining, what the best version of that is.
I do mention the challenge of the disagreement point (see footnote 7). Again, I agree that this is the sort of thing that merits further inquiry. I'm not sold on the 'random dictator point', which, if I understood correctly, is identical to running a lottery where each theory has a X% chance of getting their top choice (where X% represents your credence in that theory). I note in part of section 2 that bargaining agents will likely think it preferable, by their own lights, to bargain over time rather than resolve things with lotteries. It's for this reason I'm also inclined to prefer a 'moral marketplace' over a 'moral parliament': the former is what the sub-agents would themselves prefer.
Hello Aidan. Thanks for all of these, much food for thought. I'll reply in individual comments to make this more manageable.
I didn't really explain myself here, but there might be better vs worse regress problems. I haven't worked out my thoughts enough yet to write something useful.
Agree the distinction would be tightened up. And yes, important bit seems to be whether agents will just 'do their own thing' vs consider moral trade (and moral 'trade wars')
I don't really disagree. However, as I stated, my purpose was to give people 'a feel' for the view I doubt they would get from Greaves and Cotton-Barrett's paper (and I certainly didn't get when I did). The idea was to sketch a 'quick-and-dirty' version of the view to see if it was worth doing with greater precision.
There's a bunch of work in cognitive science on "virtual bargaining" -- or how people bargain in their head with hypothetical social partners when figuring out how to make social/ethical decisions (see https://www.sciencedirect.com/science/article/pii/S1364661314001314 for a review; Nick Chater is the person who's done the most work on this). This is obviously somewhat different from what you're describing, but it seems related -- you could imagine people assigning various moral intuitions to different hypothetical social partners (in fact, that's kind of explicitly what Karnofsky does) in order to implement the kind of bargaining you describe. Could be worth checking out that literature.
Oh wow, that is a really great paper! Thank you very much for linking it.
There are slides here by Greaves on the IB approach, which she describes as a moral parliament approach. Newberry and Ord, 2021, more recently, develop and discuss multiple parliamentary approaches.
Allocating proportionally with credences is in particular very similar to the proportional chances voting parliamentary approach, preferred and defended by Newberry and Ord, 2021, and according to which votes in a fictional parliament are distributed in proportion to our credences in each theory, voters can bargain and trade, and the voter (theory) to decide on a motion is selected randomly in proportion to the votes. Furthermore, if we consider voting on what to do with each unit of resources independently and resources are divisible into a large number of units, proportional chances voting will converge to a proportional allocation of resources.
To prevent minority views from gaining control and causing extreme harm by the lights of the majority, Newberry and Ord propose imagining voters believe they will be selected at random in proportion to their credences and so they act accordingly, compromising and building coalitions, but then the winner is just chosen by plurality, i.e. whichever theory gets the most votes. I wouldn't endorse this, and would instead recommend looking for alternatives.
I also hadn't seen these slides, thanks for posting! (And thanks to Michael for the post, I thought it was interesting/thought-provoking.)
Ah, I wasn't aware of the slides. I actually had a section discussing the distinction between IB and moral parliaments which I cut at the last minute because I wasn't sure what to say. In the Greaves and Cotton-Barrett paper, they make no reference to moral parliaments. Newberry and Ord (2021) cite Greaves and Cotton-Barrett but don't explain what they take the distinction between the moral parliament and bargaining theoretic approach to be.
Thinking about it now, both approaches capture the intuition we should imagine what would happen if we distributed resources to representatives of moral views and then some bargaining occurs. The differences appear to be in the specifics: do we allow agents to bargain over time vs only at an instance? Do we allow grand bargains vs restrict the discussion to a limited set of issues? What are the decision-making procedure and the disagreement point? What resources are awarded to the agents (i.e. votes vs money/time)?
One problem with the parliamentary approach, where we have to specify a voting procedure is, is how we could possibly justify one procedure over another. After all, to say it was right on the basis of some first-order moral theory would be question-begging.
Relatedly, as I note in a comment above, is that agents may well prefer to be able to engage in grand, life-long bargains than to be restricted (as Ord and Newberry suggest) to convening parliaments now and then to deal with particular issues. So we can see why agents, starting from something like a 'veil of ignorance', would not agree to it.
It's for these reasons I suspect the internal bargaining version is superior. I suspect there would be more to say about this after further reflection.
I posed a puzzle in Is the potential astronomical waste in our universe too small to care about?, which seems to arise in any model of moral uncertainty where the representatives of different moral theories can trade resources, power, or influence with each other, like here. (Originally my post assumed a Moral Parliament where the delegates can trade votes with each other.)
(There was some previous discussion of my puzzle on this forum, when the Greaves and Cotton-Barratt preprint came out.)
This worry about internal bargaining & moral parliament approaches strikes me as entirely well-taken. The worry basically being that on the most obvious way of implementing these proposals for dealing with moral uncertainty, it looks as though a decision maker will wind up governed by the dead hand of her past empirical evidence. Suppose outcome X looks unlikely, and my subagents A and B strike a deal that benefits A (in which I have low credence) just in case X obtains. Now I find out that outcome X only looked unlikely because my old evidence wasn't so great. In fact, X now obtains! Should I now do as A recommends, even though I have much higher credence in the moral theory that B represents?
What I think this worry shows us is that we should not implement the internal bargaining proposal in the most obvious way. Instead, we should implement it in a slightly less obvious way. I outline what I have in mind more formally in work in progress that I'd be very happy to share with anyone interested, but here's the basic idea: the 'contracts' between subagents / parliamentarians that a decision maker should regard herself as bound by in the present choice situation are not the contracts that those subagents / parliamentarians agreed to earlier in the decision maker's lifetime based on the evidence that she had then. Instead, the decision maker should regard herself as bound by the contracts that those subagents would have agreed earlier in the decision maker's lifetime if they had then had the empirical evidence that they have now. This should resolve Wei_Dai's puzzle.
If you found this post helpful, please consider completing HLI's 2022 Impact Survey.
Most questions are multiple-choice and all questions are optional. It should take you around 15 minutes depending on how much you want to say.
Here's a comment on the value of moral information. First a quibble: the setup in the Wise Philanthropist case strikes me as slightly strange. If one thinks that studying more would improve one's evidence, and one suspects that studying more would increase one's credence in A relative to B, then this in itself should already be shifting one's credence from B to A (cf. e.g. Briggs 2009).
Still, I think that the feature of Wise Philanthropist that I am quibbling about here is inessential to the worry that the internal bargaining approach will lead us to undervalue moral information. Suppose that I'm 50% confident in moral theory T1, 50% confident in T2, and that I can learn for some small fee $x from some oracle which theory is in fact correct. Intuitively, I should consult the oracle. But if the T1 and T2 subagents each think that there's a 50% chance the oracle will endorse T1 and a 50% chance she'll endorse T2, then each subagent might well think that she has just as much to lose by consulting the oracle as she has to gain, so might prefer not to spend the $x.
Fortunately for the internal bargaining theory, I think that Michael (the OP) opens himself up to these results only by being unfaithful to the motivating idea of internal bargaining. The motivating idea is that each subagent is certain in the moral theory that she represents. But in that case, in the oracle example the T1 and T2 subagents should each be certain that the oracle will endorse their preferred theory! Each subagent would then be willing to pay quite a lot in order to consult the oracle. Hence -- as is intuitive -- it is indeed appropriate for the uncertain decision maker to do so.
(FYI, I develop this line of thought more formally in work in progress that I'd be very happy to share with anyone interested :) )
Ah, this is very interesting! My only comment on this, which develops the idea, is that the agents would realise that the other agents are also certain, so won't change their minds whatever the oracle pronounces. If we conceive of there as being undecided sub-agents, then the decided sub-agents could think about the value of getting information that might convince them.
Hmm, I don't think that we need for any theory-agents to change their minds when the oracle delivers her pronouncement though -- we just need for the theory-agents' resource endowments to be sensitive to what the oracle says. We can think of all the same theory-agents still hanging around after the oracle delivers her pronouncement, still just as certain in the theory they represent -- it's just that now only one of them ever gets endowed with any resources.
The value of a longtermist view depends on the control you believe that you can exert over the future. While you might find moral value in creating an actual future, a hypothetical future that you believe that you will not create has no moral significance for your present actions, in terms of number of lives present, circumstances present at that future time, or any other possible feature of it.
Put differently: to declare a possibility that your actions turn out to be necessary (or even sufficient) causes of future events, but without believing that those future events will necessarily occur after your actions, is too imply that the consequences of your actions lack moral significance to you. And that's longtermism in a nutshell, just actions in pursuit of an implausible future.
How do you derive the credence you give to each moral view you hold, by the way, those numbers like 60%? What do those percentages mean to you? Are they a historical account of the frequency of your actual moral views arbitrarily occurring, one then another in some sequence, or are they a subjective experience of the amount of belief in each view that you hold during a particular instance of comparing different moral views, or something else? Are they a "belief in your beliefs"? Are you assigning probabilities to the outputs of your self-awareness?