1

11

Just as a side note, Harsanyi's result is not directly applicable to a formal setup involving subjective uncertainty, such as Savage's or the Jeffrey-Bolker framework underlying evidential and causal decision theory. Though there are results for the Savage setup too, e.g., https://www.jstor.org/stable/10.1086/421173, and Caspar Oesterheld and I are working on a similar result for the Jeffrey Bolker framework. In this setup, to get useful results, the indifference Axiom can only be applied to a restricted class of propositions where everyone agrees on beliefs.

I don't think Romeo even has to deny any of the assumptions. Harsanyi's result, derived from the three assumptions, is not enough to determine how to do intersubjective utility comparisons. It merely states that social welfare will be *some *linear combination of individual utilities. While this already greatly restricts the way in which utilities are aggregated, it does not specify which weights to use for this sum.

Moreover, arguing that weights should be equal based on the veil of ignorance, as I believe Harsanyi does, is not sufficient, since utility functions are only determined up to affine transformations, which includes rescalings. (This point has been made in the literature as a criticism of preference utilitarianism, I believe.) So there seems to be no way to determine what equal weights should look like, without settling on a way to normalize utility functions, e.g., by range normalization or variance normalization. I think the debate about intersubjective utility comparisons comes in at the point where you ask how to normalize utility functions.

Of course, if you are not using a kind of preference utilitarianism but instead just aggregate some quantities you believe to have an absolute scale—such as happiness and suffering—then you could argue that utility functions should just correspond to this one absolute scale, with the same scaling for everyone. Though I think this is also not a trivial argument—there are potentially different ways to get from this absolute scale or Axiology to behavior towards risky gambles, which in turn determine the utility functions.

And it turns out that the utilitarian approach of adding up utilities is *not* a bargaining solution, because it violates Pareto-optimality in some cases. Does that "disprove" total utilitarianism?

I'm not sure this is right. As soon as you maximize a weighted sum with non-negative coefficients your solution will be weakly Pareto optimal. As soon as all coefficients are strictly positive, it will be strongly Pareto optimal. The axioms mentioned above don't imply non-negative coefficients, so theoretically they are also satisfied by "anti-utilitarianism" which counts everyone's utility negatively. But one can add stronger Pareto axioms to force all coefficients to be strictly positive.

The problem with the utilitarian Bargaining solution is that it is not independent of affine transformations of utility functions. Just summing up utility functions is underspecified, one also needs to choose a scaling for the utility functions. A second criterion that might not be satisfied by the utilitarian solution (depending on the scaling chosen) is individual rationality, which means that everyone will be better off given the bargaining solution than some disagreement outcome.

Your argument seems to combine SSA style anthropic reasoning with CDT. I believe this is a questionable combination as it gives different answers from an ex-ante rational policy or from updateless decision theory (see e.g. https://www.umsu.de/papers/driver-2011.pdf). The combination is probably also dutch-bookable.

Consider the different hingeynesses of times as the different possible worlds and your different real or simulated versions as your possible locations in that world. Say both worlds are equally likely a priori and there is one real version of you in both worlds, but the hingiest one also has 1000 subjectively indistinguishable simulations (which don't have an impact). Then SSA tells you that you are much less likely a real person in the hingiest time than you are to be a real person in the 20th hingiest time. Using these probabilities to calculate your CDT-EV, you conclude that the effects of your actions on the 20th most hingiest time dominate.

Alternatively, you could combine CDT with SIA. Under SIA, being a real person in either time is equally likely. Or you could combine the SSA probabilities with EDT. EDT would recommend acting as if you were controlling all simulations and the real person at once, no matter whether you are in the simulation or not. In either case, you would conclude that you should do what is best for the hingiest time (given that they are equally likely a priori).

Unlike the SSA+CDT approach, either of these latter approaches would (in this case) yield the actions recommended by someone coordinating everyone's actions ex ante.

Thanks a lot for this article! I just wanted to link to Lukas Gloor's new paper on Fail-Safe AI, which discusses the reduction of "quality future-risks" in the context of AI safety. It turns out that there might be interventions that are less directed at achieving a perfect outcome, but instead try to avoid the worst outcomes. And those interventions might be more tractable (because they don't aim at such a tiny spot in value-space) and more neglected than other work on the control problem. https://foundational-research.org/wp-content/uploads/2016/08/Suffering-focused-AI-safety.pdf

(Edit: I no longer endorse negative utilitarianism or suffering-focused ethics.)

Thank you! Cross-posting my reply as well:

If we adopt more of a preference-utilitarian view, we end up producing contradictory conclusions in the same scenarios that I discussed in my original essay—you can't claim that AMF saves 35 DALYs without knowing AMF's population effects.

Shouldn't this be fixed by negative preference utilitarianism? There could be value in not violating the "preference-equivalent" of dying one year earlier, but no value in creating additional "life-year" preferences. A YLL would be equivalent to a violated life-preference, then. You could avert YLLs by not having children, of course, which seems plausible to me (if noone is born, whose preference is violated by dying from Malaria?). Being born and dying from Malaria would be worse than non-existence, so referring to your "Bigger Problem"-scenarios, A < B < C and C = D.

Regarding EV: I agree, there has to be one ranking mapping world-states onto real numbers (or R^n if you drop the continuity-axiom). So you're right in the sense that the supposed GiveWell-ranking of world-states that you assume doesn't work out. I still think that there might be a way to make a creative mapping in the real world so that the GiveWell focus on DALYs without regarding population size can be somehow translated into a utility function. Anyway, I would kind of agree that AMF turns out to be less effective than previously thought, both from an SFE and a classical view smile emoticon

One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents' algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.

Unfortunately, I don't know what the oracle would be doing in general. I could also imagine that, when formulated this way, the conclusion is that humans never correlate with anything, for instance.