All of EJT's Comments + Replies

You should read the post! Section 4.1.1 makes the move that you suggest (rescuing PAVs by de-emphasising axiology). Section 5 then presents arguments against PAVs that don't appeal to axiology. 

4
Lukas_Gloor
17d
Sorry, I hate it when people comment on something that has already been addressed. FWIW, though, I had read the paper the day it was posted on the GPI fb page. At that time, I didn't feel like my point about "there is no objective axiology" fit into your discussion. I feel like even though you discuss views that are "purely deontic" instead of "axiological," there are still some assumptions from the axiology-based framework that underly your conclusion about how to reason about such views. Specifically, when explaining why a view says that it would be wrong to create only Amy but not Bobby, you didn't say anything that suggests understanding of "there is no objective axiology about creating new people/beings." That said, re-reading the sections you point to, I think it's correct that I'd need to give some kind of answer to your dilemmas, and what I'm advocating for seems most relevant to this paragraph: At the very least, I owe you an explanation of what I would say here. I would indeed advocate for what you call the "intermediate wide view," but I'd motivate this view a bit differently. All else equal, IMO, the problem with creating Amy and then not creating Bobby is that these specific choices, in combination, and if it would have been low-effort to choose differently (or the other way around), indicate that you didn't consider the interests of possible people/beings even to a minimum degree. Considering them to a minimum degree would mean being willing to at least take low-effort actions to ensure your choices aren't objectionable from their perspective (the perspective of possible people/beings). Adding someone with +1 when you could've easily added someone else with +100 just seems careless. If Alice and Bobby sat behind a veil of ignorance, not knowing which of them will be created with +1 or +100 (if someone gets created at all), the one view they would never advocate for is "only create the +1 person." If they favor anti-natalist views, they advocate f

I think my objections still work if we 'go anonymous' and remove direct information about personal identity across different options. We just need to add some extra detail. Let the new version of One-Shot Non-Identity be as follows. You have a choice between: (1) combining some pair of gametes A, which will eventually result in the existence of a person with welfare 1, and (2) combining some other pair of gametes B, which will eventually result in the existence of a person with welfare 100. 

The new version of Expanded Non-Identity is then the same as ... (read more)

Here's my understanding of the dialectic here:

Me: Some wide views make the permissibility of pulling both levers depend on whether the levers are lashed together. That seems implausible. It shouldn't matter whether we can pull the levers one after the other.

Interlocutor: But lever-lashing doesn't just affect whether we can pull the levers one after the other. It also affects what options are available. In particular, lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. So if a wide view has ... (read more)

2
MichaelStJules
18d
Ah, I should have read more closely. I misunderstood and was unnecessarily harsh. I'm sorry. I think your response to Risberg is right. I would still say that permissibility could depend on lever-lashing (in some sense?) because it affects what options are available, though, but in a different way. Here is the view I'd defend: Here are the consequences in your thought experiments: 1. In the four button case, the "Just Amy" button is impermissible, because there's a "Just Bobby" button. 2. In the lashed levers case, it's impermissible to pull either, because this would give "Just Amy", and the available alternative is "Just Bobby". 3. In the unlashed levers case,  1. Ahead of time, each lever is permissible to pull and permissible to not pull, as long as you won't pull both (or leave both pulled, in case you can unpull). Ahead of time, pulling both levers is impermissible, because that would give "Just Amy", and "Just Bobby" is still available. This agrees with 1 and 2. 2. But if you have already pulled one lever (and this is irreversible), then "Just Bobby" is no longer available (either Amy is/will be created, or Bobby won't be created), and pulling the other is permissible, which would give "Just Amy". "Just Amy" is therefore permissible at this point. As we see in 3.b., "Just Bobby" gets ruled out, and then "Just Amy" becomes permissible after and because of that, but only after "Just Bobby" is ruled out, not before. Permissibility depends on what options are still available, specifically if "Just Bobby" is still available in these thought experiments. "Just Bobby" is still available in 2 and 3.a. In your post, you wrote: This is actually true ahead of time, in 2 and 3.a, with pulling both together impermissible. But already having pulled a lever and then pulling the other is permissible, in 3.b.  Maybe this is getting pedantic and off-track, but "already having pulled a lever" is not an action available to you, it's just a state of the world.

Thanks! I'd like to think more at some point about Dasgupta's approach plus resolute choice. 

2
MichaelStJules
15d
I wrote a bit more about Dasgupta's approach and how to generalize it here.

In Parfit's case, we have a good explanation for why you're rationally required to bind yourself: doing so is best for you.

Perhaps you're morally required to bind yourself in Two-Shot Non-Identity, but why? Binding yourself isn't better for Amy. And if it's better for Bobby, it seems that can only be because existing is better for Bobby than not-existing, and then there's pressure to conclude that we're required to create Bobby in Just Bobby, contrary to the claims of PAVs.

And suppose that (for whatever reason) you can't bind yourself in Two-Shot Non-Ident... (read more)

2
MichaelStJules
18d
The more general explanation is that it's best according to your preferences, which can also reflect or just be your moral views. It's not necessarily a matter of personal welfare, narrowly construed. We have similar thought experiments for total utilitarianism. As long as you 1.  expect to do more to further your own values/preferences with your own money than the driver would to further your own values/preferences with your money 2. don't disvalue breaking promises (or don't disvalue it enough), and 3. can't bind yourself to paying and know this, then you'd predict you won't pay and be left behind. Generically, if and because you hold a wide PAV, and it leads to the best outcome ahead of time on that view. There could be various reasons why someone holds a wide PAV. It's not about it being better for Bobby or Amy. It's better "for people", understood in wide person-affecting terms. One rough argument for wide PAVs could be something like this, based on Frick, 2020 (but without asymmetry): 1. If a person A existed, exists or will exist in an outcome,[1] then the moral standard of "A's welfare" applies in that outcome, and its degree of satisfaction is just A's lifetime (or future) welfare. 2. Between two outcomes, X and Y, if 1) standard x applies in X and standard y applies in Y (and either x and y are identical standards or neither applies in both X and Y), 2) standards x and y are of "the same kind", 3) x is at least as satisfied in X as y is in Y, and 4) all else is equal, then X ≿ Y (X is at least as good as Y). 1. If keeping promises matters in itself, then it's better to make a promise you'll keep than a promise you'll break, all else equal. 2. With 1 (and assuming different people result in "the same kind" of welfare standards with comparable welfare), "Just Bobby" is better than "Just Amy", because the moral standard of Bobby's welfare would be more satisfied than the moral standard of Amy's welfare. 3. This is basically Pareto for st

Interesting, thanks! I hadn't come across this argument before.

1
Matthew Rendall
12d
It's in his book Inequality, chapter 9. Ingmar Persson makes a similar argument about the priority view here: https://link.springer.com/article/10.1023/A:1011486120534.

Yes, nice points. If one is committed to contingent people not counting, then one has to say that C is worse than B. But it still seems to me like an implausible verdict, especially if one of B and C is going to be chosen (and hence those contingent people are going to become actual). 

It seems like the resulting view also runs into problems of sequential choice. If B is best out of {A, B, C}, but C is best out of {B, C}, then perhaps what you're required to do is initially choose B and then (once A is no longer available) later switch to C, even if doing so is costly. And that seems like a bad feature of a view, since you could have costlessly chosen C in your first choice.

2
MichaelStJules
20d
I think you'd still just choose A at the start here if you're considering what will happen ahead of time and reasoning via backwards induction on behalf of the necessary people. (Assuming C is worse than A for the original necessary people.) If you don't use backwards induction, you're going to run into a lot of suboptimal behaviour in sequential choice problems, even if you satisfy expected utility theory axioms in one-shot choices.

Taken as an argument that B isn't better than A, this response doesn't seem so plausible to me. In favour of B being better than A, we can point out: B is better than A for all of the necessary people, and pretty good for all the non-necessary people. Against B being better than A, we can say something like: I'd regret picking B over C. The former rationale seems more convincing to me, especially since it seems like you could also make a more direct, regret-based case for B being better than A: I'd regret picking A over B.

But taken as an argument that A is permissible, this response seems more plausible. Then I'd want to appeal to my arguments against deontic PAVs.

2
MichaelStJules
20d
A steelman could be to just set it up like a hypothetical sequential choice problem consistent with Dasgupta's approach: 1. Choose between A and B 2. If you chose B in 1, choose between B and C. or 1. Choose between A and (B or C). 2. If you chose B or C in 1, choose between B and C. In either case, "picking B" (including "picking B or C") in 1 means actually picking C, if you know you'd pick C in 2, and then use backwards induction. The fact that A is at least as good as (or not worse than and incomparable to) B could follow because B actually just becomes C, which is equivalent to A once we've ruled out B. It's not just facts about direct binary choices that decide rankings ("betterness"), but the reasoning process as a whole and how we interpret the steps. At any rate, I don’t think it’s that important whether we interpret the rankings as "betterness", as usually understood, with its usual sensitivities and only those. I think you've set up a kind of false dichotomy between permissibility and betterness as usually understood. A third option is rankings not intended to be interpeted as betterness as usual. Or, we could interpret betterness more broadly. Having separate rankings of options apart from or instead of strict permissibility facts can still be useful, say because we want to adopt something like a scalar consequentialist view over those rankings. I still want to say that C is "better" than B, which is consistent with Dasgupta's approach. There could be other options like A, with the same 100 people, but everyone gets 39 utility instead of 40, and another where everyone gets 20 utility instead. I still want to say 39 is better than 20, and ending up with 39 instead of 40 is not so bad, compared to ending up with 20, which would be a lot worse.

Yes, nice point. I argue against this kind of dependence in footnote 16 of the paper. Here's what I say there:

Here’s a possible reply, courtesy of Olle Risberg. What we’re permitted to do depends on lever-lashing, but not because lever-lashing precludes pulling the levers one after the other. Instead, it’s because lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. If we have the option to create both and the option to create neither, then creating just Amy is permissible. If we don’t have

... (read more)
2
MichaelStJules
20d
EDIT: Actually my best reply is that just Amy is impermissible whenever just Bobby is available, ahead of time considering all your current and future options (and using backwards induction). The same reason applies for all of the cases, whether buttons, levers, or lashed levers. EDIT2: I think I misunderstood and was unfairly harsh below. ---------------------------------------- I do still think the rest of this comment below is correct in spirit as a general response, i.e. a view can make different things impermissible for different reasons. I also think you should have followed up to your own reply to Risberg or anticipated disjunctive impermissibility in response, since it seems so obvious to me, given its simplicity and I think it’s a pretty standard way to interpret (im)permissibility. Like I would guess Risberg would have pointed out the same (but maybe you checked?). Your response seems uncharitable/like a strawman. Still, the reasons are actually the same in the cases here, but for a more sophisticated reason that seems easier to miss, i.e. considering all future options ahead of time. ‐-------- I agree that my/Risberg's reply doesn't help in this other case, but you can have different replies for different cases. In this other case, you just use the wide view's solution to the nonidentity problem, which tells you to not pick just Amy if just Bobby is available. Just Amy is ruled out for a different reason. And the two types of replies fit together in a single view, which is a wide view considering the sequences of options ahead of time and using backwards induction (everyone should use backwards induction in (finite) sequential choice problems, anyway). This view will give the right reply when it's needed. Or, you could look at it like if something is impermissible for any reason (e.g. via either reply), then it is impermissible period, so you treat impermissibility disjunctively. As another example, someone might say each of murder and lying are i

I'm quite surprised that superforecasters predict nuclear extinction is 7.4 times more likely than engineered pandemic extinction, given that (as you suggest) EA predictions usually go the other way. Do you know if this is discussed in the paper? I had a look around and couldn't find any discussion.

2
Vasco Grilo
5mo
Hi EJT, I was also curious to understand why superforecasters' nuclear extinction risk was so high. Sources of agreement, disagreement and uncertainty, and arguments for low and high estimates are discussed on pp. 298 to 303. I checked these a few months ago, and my recollection is that the forecasters have the right qualitative considerations in mind, but I do believe they are arriving to an overly high extinction risk. I recently commented about this. Note domain experts guessed an even higher nuclear extinction probability by 2100 of 0.55 %, 7.43 (= 0.0055/0.00074) times that of the superforecasters. This is specially surprising considering: * The pool of experts drew more heavily from the EA community than the pool of superforecasters. "The sample drew heavily from the Effective Altruism (EA) community: about 42% of experts and 9% of superforecasters reported that they had attended an EA meetup". * I would have expected people in the EA community to guess a lower nuclear extinction risk. 0.55 % is 5.5 times Toby Ord's guess given in The Precipice for nuclear existential risk from 2021 to 2120 of 0.1 %, and extinction risk should be lower than existential risk.

That all sounds approximately right but I'm struggling to see how it bears on this point:

If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent's preference. And once we do that, we can observe violations of Completeness.

Can you explain?

The only thing that matters is whether the agent's resulting behaviour can be coherently described as maximising a utility function.

If you're only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent's preferences are defined over entire histories of ... (read more)

4
Lucius Bushnaq
8mo
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it's probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it's a ludicrously large one. The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It's telling you what kind of systems you can usefully model as "making decisions" if you want to predict their actions. If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent's preferences be somehow explicitly represented in the algorithms it "actually uses" to make decisions, whatever that would mean. It doesn't know what these algorithms are, and doesn't even have the vocabulary to formulate questions about them. It's like saying we can't use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts

Thanks, Danny! This is all super helpful. I'm planning to work through this comment and your BCA update post next week.

I think this paper is missing an important distinction between evolutionarily altruistic behaviour and functionally altruistic behaviour.

  • Evolutionarily altruistic behaviour: behaviour that confers a fitness benefit on the recipient and a fitness cost on the donor.
  • Functionally altruistic behaviour: behaviour that is motivated by an intrinsic concern for others' welfare.

These two forms of behaviour can come apart.

A parent's care for their child is often functionally altruistic but evolutionarily selfish: it is motivated by an intrinsic concern for the child'... (read more)

I wouldn't call a small policy like that 'democratically unacceptable' either. I guess the key thing is whether a policy goes significantly beyond citizens' willingness to pay not only by a large factor but also by a large absolute value. It seems likely to be the latter kinds of policies that couldn't be adopted and maintained by a democratic government, in which case it's those policies that qualify as democratically unacceptable on our definition.

suggests that we are not too far apart.

Yes, I think so!

I guess this shows that the case won't get through with the conservative rounding off that you applied here, so future developments of this CBA would want to go straight for the more precise approximations in order to secure a higher evaluation.

And thanks again for making this point (and to weeatquince as well). I've written a new paragraph emphasising a more reasonable, less conservative estimate of benefit-cost ratios. I expect it'll probably go in the final draft, and I'll edit the post here to incl... (read more)

Thanks for this! All extremely helpful info.

Naively a benefit cost ratio of >1 to 1 suggests that a project is worth funding. However given the overhead costs of government policy, to governments propensity to make even cost effective projects go wrong and public preferences for money in hand it may be more appropriate to apply a higher bar for cost-effective government spending. I remember I used to have a 3 to 1 ratio, perhaps picked up when I worked in Government although I cannot find a source for this now.

This is good to know. Our BCR of 1.6 is bas... (read more)

Though I agree that refuges would not pass a CBA, I don't think they are an example of something that would be extreme cost to those alive today-I suspect significant value could be obtained with $1 billion.

I think this is right. Our claim is that a strong longtermist policy as a whole would place extreme burdens on the present generation. We expect that a strong longtermist policy would call for particularly extensive refuges (and lots of them) as well as the other things that we mention in that paragraph.

We also focus on the risk of global catastrophes,

... (read more)

Maybe an obvious point, but I think we shouldn't lose sight of the importance of providing EA funding for catastrophe-preventing interventions, alongside attempts to influence government. Attempts to influence government may fail / fall short of what is needed / take too long given the urgency of action.

Yep, agreed! 

Should we just get on with developing refuges ourselves?

My impression is that this is being explored. See, e.g., here.

Second, the argument overshoots.

The argument we mean to refer to here is the one that we call the ‘best-known argument’ elsewhere: the one that says that the non-existence of future generations would be an overwhelming moral loss because the expected future population is enormous, the lives of future people are good in expectation, and it is better if the future contains more good lives. We think that this argument is liable to overshoot.

I agree that there are other compelling longtermist arguments that don’t overshoot. But my concern is that governments c... (read more)

3
Toby_Ord
1y
Thanks for the clarifications!

But CBA cares about marginal cost effectiveness and presumably the package can be broken into chunks of differing ex-ante cost-effectiveness (e.g. by intervention type, or by tranches of funding in each intervention). Indeed you suggest this later in the piece. Since the average only just meets the bar, if there is much variation, the marginal work won’t meet the bar, so government funding would cap out at something less than this, perhaps substantially so.

Yes, this is an important point. If we were to do a more detailed cost-benefit analysis of catastroph... (read more)

2
Toby_Ord
1y
Thanks Elliott, I guess this shows that the case won't get through with the conservative rounding off that you applied here, so future developments of this CBA would want to go straight for the more precise approximations in order to secure a higher evaluation. Re the possibility of international agreements, I agree that they can make it easier to meet various CBA thresholds, but I also note that they are notoriously hard to achieve, even when in the interests of both parties. That doesn't mean that we shouldn't try, but if the CBA case relies on them then the claim that one doesn't need to go beyond it (or beyond CBA-plus-AWTP) becomes weaker. That said, I think some of our residual disagreement may be to do with me still not quite understanding what your paper is claiming. One of my concerns is that I'm worried that CBA-plus-AWTP is a weak style of argument — especially with elected politicians. That is, arguing for new policies (or treaties) on grounds of CBA-plus-AWTP has some sway for fairly routine choices made by civil servants who need to apply government cost-effectiveness tests, but little sway with voters or politicians. Indeed, many people who would be benefited by such cost-effectiveness tests are either bored by — or actively repelled by — such a methodology. But if you are arguing that we should only campaign for policies that would pass such a test, then I'm more sympathetic. In that case, we could still make the case for them in terms that will resonate more broadly.

On the first, I think we should use both traditional CBA justifications as well as longtermist considerations

I agree with this. What we’re arguing for is a criterion: governments should fund all those catastrophe-preventing interventions that clear the bar set by cost-benefit analysis and altruistic willingness to pay. One justification for funding these interventions is the justification provided by CBA itself, but it need not be the only one. If longtermist justifications help us get to the place where all the catastrophe-preventing interventions that cl... (read more)

at other times you seem to trade on the idea that there is something democratically tainted about political advocacy on behalf of the people of the future — this is something I strongly reject.

I reject that too. We don’t mean to suggest that there is anything democratically tainted about that kind of advocacy. Indeed, we say that longtermists should advocate on behalf of future generations, in order to increase the present generation’s altruistic willingness to pay for benefits to future generations.

What we think would be democratically unacceptable is gov... (read more)

4
Toby_Ord
1y
I'm not so sure about that. I agree with you that it would be normatively problematic in the paradigm case of a policy that imposed extreme costs on current society for very slight reduction in total existential risk — let's say, reducing incomes by 50% in order to lower risk by 1 part in 1 million. But I don't know that it is true in general. First, consider a policy that was inefficient but small — e.g. one that cost $10 million to the US govt, but reduced the number of statistical lives lost in the US by only 0.1, I don't think I'd say that this was democratically unacceptable. Policies like this are enacted all the time in safety contexts and are often inefficient and ill-thought-out, and I'm not generally in favour of them, but I don't find them to be undemocratic. I suppose one could argue that all US policy that doesn't pass a CBA is undemocratic (or democratically unacceptable), but that seems a stretch to me. So I wonder whether it is correct to count our intuitions on the extreme example as counting against all policies that are inefficient in traditional CBA terms or just against those that impose severe costs.
EJT
1y10
1
0

Thanks, these comments are great! I'm planning to work through them later this week. 

I agree with pretty much all of your bulletpoints. With regards to the last one, we didn't mean to suggest that arguing for greater concern about existential risks is undemocratic. Instead, we meant to suggest that (in the world as it is today) it would be undemocratic for governments to implement polices that place heavy burdens on the present generation for the sake of small reductions in existential risk.

Thanks for the tip! Looking forward to reading your paper.

but surely to be authorized by wider consultation

What do you mean by this?

2
Matt Boyd
1y
Thanks. I guess this relates to your point about democratically acceptable decisions of governments. If a government is choosing to neglect something (eg because its probability is low, or because they have political motivations for doing so, vested interests etc), then they should only do so if they have information suggesting the electorate has/would authorize this. Otherwise it is an undemocratic decision. 
EJT
1y10
5
0

Thanks for the comment!

There are clear moral objections against pursuing democratically unacceptable policies

What we mean with this sentence is that there are clear moral objections against governments pursuing [perhaps we should have said 'instituting'] democratically unacceptable policies. We don't mean to suggest that there's anything wrong with citizens advocating for policies that are currently democratically unacceptable with the aim of making them democratically acceptable.

8
Richard Y Chappell
1y
OK, thanks for clarifying! I guess there's a bit of ambiguity surrounding talk of "the goal of longtermists in the political sphere", so maybe worth distinguishing immediate policy goals that could be implemented right away, vs. external (e.g. "consciousness-raising") advocacy aimed at shifting values. It's actually an interesting question when policymakers can reasonably go against public opinion. It doesn't seem necessarily objectionable (e.g. to push climate protection measures that most voters are too selfish or short-sighted to want to pay for). There's a reason we have representative rather than direct democracy. But the key thing about your definition of "democratically unacceptable" is that it specifies the policy could not possibly be maintained, which more naturally suggests a feasibility objection than a moral one, anyhow. But I'm musing a bit far afield now.  Thanks for the thought-provoking paper!

Great post!

For example, maybe, according to you, you’re an “all men are created equal” type. That is, you treat all men equally. Maybe you even write a fancy document about this, and this document gets involved in the founding of a country, or something.

There’s a thing philosophy can do, here, which is to notice that you still own slaves. Including: male slaves. And it can do that whole “implication” thing, about how, Socrates is a man, you treat all men equally, therefore you treat Socrates equally, except oh wait, you don’t, he’s your slave.

Charles Mills... (read more)

I think all of these objections would be excellent if I were arguing against this claim: 

  • Agents are rationally required to satisfy the VNM axioms.

But I’m arguing against this claim:

  • Sufficiently-advanced artificial agents will satisfy the VNM axioms.

And given that, I think your objections miss the mark. 

On your first point, I’m prepared to grant that agents have no reason to rule out option A- at node 2. All I need to claim is that advanced artificial agents might rule out option A- at node 2. And I think my argument makes that... (read more)

There's a complication here related to a point that Rohin makes : if we can only see an agent's decisions and we know nothing about its preferences, all behavior can be rationalized as EU maximization.

But suppose we set that complication aside. Suppose we know this about an agent's preferences: 

  • There is some option A such that the agent strictly prefers A+$1

Then we can observe violations of Completeness. Suppose that we first offer our agent a choice between A and some other option B, and that the agent chooses A. Then we give the agent the chance to ... (read more)

Ah I see! Yep, agree with that.

Nice point. The rough answer is 'Yes, but only once the agent has turned down a sufficiently wide array of options.' Depending on the details, that might never happen or only happen after a very long time. 

I've had a quick think about the more precise answer, and I think it is: 

  • The agent's preferences will be functionally complete once and only once it is the case that, for all pairs of options between which the agent has a preferential gap, the agent has turned down an option that is strictly preferred to one of the options in the pair.
4
Nick_Anyos
1y
I had a similar thought to Shiny. Am I correct that an agent following your suggested policy ("‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ ") would never *appear* to violate completeness from the perspective of an observer that could only see their decisions and not their internal state? And assuming completeness is all we need to get to full utility maximization, does that mean an agent following your policy would act like a utility maximizer to an observer?

I didn’t mean to suggest it was new! I remember that part of your book.

Your second point seems to me to get the dialectic wrong. We can read coherence arguments as saying:

  • Sufficiently-advanced artificial agents won't pursue dominated strategies, so they'll  have complete preferences.

I’m pointing out that that inference is poor. Advanced artificial agents might instead avoid dominated strategies by acting in accordance with the policy that I suggest.

I’m still thinking about your last point. Two quick thoughts:

  • It seems like most humans aren’t consequentialists.
  • Advanced artificial agents could have better memories of their past decisions  than humans.
1
Johan Gustafsson
1y
But my argument against proposals like yours is not that agents wouldn’t have sufficiently good memories. The objection (following Broome and others) is that the agents at node 2 have no reason at that node for ruling out option A- with your policy. The fact that A could have been chosen earlier should not concern you at node 2. A- is not dominated by any of the available options at node 2. Regarding the inference being poor,  my argument in the book has two parts (1) the money pump for Completeness which relies on Decision-Tree Separability and (2) the defence of Decision-Tree Separability. It is (2) that rules out your proposal. Regarding your two quick thoughts, lots of people may be irrational. So that arguments does not work.

it seems like we agree on the object-level facts

I think that’s right.

Often a lot of the important "assumptions" in a theorem are baked into things like the type signature of a particular variable or the definitions of some key terms; in my toy theorem above I give two examples (completeness and lack of time-dependence). You are going to lose some information about what the theorem says when you convert it from math to English; an author's job is to communicate the "important" parts of the theorem (e.g. the conclusion, any antecedents that the reader may no

... (read more)
3
Rohin Shah
1y
Upon rereading I realize I didn't state this explicitly, but my conclusion was the following: Transitivity depending on completeness doesn't invalidate that conclusion.

Thanks for the comment! In this context, where we're arguing about whether sufficiently-advanced artificial agents will satisfy the VNM axioms, I only have to give up Decision-Tree Separability*:

Sufficiently-advanced artificial agents’ dispositions to choose options at a choice node will not depend on other parts of the decision tree than those that can be reached from that node. 

And Decision-Tree Separability* isn't particularly plausible. It’s false if any sufficiently-advanced artificial agent acts in accordance with the following policy: ‘if I pre... (read more)

2
Johan Gustafsson
1y
What you are suggesting is what I called "The Conservative Approach" to resolute choice, which I discuss critically on pages 73–74. It is not a new idea. Note also that avoiding money pumps for Completeness cannot alone motivate your suggested policy, since you can also avoid them by satisfying Completeness. So that argument does not work (without assuming the point at issue). Finally, I guess I don't see why consequentialism would less plausible for artificial agents than other agents.

So, you would agree that the following is an English description of a theorem:

If an agent has complete, transitive preferences, and it does not pursue dominated strategies, then it must be representable as maximizing expected utility.

Yep, I agree with that.

I feel pretty fine with justifying the transitive part via theorems basically like the one I gave above.

Note that your money-pump justifies acyclicity  (The agent does not strictly prefer A to B, B to C, and C to A) rather than the version of transitivity necessary for the VNM and Complete Class the... (read more)

5
Rohin Shah
1y
Okay, it seems like we agree on the object-level facts, and what's left is a disagreement about whether people have been making a major error. I'm less interested in that disagreement so probably won't get into a detailed discussion, but I'll briefly outline my position here. The main way in which this claim is false (on your way of using words) is that it fails to note some of the antecedents in the theorem (completeness, maybe transitivity). But I don't think this is a reasonable way to use words, and I don't think it's reasonable to read the quotes in your appendix as claiming what you say they claim. Converting math into English is a tricky business. Often a lot of the important "assumptions" in a theorem are baked into things like the type signature of a particular variable or the definitions of some key terms; in my toy theorem above I give two examples (completeness and lack of time-dependence). You are going to lose some information about what the theorem says when you convert it from math to English; an author's job is to communicate the "important" parts of the theorem (e.g. the conclusion, any antecedents that the reader may not agree with, implications of the type signature that limit the applicability of the conclusion), which will depend on the audience. As a result when you read an English description of a theorem, you should not expect it to state every antecedent. So it seems unreasonable to me to critique a claim in English about a theorem existing purely because it didn't list all the antecedents. I think it is reasonable to critique a claim in English about a theorem on the basis that it didn't highlight an important antecedent that limits its applicability. If you said "AI alignment researchers should make sure to highlight the Completeness axiom when discussing coherence theorems" I'd be much more sympathetic (though personally my advice would be "AI alignment researchers should make sure to either argue for or highlight as an assumption t
EJT
1y15
4
0

Theorems are typically of the form "Suppose X, then Y"; what is X if not an assumption?

X is an antecedent.

Consider an example. Imagine I claim:

  • Suppose James is a bachelor. Then James is unmarried.

In making this claim, I am not assuming that James is a bachelor. My claim is true whether or not James is a bachelor.

I might temporarily assume that James is a bachelor, and then use that assumption to prove that James is unmarried. But when I conclude ‘Suppose James is a bachelor. Then James is unmarried’, I discharge that initial assumption. My conclusion no lo... (read more)

Thanks, I understand better what you're trying to argue.

The part I hadn't understood was that, according to your definition, a "coherence theorem" has to (a) only rely on antecedents of the form "no dominated strategies" and (b) conclude that the agent is representable by a utility function. I agree that on this definition there are no coherence theorems. I still think it's not a great pedagogical or rhetorical move, because the definition is pretty weird.

I still disagree with your claim that people haven't made this critique before.

From your discussion:

[T

... (read more)
EJT
1y13
5
0

Two points, made in order of importance:

(1) How we define the term ‘coherence theorems’ doesn’t matter. What matters is that Premise 1 (striking out the word ‘coherence’, if you like) is false.

(2) The way I define the term ‘coherence theorems’ seems standard.

Now making point (1) in more detail:

Reserve the term ‘coherence theorems’ for whatever you like. Premise 1 is false: there are no theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other avai... (read more)

EJT
1y11
7
1

Using 'coherence theorems' with a meaning that is as standard as any, and explaining that meaning within two sentences, seems fine to me.

EJT
1y11
6
1

I would have hoped you reached the second sentence before skimming! I define what I mean (and what I take previous authors to mean) by 'coherence theorems' there.

I think your title might be causing some unnecessary consternation.  "You don't need to maximise utility to avoid domination" or something like that might have avoided a bit of confusion.

-5
Jaime Sevilla
1y
1
D0TheMath
1y
Ah, ok. Why don't you just respond with markets then!
EJT
1y20
10
1

I’m following previous authors in defining ‘coherence theorems’ as

theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.

On that definition, there are no coherence theorems. VNM is not a coherence theorem, nor is Savage’s Theorem, nor is Bolker-Jeffrey, nor are Dutch Book Arguments, nor is Cox’s Theorem, nor is the Complete Class Theorem.

there are theorems that are relevant to the question of agent coherence

I'd have no proble... (read more)

6
Habryka
1y
Can you be concrete whose previous authors definition are you using here? A google search for your definition returns no results but this post, and this is definitely not a definition of "coherence theorems" that I would use.
0
D0TheMath
1y
Spoiler (don't read if you want to work on a fun puzzle or test your alignment metal).
EJT
1y13
0
0

I haven't read this post yet, but it sounds like you might be interested in this paper on existential risks from a Thomist Christian perspective if you haven't seen it already.

EJT
1y15
8
0

Nice post! Consider this a vote for more summaries.

8
JackM
1y
Thanks Elliott! I wasn’t sure how you’d react to these summaries. I’m very happy to continue to make them. It’s also for my benefit so I can easily remind myself what a paper said. I think I’ll get back in touch with you or Rossa in the near future to offer if I can do anything else with regards to helping GPI research get heard.

All good points, but Tarsney's argument doesn't depend on the assumption that longtermist interventions cannot accidentally increase x-risk. It just depends on the assumption that there's some way that we could spend $1 million  that would increase the epistemic probability that humanity survives the next thousand years by at least 2x10^-14.

Thanks! This is valuable feedback.

By 'persistent difference', Tarsney doesn't mean a difference that persists forever. He just means a difference that persists for a long time in expectation: long enough to make the expected value of the longtermist intervention greater than the expected value of the neartermist benchmark intervention.

Perhaps you want to know why we should think that we can make this kind of persistent difference. I can talk a little about that in another comment if so.

Load more