All of Elliott Thornley (EJT)'s Comments + Replies

I said a little in another thread. If we get aligned AI, I think it'll likely be a corrigible assistant that doesn't have its own philosophical views that it wants to act on. And then we can use these assistants to help us solve philosophical problems. I'm imagining in particular that these AIs could be very good at mapping logical space, tracing all the implications of various views, etc. So you could ask a question and receive a response like: 'Here are the different views on this question. Here's why they're mutually exclusive and jointly exhaustive. He... (read more)

4
Wei Dai
The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don't know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There's also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand the tree for us, there are potential advantages like you mentioned, but also potential disadvantages, like AIs not having good intuition/judgment about what lines of arguments to pursue, or the argument tree (or AI-generated philosophical literature) becoming too large for any humans to read and think about in a relevant time frame. Many will be very tempted to just let AIs answer the questions / make the final conclusions for us, especially if AIs also accelerate technological progress, creating many urgent philosophical problems related to how to use them safely and beneficially. Or if humans try to make the conclusions, can easily get them wrong despite AI help with expanding the argument tree. So I think undergoing the AI transition without solving metaphilosophy, or making AIs autonomously competent at philosophy (good at getting correct conclusions by themselves) is enormously risky, even if we have corrigible AIs helping us.

I'm not sure but I think maybe I also have a different view than you on what problems are going to be bottlenecks to AI development. e.g. I think there's a big chance that the world would steam ahead even if we don't solve any of the current (non-philosophical) problems in alignment (interpretability, shutdownability, reward hacking, etc.).

try to make them "more legible" to others, including AI researchers, key decision makers, and the public

Yes, I agree this is valuable, though I think it's valuable mainly because it increases the probability that people use future AIs to solve these problems, rather than because it will make people slow down AI development or try very hard to solve them pre-TAI.

4
Elliott Thornley (EJT)
I'm not sure but I think maybe I also have a different view than you on what problems are going to be bottlenecks to AI development. e.g. I think there's a big chance that the world would steam ahead even if we don't solve any of the current (non-philosophical) problems in alignment (interpretability, shutdownability, reward hacking, etc.).

I don't think philosophical difficulty is that much of an increase to the difficulty of alignment, mainly because I think that AI developers should (and likely will) aim to make AIs corrigible assistants rather than agents with their own philosophical views that they try to impose on the world. And I think it's fairly likely that we can use these assistants (if we succeed in getting them and aren't disempowered by a misaligned AI instead) to help a lot with these hard philosophical questions.

I didn't meant to imply that Wei Dai was overrating the problems' importance. I agree they're very important! I was making the case that they're also very intractable.

If I thought solving these problems pre-TAI would be a big increase to the EV of the future, I'd take their difficulty to be a(nother) reason to slow down AI development. But I think I'm more optimistic than you and Wei Dai about waiting until we have smart AIs to help us on these problems.

2
Wei Dai
Do you want to talk about why you're relatively optimistic? I've tried to explain my own concerns/pessimism at https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy and https://forum.effectivealtruism.org/posts/axSfJXriBWEixsHGR/ai-doing-philosophy-ai-generating-hands.

I'm a philosopher who's switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.

With regards to your Problems in AI Alignment that philosophers could potentially contribute to:

  • I agree that many of these questions are important and that more people should work on them.
  • But a fair amount of them are discussed in conventional academic philosophy, e.g.:
    • How to resolve standard debates in decision theory?
    • Infinite/multiversal/astronomical ethics
    • Fair distribution of benefits
    • What is the nature o
... (read more)

This reads to me like you're saying "these problems are hard [so Wei Dai is over-rating the importance of working on them]", whereas the inference I would make is "these problems are hard, so we need to slow down AI development, otherwise we won't be able to solve them in time."

5
Wei Dai
I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don't necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them "more legible" to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.
4
Wei Dai
I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be "called to arms" by a civilization-wide AI safety effort, and would be given sufficient resources including time, so the list was trying to be more comprehensive (listing every philosophical problem that I thought relevant to AI safety) than prioritizing. Unfortunately, the reality is nearly the completely opposite of this. Currently, one of my main puzzles is why philosophers with public AI x-risk estimates still have numbers in the 10% range, despite reality being near the most pessimistic of my range of expectations, and it looking like that the AI takeoff/transition will occur while most of these philosophical problems will remain in a wide open or totally confused state, and AI researchers seem almost completely oblivious or uncaring about this. Why are they not making the same kind of argument that I've been making, that philosophical difficulty is a reason that AI alignment/x-safety is harder than many think, and an additional reason to pause/stop AI?

Makes sense! Unfortunately any x-risk cost-effectiveness calculation has to be a little vibes-based because one of the factors is 'By how much would this intervention reduce x-risk?', and there's little evidence to guide these estimates.

Whether longtermism is a crux will depend on what we mean by 'long,' but I think concern for future people is a crux for x-risk reduction. If future people don't matter, then working on global health or animal welfare is the more effective way to improve the world. The more optimistic of the calculations that Carl and I do suggest that, by funding x-risk reduction, we can save a present person's life for about $9,000 in expectation. But we could save about 2 present people if we spent that money on malaria prevention, or we could mitigate the suffering of ... (read more)

3
Matrice Jacobine🔸🏳️‍⚧️
This seems clearly wrong. If you believe that it would take a literal Manhattan project for AI safety ($26 billion adjusting for inflation) to reduce existential risk by a mere 1% and only care about the current 8 billion people dying, then you can save a present person's life for $325, swamping any GiveWell-recommended charity.
3
cb
Yep, I was being imprecise. I think the most plausible (and actually believed-in) alternative to longtermism isn't "no care at all for future people", but "some >0 discount rate", and I think xrisk reduction will tend to look good under small >0 discount rates. I do also agree that there are some combinations of social discount rate and cost-effectiveness of longtermism, such that xrisk reduction isn't competitive with other ways of saving lives. I don't yet think this is clearly the case, even given the numbers in your paper — afaik the amount of existential risk reduction you predicted was pretty vibes-based, so I don't really take the cost-effectiveness calculation it produces seriously. (And I  haven't done the math myself on discount rates and cost-effectiveness.)  Even if xrisk reduction doesn't look competitive with e.g. donating to AMF, I think it would be pretty reasonable for some people to spend more time thinking about it to figure out if they could identify more cost-effective interventions. (And especially if they seemed like poor fits for E2G or direct work.)

Oops yes, fundamentals between my and Bruce's cases are very similar. Should have read Bruce's comment!

The claim we're discussing - about the possibility of small steps of various kinds - sounds kinda like a claim that gets called 'Finite Fine-Grainedness'/'Small Steps' in the population axiology literature. It seems hard to convincingly argue for, so in this paper I present a problem for lexical views that doesn't depend on it. I sort of gestured at it above with the point about risk without making it super precise. The one-line summary is that expected welfare levels are finitely fine-grained.

Oh yep nice point, though note that - e.g. - there are uncountably many reals between 1,000,000 and 1,000,001 and yet it still seems correct (at least talking loosely) to say that 1,000,001 is only a tiny bit bigger than 1,000,000.

But in any case, we can modify the argument to say that S* feels only a tiny bit worse than S. Or instead we can modify it so that S is degrees celsius of a fire that causes suffering that just about can be outweighed, and S* is degrees celsius of a fire that causes suffering that just about can't be outweighed.

4
Ben_West🔸
I interpret OP's point about asymptotes to mean that he indeed bites this bullet and believes that the "compensation schedule" is massively higher even when the "instrument" only feels slightly worse?

Nice post! Here's an argument that extreme suffering can always be outweighed.

Suppose you have a choice between:

(S+G): The most intense suffering S that can be outweighed, plus a population that's good enough to outweigh it G, so that S+G is good overall: better than an empty population.

(S*+nG): The least intense suffering S* that can't be outweighed, plus a population that's n times better than the good population G.

If extreme suffering can't be outweighed, we're required to choose S+G over S*+nG, no matter how big n is. But that seems implausible. S* is ... (read more)

9
Ben_West🔸
In his examples (R∗ and Rn lexically ordered) there is no "most intense suffering which can be outweighed" (or "least intense suffering which can't be outweighed"). E.g. in the hyperreals ∀n1,n2∈R:n1ω>n2 no matter how small n1 or large n2.  In his examples, between any S which can't be outweighed and S* which can, there are an uncountably infinite number of additional levels of suffering! So I don't think it's correct to say it's only a tiny bit worse.

Note also that you can accept outweighability and still believe that extreme suffering is really bad. You could - e.g. - think that 1 second of a cluster headache can only be outweighed by trillions upon trillions of years of bliss. That would give you all the same practical implications without the theoretical trouble.

+1 to this, this echoes some earlier discussion we've had privately and I think it would be interesting to see it fleshed out more, if your current view is to reject outweighability in theory

More importantly I think this points to a potentia... (read more)

Nice point, but I think it comes at a serious cost.

To see how, consider a different case. In X, ten billion people live awful lives. In Y, those same ten billion people live wonderful lives. Clearly, Y is much better than X. 

Now consider instead Y* which is exactly the same as Y except that we also add one extra person, also with a wonderful life. As before, Y* is much better than X for the original ten billion people. If we say that the value of adding the extra person is undefined and that this undefined value renders the value of the whole change f... (read more)

Wait, all the LLMs get 90+ on ARC? I thought LLMs were supposed to do badly on ARC.

It's an unfortunate naming clash, there are different ARC Challenges:

ARC-AGI (Chollet et al) - https://github.com/fchollet/ARC-AGI

ARC (AI2 Reasoning Challenge) - https://allenai.org/data/arc

These benchmarks are reporting the second of the two.

LLMs (at least without scaffolding) still do badly on ARC, and I'd wager Llama 405B still doesn't do well on the ARC-AGI challenge, and it's telling that all the big labs release the 95%+ number they get on AI2-ARC, and not whatever default result they get with ARC-AGI...

(Or in general, reporting benchmarks where they... (read more)

You should read the post! Section 4.1.1 makes the move that you suggest (rescuing PAVs by de-emphasising axiology). Section 5 then presents arguments against PAVs that don't appeal to axiology. 

4
Lukas_Gloor
Sorry, I hate it when people comment on something that has already been addressed. FWIW, though, I had read the paper the day it was posted on the GPI fb page. At that time, I didn't feel like my point about "there is no objective axiology" fit into your discussion. I feel like even though you discuss views that are "purely deontic" instead of "axiological," there are still some assumptions from the axiology-based framework that underly your conclusion about how to reason about such views. Specifically, when explaining why a view says that it would be wrong to create only Amy but not Bobby, you didn't say anything that suggests understanding of "there is no objective axiology about creating new people/beings." That said, re-reading the sections you point to, I think it's correct that I'd need to give some kind of answer to your dilemmas, and what I'm advocating for seems most relevant to this paragraph: At the very least, I owe you an explanation of what I would say here. I would indeed advocate for what you call the "intermediate wide view," but I'd motivate this view a bit differently. All else equal, IMO, the problem with creating Amy and then not creating Bobby is that these specific choices, in combination, and if it would have been low-effort to choose differently (or the other way around), indicate that you didn't consider the interests of possible people/beings even to a minimum degree. Considering them to a minimum degree would mean being willing to at least take low-effort actions to ensure your choices aren't objectionable from their perspective (the perspective of possible people/beings). Adding someone with +1 when you could've easily added someone else with +100 just seems careless. If Alice and Bobby sat behind a veil of ignorance, not knowing which of them will be created with +1 or +100 (if someone gets created at all), the one view they would never advocate for is "only create the +1 person." If they favor anti-natalist views, they advocate f

I think my objections still work if we 'go anonymous' and remove direct information about personal identity across different options. We just need to add some extra detail. Let the new version of One-Shot Non-Identity be as follows. You have a choice between: (1) combining some pair of gametes A, which will eventually result in the existence of a person with welfare 1, and (2) combining some other pair of gametes B, which will eventually result in the existence of a person with welfare 100. 

The new version of Expanded Non-Identity is then the same as ... (read more)

Here's my understanding of the dialectic here:

Me: Some wide views make the permissibility of pulling both levers depend on whether the levers are lashed together. That seems implausible. It shouldn't matter whether we can pull the levers one after the other.

Interlocutor: But lever-lashing doesn't just affect whether we can pull the levers one after the other. It also affects what options are available. In particular, lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. So if a wide view has ... (read more)

2
Michael St Jules 🔸
Ah, I should have read more closely. I misunderstood and was unnecessarily harsh. I'm sorry. I think your response to Risberg is right. I would still say that permissibility could depend on lever-lashing (in some sense?) because it affects what options are available, though, but in a different way. Here is the view I'd defend: Here are the consequences in your thought experiments: 1. In the four button case, the "Just Amy" button is impermissible, because there's a "Just Bobby" button. 2. In the lashed levers case, it's impermissible to pull either, because this would give "Just Amy", and the available alternative is "Just Bobby". 3. In the unlashed levers case,  1. Ahead of time, each lever is permissible to pull and permissible to not pull, as long as you won't pull both (or leave both pulled, in case you can unpull). Ahead of time, pulling both levers is impermissible, because that would give "Just Amy", and "Just Bobby" is still available. This agrees with 1 and 2. 2. But if you have already pulled one lever (and this is irreversible), then "Just Bobby" is no longer available (either Amy is/will be created, or Bobby won't be created), and pulling the other is permissible, which would give "Just Amy". "Just Amy" is therefore permissible at this point. As we see in 3.b., "Just Bobby" gets ruled out, and then "Just Amy" becomes permissible after and because of that, but only after "Just Bobby" is ruled out, not before. Permissibility depends on what options are still available, specifically if "Just Bobby" is still available in these thought experiments. "Just Bobby" is still available in 2 and 3.a. In your post, you wrote: This is actually true ahead of time, in 2 and 3.a, with pulling both together impermissible. But already having pulled a lever and then pulling the other is permissible, in 3.b.  Maybe this is getting pedantic and off-track, but "already having pulled a lever" is not an action available to you, it's just a state of the world.

Thanks! I'd like to think more at some point about Dasgupta's approach plus resolute choice. 

2
Michael St Jules 🔸
I wrote a bit more about Dasgupta's approach and how to generalize it here.

In Parfit's case, we have a good explanation for why you're rationally required to bind yourself: doing so is best for you.

Perhaps you're morally required to bind yourself in Two-Shot Non-Identity, but why? Binding yourself isn't better for Amy. And if it's better for Bobby, it seems that can only be because existing is better for Bobby than not-existing, and then there's pressure to conclude that we're required to create Bobby in Just Bobby, contrary to the claims of PAVs.

And suppose that (for whatever reason) you can't bind yourself in Two-Shot Non-Ident... (read more)

2
Michael St Jules 🔸
The more general explanation is that it's best according to your preferences, which can also reflect or just be your moral views. It's not necessarily a matter of personal welfare, narrowly construed. We have similar thought experiments for total utilitarianism. As long as you 1.  expect to do more to further your own values/preferences with your own money than the driver would to further your own values/preferences with your money 2. don't disvalue breaking promises (or don't disvalue it enough), and 3. can't bind yourself to paying and know this, then you'd predict you won't pay and be left behind. Generically, if and because you hold a wide PAV, and it leads to the best outcome ahead of time on that view. There could be various reasons why someone holds a wide PAV. It's not about it being better for Bobby or Amy. It's better "for people", understood in wide person-affecting terms. One rough argument for wide PAVs could be something like this, based on Frick, 2020 (but without asymmetry): 1. If a person A existed, exists or will exist in an outcome,[1] then the moral standard of "A's welfare" applies in that outcome, and its degree of satisfaction is just A's lifetime (or future) welfare. 2. Between two outcomes, X and Y, if 1) standard x applies in X and standard y applies in Y (and either x and y are identical standards or neither applies in both X and Y), 2) standards x and y are of "the same kind", 3) x is at least as satisfied in X as y is in Y, and 4) all else is equal, then X ≿ Y (X is at least as good as Y). 1. If keeping promises matters in itself, then it's better to make a promise you'll keep than a promise you'll break, all else equal. 2. With 1 (and assuming different people result in "the same kind" of welfare standards with comparable welfare), "Just Bobby" is better than "Just Amy", because the moral standard of Bobby's welfare would be more satisfied than the moral standard of Amy's welfare. 3. This is basically Pareto for st

Interesting, thanks! I hadn't come across this argument before.

1
Matthew Rendall
It's in his book Inequality, chapter 9. Ingmar Persson makes a similar argument about the priority view here: https://link.springer.com/article/10.1023/A:1011486120534.

Yes, nice points. If one is committed to contingent people not counting, then one has to say that C is worse than B. But it still seems to me like an implausible verdict, especially if one of B and C is going to be chosen (and hence those contingent people are going to become actual). 

It seems like the resulting view also runs into problems of sequential choice. If B is best out of {A, B, C}, but C is best out of {B, C}, then perhaps what you're required to do is initially choose B and then (once A is no longer available) later switch to C, even if doing so is costly. And that seems like a bad feature of a view, since you could have costlessly chosen C in your first choice.

2
Michael St Jules 🔸
I think you'd still just choose A at the start here if you're considering what will happen ahead of time and reasoning via backwards induction on behalf of the necessary people. (Assuming C is worse than A for the original necessary people.) If you don't use backwards induction, you're going to run into a lot of suboptimal behaviour in sequential choice problems, even if you satisfy expected utility theory axioms in one-shot choices.

Taken as an argument that B isn't better than A, this response doesn't seem so plausible to me. In favour of B being better than A, we can point out: B is better than A for all of the necessary people, and pretty good for all the non-necessary people. Against B being better than A, we can say something like: I'd regret picking B over C. The former rationale seems more convincing to me, especially since it seems like you could also make a more direct, regret-based case for B being better than A: I'd regret picking A over B.

But taken as an argument that A is permissible, this response seems more plausible. Then I'd want to appeal to my arguments against deontic PAVs.

3
Michael St Jules 🔸
A steelman could be to just set it up like a hypothetical sequential choice problem consistent with Dasgupta's approach: 1. Choose between A and B 2. If you chose B in 1, choose between B and C. or 1. Choose between A and (B or C). 2. If you chose B or C in 1, choose between B and C. In either case, "picking B" (including "picking B or C") in 1 means actually picking C, if you know you'd pick C in 2, and then use backwards induction. The fact that A is at least as good as (or not worse than and incomparable to) B could follow because B actually just becomes C, which is equivalent to A once we've ruled out B. It's not just facts about direct binary choices that decide rankings ("betterness"), but the reasoning process as a whole and how we interpret the steps. At any rate, I don’t think it’s that important whether we interpret the rankings as "betterness", as usually understood, with its usual sensitivities and only those. I think you've set up a kind of false dichotomy between permissibility and betterness as usually understood. A third option is rankings not intended to be interpeted as betterness as usual. Or, we could interpret betterness more broadly. Having separate rankings of options apart from or instead of strict permissibility facts can still be useful, say because we want to adopt something like a scalar consequentialist view over those rankings. I still want to say that C is "better" than B, which is consistent with Dasgupta's approach. There could be other options like A, with the same 100 people, but everyone gets 39 utility instead of 40, and another where everyone gets 20 utility instead. I still want to say 39 is better than 20, and ending up with 39 instead of 40 is not so bad, compared to ending up with 20, which would be a lot worse.

Yes, nice point. I argue against this kind of dependence in footnote 16 of the paper. Here's what I say there:

Here’s a possible reply, courtesy of Olle Risberg. What we’re permitted to do depends on lever-lashing, but not because lever-lashing precludes pulling the levers one after the other. Instead, it’s because lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. If we have the option to create both and the option to create neither, then creating just Amy is permissible. If we don’t have

... (read more)
2
Michael St Jules 🔸
EDIT: Actually my best reply is that just Amy is impermissible whenever just Bobby is available, ahead of time considering all your current and future options (and using backwards induction). The same reason applies for all of the cases, whether buttons, levers, or lashed levers. EDIT2: I think I misunderstood and was unfairly harsh below. ---------------------------------------- I do still think the rest of this comment below is correct in spirit as a general response, i.e. a view can make different things impermissible for different reasons. I also think you should have followed up to your own reply to Risberg or anticipated disjunctive impermissibility in response, since it seems so obvious to me, given its simplicity and I think it’s a pretty standard way to interpret (im)permissibility. Like I would guess Risberg would have pointed out the same (but maybe you checked?). Your response seems uncharitable/like a strawman. Still, the reasons are actually the same in the cases here, but for a more sophisticated reason that seems easier to miss, i.e. considering all future options ahead of time. ‐-------- I agree that my/Risberg's reply doesn't help in this other case, but you can have different replies for different cases. In this other case, you just use the wide view's solution to the nonidentity problem, which tells you to not pick just Amy if just Bobby is available. Just Amy is ruled out for a different reason. And the two types of replies fit together in a single view, which is a wide view considering the sequences of options ahead of time and using backwards induction (everyone should use backwards induction in (finite) sequential choice problems, anyway). This view will give the right reply when it's needed. Or, you could look at it like if something is impermissible for any reason (e.g. via either reply), then it is impermissible period, so you treat impermissibility disjunctively. As another example, someone might say each of murder and lying are i

I'm quite surprised that superforecasters predict nuclear extinction is 7.4 times more likely than engineered pandemic extinction, given that (as you suggest) EA predictions usually go the other way. Do you know if this is discussed in the paper? I had a look around and couldn't find any discussion.

2
Vasco Grilo🔸
Hi EJT, I was also curious to understand why superforecasters' nuclear extinction risk was so high. Sources of agreement, disagreement and uncertainty, and arguments for low and high estimates are discussed on pp. 298 to 303. I checked these a few months ago, and my recollection is that the forecasters have the right qualitative considerations in mind, but I do believe they are arriving to an overly high extinction risk. I recently commented about this. Note domain experts guessed an even higher nuclear extinction probability by 2100 of 0.55 %, 7.43 (= 0.0055/0.00074) times that of the superforecasters. This is specially surprising considering: * The pool of experts drew more heavily from the EA community than the pool of superforecasters. "The sample drew heavily from the Effective Altruism (EA) community: about 42% of experts and 9% of superforecasters reported that they had attended an EA meetup". * I would have expected people in the EA community to guess a lower nuclear extinction risk. 0.55 % is 5.5 times Toby Ord's guess given in The Precipice for nuclear existential risk from 2021 to 2120 of 0.1 %, and extinction risk should be lower than existential risk.

That all sounds approximately right but I'm struggling to see how it bears on this point:

If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent's preference. And once we do that, we can observe violations of Completeness.

Can you explain?

The only thing that matters is whether the agent's resulting behaviour can be coherently described as maximising a utility function.

If you're only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent's preferences are defined over entire histories of ... (read more)

4
Lucius Bushnaq
Yes, it indeed can be. However, the less coherent the agent acts, the more cumbersome it will be to describe it as an expected utility maximiser. Once your utility function specifies entire histories of the universe, its description length goes through the roof. If describing a system as a decision theoretic agent is that cumbersome, it's probably better to look for some other model to predict its behaviour. A rock, for example, is not well described as a decision theoretic agent. You can technically specify a utility function that does the job, but it's a ludicrously large one. The less coherent and smart a system acts, the longer the utility function you need to specify to model its behaviour as a decision theoretic agent will be. In this sense, expected-utility-maximisation does rule things out, though the boundary is not binary. It's telling you what kind of systems you can usefully model as "making decisions" if you want to predict their actions. If you would prefer math that talks about the actual internal structures agents themselves consist of, decision theory is not the right field to look at. It just does not address questions like this at all. Nowhere in the theorems will you find a requirement that an agent's preferences be somehow explicitly represented in the algorithms it "actually uses" to make decisions, whatever that would mean. It doesn't know what these algorithms are, and doesn't even have the vocabulary to formulate questions about them. It's like saying we can't use theorems for natural numbers to make statements about counting sheep, because sheep are really made of fibre bundles over the complex numbers, rather than natural numbers. The natural numbers are talking about our count of the sheep, not the physics of the sheep themselves, nor the physics of how we move our eyes to find the sheep. And decision theory is talking about our model of systems as agents that make decisions, not the physics of the systems themselves and how some parts

Thanks, Danny! This is all super helpful. I'm planning to work through this comment and your BCA update post next week.

I think this paper is missing an important distinction between evolutionarily altruistic behaviour and functionally altruistic behaviour.

  • Evolutionarily altruistic behaviour: behaviour that confers a fitness benefit on the recipient and a fitness cost on the donor.
  • Functionally altruistic behaviour: behaviour that is motivated by an intrinsic concern for others' welfare.

These two forms of behaviour can come apart.

A parent's care for their child is often functionally altruistic but evolutionarily selfish: it is motivated by an intrinsic concern for the child'... (read more)

I wouldn't call a small policy like that 'democratically unacceptable' either. I guess the key thing is whether a policy goes significantly beyond citizens' willingness to pay not only by a large factor but also by a large absolute value. It seems likely to be the latter kinds of policies that couldn't be adopted and maintained by a democratic government, in which case it's those policies that qualify as democratically unacceptable on our definition.

suggests that we are not too far apart.

Yes, I think so!

I guess this shows that the case won't get through with the conservative rounding off that you applied here, so future developments of this CBA would want to go straight for the more precise approximations in order to secure a higher evaluation.

And thanks again for making this point (and to weeatquince as well). I've written a new paragraph emphasising a more reasonable, less conservative estimate of benefit-cost ratios. I expect it'll probably go in the final draft, and I'll edit the post here to incl... (read more)

Thanks for this! All extremely helpful info.

Naively a benefit cost ratio of >1 to 1 suggests that a project is worth funding. However given the overhead costs of government policy, to governments propensity to make even cost effective projects go wrong and public preferences for money in hand it may be more appropriate to apply a higher bar for cost-effective government spending. I remember I used to have a 3 to 1 ratio, perhaps picked up when I worked in Government although I cannot find a source for this now.

This is good to know. Our BCR of 1.6 is bas... (read more)

Though I agree that refuges would not pass a CBA, I don't think they are an example of something that would be extreme cost to those alive today-I suspect significant value could be obtained with $1 billion.

I think this is right. Our claim is that a strong longtermist policy as a whole would place extreme burdens on the present generation. We expect that a strong longtermist policy would call for particularly extensive refuges (and lots of them) as well as the other things that we mention in that paragraph.

We also focus on the risk of global catastrophes,

... (read more)

Maybe an obvious point, but I think we shouldn't lose sight of the importance of providing EA funding for catastrophe-preventing interventions, alongside attempts to influence government. Attempts to influence government may fail / fall short of what is needed / take too long given the urgency of action.

Yep, agreed! 

Should we just get on with developing refuges ourselves?

My impression is that this is being explored. See, e.g., here.

Second, the argument overshoots.

The argument we mean to refer to here is the one that we call the ‘best-known argument’ elsewhere: the one that says that the non-existence of future generations would be an overwhelming moral loss because the expected future population is enormous, the lives of future people are good in expectation, and it is better if the future contains more good lives. We think that this argument is liable to overshoot.

I agree that there are other compelling longtermist arguments that don’t overshoot. But my concern is that governments c... (read more)

3
Toby_Ord
Thanks for the clarifications!

But CBA cares about marginal cost effectiveness and presumably the package can be broken into chunks of differing ex-ante cost-effectiveness (e.g. by intervention type, or by tranches of funding in each intervention). Indeed you suggest this later in the piece. Since the average only just meets the bar, if there is much variation, the marginal work won’t meet the bar, so government funding would cap out at something less than this, perhaps substantially so.

Yes, this is an important point. If we were to do a more detailed cost-benefit analysis of catastroph... (read more)

2
Toby_Ord
Thanks Elliott, I guess this shows that the case won't get through with the conservative rounding off that you applied here, so future developments of this CBA would want to go straight for the more precise approximations in order to secure a higher evaluation. Re the possibility of international agreements, I agree that they can make it easier to meet various CBA thresholds, but I also note that they are notoriously hard to achieve, even when in the interests of both parties. That doesn't mean that we shouldn't try, but if the CBA case relies on them then the claim that one doesn't need to go beyond it (or beyond CBA-plus-AWTP) becomes weaker. That said, I think some of our residual disagreement may be to do with me still not quite understanding what your paper is claiming. One of my concerns is that I'm worried that CBA-plus-AWTP is a weak style of argument — especially with elected politicians. That is, arguing for new policies (or treaties) on grounds of CBA-plus-AWTP has some sway for fairly routine choices made by civil servants who need to apply government cost-effectiveness tests, but little sway with voters or politicians. Indeed, many people who would be benefited by such cost-effectiveness tests are either bored by — or actively repelled by — such a methodology. But if you are arguing that we should only campaign for policies that would pass such a test, then I'm more sympathetic. In that case, we could still make the case for them in terms that will resonate more broadly.

On the first, I think we should use both traditional CBA justifications as well as longtermist considerations

I agree with this. What we’re arguing for is a criterion: governments should fund all those catastrophe-preventing interventions that clear the bar set by cost-benefit analysis and altruistic willingness to pay. One justification for funding these interventions is the justification provided by CBA itself, but it need not be the only one. If longtermist justifications help us get to the place where all the catastrophe-preventing interventions that cl... (read more)

at other times you seem to trade on the idea that there is something democratically tainted about political advocacy on behalf of the people of the future — this is something I strongly reject.

I reject that too. We don’t mean to suggest that there is anything democratically tainted about that kind of advocacy. Indeed, we say that longtermists should advocate on behalf of future generations, in order to increase the present generation’s altruistic willingness to pay for benefits to future generations.

What we think would be democratically unacceptable is gov... (read more)

4
Toby_Ord
I'm not so sure about that. I agree with you that it would be normatively problematic in the paradigm case of a policy that imposed extreme costs on current society for very slight reduction in total existential risk — let's say, reducing incomes by 50% in order to lower risk by 1 part in 1 million. But I don't know that it is true in general. First, consider a policy that was inefficient but small — e.g. one that cost $10 million to the US govt, but reduced the number of statistical lives lost in the US by only 0.1, I don't think I'd say that this was democratically unacceptable. Policies like this are enacted all the time in safety contexts and are often inefficient and ill-thought-out, and I'm not generally in favour of them, but I don't find them to be undemocratic. I suppose one could argue that all US policy that doesn't pass a CBA is undemocratic (or democratically unacceptable), but that seems a stretch to me. So I wonder whether it is correct to count our intuitions on the extreme example as counting against all policies that are inefficient in traditional CBA terms or just against those that impose severe costs.

Thanks, these comments are great! I'm planning to work through them later this week. 

I agree with pretty much all of your bulletpoints. With regards to the last one, we didn't mean to suggest that arguing for greater concern about existential risks is undemocratic. Instead, we meant to suggest that (in the world as it is today) it would be undemocratic for governments to implement polices that place heavy burdens on the present generation for the sake of small reductions in existential risk.

Thanks for the tip! Looking forward to reading your paper.

but surely to be authorized by wider consultation

What do you mean by this?

2
Matt Boyd
Thanks. I guess this relates to your point about democratically acceptable decisions of governments. If a government is choosing to neglect something (eg because its probability is low, or because they have political motivations for doing so, vested interests etc), then they should only do so if they have information suggesting the electorate has/would authorize this. Otherwise it is an undemocratic decision. 

Thanks for the comment!

There are clear moral objections against pursuing democratically unacceptable policies

What we mean with this sentence is that there are clear moral objections against governments pursuing [perhaps we should have said 'instituting'] democratically unacceptable policies. We don't mean to suggest that there's anything wrong with citizens advocating for policies that are currently democratically unacceptable with the aim of making them democratically acceptable.

8
Richard Y Chappell🔸
OK, thanks for clarifying! I guess there's a bit of ambiguity surrounding talk of "the goal of longtermists in the political sphere", so maybe worth distinguishing immediate policy goals that could be implemented right away, vs. external (e.g. "consciousness-raising") advocacy aimed at shifting values. It's actually an interesting question when policymakers can reasonably go against public opinion. It doesn't seem necessarily objectionable (e.g. to push climate protection measures that most voters are too selfish or short-sighted to want to pay for). There's a reason we have representative rather than direct democracy. But the key thing about your definition of "democratically unacceptable" is that it specifies the policy could not possibly be maintained, which more naturally suggests a feasibility objection than a moral one, anyhow. But I'm musing a bit far afield now.  Thanks for the thought-provoking paper!

Great post!

For example, maybe, according to you, you’re an “all men are created equal” type. That is, you treat all men equally. Maybe you even write a fancy document about this, and this document gets involved in the founding of a country, or something.

There’s a thing philosophy can do, here, which is to notice that you still own slaves. Including: male slaves. And it can do that whole “implication” thing, about how, Socrates is a man, you treat all men equally, therefore you treat Socrates equally, except oh wait, you don’t, he’s your slave.

Charles Mills... (read more)

I think all of these objections would be excellent if I were arguing against this claim: 

  • Agents are rationally required to satisfy the VNM axioms.

But I’m arguing against this claim:

  • Sufficiently-advanced artificial agents will satisfy the VNM axioms.

And given that, I think your objections miss the mark. 

On your first point, I’m prepared to grant that agents have no reason to rule out option A- at node 2. All I need to claim is that advanced artificial agents might rule out option A- at node 2. And I think my argument makes that... (read more)

There's a complication here related to a point that Rohin makes : if we can only see an agent's decisions and we know nothing about its preferences, all behavior can be rationalized as EU maximization.

But suppose we set that complication aside. Suppose we know this about an agent's preferences: 

  • There is some option A such that the agent strictly prefers A+$1

Then we can observe violations of Completeness. Suppose that we first offer our agent a choice between A and some other option B, and that the agent chooses A. Then we give the agent the chance to ... (read more)

Nice point. The rough answer is 'Yes, but only once the agent has turned down a sufficiently wide array of options.' Depending on the details, that might never happen or only happen after a very long time. 

I've had a quick think about the more precise answer, and I think it is: 

  • The agent's preferences will be functionally complete once and only once it is the case that, for all pairs of options between which the agent has a preferential gap, the agent has turned down an option that is strictly preferred to one of the options in the pair.
4
Nick_Anyos
I had a similar thought to Shiny. Am I correct that an agent following your suggested policy ("‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ ") would never *appear* to violate completeness from the perspective of an observer that could only see their decisions and not their internal state? And assuming completeness is all we need to get to full utility maximization, does that mean an agent following your policy would act like a utility maximizer to an observer?
Load more