Econ PhD student at Oxford and research associate at the Global Priorities Institute. I'm slightly less ignorant about economic theory than about everything else.

10

138

To answer the first question, no, the argument doesn’t rely on SIA. Let me know if the following is helpful.

Suppose your prior (perhaps after studying plate tectonics and so on, but not after considering the length of time that’s passed without an an extinction-inducing supervolcano) is that there’s probability “P(A)”=0.5 that risk of an extinction-inducing supervolcano at the end of each year is 1/2 and probability “P(B)”=0.5 that the risk is 1/10. Suppose that the world lasts at least 1 year and most 3 years regardless.

Let “A1” be the possible world in which the risk was 1/2 per year and we blew up at the end of year 1, “A2” be that in which the risk was 1/2 per year and we blew up at the end of year 2, and “A3” be that in which the risk was 1/2 per year and we never blew up, so that we got to exist for 3 years. Define B1, B2, B3 likewise for the risk=1/10 worlds.

Suppose there’s one “observer per year” before the extinction event and zero after, and let “Cnk”, with k<=n, be observer #k in world Cn (C can be A or B). So there are 12 possible observers: A11, A21, A22, A31, A32, A33, and likewise for the Bs.

If you are observer Cnk, your evidence is that you are observer #k. The question is what Pr(A|k) is; what probability you should assign to the annual risk being 1/2 given your evidence.

Any Bayesian, whether following SIA or SSA (or anything else), agrees that

**Pr(A|k) = Pr(k|A)Pr(A)/Pr(k),**

where Pr(.) is the credence an observer should have for an event according to a given anthropic principle. The anthropic principles disagree about the values of these credences, but here the disagreements cancel out. Note that we do not necessarily have Pr(A)=P(A): in particular, if the prior P(.) assigns equal probability to two worlds, SIA will recommend assigning higher credence Pr(.) to the one with more observers, e.g. by giving an answer of Pr(coin landed heads) = 1/3 in the sleeping beauty problem, where on this notation P(coin landed heads) = 1/2.

On SSA, your place among the observers is in effect generated first by randomizing among the worlds according to your prior and then by randomizing among the observers in the chosen world. So **Pr(A)=0.5**, and

Pr(1|A) = 1/2 + 1/4*1/2 + 1/4*1/3 = 17/24

(since Pr(n=1|A)=1/2, in which case k=1 for sure; Pr(n=2|A)=1/4, in which case k=1 with probability 1/2; and Pr(n=3|A)=1/4, in which case k=1 with probability 1/3);

**Pr(2|A) **= 1/4*1/2 + 1/4*1/3 = **5/24**; and

Pr(3|A) = 1/4*1/3 = 2/24.

For simplicity we can focus on the k=2 case, since that’s the case analogous to people like us, in the middle of an extended history. Going through the same calculation for the B worlds gives Pr(2|B) = 63/200, so** Pr(2)** = 0.5*5/24 + 0.5*63/200 = **157/600**.

So **Pr(A|2) = 125/314 ≈ 0.4.**

On SIA, your place among the observers is generated by randomizing among the observers, giving proportionally more weight to observers in worlds with proportionally higher prior probability, so that the probability of being observer Cnk is

1/12*Pr(Cn) / [sum over possible observers, labeled “Dmj”, of (1/12*Pr(Dm))].

This works out to Pr(2|A) = 2/7 [6 possible observers given A, but the one in the n=1 world “counts for double” since that world is twice as likely than the n=2 or =3 worlds a priori];

Pr(A) = 175/446 [less than 1/2 since there are fewer observers in expectation when the risk of early extinction is higher], and

Pr(2) = 140/446, so

**Pr(A|2) = 5/14 ≈ 0.36**.

So in both cases you update on the fact that a supervolcano did not occur at the end of year 1, from assigning probability 0.5 to the event that the underlying risk is 1/2 to assigning some lower probability to this event.

But I said that the disagreements canceled out, and here it seems that they don’t cancel out! This is because the anthropic principles disagree about Pr(A|2) for a reason other than the evidence provided by the lack of a supervolcano at the end of year 1: namely the *possible existence of year 3*. How to update on the fact that you’re in year 2 when you “could have been” in year 3 gets into doomsday argument issues, which the principles do disagree on. I included year 3 in the example because I worried it might seem fishy to make the example all about a 2-period setting where, in period 2, the question is just “what *was* the underlying probability we would make it here”, with no bearing on what probability we should assign to making it to the next period. But since this is really the example that isolates the anthropic shadow consideration, observe that if we simplify things so that the world lasts at most 2 years (and there 6 possible observers), SSA gives

Pr(2|A) = 1/4, Pr(A) = 1/2, Pr(2) = 4/5 -> Pr(A|2) = 5/14.

and SIA gives

Pr(2|A) = 1/3, Pr(A) = 15/34, Pr(2) = 14/34 -> Pr(A|2) = 5/14.

____________________________

An anthropic principle that *would *assign a different value to Pr(A|2)--for the extreme case of sustaining the “anthropic shadow”, a principle that would assign Pr(A|2)=Pr(A)=1/2--would be one in which your place among the observers is generated by

- first randomizing among times k (say, assigning k=1 and k=2 equal probability);
- then over worlds with an observer alive at k, maintaining your prior of Pr(A)=1/2;
- [and then perhaps over observers at that time, but in this example there is only one].

This is more in the spirit of SSA than SIA, but it is not SSA, and I don't think anyone endorses it. SSA randomizes over worlds and then over observers within each world, so that observing that you’re late in time is indeed evidence that “most worlds last late”.

I also found this remarkably clear and definitive--a real update for me, to the point of coming with some actual relief! I'm afraid I wasn't aware of the existing posts by Toby Crisford and Jessica Taylor.

I suppose if there's a sociological fact here it's that EAs and people who are nerdy in similar sorts of ways, myself absolutely included, can be quick to assume a position is true because it sounds reasonable and seemingly thoughtful other people who have thought about the question more have endorsed it. I don't think this single-handedly demonstrates we're *too *quick; not everyone can dig into everything, so at least to some extent it makes sense to specialize and defer despite the fact this is bound to happen now and then.

Of course argument-checking is also something one can specialize in, and one thing about the EA community which I think is uncommon and great is hiring people like Teru to dig into its cultural background assumptions like this...

To my mind, the first point applies to whatever resources are used throughout the future, whether it’s just the earth or some larger part of the universe.

I agree that the number/importance of welfare subjects in the future is a crucial consideration for how much to do longtermist as opposed to other work. But when comparing longtermist interventions—say, splitting a budget between lowering the risk of the world ending and proportionally increasing the fraction of resources devoted to creating happy artificial minds—it would seem to me that the “size of the future” typically multiplies the value of both interventions equally, and so doesn’t matter.

Ok--at Toby's encouragement, here are my thoughts:

This is a very old point, but to my mind, at least from a utilitarian perspective, the main reason it's worth working on promoting AI welfare is the risk of foregone upside. I.e. without actively studying what constitutes AI welfare and advocating for producing it, we seem likely to have a future that's very comfortable for ourselves and our descendants--fully automated luxury space communism, if you like--but which contains a very small proportion of the value that could have been created by creating lots of happy artificial minds. So concern for creating AI welfare seems likely to be the most important way in which utilitarian and human-common-sense moral recommendations differ.

It seems to me that the amount of value we could create if we really optimized for total AI welfare is probably greater than the amount of disvalue we'll create if we just use AI tools and allow for suffering machines by accident, since in the latter case the suffering would be a byproduct, not something anyone optimizes for.

But AI welfare work (especially if this includes moral advocacy) just for the sake of avoiding this downside also seems valuable enough to be worth a lot of effort on its own, even if suffering AI tools are a long way off. The animal analogy seems relevant: it's hard to replace factory farming once people have started eating a lot of meat, but in India, where Hinduism has discouraged meat consumption for a long time, less meat is consumed and so factory farming is evidently less widespread.

So in combination, I expect AI welfare work of some kind or another is probably very important. I have almost no idea what the best interventions would be or how cost-effective they would be, so I have no opinion on exactly how much work should go into them. I expect no one really knows at this point. But at face value the topic seems important enough to warrant at least doing exploratory work until we have a better sense of what can be done and how cost-effective it could be, only stopping in the (I think unlikely) event that we can say with some confidence that the best AI welfare work to be done is worse than the best work that can be done in other areas.

The point that it's better to save people with better lives than people with worse lives, all else equal, does make sense (at least from a utilitarian perspective). So you're right that [$ / lives saved] is not a perfect approach. I do think it's worth acknowledging this...!

But the right correction isn't to use VSLs. The way I'd put it is: a person's VSL--assuming it's been ideally calculated for each individual, putting aside issues about how governments estimate it in practice--is how many dollars they value as much as slightly lowering their chance of death. So the fact that VSLs differ across people mixes together two things: a rich person might have a higher VSL than a poor person (1) because the rich person values their life more, or (2) because the rich person values a dollar less. The first thing is right to correct for (from a utilitarian perspective), but as other commenters have noted, the second isn't.

My guess is that the second factor baked into the VSL is bigger in most real-world comparisons we might want to make, so that it's less of a mistake to just try to maximize [$ / lives saved] than to try to maximize [$ / (lives saved * VSL)].

I don't follow--are you saying that (i) AI safety efforts so far have obviously not actually accomplished much risk-reduction, (ii) that this is largely for risk compensation reasons, and (iii) that this is worth emphasizing in order to prevent us from carrying on the same mistakes?

If so, I agree that if (i)-(ii) are true then (iii) seems right, but I'm not sure about (i) and (ii). But if you're just saying that it would be good to know whether (i)-(ii) are true because if they are then it would be good to do (iii), I agree.

Good to hear, thanks!

I‘ve just edited the intro to say: it’s not obvious to me one way or the other whether it's a big deal in the AI risk case. I don't think I know much about the AI risk case (or any other case) to have much of an opinion, and I certainly don't think anything here is specific enough to come to a conclusion in any case. My hope is just that something here makes it easier to for people who do know about particular cases to get started thinking through the problem.

If I have to make a guess about the AI risk case, I'd emphasize my conjecture near the end, just before the "takeaways" section, namely that (as you suggest) there currently isn't a ton of restraint, so (b) mostly fails, but that this has a good chance of changing in the future:

Today, while even the most advanced AI systems are neither very capable nor very dangerous, safety concerns are not constraining much below . If technological advances unlock the ability to develop systems which offer utopia if their deployment is successful, but which pose large risks, then the developer’s choice of at any given is more likely to be far below , and the risk compensation induced by increasing is therefore more likely to be strong.

If lots/most of AI safety work (beyond evals) is currently acting more "like evals" than like pure "increases to S", great to hear--concern about risk compensation can just be an argument for making sure it stays that way!

I’m not sure I understand the second question. I would have thought both updates are in the same direction: the fact that we’ve survived on Earth a long time tells us that this is a planet hospitable to life, both in terms of its life-friendly atmosphere/etc and in terms of the rarity of supervolcanoes.

We can say, on anthropic grounds, that it would be confused to think other planets are hospitable on the basis of Earth’s long and growing track record. But as time goes on, we get more evidence that we really are on a life-friendly planet, and haven’t just had a long string of luck on a life-hostile planet.

The anthropic shadow argument was an argument along the lines, “no, we shouldn’t get ever more convinced we’re on a life-friendly planet over time (just on the evidence that we’re still around). It

isactually plausible that we’ve just had a lucky streak that’s about to break—and this lack of update is in some way because no one is around to observe anything in the worlds that blow up”.