All of tobycrisford's Comments + Replies

Against Anthropic Shadow

I think that makes sense!

There is another independent aspect to anthropic reasoning too, which is how you assign probabilities to 'indexical' facts. This is the part of anthropic reasoning I always thought was more contentious. For example, if two people are created, one with red hair and one with blue hair, and you are one of these people, what is the probability that you have red hair (before you look in the mirror)? We are supposed to use the 'Self-Sampling Assumption' here, and say the answer is 1/2, but if you just naively apply that rule too widely t... (read more)

Against Anthropic Shadow

I think that's a good summary of where our disagreement lies. I think that your "sample worlds until the sky turns out blue" methodology for generating a sample is very different to the  existence/non-existence case, especially if there is actually only one world! If there are many worlds, it's more similar, and this is why I think anthropic shadow has more of a chance of working in that case (that was my 'Possible Solution #2').

I find it very interesting that your intuition on the Russian roulette is the other way round to mine. So if there are two g... (read more)

2Jonas Moss23d
Here's a rough sketch of how we could, potentially, think about anthropic problems. LetPtbe a sequence of true, bird's-eye view probability measures andQt your own measures, trying to mimicPtas closely as possible. These measures aren't defined on the same sigma-algebra. The sequence of true measures is defined on some original sigma-algebraΣ, but your measure is defined only on the sigma-algebra{A∩{ωwhere the sky is blue at timet}}. Now, the best-known probability measure defined on this set is the conditional probability Qt(A)=Pt(A∣where the sky is blue at timet).This is, in a sense, the probability measure that most closely mimicsPt. On the other hand, the measure that mimicsPt most closely isQt(A)=Pt(A), hands down. This measure has a problem though, namely thatmaxQt(A)<1, hence it isn't a probability measure anymore. I think the main reason why I intuitively want to condition on the color of the sky is that I want to work with proper probability measures, not just measures bounded by 0 and 1. (That's why I'm talking about, e.g., being "uncomfortable pretending we could have observed non-existence".) But your end goal is to have the best measure on the data you can actually observe, taking into account possibilities you can't observe. This naturally leads us toQt(A)=Pt(A)instead of Qt(A)=Qt(A)=Pt(A∣where the sky is blue at timet).
2Jonas Moss23d
The roulette example might get to the heart of the problem with the worm's-eye view! From the worm's-eye view, the sky will always be blue, soP(skycolor=green) =0, making it impossible to deal with problems where the sky might turn green in the future. In the roulette example, we're effectively dealing with an expected utility problem where we condition on existence when learning about the probability, but not when we act. That looks incoherent to me; we can't condition and uncondition on an event willy-nilly: Either we will live in a world where an event must be true, or we don't. So yeah, it seems like you're right, and we're effectively treating existence as a certainty when looking at the problem from the worm's-eye view. As I see it, this strongly suggests we should take the bird's-eye view, as you proposed, and not the worm's-eye view. Or something else entirely; I'm still uncomfortable pretending we could have observed non-existence.
Against Anthropic Shadow

Thank you for your comment! I agree with you that the difference between the bird's-eye view and the worm's eye view is very important, and certainly has the potential to explain why the extinction case is not the same as the blue/green sky case. It is this distinction that I was referring to in the post when asking whether the 'anthropicness' of the extinction case could explain why the two arguments should be treated differently.

But I'm not sure I agree that you are handling the worm's-eye case in the correct way. I could be wrong, but I think the explan... (read more)

1Jonas Moss24d
I wouldn't say you treat existence as certainty, as you could certainly be dead, but you have to condition on it when you're alive. You have to condition on it since you will never find yourself outside the space of existence (or blue skies! ) in anthropic problems. And that's the purpose / meaning of conditioning; you restrict your probability space to the certain subset of basic events you can possibly see. Then again, there might be nothing very special about existence here. Let's revisit the green sky problem again, but consider it from a slightly different point of view. Instead of living in the word with a blue or a green sky, imagine yourself living outside of that whole universe. I promise to give you a sample of a world, with registered catastrophes and all, but I will not show you a world with a green sky (i.e., I will sample worlds until the sky turns out blue). In this case, the math is clear. You should condition on the sky being green. Is there a relevant difference between the existence scenario and this scenario? Maybe there is? You are not guaranteed to see a world at all in the "existence" scenario, as you will not exist if the world turns out to be a blue-sky world, but you are guaranteed an observation in the "outside view" scenario. Does this matter though? I don't think it does, as you can't do an analysis either way if you're dead, but I might be wrong. Maybe this is where our disagreement lies? I don't find the objection of the Russian roulette persuasive at all. Intuition shouldn't be trusted in probability, as e.g. the Monty Hall experiment tells us, and least of all in confusing anthropic problems. We should focus on getting the definitions, concepts, and math correctly without stopping to think about how intuitive different solutions are. (By the way, I don't even find the Russian roulette experiment weird or contra-intuitive. I find it intuitive and obvious. Strange? Maybe not. Philosophical intuitions aren't as widely shared as one w
Don’t Be Comforted by Failed Apocalypses

I can see that is a difference between the two cases. What I'm struggling to understand is why that leads to a different answer.

My understanding of the steps of the anthropic shadow argument (possibly flawed or incomplete) is something like this:

You are an observer -> We should expect observers to underestimate the frequency of catastrophic events on average, if they use the frequency of catastrophic events in their past -> You should revise your estimate of the frequency of catastrophic events upwards

But in the coin/tile case you could make an exact... (read more)

Don’t Be Comforted by Failed Apocalypses

In the tile case, the observers who see a blue tile are underestimating on average. If you see a blue tile, you then know that you belong to that group, who are underestimating on average. But that still should not change your estimate. That's weird and unintuitive, but true in the coin/tile case (unless I've got the maths badly wrong somewhere).

I get that there is a difference in the anthropic case. If you kill everyone with a red tile, then you're right, the observers on average will be biased, because it's only the observers with a blue tile who are lef... (read more)

1ColdButtonIssues1mo
No, because it's possible you observe blue tile or red tile. You observe things (alive) or don't observe things (not alive.) In the first situation, the observer knows multiple facts about the world could be observed. Not so in the second case.
Don’t Be Comforted by Failed Apocalypses

Thanks for your reply!

If 100 people do the experiment, the ones who end up with a blue tile will, on average, have fewer heads than they should, for exactly the same reason that most observers will live after comparitively fewer catastrophic events.

But in the coin case that still does not mean that seeing a blue tile should make you revise your naive estimate upwards. The naive estimate is still, in bayesian terms, the correct one.

I don't understand why the anthropic case is different.

1ColdButtonIssues1mo
In the tile case, the observers on average will be correct. Some will get too many heads, some few. But the observers on average will be correct. You won't know whether you should adjust your personal estimate. In the anthropic case, the observers on average will zero apocalypses no matter how common apocalypses are. Imagine if in the tile case, everyone who was about to get more heads than average was killed by an assassin and the assassin told you what they were doing. Then when you did the experiment and lived, you would know your estimate was biased.
Don’t Be Comforted by Failed Apocalypses

I've never understood the bayesian logic of the anthropic shadow argument. I actually posted a question about this on the EA forum before, and didn't get a good answer. I'd appreciate it if someone could help me figure out what I'm missing. When I write down the causal diagram for this situation, I can't see how an anthropic shadow effect could be possible.

Section 2 of the linked paper shows that the probability of a catastrophic event having occurred in some time frame in the past given that we exist now: P(B_2|E), is smaller than its actual probability o... (read more)

1ColdButtonIssues1mo
Hi Toby, Can't we imagine 100 people doing that experiment. People will get different results- some more heads than they "should" and some fewer heads than they "should." But the sample means will cluster around the real rate of heads. So any observer won't know if their result has too many heads or too few. So they go with their naive estimate. With apocalypses, you know by definition you're one of the observers that wasn't wiped out. So I do think this reasoning works. If I'm wrong or my explanation makes no sense, please let me know!
Is the reasoning of the Repugnant Conclusion valid?

"I would say exactly the same for this. If these people are being freshly created, then I don't see the harm in treating them as identical."

I think you missed my point. How can 1,000 people be identical to 2,000 people? Let me give a more concrete example. Suppose again we have 3 possible outcomes:

(A) (Status quo): 1 person exists at high welfare +X

(B): Original person has welfare reduced to X - 2, 1000 new people are created at welfare +X

(C): Original person has welfare reduced only to X - , 2000 new people are created, 1000 at welfare , and ... (read more)

1Adithya2mo
"Let me give a more concrete example." Ah, I understand now. Certainly then there is ambiguity that needs to be sorted out. I'd like to say again that this is not something the original theory was designed to handle. Everything I've been saying in these comments is off the cuff rather than premeditated - it's not surprising that there are flaws in the fixes I've suggested. It's certainly not surprising that the ad hoc fixes don't solve every conceivable problem. And again, it would appear to me that there are plenty of plausible solutions. I guess really that I just need to spend some time evaluating which would be best and then tidy it up in a new post. "No, they wouldn't, because the people in (B) are different to the people in (C). You can assert that you treat them the same, but you can't assert that they are the same. The (B) scenario with different people and the (B) scenario with the same people are both distinct, possible, outcomes, and your theory needs to handle them both. It can give the same answer to both, that's fine, but part of the set up of my hypothetical scenario is that the people are different." Then yes, as I did say in the rather lengthy explanation I gave: "The route of disappearing 1000 people and replacing them with 1000 new people is one of the worse routes." If you insist that we must get rid of 1000 people and replace them with 1000 different people, then sure, (B) is worse than (C). So now I will remind myself what your objection regarding this was in an earlier comment. I'll try explaining again briefly. With this theory, don't think of the (B),(C) etc. as populations but rather as "distributions" the status quo population could take. Thus, as I said: "(B) is a hypothetical which may be achieved by any route. Whether the resulting people of (B) in the hypothetical are real or imaginary depends on which route you take." When a population is not the status quo, it is simply representing a population distribution that you can get
The COILS Framework for Decision Analysis: A Shortened Intro+Pitch

Where would unintended consequences fit into this?

E.g. if someone says:

"This plan would cause X, which is good. (Co) X would not occur without this plan, (I) We will be able to carry out the plan by doing Y, (L) the plan will cause X to occur, and (S) X is morally good."

And I reply:

"This plan will also cause Z, which is morally bad, and outweights the benefit of X"

Which of the 4 categories of claim am I attacking? Is it 'implementation'?

2Harrison Durland2mo
"This plan will also cause Z, which is morally bad" is its own disadvantage/con. "... and outweighs the benefit of X" relates to the caveat listed in footnote 3 [https://forum.effectivealtruism.org/posts/gwQNdY6Pzr6DF9HKK/the-coils-framework-for-decision-analysis-a-shortened-intro?commentId=BiWxDn7ToSNP6mEuD#fnxm85gtgbcvm] : you are no longer attacking/challenging the advantage itself ("this plan causes X"), but rather just redirecting towards a disadvantage. (Unless you are claiming something like "the benefits of X are not as strong as you suggested," in which case you're attacking it on significance.)
Is the reasoning of the Repugnant Conclusion valid?

You can assert that you consider the 1000 people in (B) and (C) to be identical, for the purposes of applying your theory. That does avoid the non-identity problem in this case. But the fact is that they are not the same people. They have different hopes, dreams, personalities, memories, genders, etc.

By treating these different people as equivalent, your theory has become more impersonal.  This means you can no longer appeal to one of the main arguments you gave to support it: that your recommendations always align with the answer you'd get if you ask... (read more)

1Adithya2mo
"You can assert that you consider the 1000 people in (B) and (C) to be identical, for the purposes of applying your theory. That does avoid the non-identity problem in this case. But the fact is that they are not the same people. They have different hopes, dreams, personalities, memories, genders, etc." But you stated that they don't exist yet (that they are "created"). Thus, we have no empirical knowledge of their hopes and dreams, so the most sensible prior seems to be that they are all identical. I apologise if I am coming across as obtuse, but I really do not see how non-identity causes issues here. "The people in (B) would not want to move to (C), and vice versa, because that would mean they no longer exist." Sorry, but this is quite incorrect. The people in (C) would want to move to (B). Bear in mind that when we are evaluating this decision, we now set (C) as the status quo. So the 1000 people at welfareεare considered to be wholly real. If you stipulate that in going to (B), these 1000 people are to be eradicated then replaced with (imaginary) people at high welfare, then naturally the people of (C) should say no. However, if you instead take the more reasonable route of getting from (C) to (B) via raising the real welfare of the 1000 and slightly reducing the welfare of one person, then clearly (B) is better than (C). I think I realise what the issue may be here. When I say "going from (C) to (B)" or similar, I do not mean that (C),(B) are existent populations and (C) is suddenly becoming (B). That way, we certainly do run into issues of non-identity. Rather, (C) is a status quo and (B) is a hypothetical which may be achieved by any route. Whether the resulting people of (B) in the hypothetical are real or imaginary depends on which route you take. Naturally, the best routes involve eradicating as few real people as possible. In this instance, we can get from (C) to (B) without getting rid of anyone. The route of disappearing 1000 people and replacing
Is the reasoning of the Repugnant Conclusion valid?

"We minimise our loss of welfare according to the methodology and pick B, the 'least worst' option."

But (B) doesn't minimise our loss of welfare. In B we have welfare X-2, and in C we have welfare X - , so wouldn't your methodology tell us to pick (C)? And this is intuitively clearly wrong in this case. It's telling us not tmake a negligible sacrifice to our welfare now in order to improve the lives of future generations, which is the same problematic conclusion that the non-identity problem gives to certain theories of population ethics.

I'm interes... (read more)

1Adithya2mo
Sorry, I misread (B) and (C). You are correct that, as written in the post, (C) would then be the better choice. However, continuing with what I meant to imply when I realised this was a forced decision, we can note that whichever of (B),(C) is picked, 1000 people will come into existence with certainty. Thus, in this case, I would argue they are effectively real. This is contrasted with the case in which the decision is not forced -- then, there are no 1000 new people necessarily coming into existence, and as you correctly interpreted, the status quo is preferable (since the status quo (A) is actually an option this time). Regarding non-identity, I would consider these 1000 new people in either (B),(C) to be identical. I am not entirely sure how non-identity is an issue here. I am still not quite sure what you mean by uncertainty, but I feel that the above patches up (or more accurately, correctly generalises) the model at least with regards to the example you gave. I'll try to think of counterexamples myself. By the way, this would also be my answer to Parfit's "depletion" problem, which I briefly glanced over. There is no way to stop hundreds of millions of people continuing to come into existence without dramatically reducing welfare (a few nuclear blasts might stop population growth but at quite a cost to welfare). Thus, these people are effectively real. Hence, if the current generation depleted everything, this would necessarily cause a massive loss of welfare to a population which may not exist yet, but are nevertheless effectively real. So we shouldn't do that. (That doesn't rule out a 'slower depletion', but I think that's fine.)
Is the reasoning of the Repugnant Conclusion valid?

I understood your rejection of the total ordering on populations, and as I say, this is an idea that others have tried to apply to this problem before.

But the approach others have tried to take is to use the lack of a precise "better than" relation to evade the logic of the repugnant conclusion arguments, while still ultimately concluding that population Z is worse than population A. If you only conclude that Z is not worse than A, and A is not worse than Z (i.e. we should be indifferent about taking actions which transform us from world A to world Z), the... (read more)

1Adithya2mo
"Or are you saying that your theory tells us not to transform ourselves to world Z? Because we should only ever do anything that will make things actually better?" Yes - and the previous description you gave is not what I intended. "If so, how would your approach handle uncertainty? What probability of a world Z should we be willing to risk in order to improve a small amount of real welfare?" This is a reasonable question, but I do not think this is a major issue so I will not necessarily answer it now. "And there's another way in which your approach still contains some form of the repugnant conclusion. If a population stopped dealing in hypotheticals and actually started taking actions, so that these imaginary people became real, then you could imagine a population going through all the steps of the repugnant conclusion argument process, thinking they were making improvements on the status quo each time, and finding themselves ultimately ending up at Z. In fact it can happen in just two steps, if the population of B is made large enough, with small enough welfare." This is true, and I noticed this myself. However, actually, this comes from the assumption that more net imaginary welfare is always a good thing, which was one of the "WLOG" assumptions I made not needed for the refutation of the Repugnant Conclusion. If we instead take an averaging or more egalitarian approach with imaginary welfare, I think the problem doesn't have to appear. For instance, suppose we now stipulate that any decision (given the constraints on real welfare) that has average welfare for the imaginary population at least equal to the average of the real population is better than any decision without this property, then the problem is gone. (Remark: we still do need the real/imaginary divide here to avoid the Repugnant Conclusion.) This may seem rather ad hoc, and it is, but it could be framed as, "A priority is that the average welfare of future populations is at least as good as it
Is the reasoning of the Repugnant Conclusion valid?

It sounds like I have misunderstood how to apply your methodology. I would like to understand it though. How would it apply to the following case?

Status quo (A): 1 person exists at very high welfare +X

Possible new situation (B): Original person has welfare reduced to X - 2 , 1000 people are created with very high welfare +X

Possible new situation (C): Original person has welfare X - , 1000 people are created with small positive welfare .

I'd like to understand how your theory would answer two cases: (1) We get to choose between all of A,B,C... (read more)

1Adithya2mo
"From your reply it sounds like you're coming up with a different answer when comparing (B) to (C), because both ways round the 1000 people are always considered imaginary, as they don't literally exist in the status quo? Is that right?" If the status quo is A, then with my methodology, you cannot compare B and C directly, and I don't think this is a problem. As I said previously, "... in particular, if you are a member of A, it's not relevant that the population of Z disagree which is better". Similarly, I don't think it's necessary that the people of A can compare B and C directly. The issue is that some of your comparisons do not have (A) as the status quo. To fully clarify, if you are a member of X (or equivalently, X is your status quo), then you can only consider comparisons between X and other populations. You might find that B is better than X and C is not better than X. Even then, you could not objectively say B is better than C because you are working from your subjective viewpoint as a member of X. In my methodology, there is no "objective ordering" (which is what I perhaps inaccurately was referring to as a total ordering). Thus, "(A) is not better than (B) or (C) because to change (B) or (C) to (A) would cause 1000 people to disappear (which is a lot of negative real welfare)." is true if you take the status quos to be (B),(C) respectively - but this is not our status quo. (Similarly for the third bullet point.) "Neither (B) nor (C) are better than (A), because an instantaneous change from (A) to (B) or (C) would reduce real welfare (of the one already existing person)." This is true from our viewpoint as a member of A. Hence, if we are forced to go from A to one of B or C, then it's always a bad thing. We minimise our loss of welfare according to the methodology and pick B, the 'least worst' option.
Is the reasoning of the Repugnant Conclusion valid?

P.S. Thinking about this a bit more, doesn't this approach fail to give sensible answers to the non-identity problem as well? Almost all decisions we make about the future will change not just the welfare of future people, but which future people exist. That means every decision you could take will reduce real welfare, and so under this approach no decision can be be better than any other, which seems like a problem!

1Adithya2mo
I'm afraid I don't understand this. In my framework, future people are imaginary, whether they are expected to come into existence with the status quo or not. Thus, they only contribute (negative) real welfare if they are brought into the world with negative welfare. I don't see why this would be true for almost all decisions, let alone most. It seems to me that the non-identity problem is completely independent of this. Either way, I am treating all future, or more generally imaginary, people as identical. "That means every decision you could take will reduce real welfare, and so under this approach no decision can be be better than any other, which seems like a problem!" As I say, I believe the premise to be false (I may have of course misunderstood), but nevertheless, in this case, you would take the decision that minimises the loss of real welfare and maximises imaginary welfare (I'll assume the infima and suprema are part of the decision space, since this isn't analysis). Then, such a decision is better than every other. I don't understand why the premise would lead to no decision being better than any other.
Is the reasoning of the Repugnant Conclusion valid?

This is an interesting approach. The idea that we can avoid the repugnant conclusion by saying that B is not better than A, and neither is A better than B, is I think similar to how Parfit himself thought we might be able to avoid the repugnant conclusion: https://onlinelibrary.wiley.com/doi/epdf/10.1111/theo.12097

He used the term "evaluative imprecision" to describe this. Here's a quote from the paper:

"Precisely equal is a transitive relation. If X and Y are precisely equally good, and Y and Z are precisely equally good, X and Z must be precisely equally ... (read more)

1Adithya2mo
Hi, Toby. One of the core arguments here, which perhaps I didn't fully illuminate, is that (I believe that) the "better than" operation is fundamentally unsuitable for population ethics. If you are a member of A, then Z is not better than A. So Z is worse than A but only if you are a member of A. If you are a member of Z, you will find that A is not better than Z. So, sure, A is worse than Zbut only if you are a member of Z. In other words, my population ethics depends on the population. In particular, if you are a member of A, it's not relevant that the population of Z disagree which is better. Indeed, why would you care? The fallacy that every argument re: Repugnant Conclusion commits is assuming that we require a total ordering of populations' goodness. This is a complete red herring. We don't. Doesn't it suffice to know what is best for your particular population? Isn't that the purpose of population ethics? I argue that the Repugnant Conclusion is merely the result of an unjustified fixation on a meaningless mathematical ideal (total ordering). I said something similar in my reply to Max Daniel's comment; I am not sure if I phrased it better here or there. If this idea was not clear in the post (or even in this reply), I would appreciate any feedback on how to make it more apparent.
7tobycrisford2mo
P.S. Thinking about this a bit more, doesn't this approach fail to give sensible answers to the non-identity problem as well? Almost all decisions we make about the future will change not just the welfare of future people, but which future people exist. That means every decision you could take will reduce real welfare, and so under this approach no decision can be be better than any other, which seems like a problem!
An uncomfortable thought experiment for anti-speciesist non-vegans

I think this point of view makes a lot of sense, and is the most reasonable way an anti-speciesist can defend not being fully vegan.

But I'd be interested to hear more about what the very strong 'instrumental' reasons are for humans not subjugating humans, and why they don't apply to humans subjugating non-humans?

(Edit: I'm vegan, but my stance on it has softened a bit since being won round by the total utilitarian view)

How does the simulation hypothesis deal with the 'problem of the dust'?

This is a very interesting and weird problem. It feels like the solution should have something to do with the computational complexity of the mapping? E.g. is it a mapping that could be calculated in polynomial or exponential time? If the mapping function is as expensive to compute as just simulating the brain in the first place, then the dust hasn't really done any of the computational work.

Another way of looking at this: if you do take the dust argument seriously, why do you even need the dust at all? The mapping from dust to mental states exists in the ... (read more)

3Eli Rose4mo
Hmm. Thanks for the example of the "pure time" mapping of t --> mental states. It's an interesting one. It reminds me of Max Tegmark's mathematical universe hypothesis [https://en.wikipedia.org/wiki/Mathematical_universe_hypothesis] at "level 4," where, as far as I understand, all possible mathematical structures are taken to "exist" equally. This isn't my current view, in part because I'm not sure what it would mean to believe this. I think the physical dust mapping is meaningfully different from the "pure time" mapping. The dust mapping could be defined by the relationships between dust specks. E.g. at each time t, I identify each possible pairing of dust specks with a different neuron in George Soros's brain, then say "at time t+1, if a pair of dust specks is farther apart than it was at time t, the associated neuron fires; if a pair is closer together, the associated neuron does not fire." This could conceivably fail if there's not enough pairs of dust specks in the universe to make the numbers work out. The "pure time" mapping could never fail to work; it would work (I think) even in an empty universe containing no dust specks. So it feels less grounded, and like an extra leap. ... I agree that it seems like there's something around "how complex is the mapping." I think what we care about is the complexity of the description of the mapping, though, rather than the computational complexity. I think George Soros mapping is pretty quick to compute once defined? All the work seems hidden in the definition — how do I know which pairs of dust specks should correspond to which neurons?

"deciding, based on reason, that Exposure A is certain to have no effect on Outcome X, and then repeatedly running RCTs for the effect of exposure A on Outcome X to obtain a range of p values"

If the p-values have been calculated correctly and you run enough RCTS, then we already know what the outcome of this experiment will be: p<0.05 will occur 5% of the time, p<0.01 will occur 1% of the time, etc for all values of p between 0 and 1.

The other way round is more interesting, it will tell you what the "power" of your test was (https://en.wikipedia.org/... (read more)

1Matt_Sharp6mo
"The one you should use depends on context. It should depend on how much you care about false positives vs false negatives in that particular case" Yep, exactly! Assume you're a doctor, have a bunch of patients with a disease that is definitely going to kill them tomorrow, and there is a new, very low-cost, possible cure. Even if there's only one study of this possible cure showing a p-value of 0.2, you really should still recommend it!
Reasons and Persons: Watch theories eat themselves

From my memory of Reasons+Persons, Parfit does say that common-sense morality being collectively directly self-defeating refutes common-sense morality, but he doesn't think that consequentialism being indirectly self-defeating refutes consequentialism. This is because it isn't an aim of consequentialism that people have consequentialist temperaments, or even that they believe in consequentialism, and because any theory will be indirectly self-defeating in some circumstances (the satan thought experiment proves that).

I really like this summary, but just wan... (read more)

Saving Average Utilitarianism from Tarsney - Self-Indication Assumption cancels solipsistic swamping.

I think this is a really interesting observation.

But I don't think it's fair to say that average utilitarianism  "avoids the repugnant conclusion".

If the world contains only a million individuals whose lives are worse than not existing (-100 utils each), and you are considering between two options: (i) creating a million new individuals who are very happy (50 utils each) or (ii) creating N new individuals whose lives are barely worth living (x utils each), then for any x, however small, there is some N where (ii) is preferred, even under average utili... (read more)

1wuschel1y
I think you are correct, that there are RC-like problems that AU faces (like the ones you describe), but the original RC (For any population, leading happy lives, there is a bigger population leading nearly worth living lives, whose existence would be better) can be refuted.
6MichaelStJules1y
Indeed, whether AU avoids the RC in practice depends on your beliefs about the average welfare in the universe. In fact, average utilitarianism reduces to critical-level utilitarianism with the critical level being the average utility, in a large enough world [https://globalprioritiesinstitute.org/christian-tarsney-and-teruji-thomas-non-additive-axiologies-in-large-worlds/] (in uncertainty-free cases). Personally, I find the worst part of AU to be the possibility that, if the average welfare is already negative, adding bad lives to the world can make things better, and this is what rules it out for me.
Incompatibility of moral realism and time discounting

This is a beautiful thought experiment, and a really interesting argument. I wonder if saying that it shows an incompatibility between moral realism and time discounting is too strong though? Maybe it only shows an incompatibility between time discounting and consequentialism?

Under non-consequentialist moral theories, it is possible for different moral agents to be given conflicting aims. For example, some people believe that we have a special obligation towards our own families. Suppose that in your example, Anna and Christoph are moving towards their res... (read more)

What are some low-information priors that you find practically useful for thinking about the world?

I think I disagree with your claim that I'm implicitly assuming independence of the ball colourings.

I start by looking for the maximum entropy distribution within all possible probability distributions over the 2^100 possible colourings. Most of these probability distributions do not have the property that balls are coloured independently. For example, if the distribution was a 50% probability of all balls being red, and 50% probability of all balls being blue, then learning the colour of a single ball would immediately tell you the colour of all of t... (read more)

1NunoSempere2y
As a side-note, the maximum entropy principle would tell you to choose the maximum entropy prior given the information you have, and so if you intuit the information that the balls are likely to be produced by the same process, you'll get a different prior that if you don't have that information. I.e., your disagreement might stem from the fact that the maximum entropy principle gives different answers conditional on different information. I.e., you actually have information to differentiate between drawing n balls and flipping a fair coin n times.
What are some low-information priors that you find practically useful for thinking about the world?

I think I disagree that that is the right maximum entropy prior in my ball example.

You know that you are drawing balls without replacement from a bag containing 100 balls, which can only be coloured blue or red. The maximum entropy prior given this information is that every one of the 2^100 possible colourings {Ball 1, Ball 2, Ball 3, ...} -> {Red, Blue} is equally likely (i.e. from the start the probability that all balls are red is 1 over 2^100).

I think the model you describe is only the correct approach if you make an additional assumption that all b... (read more)

1AidanGoth2y
Thanks for the clarification - I see your concern more clearly now. You're right, my model does assume that all balls were coloured using the same procedure, in some sense - I'm assuming they're independently and identically distributed. Your case is another reasonable way to apply the maximum entropy principle and I think it's points to another problem with the maximum entropy principle but I think I'd frame it slightly differently. I don't think that the maximum entropy principle is actually directly problematic in the case you describe. If we assume that all balls are coloured by completely different procedures (i.e. so that the colour of one ball doesn't tell us anything about the colours of the other balls), then seeing 99 red balls doesn't tell us anything about the final ball. In that case, I think it's reasonable (even required!) to have a 50% credence that it's red and unreasonable to have a 99% credence, if your prior was 50%. If you find that result counterintuitive, then I think that's more of a challenge to the assumption that the balls are all coloured in such a way that learning the colour of some doesn't tell you anything about the colour of the others rather than a challenge to the maximum entropy principle. (I appreciate you want to assume nothing about the colouring processes rather than making the assumption that the balls are all coloured in such a way that learning the colour of some doesn't tell you anything about the colour of the others, but in setting up your model this way, I think you're assuming that implicitly.) Perhaps another way to see this: if you don't follow the maximum entropy principle and instead have a prior of 30% that the final ball is red and then draw 99 red balls, in your scenario, you should maintain 30% credence (if you don't, then you've assumed something about the colouring process that makes the balls not independent). If you find that counterintuitive, then the issue is with the assumption that the balls are all c
What are some low-information priors that you find practically useful for thinking about the world?

The maximum entropy principle can give implausible results sometimes though. If you have a bag containing 100 balls which you know can only be coloured red or blue, and you adopt a maximum entropy prior over the possible ball colourings, then if you randomly drew 99 balls from the bag and they were all red, you'd conclude that the next ball is red with probability 50/50. This is because in the maximum entropy prior, the ball colourings are independent. But this feels wrong in this context. I'd want to put the probability on the 100th ball being red much higher.

4AidanGoth2y
The maximum entropy principle does give implausible results if applied carelessly but the above reasoning seems very strange to me. The normal way to model this kind of scenario with the maximum entropy prior would be via Laplace's Rule of Succession, as in Max's comment below. We start with a prior for the probability that a randomly drawn ball is red and can then update on 99 red balls. This gives a 100/101 chance that the final ball is red (about 99%!). Or am I missing your point here? Somewhat more formally, we're looking at a Bernoulli trial - for each ball, there's a probability p that it's red. We start with the maximum entropy prior for p, which is the uniform distribution on the interval [0,1] (= beta(1,1)). We update on 99 red balls, which gives a posterior for p of beta(100,1), which has mean 100/101 (this is a standard result, see e.g. conjugate priors [https://en.wikipedia.org/wiki/Conjugate_prior] - the beta distribution is a conjugate prior for a Bernoulli likelihood). The more common objection to the maximum entropy principle comes when we try to reparametrise. A nice but simple example is van Fraassen's cube factory (edit: new link [https://plato.stanford.edu/entries/probability-interpret/]): a factory manufactures cubes up to 2x2x2 feet, what's the probability that a randomly selected cube has side length less than 1 foot? If we apply the maximum entropy principle (MEP), we say 1/2 because each cube has length between 0 and 2 and MEP implies that each length is equally likely. But we could have equivalently asked: what's the probability that a randomly selected cube has face area less than 1 foot squared? Face area ranges from 0 to 4, so MEP implies a probability of 1/4. All and only those cubes with side length less than 1 have face area less than 1, so these are precisely the same events but MEP gave us different answers for their probabilities! We could do the same in terms of volume and get a different answer again. This inconsistency is the
What is the reasoning behind the "anthropic shadow" effect?

Thank you for your answer!


I think I agree that there is a difference between the extinction example and the coin example, to do with the observer bias, which seems important. I'm still not sure how to articulate this difference properly though, and why it should make the conclusion different. It is true that you have perfect knowledge of Q, N, and the final state marker in the coin example, but you do in the (idealized) extinction scenario that I described as well. In the extinction case I supposed that we knew Q, N, and the fact that we haven't ... (read more)