Forum? I'm against 'em!
What are the arguments for why someone should work in AI safety over wild animal welfare? (Holding constant personal fit etc)
I imagine a proof (by contradiction) would work something like this:
Suppose you place > 1/x probability on your credences moving by a factor of x. Then the expectation of your future beliefs is > prior * x * 1/x = prior, so your credence will increase. With our remaining probability mass, can we anticipate some evidence in the other direction, such that our beliefs still satisfy conservation of expected evidence? The lowest our credence can go is 0, but even if we place our remaining < 1 - 1/x probability on 0, we would still find future beliefs > prior * x * 1/x + 0 * [remaining probability] = prior. So we would necessarily violate conservation of expected evidence, and we conclude that Joe's rule holds.
Note that all of these comments apply, symmetrically, to people nearly certain of doom. 99.99%? OK, so less than 1% than you ever drop to 99% or lower?
But I don't think this proof works for beliefs decreasing (because we don't have the lower bound of 0). Consider this counterexample:
prior = 10%
probability of decreasing to 5% (factor of 2) = 60% > 1/2 —> violates the rule
probability of increasing to 17.5% = 40%
Then, expectation of future beliefs = 5% * 60% + 17.5% * 40% = 10%
So conservation of expected evidence doesn't seem to imply Joe's rule in this direction? (Maybe it holds once you introduce some restrictions on your prior, like in his 99.99% example, where you can't place the remaining probability mass any higher than 1, so the rule still bites.)
This asymmetry seems weird?? Would love for someone to clear this up.
Thanks for the thorough response! I agree with a lot of what you wrote, especially the third section on Epistemic Learned Helplessness: "Bayesianism + EUM, but only when I feel like it" is not a justification in any meaningful sense.
I agree that we can construct thought experiments (Pascal's Mugging, acausal trade) with arbitrarily high stakes to swamp commonsense priors (even without religious scenarios or infinite value, which are so contested I think it would be difficult to extract a sociological lesson from them).
I still think a lot of speculative conclusions we encounter in the wild suffer from undiscovered evidence and model uncertainty, and even barring this we might want to defer taking action until we've had a chance to learn more.
Your response jumps over these cases to those where we have "~all the evidence we’re ever going to have," but I'm skeptical these cases exist. Even with religion, we might expect some future miracles or divine revelations to provide new evidence; we have some impossibility theorems in ethics, but new ideas might come to light that resolve paradoxes or avoid them completely. In fact, soteriological research and finding the worldview that best acausally benefits observers are proposals to find new evidence.
But ok, yes, I think we can probably come up with cases where we do have ~all the evidence and still refrain from acting on speculative + fanatical conclusions.
From here on, I'm abandoning the justification thing. I agree that we've found some instances where the Fourth Principle holds without Bayesian + EUM justification. Instead, I'm getting more into the semantics of what is a "norm."
The problem is that the support for this behavior among EAs comes from niche pieces of philosophy like Pascal's Mugging, noncausal decision theory, and infinite ethics, ideas that are niche not just relative to the general population, but also within EA. So I feel like the Fourth Principle amounts to "the minority of EAs who are aware of these edge cases behave this way when confronted with them," which doesn't really seem like a norm about EA.
(This is also not a justification, it's an observation about the Fourth Principle)
The first three principles capture ways that EA differs from other communities. The Fourth Principle, on the other hand, seems like the kind of thing that all people do? For example, a lot of people write off earning to give when they first learn about it because it looks speculative and fanatical. Now, maybe EAs differ from other people on which crazy train stop they deem "speculative," and I think that would qualify as a norm, but relative to each person's threshold for "speculative," I think this is more of a human-norm than an EA-norm.
Would love your thoughts on this, and I'm looking forward to your April post :)
Thanks for the excellent post!
I think you are right that this might be a norm/heuristic in the community, but in the spirit of a "justificatory story of our epistemic practices," I want to look a little more at
4. When arguments lead us to conclusions that are both speculative and fanatical, treat this as a sign that something has gone wrong.
First, I'm not sure that "speculative" is an independent reason that conclusions are discounted, in the sense of a filter that is applied ex-post. In your 15AI thought experiment, for example, I think that expected value calculations would get you most of the way toward explaining an increase in fanaticism; the probability that we can solve the problem might increase on net, despite the considerations you note about replication. The remaining intuition might be explained by availability/salience bias, to which EA is not immune.
Now, "speculative" scenarios might be discounted during the reasoning process if we are anchored to commonsense priors, but this would fall under typical bayesian reasoning. The priors we use and the weight we grant various pieces of evidence are still epistemic norms worthy of examination! But a different kind than suggested by the fourth principle.
Suppose "speculative" arguments are discounted ex-post in EA. I think this practice can still be redeemed on purely bayesian grounds as a correction to the following problems:
Even after accounting for these considerations, we might find that the EV of pursuing the speculative path warrants fanaticism. In this event, discounting the speculative conclusion might be a pragmatic move to deprioritize actions on this front in anticipation of new evidence that will come to light, including evidence that will bear on model uncertainty. (We might treat this as a motivation for imprecise credences, prioritizing views with sharper credences over speculative views with fuzzier ones.)
Suppose someone is an ethical realist: the One True Morality is out there, somewhere, for us to discover. Is it likely that AGI will be able to reason its way to finding it?
What are the best examples of AI behavior we have seen where a model does something "unreasonable" to further its goals? Hallucinating citations?