110 karmaJoined Aug 2020


Forum? I'm against 'em!


Suppose someone is an ethical realist: the One True Morality is out there, somewhere, for us to discover. Is it likely that AGI will be able to reason its way to finding it? 

What are the best examples of AI behavior we have seen where a model does something "unreasonable" to further its goals? Hallucinating citations?

What are the arguments for why someone should work in AI safety over wild animal welfare? (Holding constant personal fit etc)

  • If someone thinks wild animals live positive lives, is it reasonable to think that AI doom would mean human extinction but maintain ecosystems? Or does AI doom threaten animals as well?
  • Does anyone have BOTECs on numbers of wild animals vs numbers of digital minds?

At least we can have some confidence in the total weight of meat consumed on average by a Zambian per year and the life expectancy at birth in Zambia.


We should also think about these on the margin. Ie the lives averted might have been shorter than average and consumed less meat than average.

I imagine a proof (by contradiction) would work something like this:

Suppose you place > 1/x probability on your credences moving by a factor of x. Then the expectation of your future beliefs is > prior * x * 1/x = prior, so your credence will increase. With our remaining probability mass, can we anticipate some evidence in the other direction, such that our beliefs still satisfy conservation of expected evidence? The lowest our credence can go is 0, but even if we place our remaining < 1 - 1/x probability on 0, we would still find future beliefs > prior * x * 1/x + 0 * [remaining probability] = prior. So we would necessarily violate conservation of expected evidence, and we conclude that Joe's rule holds.

Note that all of these comments apply, symmetrically, to people nearly certain of doom. 99.99%? OK, so less than 1% than you ever drop to 99% or lower?

But I don't think this proof works for beliefs decreasing (because we don't have the lower bound of 0). Consider this counterexample:

prior = 10%

probability of decreasing to 5% (factor of 2) = 60% > 1/2 —> violates the rule

probability of increasing to 17.5% = 40%

Then, expectation of future beliefs = 5% * 60% + 17.5% * 40% = 10%

So conservation of expected evidence doesn't seem to imply Joe's rule in this direction? (Maybe it holds once you introduce some restrictions on your prior, like in his 99.99% example, where you can't place the remaining probability mass any higher than 1, so the rule still bites.)

This asymmetry seems weird?? Would love for someone to clear this up.

I would endorse all of this based on experience leading EA fellowships for college students! These are good principles not just for public media discussions, but also for talking to peers.

Answer by utilistrutilApr 20, 202320

THB that EA-minded college freshmen should study Computer Science over Biology

Thanks for the thorough response! I agree with a lot of what you wrote, especially the third section on Epistemic Learned Helplessness: "Bayesianism + EUM, but only when I feel like it" is not a justification in any meaningful sense.

On Priors

I agree that we can construct thought experiments (Pascal's Mugging, acausal trade) with arbitrarily high stakes to swamp commonsense priors (even without religious scenarios or infinite value, which are so contested I think it would be difficult to extract a sociological lesson from them).

On Higher Order Evidence

I still think a lot of speculative conclusions we encounter in the wild suffer from undiscovered evidence and model uncertainty, and even barring this we might want to defer taking action until we've had a chance to learn more. 

Your response jumps over these cases to those where we have "~all the evidence we’re ever going to have," but I'm skeptical these cases exist. Even with religion, we might expect some future miracles or divine revelations to provide new evidence; we have some impossibility theorems in ethics, but new ideas might come to light that resolve paradoxes or avoid them completely. In fact, soteriological research and finding the worldview that best acausally benefits observers are proposals to find new evidence.

But ok, yes, I think we can probably come up with cases where we do have ~all the evidence and still refrain from acting on speculative + fanatical conclusions. 

Problem 1: Nicheness

From here on, I'm abandoning the justification thing. I agree that we've found some instances where the Fourth Principle holds without Bayesian + EUM justification. Instead, I'm getting more into the semantics of what is a "norm."

The problem is that the support for this behavior among EAs comes from niche pieces of philosophy like Pascal's Mugging, noncausal decision theory, and infinite ethics, ideas that are niche not just relative to the general population, but also within EA. So I feel like the Fourth  Principle amounts to "the minority of EAs who are aware of these edge cases behave this way when confronted with them,"  which doesn't really seem like a norm about EA.

Problem 2: Everyone's Doing It

(This is also not a justification, it's an observation about the Fourth Principle)

The first three principles capture ways that EA differs from other communities. The Fourth Principle, on the other hand, seems like the kind of thing that all people do? For example, a lot of people write off earning to give when they first learn about it because it looks speculative and fanatical. Now, maybe EAs differ from other people on which crazy train stop they deem "speculative," and I think that would qualify as a norm, but relative to each person's threshold for "speculative," I think this is more of a human-norm than an EA-norm.

Would love your thoughts on this, and I'm looking forward to your April post :)

Thanks for the excellent post!

I think you are right that this might be a norm/heuristic in the community, but in the spirit of a "justificatory story of our epistemic practices," I want to look a little more at 

4. When arguments lead us to conclusions that are both speculative and fanatical, treat this as a sign that something has gone wrong.  

First, I'm not sure that "speculative" is an independent reason that conclusions are discounted, in the sense of a filter that is applied ex-post. In your 15AI thought experiment, for example, I think that expected value calculations would get you most of the way toward explaining an increase in fanaticism; the probability that we can solve the problem might increase on net, despite the considerations you note about replication. The remaining intuition might be explained by availability/salience bias, to which EA is not immune.

Now, "speculative" scenarios might be discounted during the reasoning process if we are anchored to commonsense priors, but this would fall under typical bayesian reasoning. The priors we use and the weight we grant various pieces of evidence are still epistemic norms worthy of examination! But a different kind than suggested by the fourth principle.

Suppose "speculative" arguments are discounted ex-post in EA. I think this practice can still be redeemed on purely bayesian grounds as a correction to the following problems:

  1. Undiscovered Evidence: An argument seems speculative not just insofar as it is divorced from empirical observations, but also insofar as we have not thought about it very much. It seems that AI risk has become less speculative as people spend more time thinking about it, holding constant actual progress in AI capabilities. We have some sense of the space of possible arguments that might be made and evidence that might be uncovered, given further research on a topic. And these undiscovered arguments/evidence might not enter neatly into our initial reasoning process. We want some way to say "I haven't thought of it yet, but I bet there's a good reason this is wrong," as we might respond to some clever conspiracy theorist who presents a superficially bulletproof case for a crazy theory we haven't encountered before. And discounting speculative conclusions is one way to achieve this. 
    1. This point is especially relevant for speculative conclusions because they often rely on chains of uncertain premises, making our credence in their conclusions all the more sensitive to new information that could update multiple steps of the argument.
  2. Model Uncertainty: Even in a domain where we have excavated all the major arguments available to us, we may still suffer from "reasoning in the dark," ie, in the absence of solid empirics. When reasoning about extremely unlikely events, the probability our model is wrong can swamp our credence in its conclusion. Discounting speculative conclusions allows us to say "we should be fanatical insofar as my reasoning is correct, but I am not confident in my reasoning."
    1. We can lump uncertainty in our axiology, epistemology, and decision theory under this section. That is, a speculative conclusion might look good only under total utilitarian axiology, bayesian epistemology, and causal decision theory, but a more conventional conclusion might be more robust to alternatives in these categories. (Note that this is a prior question to the evidential-hedging double bind set up in Appendix B.)
    2. Chains of uncertain premises also make model uncertainty doubly important for speculative conclusions. As Anders Sandberg points out, "if you have a long argument, the probability of there being some slight error somewhere is almost 1."

Even after accounting for these considerations, we might find that the EV of pursuing the speculative path warrants fanaticism. In this event, discounting the speculative conclusion might be a pragmatic move to deprioritize actions on this front in anticipation of new evidence that will come to light, including evidence that will bear on model uncertainty. (We might treat this as a motivation for imprecise credences, prioritizing views with sharper credences over speculative views with fuzzier ones.)

Load more