Mark Xu

889 karmaJoined


I do alignment research at the Alignment Research Center. Learn more about me at


I think this model is kind of misleading, and that the original astronomical waste argument is still strong. It seems to me that a ton of the work in this model is being done by the assumption of constant risk, even in post-peril worlds. I think this is pretty strange. Here are some brief comments:

  • If you're talking about the probability of a universal quantifier, such as "for all humans x, x will die", then it seems really weird to say that this remains constant, even when the thing you're quantifying over grows larger.
    • For instance, it seems clear that if there were only 100 humans, the probability of x-risk would be much higher than if there were 10^6 humans. So it seems like if there are 10^20 humans, it should be harder to cause extinction than 10^10 humans.
  • Assuming constant risk has the implication that human extinction is guaranteed to happen at some point in the future, which puts sharp bounds on the goodness of existential risk reduction.
  • It's not that hard to get exponentially decreasing probability on universal quantifiers if you assume independence in survival amongst some "unit" of humanity. In computing applications, it's not that hard to drive down the probability of error exponentially in the resources allocated, because each unit of resource can ~halve the probability of error. Naively, each human doesn't want to die, so there are # humans rolls for surviving/solving x-risk.
  • It seems like the probability of x-risk ought to be inversely proportional to the current estimated amount of value at stake. This seems to follow if you assume that civilization acts as a "value maximizer" and it's not that hard to reduce x-risk. Haven't worked it out, so wouldn't be surprised if I was making some basic error here.
  • Generally, it seems like most of the risk is going to come from worlds where the chance of extinction isn't actually a universal quantifier, and there's some correlation amongst seemingly independent roles for survival. In particularly bad cases, humans go extinct if there exists someone that wants to destroy the universe, so we actually see an extremely rapid increasing probability of extinction as we get more humans. These worlds would require extremely strong coordination and governance solutions.
    • These worlds are also slightly physically impossible because parts of humanity will rapidly become causally isolated from each other. I don't know enough cosmology to have an intuition for which way the functional form will ultimately go.
  • Generally, it seems like the naive view is that as humans get richer/smarter, they'll allocate more and more resources towards not dying. At equilibrium,  it seems reasonable to first-order-assume we'll drive existential risk down until the marginal cost equals the marginal benefit, so the key question is how this equilibrium behaves. It seems like my guess is that it will depend heavily on the total amount of value available in the future, determined by physical constraints (and potentially more galaxy-brained considerations).
    • This view seems to allow you to recover more the more naive astronomical waste perspective.
  • This makes me feel like the model makes kind of strong assumptions about the amount it will ultimately cost to drive down existential risk. E.g. you seem to imply that rl = 0.0001 is small, but an independent chance that large each century suggests that the probability humanity survives for ~10^10 years is ~0. This feels quite absurd to me.
    • The sentence: "Note that for the Pessimist, this is a reduction of 200,000%", but humans routinely reduce the probabilities of failures by more than 200,000% via engineering efforts and produce highly complex and  artifacts like computers, airplanes, rockets, satellites, etc. It feels like you should naively expect "breaking" human civilization to be harder than breaking an airplane, especially when civilization is actively trying to ensure that it doesn't go extinct.
  • Also, you seem to assume each century has some constant value v eventually, which seems reasonable to me, but the implication "Warming (slightly) on short-termist cause areas" relies on an assumption that the current century is close to value v, when it seems like even pretty naive bounds (e.g. percent of sun's energy), suggest that the current century is not even within a factor of 10^9 of the long-run value-per-century humanity could reach.
    • Assuming that value grows quadratically seems also quite weird, because of analysis like eternity in 6 hours, which seems to imply that a resource-maximizing civilization will undergo a period of incredibly rapid expansion to achieve per-century rates of value much higher than the current century, and then have nowhere else to go. A better model from my perspective is logistic growth of value, with the upper bound given by some weak proxy like "suppose that value is linear in the amount of energy a civilization uses, then take the total amount of value in the year 2020", with the ultimate unit being "value in 2020". This would produce much higher numbers, and give a more intuitive sense of "astronomical waste."

I like the process of proposing concrete models for things as a substrate for disagreement, and I appreciate that you wrote this. It feels much better to articulate objections like "I don't think this particular parameter should be constant in your model" than to have abstract arguments. I also like how it's now more clear that if you do believe that risk in post-peril worlds is constant, then the argument for longtermism is much weaker (although I think still quite strong because of my comments about v). 

I expect 10 people donating 10% of their time to be less effective than 1 person using 100% of their time because you don't get to reap the benefits of learning for the 10% people. Example: if people work for 40 years, then 10 people donating 10% of their time gives you 10 years with 0 experience, 10 with 1 year, 10 with 2 years, and 10 with 3 years; however, if someone is doing EA work full-time, you get 1 year with 0 exp, 1 with 1, 1 with 2, etc. I expect 1 year with 20 years of experience to plausibly be as good/useful as 10 with 3 years of experience. Caveats to the simple model:

  • labor-years might be more valuable during the present
  • if you're volunteering for a thing that is similar to what you spend the other 90% of your time doing, then you still get better at the thing you're volunteering for

I make a similar argument here.

One key difference is that "continuing school" usually has a specific mental image attached, whereas "drop out of school" is much vaguer, making them difficult to compare between.

Many people in EA depart from me here: they see choices that do not maximize impacts as personal mistakes. Imagine a button that, if you press it, would cause you to always take the impact-maximizing action for the rest of your life, even if it entails great personal sacrifice. Many (most?) longtermist EAs I talk to say they would press this button – and I believe them. That’s not true of me; I’m partially aligned with EA values (since impact is an important consideration for me), but not fully aligned.

I think there are people (e.g. me) that value things besides impact and would also press the button because of golden-rule type reasoning. Many people optimize for impact to the point where it makes them less happy.

A title like "How many lives might have been saved given an earlier COVID-19 vaccine rollout?" would have given me much more information about what the post was about than the current title, which I find very vague.

kindle's are smaller, have backlights, and the kindle store is a good user experience.

Note: I work for ARC.

I would consider someone a "pretty good fit" (whatever that means) for alignment research if they started out with a relatively technical background, e.g. an undegrad degree in math/cs, but not really having engaged with alignment before and they were able to come up with a decent proposal after:

  • ~10 hours of engaging with the ELK doc.
  • ~10 hours of thinking about the document and resolving confusions they had, which might involve asking some questions to clarify the rules and the setup.
  • ~10 hours of trying to come up with a proposal.

If someone starts from having thought about alignment a bunch, I would consider them a potentially "pretty good researcher" if they were able to come up with a decent proposal in 2-8 hours. I expect many existing (alignment) researchers to be able to come up with proposals in <1 hour.

Note that I'm saying "if (can come up with proposal in N hours), then (might be good alignment researcher)" and not saying the other implication also holds, e.g. it is not the case that "if (might be good alignment researcher), then (can come up with proposal in N hours)"

Can confirm we would be interested in hearing what you came up with.

Load more