Zach Stein-Perlman

Research @ AI Impacts
4089 karmaJoined Nov 2020Working (0-5 years)Berkeley, CA, USA



AI forecasting & strategy at AI Impacts. Blog: Not Optional.


Topic Contributions

  1. Yep, AI safety people tend to oppose sharing model weights for future dangerous AI systems.
  2. But it's not certain that (operator-aligned) open-source powerful AI entails doom. To a first approximation, it entails doom iff "offense" is much more efficient than "defense," which depends on context. But absent super monitoring to make sure that others aren't making weapons/nanobots/whatever, or super efficient defenses against such attacks, I intuit that offense is heavily favored.

An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.

A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.

I don't know how we can pursue that goal.

I don't want to try to explain now, sorry.

(This shortform was intended more as starting-a-personal-list than as a manifesto.)

What's the best thing to read on "Zvi's writing on EAs confusing the map for the territory"? Or at least something good?

Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I'll elaborate on all this in a top-level post.

Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.

Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren't cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.

Or me. I'm veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.

I certainly haven't done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I'd do the cost-benefit analysis, and do things like:

  • Eat meat that would otherwise go to waste (when that wouldn't increase anticipated demand for meat in the future)
  • Try to reduce others' meat consumption, and try to reduce the supply of meat or improve the lives of farmed animals, when that's more cost-effective than personal veg(etari)anism
  • Notice whether eating meat would substantially boost my health and productivity, and go back to eating meat if so

I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don't notice this.

Written quickly, maybe hard-to-parse and imprecise.

(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don't really want to get into this now.)

Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.


Some people say things like "my doom-credence fluctuates between 10% and 25% day to day"; this is dutch-book-able and they'd make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.

Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:


  • Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
    • But thinking in terms of probabilities over probabilities is sometimes useful, e.g. you have a probability distribution over possible worlds/models and those worlds/models are probabilistic
  • Unstable beliefs about stuff like AI timelines in the sense of I'd be pretty likely to say something pretty different if you asked tomorrow
    • Instability in the sense of being likely to change beliefs if you thought about it more is fine; fluctuating predictably (dutch-book-ably) is not


  • Axiologies besides ~utilitarianism
    • Possibly I'm actually noticing sloppy reasoning about how to go from axiology to decision procedure, possibly including just not taking axiology seriously
  • Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
  • Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project

Cause prioritization:

  • Cause prioritization that doesn't take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
  • Deciding in advance to boost a certain set of causes [what determines that set??], or a "portfolio approach" without justifying the portfolio-items
    • E.g. multiple CEA staff donate by choosing some cause areas and wanting to help in each of those areas
    • Related error: agent-relativity
    • Related error: considering difference from status quo rather than outcomes in a vacuum
    • Related error: risk-aversion in your personal contributions (much more egregious than risk-averse axiology)
    • Instead you should just argmax — find the marginal value of your resources in each cause (for your resources that can funge between causes), then use them in the best possible way
  • Intra-cause offsetting: if you do harm in area X [especially if it's avoidable/unnecessary/intentional], you should fix your harm in that area, even if you could do more good in another area
    • Maybe very few of my friends actually believe this


  • Not noticing big obvious problems with impact certificates/markets
  • Naively using calibration as a proxy for forecasting ability
  • Thinking you can (good-faith) bet on the end of the world by borrowing money
    • Many examples, e.g. How to place a bet on the end of the world
    • I think most of us understand the objection you can do better by just borrowing money at market rates — I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn't liquidity-constrained). The bet only makes sense if you spend all your money before you'd have to pay back.
  • [Maybe something deep about donations; not sure]
  • [Maybe something about compensating altruists or compensating for roles often filled by altruists; not sure]
  • [Maybe something about status; not sure]

Possibly I'm wrong about which attitudes are common.

For now I'm just starting a list, not trying to be legible, much less change minds. I know I haven't explained my views.

Edit: I'm sharing controversial beliefs, without justification and with some framed provocatively. If one of these views makes you think worse of me to a nontrivial degree, please ask for elaboration; maybe there's miscommunication or it's more reasonable than it seems. Edit 2: there are so many comments; I may not respond to requests-for-elaboration but will at least notice them as a bid-for-elaboration-at-some-point.

Load more