Zach Stein-Perlman

Undergraduate and EA organizer at Williams. Currently thinking about the interaction between government and AI, especially how government could affect an intelligence explosion. I'm just starting to do research; comments and suggestions are greatly appreciated.

Some things I'd be excited to talk about:

  • What happens after an intelligence explosion
  • What happens if most people appreciate AI
  • International relations in the context of powerful AI
  • Policy responses to AI — what's likely to happen and what would be good

Wiki Contributions


EA Forum Prize: Winners for May-July 2021

If you have ideas along these lines, add a comment here.

I don't love the idea of themed prizes like the creative writing contest. But if you're going to do them, I'd like to see something for taxonomies/trees. I think high-quality taxonomies, like this, can have great analytic value. And taxonomies are rare and not widely appreciated, so a contest—making the format visible and highlighting good examples—would result in people being aware of the format and using it in the future when it would be useful, in addition to creating content that they otherwise wouldn't have thought to.

Edit: importantly, taxonomies/trees have a very high idea-to-length ratio, since the taxonomy/tree itself is essentially just a list of relations. I would happily read every taxonomy/tree produced in a contest, and likely learn something or gain a new perspective from each.

EA Forum Prize: Winners for May-July 2021

We still plan to use our budget to incentivize and reward strong writing

And elsewhere, you've said the goal is to "promote the creation of excellent Forum content." I think it's a mistake to see the goal as causing more good content at the expense of identifying and helping spread good content. Good writing is only as effective as the attention it gets. LessWrong has curation and an annual review for this purpose; the EA Forum just has the newsletter, which isn't sufficiently selective to serve the same role. Regardless of whether there's money attached, I wish the EA Forum did more to identify great content.

Can human extinction due to AI be justified as good?

Can we imagine scenarios in which human extinction due to AI is good? Sure; under reasonable empirical and normative assumptions, great futures look like "filling the universe with posthumans" or "filling the universe with digital minds" (or maybe weirder stuff, like involving acausal trade). But since value is fragile, almost all possible futures in which AI replaces us involve the AI doing stuff that is not morally important. So it's certainly not enough to say that if agenty AI is sufficiently powerful to destroy us, we morally ought to be OK with that. Even if the AI "has moral status," by default it doesn't do morally valuable stuff.

And good futures involving human extinction probably look more like "we all choose to ascend to posthumanity" or "after a long reflection, we choose for our descendants—no, successors—to be nonhuman" than "AI kills us all against our will." In the latter case, we've messed up our AI development by definition; it's unlikely that the AI-controlled future is good. So I would quibble with your suggestion that the good-future-from-AI looks like "sacrificing humans."

Suffering-Focused Ethics (SFE) FAQ

I strongly encourage you/everyone to not call "practical SFE" SFE. It's much better (analytically) to distinguish the value of causing happiness and preventing suffering from empirical considerations. Under your definition, if (say) utilitarianism is true, then SFE is true given certain empirical circumstances but not others. This is an undesirable definition. Anything called SFE should contain a suffering-focused ranking of possible worlds (for a SF theory of the good) or ranking of possible actions (for a SF theory of the right), not merely a contingent decision procedure. Otherwise the fact that someone accepts SFE is nearly meaningless; it does not imply that they would be willing to sacrifice happiness to prevent suffering, that they should be particularly concerned with S-risks, etc.

Practical SFE views . . . are compatible with a vast range of ethical theories. To adopt a practical SFE view, one just needs to believe that suffering has a particularly high practical priority.

This makes SFE describe the options available to us, rather than how to choose between those options. That is not what an ethical theory does. We could come up with a different term to describe the practical importance of preventing suffering at the margin, but I don't think it would be very useful: given an ethical theory, we should compare different specific possibilities rather than saying "preventing suffering tends to be higher-leverage now, so let's just focus on that." That is, "practical SFE" (roughly defined as the thesis that the best currently-available actions in our universe generally decrease expected suffering much more than they increase expected happiness) has quite weak implications: it does not imply that the best thing we can do involves preventing suffering; to get that implication, we would need to have the truth of "practical SFE" be a feature of each agent (and the options available to them) rather than the universe.

Edit: there are multiple suffering-related ethical questions we could ask. One is "what ought we—humans in 2021 in our particular circumstances—to do?" Another is "what is good, and what is right?" The second question is more general (we can plug empirical facts into an answer to the second to get an answer to the first), more important, and more interesting, so I want an ethical theory to answer it.

We're Redwood Research, we do applied alignment research, AMA

When choosing between projects, we’ll be thinking about questions like “to what extent is this class of techniques fundamentally limited? Is this class of techniques likely to be a useful tool to have in our toolkit when we’re trying to align highly capable systems, or is it a dead end?”


We’re trying to take a language model that has been fine-tuned on completing fiction, and then modify it so that it never continues a snippet in a way that involves describing someone getting injured. (source)

Suppose you successfully modify GPT models as desired, at moderate cost in compute and human classification. How might your process generalize?

We're Redwood Research, we do applied alignment research, AMA

"What needs to happen in order for the field of x-risk-motivated AI alignment research to employ a thousand ML researchers and engineers"?

How would you run the Petrov Day game?

I would do something similar the the present version, but emphasize that it's a game, the stakes are low, and, while you shouldn't destroy the homepage without reason, it's not a big deal if you do. We don't need to pretend that losing the homepage for a day is a disaster in order to get value from the game: I would happily risk the homepage for a day once a year to see whether it gets destroyed, and if so, why — malice, accidents, deception, bargaining failure (e.g., someone demands something to not destroy the homepage and their demand is not fulfilled), other coordination failure (in more complex variants), or something else.

Edit: also, I don't get how the game can have much to do with trust as long as defectors are socially punished.

Does the Forum Prize lead people to write more posts?

Going forward, we no longer plan to run the “standard” Forum Prize process.

Some things I liked about the old Forum Prize:

  • Highlighting great content
  • Recognizing great work

It doesn't need to have money attached, but I hope a regular way to emphasize good work of all sorts (beyond karma, and more selective than inclusion in the forum newsletter) reappears, for the sake of both authors and readers. As a new writer (this is hypothetical; I haven't yet written any polished posts), winning would be very valuable for me (and while it probably wouldn't really affect my writing much, the existence of the forum prize would make me more excited about writing polished posts). As a reader, a regular prize has helped me find excellent content outside of the stuff I normally read.

What is your favorite EA meme?

I like the idea, but I think this would be better without specific assertions of effectiveness. Very few people will agree that MIRI is 180% and ALLFED is 240% as effective as GiveDirectly, for example (many people would say much higher; many people would say much lower), and this assertion is totally unnecessary for the value of this image.

Load More