Importance of the digital minds stuff compared to regular AI safety; how many early-career EAs should be going into this niche? What needs to happen between now and the arrival of digital minds? In other words, what kind of a plan does Carl have in mind for making the arrival go well? Also, since Carl clearly has well-developed takes on moral status, what criteria he thinks could determine whether an AI system deserves moral status, and to what extent.
Additionally—and this one's fueled more by personal curiosity than by impact—Carl's beliefs on consciousness. Like Wei Dai, I find the case for anti-realism as the answer to the problem of consciousness weak, yet this is Carl's position (according to this old Brian Tomasik post, at least), and so I'd be very interested to hear Carl explain his view.
Thank you for engaging. I don’t disagree with what you’ve written; I think you have interpreted me as implying something stronger than what I intended, and so I’ll now attempt to add some colour.
That Emily and other relevant people at OP have not fully adopted Rethink’s moral weights does not puzzle me. As you say, to expect that is to apply an unreasonably high funding bar. I am, however, puzzled that Emily and co. appear to have not updated at all towards Rethink’s numbers. At least, that’s the way I read:
- We don’t use Rethink’s moral weights.
- Our current moral weights, based in part on Luke Muehlhauser’s past work, are lower. We may update them in the future; if we do, we’ll consider work from many sources, including the arguments made in this post.
If OP has not updated at all towards Rethink’s numbers, then I see three possible explanations, all of which I find unlikely, hence my puzzlement. First possibility: the relevant people at OP have not yet given the Rethink report a thorough read, and have therefore not updated. Second: the relevant OP people have read the Rethink report, and have updated their internal models, but have not yet gotten around to updating OP’s actual grantmaking allocation. Third: OP believes the Rethink work is low quality or otherwise critically corrupted by one or more errors. I’d be very surprised if one or two are true, given how moral weight is arguably the most important consideration in neartermist grantmaking allocation. I’d also be surprised if three is true, given how well Rethink’s moral weight sequence has been received on this forum (see, e.g., comments here and here).[1] OP people may disagree with Rethink’s approach at the independent impression level, but surely, given Rethink’s moral weights work is the most extensive work done on this topic by anyone(?), the Rethink results should be given substantial weight—or at least non-trivial weight—in their all-things-considered views?
(If OP people believe there are errors in the Rethink work that render the results ~useless, then, considering the topic’s importance, I think some sort of OP write-up would be well worth the time. Both at the object level, so that future moral weight researchers can avoid making similar mistakes, and to allow the community to hold OP’s reasoning to a high standard, and also at the meta level, so that potential donors can update appropriately re. Rethink’s general quality of work.)
Additionally—and this is less important, I’m puzzled at the meta level at the way we’ve arrived here. As noted in the top-level post, Open Phil has been less than wholly open about its grantmaking, and it’s taken a pretty not-on-the-default-path sequence of events—Ariel, someone who’s not affiliated with OP and who doesn’t work on animal welfare for their day job, writing this big post; Emily from OP replying to the post and to a couple of the comments; me, a Forum-goer who doesn’t work on animal welfare, spotting an inconsistency in Emily’s replies—to surface the fact that OP does not give Rethink’s moral weights any weight.
Here, you say, “Several of the grants we’ve made to Rethink Priorities funded research related to moral weights.” Yet in your initial response, you said, “We don’t use Rethink’s moral weights.” I respect your tapping out of this discussion, but at the same time I’d like to express my puzzlement as to why Open Phil would fund work on moral weights to inform grantmaking allocation, and then not take that work into account.
The "EA movement", however you define it, doesn't get to control the money and there are good reasons for this.
I disagree, for the same reasons as those given in the critique to the post you cite. Tl;dr: Trades have happened, in EA, where many people have cast aside careers with high earning potential to work on object-level problems. I think these people should get a say over where EA money goes.
Directionally, I agree with your points. On the last one, I’ll note that counting person-years (or animal-years) falls naturally out of empty individualism as well as open individualism, and so the point goes through under the (substantively) weaker claim of “either open or empty individualism is true”.[1]
(You may be interested in David Pearce’s take on closed, empty, and open individualism.)
For the casual reader: The three candidate theories of personal identity are empty, open, and closed individualism. Closed is the common sense view, but most people who have thought seriously about personal identity—e.g., Parfit—have concluded that it must be false (tl;dr: because nothing, not memory in particular, can “carry” identity in the way that's needed for closed individualism to make sense). Of the remaining two candidates, open appears to be the fringe view—supporters include Kolak, Johnson, Vinding, and Gomez-Emilsson (although Kolak's response to Cornwall makes it unclear to what extent he is indeed a supporter). Proponents of (what we now call) empty individualism include Parfit, Nozick, Shoemaker, and Hume.
There was near-consensus that Open Phil should generously fund promising AI safety community/movement-building projects they come across
Would you be able to say a bit about to what extent members of this working group have engaged with the arguments around AI safety movement-building potentially doing more harm than good? For instance, points 6 through 11 of Oli Habryka's second message in the “Shutting Down the Lightcone Offices” post (link). If they have strong counterpoints to such arguments, then I imagine it would be valuable for these to be written up.
(Probably the strongest response I've seen to such arguments is the post “How MATS addresses ‘mass movement building’ concerns”. But this response is MATS-specific and doesn't cover concerns around other forms of movement building, for example, ML upskilling bootcamps or AI safety courses operating through broad outreach.)
I enjoyed this post, thanks for writing it.
Is there any crucial consideration I’m missing? For instance, are there reasons to think agents/civilizations that care about suffering might – in fact – be selected for and be among the grabbiest?
I think I buy your overall claim in your “Addressing obvious objections” section that there is little chance of agents/civilizations who disvalue suffering (hereafter: non-PUs) winning a colonization race against positive utilitarians (PUs). (At least, not without causing equivalent expected suffering.) However, my next thought is that non-PUs will generally work this out, as you have, and that some fraction of technologically advanced non-PUs—probably mainly those who disvalue suffering the most—might act to change the balance of realized upside- vs. downside-focused values by triggering false vacuum decay (or by doing something else with a similar switching-off-a-light-cone effect).
In this way, it seems possible to me that suffering-focused agents will beat out PUs. (Because there’s nothing a PU agent—or any agent, for that matter—can do to stop a vacuum decay bubble.) This would reverse the post’s conclusion. Suffering-focused agents may in fact be the grabbiest, albeit in a self-sacrificial way.
(It also seems possible to me that suffering-focused agents will mostly act cooperatively, only triggering vacuum decays at a frequency that matches the ratio of upside- vs. downside-focused values in the cosmos, according to their best guess for what the ratio might be.[1] This would neutralize my above paragraph as well as the post's conclusion.)
My first pass at what this looks like in practice, from the point of view of a technologically advanced, suffering-focused (or perhaps non-PU more broadly) agent/civilization: I consider what fraction of agents/civilizations like me should trigger vacuum decays in order to realize the cosmos-wide values ratio. Then, I use a random number generator to tell me whether I should switch off my light cone.
Additionally, one wrinkle worth acknowledging is that some universes within the inflationary multiverse, if indeed it exists and allows different physics in different universes, are not metastable. PUs likely cannot be beaten out in these universes, because vacuum decays cannot be triggered. Nonetheless, this can be compensated for through suffering-focused/non-PU agents in metastable universes triggering vacuum decays at a correspondingly higher frequency.
This is a good post; I’m happy it exists. One thing I notice, which I find a little surprising, is that the post doesn’t seem to include what I'd consider the classic example of controlling the past: evidentially cooperating with beings/civilizations that existed in past cycles of the universe.[1]
This example does rely on a cyclic (e.g., Big Bounce) model of cosmology,^ which has a couple of issues. Firstly, that such a cosmological model is much less likely to be true, in my all-things-considered view, than eternal inflation. Secondly, that within a cyclic model, there isn't a clearly meaningful notion of time across cycles. However, I don't think these issues undercut the example. Controlling faraway events through evidential cooperation is no less possible in an eternally inflating multiverse, it's just that space is doing more of the work now than time (which makes it a less classic example for controlling the past). Also, while to an observer within a cycle, the notion of time outside their cycle may not hold much meaning, I think that from a God's eye view, there is a material sense in which the cycles occur sequentially, with some in the past of others.
In addition, the example can be adapted, I believe, to fit the simulation hypothesis. Sequential universe cycles become sequential simulation runs,* and the God’s eye view is now the point of view of the beings in the level of reality one above ours, whether that be base reality or another simulation. *(It seems likely to me that simulation runs would be massively, but not entirely, parallelized. Moreover, even if runs are entirely parallelized, it would be physically impossible—so long as the level-above reality has physical laws that remotely resemble ours—for two or more simulations to happen in the exact same spatial location. Therefore, there would be frames of reference in the base reality from which some simulation runs take place in the past of others.)
^ (One type of cyclic model, conformal cyclic cosmology, allows causal as well as evidential influence between universes, though in this model one universe can only causally influence the next one(s) in the sequence (i.e., causally controlling the past is not possible). For more on this, see "What happens after the universe ends?".)
there are important downsides to the “cause-first” approach, such as a possible lock-in of main causes
I think this is a legitimate concern, and I’m glad you point to it. An alternative framing is lock-out of potentially very impactful causes. Dynamics of lock-out, as I see it, include:
A recent shortform by Caleb Parikh, discussing the specific case of digital sentience work, feels related. In Caleb’s words:
I think aspects of EA that make me more sad is that there seems to be a few extremely important issues on an impartial welfarist view that don’t seem to get much attention at all, despite having been identified at some point by some EAs.
Personal anecdote: Part of the reason, if I’m to be honest with myself, for my move from nuclear weapons risk research to AI strategy/governance is that it became increasingly socially difficult to be an EA working on nuclear risk. (In my sphere, at least.) Many of my conversations with other EAs, even in non-work situations and even with me trying avoid this conversation area, turned into me having to defend my not focusing on AI risk, on pain of being seen as “not getting it”.
I don’t think that observing lots of condemnation and little support is all that much evidence for the premise you take as given—that SBF’s actions were near-universally condemned by the EA community—compared to meaningfully different hypotheses like “50% of EAs condemned SBF’s actions.”
There was, and still is, a strong incentive to hide any opinion other than condemnation (e.g., support, genuine uncertainty) over SBF’s fraud-for-good ideology, out of legitimate fear of becoming a witch-hunt victim. By the law of prevalence, I therefore expect the number of EAs who don’t fully condemn SBF’s actions to be far greater than the number who publicly express opinions other than full condemnation.
(Note: I’m focusing on the morality of SBF’s actions, and not on executional incompetence.)
Anecdotally, of the EAs I’ve spoken to about the FTX collapse with whom I’m close—and who therefore have less incentive to hide what they truly believe from me—I’d say that between a third and a half fall into the genuinely uncertain camp (on the moral question of fraud for good causes), while the number in the support camp is small but not zero.[1]
And of those in my sample in the condemn camp, by far the most commonly-cited reason is timeless decision theory / pre-committing to cooperative actions, which I don’t think is the kind of reason one jumps to when one hears that EAs condemn fraud for good-type thinking.