Associate researcher, animal welfare @ Rethink Priorities
Working (0-5 years experience)
7354Joined May 2016


Associate researcher in animal welfare at Rethink Priorities. Writing on behalf of myself only.

Also interested in global priorities research and reducing s-risks.

My background is mostly in pure math, computer science and deep learning, but also some in statistics/econometrics and agricultural economics.

I'm a suffering-focused moral antirealist.

My shortform.


Topic Contributions

Imo, the main risk from working on AI capabilities is moving forward the frontier of research, and so basically the work of the main AI labs (or possible new big AI labs to come). I could easily imagine the average AI capabilities work actually slowing things down by distracting people with work that doesn't move the frontier forward. OTOH, it could bring more people into AI generally, and so more of them working near the frontier.

So, I would judge this specific project on that basis.

One main question: could it allow AI researchers at the big labs do research faster? Maybe? Seems like it could help them automate part of their own work. Would they actually use it, or some other project influenced by it? I'm not sufficiently well-informed to guess.

Or, maybe even if it doesn't move forward the frontier of research, it will affect how likely dangerous AI is to be deployed.

I did have in mind that non-experientialist goods could count, but as you suggest, experientialist goods (or goods that depend on their acknowledgement to count, including non-hedonic ones, so other than pleasure) would probably be weakened during torture, so that could introduce a confounder. The comparison now would be mostly be between hedonic bads and non-experientialist goods.

Another issue is how to determine the weight of non-experientialist goods, especially if we don't want to be paternalistic or alienating. If we do so by subjective appreciation, then it seems like we're basically just turning them back into experientialist goods. If we do so via subjective weights (even if someone can't appreciate a good at the time, they might still insist it's very important and we could infer how good it is for them), its subjective weight could also be significantly reduced during torture. So we still wouldn't necessarily be comparing the disvalue of torture to the maximum value of non-experientialist goods using Tim's judgement while being tortured.

Instead, if we do still want to use subjective weights, we might consider the torture and non-hedonic goods happening at different times (and in different orders?), for equal durations, and ask Tim during the torture, during the non-hedonic goods and at other times whether the non-hedonic goods make up for the torture. If the answers agree, then great. But if they disagree, this could be hard to resolve, because Tim's answer could be biased in each situation: he underweights non-hedonic goods during torture and otherwise while not focusing on them, and he underweights torture while not being tortured.

EDIT: On the other hand, if I tried to come up with a cognitively plausible objective cardinal account of subjective weights and value, I'd expect torture to be able to reach the max or get close to it, and that would be enough to say that negative hedonic welfare can be at least about as bad as goods can be good (in aggregate, in a moment).

As one example, I think Richard Yetter Chappell, an objective list theorist, would say torture with maximal non-hedonic goods at the same time would be bad overall per moment: https://rychappell.substack.com/p/a-multiplicative-model-of-value-pluralism

Thanks for sharing this!

Given that they've been breaking the law and that hasn't stopped them, does it seem very likely that there'd be a court injunction to stop or that a court injunction would change anything? Why hasn't the law itself stopped them? Are fines the only penalty for violating the law? If they violate the court injunction, would the only penalty again be fines, or could executives face jail time?

One of my main high-level hesitations with AI doom and futility arguments is something like this, from Katja Grace:

My weak guess is that there’s a kind of bias at play in AI risk thinking in general, where any force that isn’t zero is taken to be arbitrarily intense. Like, if there is pressure for agents to exist, there will arbitrarily quickly be arbitrarily agentic things. If there is a feedback loop, it will be arbitrarily strong. Here, if stalling AI can’t be forever, then it’s essentially zero time. If a regulation won’t obstruct every dangerous project, then is worthless. Any finite economic disincentive for dangerous AI is nothing in the face of the omnipotent economic incentives for AI. I think this is a bad mental habit: things in the real world often come down to actual finite quantities. This is very possibly an unfair diagnosis. (I’m not going to discuss this later; this is pretty much what I have to say.)

"Omnipotent" is the impression I get from a lot of the characterization of AGI.

Another recent specific example here.

Similarly, I've had the impression that specific AI takeover scenarios don't engage enough with the ways they could fail for the AI. Some are based primarily on nanotech or engineered pathogens, but from what I remember of the presentations and discussions I saw, they don't typically directly address enough of the practical challenges for an AI to actually pull them off, e.g. access to the materials and a sufficiently sophisticated lab/facility with which to produce these things, little or poor verification of the designs before running them through the lab/facility (if done by humans), attempts by humans to defend ourselves (e.g. the military) or hide, ways humans can disrupt power supplies and electronics, and so on. Even if AI takeover scenarios are disjunctive, so are the ways humans can defend ourselves and the ways such takeover attempts could fail, and we have a huge advantage through access to and control over stuff in the outside world, including whatever the AI would "live" on and what powers it. Some of the reasons AI could fail across takeover plans could be common across significant shares of otherwise promising takeover plans, potentially placing a limit on how far an AI can get by considering or trying more and more such plans or more complex plans.

I've seen it argued that it would be futile to try to make the AI more risk-averse (e.g. sharply decreasing marginal returns), but this argument didn't engage with how risks for the AI from human detection and possible shutdown, threats by humans or the opportunity to cooperate/trade with humans would increasingly disincentivize such an AI from taking extreme action the more risk-averse it is.

I've also heard an argument (in private, and not by anyone working at an AI org or otherwise well-known in the community) that AI could take over personal computers and use them, but distributing computations that way seems extremely impractical for computations that run very deep, so there could be important limits on what an AI could do this way.

That being said, I also haven't personally engaged deeply with these arguments or read a lot on the topic, so I may have missed where these issues are addressed, but this is in part because I haven't been impressed by what I have read (among other reasons, like concerns about backfire risks, suffering-focused views and very low probabilities of the typical EA or me in particular making any difference at all).

What follows are some probability-of-sentience- and rate-of-subjective-experience-adjusted welfare range estimates.

The probability of sentience is multiplied through here, right? Some of these animals are assigned <50% probability of sentience but have nonzero probability of sentience-adjusted welfare ranges at the median. Another way to present this would be to construct the random variable that's 0 if they're not sentient, and then equal to the random variable representing their moral weight conditional on sentience. This would be your actual distribution of welfare ranges for the animal, accounting for their probability of sentience. That being said, what you have now might be more useful to represent a range of expected moral weights for (approximately) risk-neutral EV-maximizing utilitarians, to represent deep uncertainty or credal fragility.

Rethink Priorities’ (RP’s) median welfare range estimates, given in this post from Bob Fischer, for:

  • Black soldier flies, 0.013.
  • Silkworms, 0.002.


It's worth noting that most arthropods by population are significantly smaller, have significantly smaller brains and would probably have less sophisticated behaviour (at least compared to adult black soldier flies; I'm not familiar with silkworm and other larval behaviour), so would probably score lower on both probability of sentience and welfare range. So, if you're including all arthropods and using these figures for all arthropods, you should probably think of these numbers (or at least the BSF ones) as providing an overestimate of the arthropod welfare effects.

Hi Vasco, thanks for writing this! I'm glad to see more cross-cause research, and this seems like a useful starting point.

Some quick thoughts on why the deforestation rate assumptions might be too high:

Net change in forest area per capita in 2015[3] (m2/person), in each of the countries analysed by GiveWell for its top charities here[4]. I calculated this from the ratio between:

  • Net change in forest area in 2015 by country (ha), based on these data from Our World in Data (OWID).
  • Population in 2015 by country, based on these data from OWID.


This is only accurate to the extent the annual impact on net forest area of the people saved by GiveWell’s top charities is similar to that of the mean citizens of their countries in 2015.

This assumption would not hold if some of the major causes of deforestation are limited by factors not very sensitive to population size. For example, some deforestation may be driven by international demand for products that are produced in those countries, so that the effects of more people willing to work on these products (by saving lives) should be tempered by elasticity effects. They could also be limited by capital, which GiveWell beneficiaries are unlikely to provide given their poverty.

Deforestation for agriculture for domestic consumption or for living area would be sensitive to the population size, but, again, GiveWell beneficiaries may be unrepresentative, a possibility you implicitly acknowledge by assuming is not the case.

Furthermore, with increasing deforestation, there will be less land left to deforest, and that land may be harder to deforest (because of practical or political challenges). Each of these point towards the marginal effect of population being smaller than the average effect.

I haven't looked into any of this in detail or tried to verify any of these possibilities, though.

I wonder if your original description is compatible with "extraordinarily intense physical pain". Or, maybe it could still be extraordinary, but well within what's bearable and far from torture. Could he carry on a conversation with his high school sweetheart if the pain were intense enough, or be excited to marry her, or think about his promise to his grandfather? When pain reaches a certain level of intensity, I expect it to be very difficult to focus attention on other things. And, to the extent that he is focusing his attention on other things and away from the pain, the pain's hedonic intensity is reduced.

Imagine he's also or instead having a panic attack. Or, instead of the trip, he's undergoing waterboarding as a cultural rite of passage (with all the same goods, similar ones or greater ones from your original description). How long could the average person be voluntarly waterboarded for any purpose? (EDIT: Or for some personal positive good in particular? I can imagine a sense of duty, e.g. to prevent greater harm to others or harm to loved ones in particular, allowing someone to last a long time, and those may figure into welfare range, but I probably wouldn't describe these in terms of positive goods, or maybe even bads, so that they can't be put on a ratio scale, only an interval scale at most.) These seem intense enough to consume attention. (I honestly don't have much intuitive sense of how painful frostbite can be.)

(I also don't expect added hunger to make much difference to his hedonic welfare in the moment, because he'll be focusing on the much more intense pain. Less sure about financial costs, although those would probably not really affect his welfare during the trip itself.)

Are you making short term (e.g. 1 year) AI-related forecasts, too? It would be helpful to figure out how much weight to give to your forecasts in this domain (and between individual forecasters), and may also give useful feedback for your longer range forecasts.

Load More