I'm a PhD student at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I edit and publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.

Wiki Contributions


General vs specific arguments for the longtermist importance of shaping AI development

Finally, I personally think that the strongest case that we can currently make for the longtermist importance of shaping AI development is fairly general - something along the lines of the most important century series - and yet this doesn't seem to be the "default" argument (i.e. the one presented in key EA content/fellowships/etc. when discussing AI).

I agree that the general argument is the strongest one, in the sense that it is most likely to be correct / robust.

The problem with general arguments is that they tell you very little about how to solve the problem. "The climate is messed up, probably due to human activity" doesn't tell you much about what to do to fix the climate. In contrast, "On average the Earth is warming due to increased concentrations of greenhouse gases" tells you a lot more (e.g. reduce emissions of GHGs, take them out of the atmosphere, find a way of cooling the Earth to balance it out).

If I were producing key EA content/fellowships/etc, I would be primarily interested in getting people to solve the problem, which suggests a focus on specific arguments, even though the general argument is "stronger".

Note that general arguments can motivate you to learn more about the problem to develop more specific arguments, which you can then solve. (E.g. if you didn't know about the greenhouse effect, the observation that the Earth is warming can motivate you to figure out why, before attempting a solution.) So if you're trying to get people to produce novel specific arguments for AI risk, then talking about the general argument makes sense.

Is it crunch time yet? If so, who can help?

I do not think it is crunch time. I think people in the reference class you're describing should go with some "normal" plan such as getting into the best AI PhD program you can get into, learning how to do AI research, and then working on AI safety.

(There are a number of reasons you might do something different. Maybe you think academia is terrible and PhDs don't teach you anything, and so instead you immediately start to work independently on AI safety. That all seems fine. I'm just saying that you shouldn't make a change like this because of a supposed "crunch time" -- I would much prefer having significantly better help in 5 or 10 years, rather than not-very-good help now.)

That being said, I feel confident that there are other AI safety researchers who would say it is crunch time or very close to it. I expect this would be a minority (i.e. < 50%).

Seeking social science students / collaborators interested in AI existential risks

Planned summary for the Alignment Newsletter:

This post presents a list of research questions around existential risk from AI that can be tackled by social scientists. The author is looking for collaborators to expand the list and tackle some of the questions on it, and is aiming to provide some mentorship for people getting involved.

The motivated reasoning critique of effective altruism

It’s so easy to collapse into the arms of “if there’s even a small chance X will make a very good future more likely …” As with consequentialism, I totally buy the logic of this! The issue is that it’s incredibly easy to hide motivated reasoning in this framework. Figuring out what’s best to do is really hard, and this line of thinking conveniently ends the inquiry (for people who want that).

I have seen something like this happen, so I'm not claiming it doesn't, but it feels pretty confusing to me. The logic pretty clearly doesn't hold up. Even if you accept that "very good future" is all that matters, you still need to optimize for the action that most increases the probability of a very good future, and that's still a hard question, and you can't just end the inquiry with this line of thinking.

The motivated reasoning critique of effective altruism

Yeah, I agree that would also count (and as you might expect I also agree that it seems quite hard to do).

Basically with (b) I want to get at "the model does something above and beyond what we already had with verbal arguments"; if it substantially affects the beliefs of people most familiar with the field that seems like it meets that criterion.

The motivated reasoning critique of effective altruism

The obvious response here is that I don't think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I'm even more suspicious of other methodologies here.

Yeah, I'm just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.

I think we might just be arguing about different things here?

Makes sense, I'm happy to ignore those sorts of methods for the purposes of this discussion.

Medicine is less amenable to empirical testing than physics, but that doesn't mean that clinical intuition is a better source of truth for the outcomes of drugs than RCTs.

You can't run an RCT on arms races between countries, whether or not AGI leads to extinction, whether totalitarian dictatorships are stable, whether civilizational collapse would be a permanent trajectory change vs. a temporary blip, etc.

What's the actual evidence for this?

It just seems super obvious in almost every situation that comes up? I also don't really know how you expect to get evidence; it seems like you can't just "run an RCT" here, when a typical quantitative model for a longtermist question takes ~a year to develop (and that's in situations that are selected for being amenable to quantitative modeling).

For example, here's a subset of the impact-related factors I considered when I was considering where to work:

  1. Lack of non-xrisk-related demands on my time
  2. Freedom to work on what I want
  3. Ability to speak publicly
  4. Career flexibility
  5. Salary

I think incorporating just these factors into a quantitative model is a hell of an ask (and there are others I haven't listed here -- I haven't even included the factors for the academia vs industry question). A selection of challenges:

  1. I need to make an impact calculation for the research I would do by default.
  2. I need to make that impact calculation comparable with donations (so somehow putting them in the same units).
  3. I need to predict the counterfactual research I would do at each of the possible organizations if I didn't have the freedom to work on what I wanted, and quantify its impact, again in similar units.
  4. I need to model the relative importance of technical research that tries to solve the problem vs. communication.
  5. To model the benefits of communication, I need to model field-building benefits, legitimizing benefits, and the benefit of convincing key future decision-makers.
  6. I need to quantify the probability of various kinds of "risks" (the org I work at shuts down, we realize AI risk isn't actually a problem, a different AI lab reveals that they're going to get to AGI in 2 years, unknown unknowns) in order to quantify the importance of career flexibility.

I think just getting a framework that incorporates all of these things is already a Herculean effort that really isn't worth it, and even if you did make such a framework, I would be shocked if you could set the majority of the inputs based on actually good reference classes rather than just "what my gut says". (And that's all assuming I don't notice a bunch more effects I failed to mention initially that my intuitions were taking into account but that I hadn't explicitly verbalized.)

It seems blatantly obvious that the correct choice here is not to try to get to the point of "quantitative model that captures the large majority of the relevant considerations with inputs that have some basis in reference classes / other forms of legible evidence", and I'd be happy to take a 100:1 bet that you wouldn't be able to produce a model that meets that standard (as I evaluate it) in 1000 person-hours.

I have similar reactions for most other cost effectiveness analyses in longtermism. (For quantitative modeling in general, it depends on the question, but I expect I would still often have this reaction.)

Eg, weird to use median staff member's views as a proxy for truth

If you mean that the weighting on saving vs. improving lives comes from the median staff member, note that GiveWell has been funding research that aims to set these weights in a manner with more legible evidence, because the evidence didn't exist. In some sense this is my point -- that if you want to get legible evidence, you need to put in large amounts of time and money in order to generate that evidence; this problem is worse in the longtermist space and is rarely worth it.

The motivated reasoning critique of effective altruism

Replied to Linch -- TL;DR: I agree this is true compared to global poverty or animal welfare, and I would defend this as simply the correct way to respond to actual differences in the questions asked in longtermism vs. those asked in global poverty or animal welfare.

You could move me by building an explicit quantitative model for a popular question of interest in longtermism that (a) didn't previously have models (so e.g. patient philanthropy or AI racing doesn't count), (b) has an upshot that we didn't previously know via verbal arguments, (c) doesn't involve subjective personal guesses or averages thereof for important parameters, and (d) I couldn't immediately tear a ton of holes in that would call the upshot into question.

The motivated reasoning critique of effective altruism

My guess is that longtermist EAs ( like almost all humans) have never been that close to purely quantitative models guiding decisions

I agree with the literal meaning of that, because it is generally a terrible idea to just do what a purely quantitative model tells you (and I'll note that even GiveWell isn't doing this). But imagining the spirit of what you meant, I suspect I disagree.

I don't think you should collapse it into the single dimension of "how much do you use quantitative models in your decisions". It also matters how amenable the decisions are to quantitative modeling. I'm not sure how you're distinguishing between the two hypotheses:

  1. Longtermists don't like quantitative modeling in general.
  2. Longtermist questions are not amenable to quantitative modeling, and so longtermists don't do much quantitative modeling, but they would if they tackled questions that were amenable to quantitative modeling.

(Unless you want to defend the position that longtermist questions are just as easy to model as, say, those in global poverty? That would be... an interesting position.)

Also, just for the sake of actual evidence, here are some attempts at modeling, biased towards AI since that's the space I know. Not all are quantitative, and none of them are cost effectiveness analyses.

  1. Open Phil's reports on AI timelines: Biological anchors, Modeling the Human Trajectory, Semi-informative priors, brain computation, probability of power-seeking x-risk
  2. Races: racing to the precipice, followup
  3. Mapping out arguments: MTAIR and its inspiration

 going from my stereotype of weeatquince(2020)'s views

Fwiw, my understanding is that weeatquince(2020) is very pro modeling, and is only against the negation of the motte. The first piece of advice in that post is to use techniques like assumption based planning, exploratory modeling, and scenario planning, all of which sound to me like "explicit modeling". I think I personally am a little more against modeling than weeatquince(2020).

The motivated reasoning critique of effective altruism

Overall great post, and I broadly agree with the thesis. (I'm not sure the evidence you present is all that strong though, since it too is subject to a lot of selection bias.) One nitpick:

Most of the posts’ comments were critical, but they didn’t positively argue against EV calculations being bad for longtermism. Instead they completely disputed that EV calculations were used in longtermism at all!

I think you're (unintentionally) running a motte-and-bailey here.

Motte: Longtermists don't think you should build explicit quantitative models, take their best guess at the inputs, chug through the math, and do whatever the model says, irrespective of common sense, verbal arguments, model uncertainty, etc.

Bailey: Longtermists don't think you should use numbers or models (and as a corollary don't consider effectiveness).

(My critical comment on that post claimed the motte; later I explicitly denied the bailey.)

Load More