I wrote this several months ago for LessWrong, but it seemed useful to have crossposted here.

It's a writeup of several informal conversations I had with Andrew Critch (of the Berkeley Existential Risk Initiative) about what considerations are important for taking AI Risk seriously, based on his understanding of the AI landscape. (The landscape has changed slightly in the past year, but I think most concerns are still relevant)




Sorted by Click to highlight new comments since:

[meta: this comment is written in more argumentative way than what is my actual position (where I have more uncertainty); it seems more useful to state the disagreement than describe the uncertainties]

If I understand the model for how the AI safety field should grow, implicitly advocated for by this text, correctly, it seems to me the model is possibly wrong/harmful.

As I understand it, the model proposed is something like

  • people should earn enough money to create themselves comfortable runway
  • they should study hard in isolation
  • ?
  • do research
  • (?) upload good papers on arxive or at least write impressive posts on LW
  • (?) than, they get noticed and start talking to other people in the field

This seems strange

I. They way how this path into the field is "filtering people" is a set of filters ca: "ability to get enough money to create themselves runway" AND "really really strong motivation to work on safety" AND "ability to work for long period of time in isolation" AND "ability to turn insight into papers".

This seems to be filtering on somewhat arbitrary set of criteria. Likely dropping talented people, who for example

  • would have to pay high opportunity costs by working on "creating financial runway"
  • get depressed working in isolation on x-risk problems
  • are at least initially internally more motivated by interesting research problems than existential risk worries
  • (and many more!)

II. Doing research in this field "without talking to people" is probably quite hard. It is improving, but many important ideas and considerations are implicit/not shared publicly.

III. It encourages people to do some sort of moral sacrifice (opposite of moral hazard: you take the risk, mostly others benefit). It would be at least fair to point it out. Also it seems relatively easy for the community to decrease the personal risks, but this way of thinking does not lead people to actually do it.

IV. The model seems to suggest people should learn and start doing research in a very different way to other intellectual enterprises like physics or machine learning. Why it should be the case?

Ah. So I'm not sure I can represent Critch here off-the-cuff, but my interpretation of this post is a bit different than what you've laid out here.

This is not a proposal for how the field overall should grow. There should be infrastructural efforts made to onboard people via mentorship, things like AI Safety Camp, things like MIRI Fellows, etc.

This post is an on-the-margin recommendation to some subset of people. I think there were a few intents here:

1. If you're basic plan is to donate, consider trying to become useful for direct work instead. Getting useful on direct work probably requires at least some chunk of time for thinking and understanding the problem, and some chunk of time for learning new skills.

2. The "take time off to think" thing isn't meant to be "do solo work" (like writing papers) It's more specifically for learning about the AI Alignment problem and landscape. From there, maybe the thing you do is write papers (solo or at an org), or maybe it's apply for a managerial or ops position at an org, or maybe it's founding a new project.

3. I think (personal opinion, although I expect Critch to agree), that when it comes to learning skills there are probably better ways to go about it than "just study independently." (Note the sub-sections on taking advantage of being in school). This will vary from person to person.

4. Not really covered in the post, but I personally think there's a "mentorship bottleneck". It's obviously better to have mentors and companions, and the field should try to flesh that out. The filter for people who can work at least somewhat independently and figure things out for themselves is a filter of necessity, not an ideal situation.

3. I think Critch was specifically trying to fill some particular-gaps on the margin, which is "people who can be trusted to flesh out the middle-tier hierarchy", who can be entrusted to launch and run new projects competently without needing to be constantly doublechecked. This is necessary to grow the field for people who do still need mentorship or guidance. (My read from recent 80k posts is that the field is still somewhat "management bottlenecked")

I suggest using full names like "Andrew Critch" rather than more ingroup-y nicknames.

Agreed but with the caveat that I'd rather see something imperfect than nothing at all. I like the post and I'm glad it was cross-posted.

Fair, but it's fairly easy to fix. Updated, and I added a link to BERI to give people some more context of who Andrew Critch is and why you might care.

Yeah, I think having the full name in the title of the crosspost is all that's needed.

Even if you’re not interested in orienting your life around helping with x-risk – if you just want to not be blindsided by radical changes that may be coming


We don't know exactly what will happen, but I expect serious changes of some sort over the next 10 years. Even if you aren't committing to saving the world, I think it's in your interest just to understand what is happening, so in a decade or two you aren't completely lost.

And even 'understanding the situation' is complicated enough that I think you need to be able to quit your day-job and focus full-time, in order to get oriented.

Raymond, do you or Andrew Critch have any concrete possibilities in mind for what "orienting one's life"/"understanding the situation" might look like from a non-altruistic perspective? I'm interested in hearing concrete ideas for what one might do; the only suggestions I can recall seeing so far were mentioned in the 80,000 Hours podcast episode with Paul Christiano, to save money and invest in certain companies. Is this the sort of thing you had in mind?

The way I am imagining it, a person thinking about this from a non-altruistic perspective would then think about the problem for several years and would narrow this list down (or add new things to it) and act on some subset of them (e.g. maybe they would think about which companies to invest in and decide how much money to save, but to not implement some other idea). Is this an accurate understanding of your view?

(Off the cuff thoughts, which are very low confidence. Not attributed to Critch at all)

So, this depends quite a bit on how you think the world is shaped (which is a complex enough question that Critch made the recommendation to just think about it for weeks or months). But the three classes of answer that I can think of are:

a) in many possible worlds, the selfish and altruistic answers are just the same. The best way to survive a fast or even moderate takeoff is to ensure a positive singularity, and just pouring your efforts and money into maximizing the chance of that is really all you can do.

b) in some possible worlds (perhaps like Robin Hanson's Age of Em), it might matter that you have skills that can go into shaping the world (such as skilled programming). Though this is realistically only an option for some people.

c) for the purposes of flourishing in the intervening years (if we're in a slowish takeoff over the next 1-4 decades), owning stock in the right companies or institutions might help. (although some caution about worrying this over the actual takeoff period may be more of a distraction than helpful)

d) relatedly, simply getting yourself psychologically ready for the world to change dramatically may be helpful in and of itself, and/or be useful to make yourself ready to take on sudden opportunities as they arise.

More from Raemon
Curated and popular this week
Relevant opportunities