Exploring different research directions to find out where in the x-risk research ecosystem I best fit in. Part of the 2018-2020 cohort in FHI's Research Scholars Programme. Previously Executive Director of the Foundational Research Institute, a project by the Effective Altruism Foundation (but I don't endorse that organization's 'suffering-focused' view on ethics).

Max_Daniel's Comments

Max_Daniel's Shortform

[Is longtermism bottlenecked by "great people"?]

Someone very influential in EA recently claimed in conversation with me that there are many tasks X such that (i) we currently don't have anyone in the EA community who can do X, (ii) the bottleneck for this isn't credentials or experience or knowledge but person-internal talent, and (iii) it would be very valuable (specifically from a longtermist point of view) if we could do X. And that therefore what we most need in EA are more "great people".

I find this extremely dubious. (In fact, it seems so crazy to me that it seems more likely than not that I significantly misunderstood the person who I think made these claims.) The first claim is of course vacuously true if, for X, we choose some ~impossible task such as "experience a utility-monster amount of pleasure" or "come up with a blueprint for how to build safe AGI that is convincing to benign actors able to execute it". But of course more great people don't help with solving impossible tasks.

Given the size and talent distribution of the EA community my guess is that for most apparent X, the issue either is that (a) X is ~impossible, or (b) there are people in EA who could do X, but the relevant actors cannot identify them, (c) acquiring the ability to do X is costly (e.g. perhaps you need time to acquire domain-specific expertise), even for maximally talented "great people", and the relevant actors either are unable to help pay that cost (e.g. by training people themselves, or giving them the resources to allow them to get training elsewhere) or make a mistake by not doing so.

My best guess for the genesis of the "we need more great people" perspective: Suppose I talk a lot to people at an organization that thinks there's a decent chance we'll develop transformative AI soon but it will go badly, and that as a consequence tries to grow as fast as possible to pursue various ambitious activities which they think reduces that risk. If these activities are scalable projects with short feedback loops on some intermediate metrics (e.g. running some super-large-scale machine learning experiments), then I expect I would hear a lot of claims like "we really need someone who can do X". I think it's just a general property of a certain kind of fast-growing organization that's doing practical things in the world that everything constantly seems like it's on fire. But I would also expect that, if I poked a bit at these claims, it would usually turn out that X is something like "contribute to this software project at the pace and quality level of our best engineers, w/o requiring any management time" or "convince some investors to give us much more money, but w/o anyone spending any time transferring relevant knowledge". If you see that things break because X isn't done, even though something like X seems doable in principle (perhaps you see others do it), it's tempting to think that what you need is more "great people" who can do X. After all, people generally are the sort of stuff that does things, and maybe you've actually seen some people do X. But it still doesn't follow that in your situation "great people" are the bottleneck ...

Curious if anyone has examples of tasks X for which the original claims seem in fact true. That's probably the easiest way to convince me that I'm wrong.

Cotton‐Barratt, Daniel & Sandberg, 'Defence in Depth Against Human Extinction'

Thank you for sharing your reaction!

Would be interested to hear if the authors have though through this.

I haven't, but it's possible that my coauthors have. I generally agree that it might be worthwhile to think along the lines you suggested.

Max_Daniel's Shortform

Thanks for sharing your reaction! There is some chance that I'll write up these and maybe other thoughts on AI strategy/governance over the coming months, but it depends a lot on my other commitments. My current guess is that it's maybe only 15% likely that I'll think this is the best use of my time within the next 6 months.

Long-term investment fund at Founders Pledge

That sounds great! I find the arguments for giving (potentially much) later intriguing and underappreciated. (If I had to allocate a large amount of money myself, I'm not sure what I'd end up doing. But overall it seems good to me if there is at least the option to invest.) I'd be very excited for such a fund to exist - partly because I expect that setting it up and running it will provide a bunch of information on empirical questions relevant for deciding whether investing into such a fund beats giving now.

Max_Daniel's Shortform

[Some of my tentative and uncertain views on AI governance, and different ways of having impact in that area. Excerpts, not in order, from things I wrote in a recent email discussion, so not a coherent text.]

1. In scenarios where OpenAI, DeepMind etc. become key actors because they develop TAI capabilities, our theory of impact will rely on a combination of affecting (a) 'structure' and (b) 'content'. By (a) I roughly mean how the relevant decision-making mechanisms look like irrespective of the specific goals and resources of the actors the mechanism consists of; e.g., whether some key AI lab is a nonprofit or a publicly traded company; who would decide by what rules/voting scheme how Windfall profits would be redistributed; etc. By (b) I mean something like how much the CEO of a key firm, or their advisors, care about the long-term future. -- I can see why relying mostly on (b) is attractive, e.g. it's arguably more tractable; however, some EA thinking (mostly from the Bay Area / the rationalist community to be honest) strikes me as focusing on (b) for reasons that seem ahistoric or otherwise dubious to me. So I don't feel convinced that what I perceive to be a very stark focus on (b) is warranted. I think that figuring out if there are viable strategies that rely more on (a) is better done from within institutions that have no ties with key TAI actors, and also might be best done my people that don't quite match the profile of the typical new EA that got excited about Superintelligence or HPMOR. Overall, I think that making more academic research in broadly "policy relevant" fields happen would be a decent strategy if one ultimately wanted to increase the amount of thinking on type-(a) theories of impact.

2. What's the theory of impact if TAI happens in more than 20 years? More than 50 years? I think it's not obvious whether it's worth spending any current resources on influencing such scenarios (I think they are more likely but we have much less leverage). However, if we wanted to do this, then I think it's worth bearing in mind that academia is one of few institutions (in a broad sense) that has a strong track record of enabling cumulative intellectual progress over long time scales. I roughly think that, in a modal scenario, no-one in 50 years is going to remember anything that was discussed on the EA Forum or LessWrong, or within the OpenAI policy team, today (except people currently involved); but if AI/TAI was still (or again) a hot topic then, I think it's likely that academic scholars will read academic papers by Dafoe, his students, the students of his students etc. Similarly, based on track records I think that the norms and structure of academia are much better equipped than EA to enable intellectual progress that is more incremental and distributed (as opposed to progress that happens by way of 'at least one crisp insight per step'; e.g. the Astronomical Waste argument would count as one crisp insight); so if we needed such progress, it might make sense to seed broadly useful academic research now. 


My view is closer to "~all that matters will be in the specifics, and most of the intuitions and methods for dealing with the specifics are either sort of hard-wired or more generic/have different origins than having thought about race models specifically". A crux here might be that I expect most of the tasks involved in dealing with the policy issues that would come up if we got TAI within the next 10-20 years to be sufficiently similar to garden-variety tasks involved in familiar policy areas that as a first pass: (i) if theoretical academic research was useful, we'd see more stories of the kind "CEO X / politician Y's success was due to idea Z developed through theoretical academic research", and (ii) prior policy/applied strategy experience is the background most useful for TAI policy, with usefulness increasing with the overlap in content and relevant actors; e.g.: working with the OpenAI policy team on pre-TAI issues > working within Facebook on a strategy for how to prevent the government to split up the firm in case a left-wing Democrat wins > business strategy for a tobacco company in the US > business strategy for a company outside of the US that faces little government regulation > academic game theory modeling. That's probably too pessimistic about the academic path, and of course it'll depend a lot on the specifics (you could start in academia to then get into Facebook etc.), but you get the idea.


Overall, the only somewhat open question for me is whether ideally we'd have (A) ~only people working quite directly with key actors or (B) a mix of people working with key actors and more independent ones e.g. in academia. It seems quite clear to me that the optimal allocation will contain a significant share of people working with key actors [...]

If there is a disagreement, I'd guess it's located in the following two points: 

(1a) How big are countervailing downsides from working directly with, or at institutions having close ties with, key actors? Here I'm mostly concerned about incentives distorting the content of research and strategic advice. I think the question is broadly similar to: If you're concerned about the impacts of the British rule on India in the 1800s, is it best to work within the colonial administration? If you want to figure out how to govern externalities from burning fossil fuels, is it best to work in the fossil fuel industry? I think the cliche left-wing answer to these questions is too confident in "no" and is overlooking important upsides, but I'm concerned that some standard EA answers in the AI case are too confident in "yes" and are overlooking risks. Note that I'm most concerned about kind of "benign" or "epistemic" failure modes: I think it's reasonably easy to tell people with broadly good intentions apart from sadists or even personal-wealth maximizers (at least in principle -- if this will get implemented is another question); I think it's much harder to spot cases like key people incorrectly believing that it's best if they keep as much control for themselves/their company as possible because after all they are the ones with both good intentions and an epistemic advantage (note that all of this really applies to a colonial administration with little modification, though here in cases such as the "Congo Free State" even the track record of "telling personal-wealth maximizers apart from people with humanitarian intentions" maybe isn't great -- also NB I'm not saying that this argument would necessarily be unsound; i.e. I think that in some situations these people would be correct).

(1b) To what extent to we need (a) novel insights as opposed to (b) an application of known insights or common-sense principles? E.g., I've heard claims that the sale of telecommunication licenses by governments is an example where post-1950 research-level economics work in auction theory has had considerable real-world impact, and AFAICT this kind of auction theory strikes me as reasonably abstract and in little need of having worked with either governments or telecommunication firms. Supposing this is true (I haven't really looked into this), how many opportunities of this kind are there in AI governance? I think the case for (A) is much stronger if we need little to no (a), as I think the upsides from trust networks etc. are mostly (though not exclusively) useful for (b). FWIW, my private view actually is that we probably need very little of (a), but I also feel like I have a poor grasp of this, and I think it will ultimately come down to what high-level heuristics to use in such a situation.

Max_Daniel's Shortform

[Some of my high-level views on AI risk.]

[I wrote this for an application a couple of weeks ago, but thought I might as well dump it here in case someone was interested in my views. / It might sometimes be useful to be able to link to this.]

[In this post I generally state what I think ​before ​updating on other people’s views – i.e., what’s ​sometimes known as​ ‘impressions’ as opposed to ‘beliefs.’]


  • Transformative AI (TAI) – the prospect of AI having impacts at least as consequential as the Industrial Revolution – would plausibly (~40%) be our best lever for influencing the long-term future if it happened this century, which I consider to be unlikely (~20%) but worth betting on.
  • The value of TAI depends not just on the technological options available to individual actors, but also on the incentives governing the strategic interdependence between actors. Policy could affect both the amount and quality of technical safety research and the ‘rules of the game’ under which interactions between actors will play out.

Why I'm interested in TAI as a lever to improve the long-run future

I expect my perspective to be typical of someone who has become interested in TAI through their engagement with the effective altruism (EA) community. In particular,

  • My overarching interest is to make the lives of as many moral patients as possible to go as well as possible, no matter where or when they live; and
  • I think that in the world we find ourselves in – it could have been otherwise –, this goal entails strong longtermism,​ i.e. the claim that “the primary determinant of the value of our actions today is how those actions affect the very long-term future.”

Less standard but not highly unusual (within EA) high-level views I hold more tentatively:

  • The indirect long-run impacts of our actions are extremely hard to predict and don’t ‘cancel out’ in expectation. In other words, I think that what ​Greaves (2016)​ calls ​complex cluelessness​ is a pervasive problem. In particular, evidence that an action will have desirable effects in the short term generally is ​not​ a decisive reason to believe that this action would be net positive overall, and neither will we be able to establish the latter through any other means.
  • Increasing the relative influence of longtermist actors is one of the very few strategies we have good reasons to consider net positive. Shaping TAI is a particularly high-leverage instance of this strategy, where the main mechanism is reaping an ‘epistemic rent’ from having anticipated TAI earlier than other actors. I take this line of support to be significantly more robust than any ​particular story on how TAI might pose a global catastrophic risk including even broad operationalizations of the ‘value alignment problem.’

My empirical views on TAI

I think the strongest reasons to expect TAI this century are relatively outside-view-based (I talk about this century just because I expect that later developments are harder to predictably influence, not because I think a century is particularly meaningful time horizon or because I think TAI would be less important later):

  • We’ve been able to automate an increasing number of tasks (with increasing performance and falling cost), and I’m not aware of a convincing argument for why we should be ​highly confident​ that this trend will stop short of ​full automation –​ i.e., AI systems being able to do all tasks more economically efficiently than humans –, despite moderate scientific and economic incentives to find and publish one.
  • Independent types of weak evidence such as ​trend extrapolation​ and ​expert​ ​surveys​ suggest we might achieve full automation this century.
  • Incorporating full automation into macroeconomic growth models predicts – at least under some a​ssumptions – a sustained higher rate of economic growth (e.g. ​Hanson 2001​, Nordhaus 2015​, ​Aghion et al. 2017​), which arguably was the main driver of the welfare-relevant effects of the Industrial Revolution.
  • Accelerating growth this century is consistent with extrapolating historic growth rates, e.g. Hanson (2000[1998])​.

I think there are several reasons to be skeptical, but that the above succeeds in establishing a somewhat robust case for TAI this century not being wildly implausible.

My impression is that I’m less confident than the typical longtermist EA in various claims around TAI, such as:

  • Uninterrupted technological progress would eventually result in TAI;
  • TAI will happen this century;
  • we can currently anticipate any specific way of positively shaping the impacts of TAI;
  • if the above three points were true then shaping TAI would be the most cost-effective way of improving the long-term future.

My guess is this is due to different priors, and due to frequently having found extant specific arguments for TAI-related claims (including by staff at FHI and Open Phil) less convincing than I would have predicted. I still think that work on TAI is among the few best shots for current longtermists.

Max_Daniel's Shortform

What's the right narrative about global poverty and progress? Link dump of a recent debate.

The two opposing views are:

(a) "New optimism:" [1] This is broadly the view that, over the last couple of hundred years, the world has been getting significantly better, and that's great. [2] In particular, extreme poverty has declined dramatically, and most other welfare-relevant indicators have improved a lot. Often, these effects are largely attributed to economic growth.

  • Proponents in this debate were originally Bill Gates, Steven Pinker, and Max Roser. But my loose impression is that the view is shared much more widely.
  • In particular, it seems to be the orthodox view in EA; cf. e.g. Muehlhauser listing one of Pinker's books in his My worldview in 5 books post, saying that "Almost everything has gotten dramatically better for humans over the past few centuries, likely substantially due to the spread and application of reason, science, and humanism."

(b) Hickel's critique: Anthropologist Jason Hickel has criticized new optimism on two grounds:

  • 1. Hickel has questioned the validity of some of the core data used by new optimists, claiming e.g. that "real data on poverty has only been collected since 1981. Anything before that is extremely sketchy, and to go back as far as 1820 is meaningless."
  • 2. Hickel prefers to look at different indicators than the new optimists. For example, he has argued for different operationalizations of extreme poverty or inequality.

Link dump (not necessarily comprehensive)

If you only read two things, I'd recommend (1) Hasell's and Roser's article explaining where the data on historic poverty comes from and (2) the take by economic historian Branko Milanovic.

By Hickel (i.e. against "new optimism"):

By "new optimists":

Commentary by others:

My view

  • I'm largely unpersuaded by Hickel's charge that historic poverty data is invalid. Sure, it's way less good than contemporary data. But based on Hasell's and Roser's article, my impression is that the data is better than I would have thought, and its orthodox analysis and interpretation more sophisticated than I would have thought. I would be surprised if access to better data would qualitatively change the "new optimist" conclusion.
  • I think there is room for debate over which indicators to use, and that Hickel makes some interesting points here. I find it regrettable that the debate around this seems so adversarial.
  • Still, my sense is that there is an important, true, and widely underappreciated (particularly by people on the left, including my past self) core of the "new optimist" story. I'd expect looking at other indicators could qualify that story, or make it less simplistic, point to important exceptions etc. - but I'd probably consider a choice of indicators that painted an overall pessimistic picture as quite misleading and missing something important.
  • On the other hand, I would quite strongly want to resist the conclusion that everything in this debate is totally settled, and that the new optimists are clearly right about everything, in the same way in which orthodox climate science is right about climate change being anthropogenic, or orthodox medicine is right about homeopathy not being better than placebo. But I think the key uncertainties are not in historic poverty data, but in our understanding of wellbeing and its relationship to environmental factors. Some examples of why I think it's more complicated
    • The Easterlin paradox
    • The unintuitive relationship between (i) subjective well-being in the sense of the momentary affective valence of our experience on one hand and (ii) reported life satisfaction. See e.g. Kahneman's work on the "experiencing self" vs. "remembering self".
    • On many views, the total value of the world is very sensitive to population ethics, which is notoriously counterintuitive. In particular, on many plausible views, the development of the total welfare of the world's human population is dominated by its increasing population size.
  • Another key uncertainty is the implications of some of the discussed historic trends for the value of the world going forward, about which I think we're largely clueless. For example, what are the effects of changing inequality on the long-term future?

[1] It's not clear to me if "new optimism" is actually new. I'm using Hickel's label just because it's short and it's being used in this debate anyway, not to endorse Hickel's views or make any other claim.

[2] There is an obvious problem with new optimism, which is that it's anthropocentric. In fact, on many plausible views, the total axiological value of the world at any time in the recent past may be dominated by the aggregate wellbeing of nonhuman animals; even more counterintuitively, it may well be dominated by things like the change in the total population size of invertebrates. But this debate is about human wellbeing, so I'll ignore this problem.

Rethink Priorities Impact Survey

Thanks for posting this! I'd really like to see more organizations evaluate their impact, and publish about their analysis.

Just a quick note: You mention that I indicated I "found [y]our work on nuclear weapons somewhat useful". This is correct. I'd like to note that the main reason why I don't find it very useful simply is that I currently don't anticipate to work on nuclear security personally, or to make any decisions that depend on my understanding of nuclear security. In general, how "useful" people find your work is a mix of their focus and the quality of your work (which in this case AFAICT is very high, though I haven't reviewed it in detail), which might make it hard to interpret the results.

Assumptions about the far future and cause priority

Regarding your "outside view" point: I agree with what you say here, but think it cannot directly undermine my original "outside view" argument. These clarifications may explain why:

  • My original outside view argument appealed to the process by which certain global health interventions such as distributing bednets have been selected rather than their content. The argument is not "global health is a different area from economic growth, therefore a health intervention is unlikely to be optimal for accelerating growth"; instead it is "an intervention that has been selected to be optimal according to some goal X is unlikely to also be optimal according to a different goal Y".
    • In particular, if GiveWell had tried to identify those interventions that best accelerate growth, I think my argument would be moot (no matter what interventions they had come up with, in particular in the hypothetical case where distributing bednets had been the result of their investigation).
    • In general, I think that selecting an intervention that's optimal for furthering some goal needs to pay attention to all of importance, tractability, and neglectedness. I agree that it would be bad to exclusively rely on the heuristics "just focus on the most important long-term outcome/risk" when selecting longtermist interventions, just as it would be bad to just rely on the heuristics "work on fighting whatever disease has the largest disease burden globally" when selecting global health interventions. But I think these would just be bad ways to select interventions, which seems orthogonal to the question when an intervention selected for X will also be optimal for Y. (In particular, I don't think that my original outside view argument commits me to the conclusion that in the domain of AI safety it's best to directly solve the largest or most long-term problem, whatever that is. I think it does recommend to deliberately select an intervention optimized for reducing AI risk, but this selection process should also take into account feedback loops and all the other considerations you raised.)
  • The main way I can see to undermine this argument would be to argue that a certain pair of goals X and Y is related in such a way that interventions optimal for X are also optimal for Y (e.g., X and Y are positively correlated, though this in itself wouldn't be sufficient). For example, in this case, such an argument could be of the type "our best macroeconomic models predict that improving health in currently poor countries would have a permanent rate effect on growth, and empirically it seems likely that the potential for sustained increases in the growth rate is largest in currently poor countries" (I'm not saying this claim is true, just that I would want to see something like this).
Assumptions about the far future and cause priority
The "inside view" point is that Christiano's estimate only takes into account the "price of a life saved". But in truth GiveWell's recommendations for bednets or deworming are to a large measure driven by their belief, backed by some empirical evidence, that children who grow up free of worms or malaria become adults who can lead more productive lives. This may lead to better returns than what his calculations suggest. (Micronutrient supplementation may also be quite efficient in this respect.)

I think this is a fair point. Specifically, I agree that GiveWell's recommendations are only partly (in the case of bednets) or not at all (in the case of deworming) based on literally averting deaths. I haven't looked at Paul Christiano's post in sufficient detail to say for sure, but I agree it's plausible that this way of using "price of a life saved" calculations might effectively ignore other benefits, thus underestimating the benefits of bednet-like interventions compared to GiveWell's analysis.

I would need to think about this more to form a considered view, but my guess is this wouldn't change my mind on my tentative belief that global health interventions selected for their short-term (say, anything within the next 20 years) benefits aren't optimal growth interventions. This is largely because I think the dialectical situation looks roughly like this:

  • The "beware suspicious convergence" argument implies that it's unlikely (though not impossible) that health interventions selected for maximizing certain short-term benefits are also optimal for accelerating long-run growth. The burden of proof is thus with the view that they are optimal growth interventions.
  • In addition, some back-of-the-envelope calculations suggest the same conclusion as the first bullet point.
  • You've pointed out a potential problem with the second bullet point. I think it's plausible to likely that this significantly to totally removes the force of the second bullet point. But even if the conclusion of the calculations were completely turned on their head, I don't think they would by themselves succeed in defeating the first bullet point.
Load More