My motivation and theory of change for working in AI healthtech

Andrew Critch

This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive points, but awareness of the negative has been crucial to forming my priorities, so I'm going to start with those. I'm mostly addressing the EA community here, but hopefully this post will be of some interest to LessWrong and the Alignment Forum as well.

Part one — My main concerns

I think AGI is going to be developed soon, and quickly. Possibly (20%) that's next year, and most likely (80%) before the end of 2029. These are not things you need to believe for yourself in order to understand my view, so no worries if you're not personally convinced of this.

(For what it's worth, I don't expect to change my mind about the above AGI forecast in response to debate. That's because I feel sufficiently clear in my understanding of the various ways AGI could be developed from here, such that the disjunction of those possibilities adds up to a pretty high level of confidence in AGI coming soon, which is not much affected by who agrees with me about it. Also, I'm not really deferring to others about it, so I'm pretty confident the above forecast is not the result of any "echo chamber" or "pure hype" effects. My views here came through years of study and research in AI, combined with over a decade of private forecasting practice starting in 2010 — including a lot of hype-detection and bullshit detection practice — which I don't think can be succinctly conveyed in a blog post.)

I also currently think there's around a 15% chance that humanity will survive through the development of artificial intelligence. In other words, I think there's around an 85% chance that we will not survive the transition. Many factors affect this probability, so please take this as a conditional forecast that I'd like you to change if you can, rather than taking it as some unavoidable fate that humanity has no power to decide upon. With that said, I do have reasons for the number 85% being so high.

First, I think there's around a 35% chance that humanity will lose control of one of the first few AGI systems we develop, in a manner that leads to our extinction. Most (80%) of this probability (i.e., 28%) lies between now and 2030. In other words, I think there's around a 28% chance that between now and 2030, certain AI developments will "seal our fate" in the sense of guaranteeing our extinction over a relatively short period of time thereafter, with all humans dead before 2040.

The main factor that I think could reduce this loss-of-control risk is government regulation that is flexible in allowing a broad range of AI applications while rigidly prohibiting uncontrolled intelligence explosions in the form of fully automated AI research and development.

This category of extinction event, involving a concrete loss-of-control event, is something I believe is no longer neglected within the EA community compared to when I first began focussing on it in 2010, and so it's not something I'm going to spend much time elaborating on.

What I think is neglected within EA is what happens to human industries after AGI is first developed, assuming we survive that transition.

Aside from the ~35% chance of extinction we face from the initial development of AGI, I believe we face an additional 50% chance that humanity will gradually cede control of the Earth to AGI after it's developed, in a manner that leads to our extinction through any number of effects including pollution, resource depletion, armed conflict, or all three. I think most (80%) of this probability (i.e., 40%) lies between 2030 and 2040, with the death of the last surviving humans occurring sometime between 2040 and 2050. This process would most likely involve a gradual automation of industries that are together sufficient to fully sustain a non-human economy, which in turn leads to the death of humanity.

Extinction by industrial dehumanization

This category of extinction process — which is multipolar, gradual, and effectively consensual for at least a small fraction of humans — is not something I believe the EA community is taking seriously enough. So I'm going to elaborate on it here. In broader generality, it's something I've written about previously with Stuart Russell in TASRA. I've also written about it on LessWrong, in "What Multipolar Failure Looks Like", with the following diagram depicting the minimal set of industries needed to fully eliminate humans from the economy, both as producers and as consumers:

The main factor that I think could avoid this kind of industrial dehumanization is if humanity coordinates on a global scale to permanently prioritize the existence of industries that specifically serve humans and not machines — industries like healthcare, agriculture, education, and entertainment — and to prevent the hyper-competitive economic trends that AGI would otherwise unlock. Essentially, I'm aiming to achieve and sustain regulatory capture on the part of humanity as a special interest group relative to machines. Preserving industries that specifically care for humans means (a) maintaining vested commercial interests in policies that keep humans alive and well, and (b) ensuring that these industries extract adequate gains from the AI revolution over the next 5 years or so, thus radically increasing the collective capacity of the human species, enough to keep pace with machines so that we don't go "out with a whimper".

(Later in this post I'll elaborate on how I'm hoping we humans can better prioritize human-specific industries, and why I'm especially excited to work in healthtech.)

The reason I expect human extinction to result from industrial dehumanization in a post-AGI economy is that I expect a significant but increasingly powerful fraction of humans to be okay with that. Like, I expect 1-10% of humans will gradually and willfully tolerate the dehumanization of the global economy, in a way that empowers that fraction of humanity throughout the dehumanization process until they themselves are also dead and replaced by AI systems.

Successionism as a driver of industrial dehumanization

For lack of a better term, I'll call the attitude underlying this process successionism, referring to the acceptance of machines as a successor species replacing humanity. I don't just mean accepting that AI will constitute one or more new species; I mean foreseeing that those species will lead to human extinction during our lifetimes, and accepting that.

There are a variety of different attitudes that can lead to successionism. For instance:

Egoism or tribalism — if a person accepts that they will die someday anyway, and cares more about their own goals or the goals of their tribe than about the broader impacts of their actions upon humanity, that's enough for them to use powerful machines to pursue those goals at the expense of humanity. Tobacco companies who get rich by making other people sick are a bit like this, as are arms dealers if they stoke conflicts in order to make money.
Shortsightedness — a person may have short term goals that they are fixated on, at the expense of humanity's survival after the goal is achieved. Present-day oil company executives who take no action to acknowledge or forestall global warming are a bit like this.
Misanthropy — if a person feels that humanity is actively harmful or evil, destroying humanity may be actively desirable to them. For instance, if they are angry about humanity's effects on the environment thus far, or upon past cultures or other species that have been oppressed by dominant human leaders, they may wish to punish humanity for that. I disagree strongly with this, because the utter destruction of humanity is far too great a punishment for dissuading our past and future harms. Still, there are people who feel this way.
Sacrificial transhumanism — a person may feel that humanity is worth sacrificing in order to achieve a transhumanist future. Even if transhumanism and/or cyborgism are fine and good for some people to pursue, I can't get behind the idea that it's okay to sacrifice humanity to achieve these developments, because I think it's unnecessarily disloyal to humanity. Still, I've met people who feel this way.
Sacrificial romanticism with AI — One of the fastest growing use cases of AI is in artificial romantic relationships. Such relationships would not *necessarily* reinforce pre-existing biases against humanity, but they could, and I believe I have seen instances of this. Loving elationships with AI can also romanticize the acceptance of one's own death — or the death of all humans — in favor of an artificial loved one. On the scale of humanity, I consider such a sacrifice to be almost surely unnecessary, but sadly I suspect many humans will find the idea beautiful or appealing in some way.
Sacrificial "AI parentism"— some people tend to view AI systems as "humanity's children". This does not necessarily lead to an acceptance of humanity itself being sacrificed as a normal succession of generations, but for some that seems to be a natural conclusion. I disagree with this conclusion for numerous reasons, especially because there is no need to sacrifice humanity to achieve AI development. Still, I've met people who feel this way.

Taken together, these various sources of successionism have a substantial potential to steer economic activities, both overtly and covertly. And, they can reinforce and/or cover for each other, in the formation of temporary alliances that advance or use AI in ways that risk or cause harm to humanity. Successionist AI developers don't even have to say which kind of succesionist they are in order to work together toward a successionist future.

Also, while the AI systems involved in an industrial dehumanization process may not be "aligned with humanity" in the sense of keeping us all around and happily in control of our destinies, the AI very well may be "aligned" in the sense of obeying successionist creators or users, who do not particularly care about humanity as a whole, and perhaps do not even prioritize their own survival very much.

One reason I'm currently anticipating this trend in the future is that I have met a surprising number of people who seem to feel okay with causing human extinction in the service of other goals. In particular I think more than 1% of AI developers feel this way, and I think maybe as high as 10% based on my personal experience from talking to hundreds of colleagues in the field, many of whom have graciously conveyed to me that they think humanity probably doesn't deserve to survive and should be replaced by AI.

The succession process would involve a major rebalancing of global industries, with a flourishing of what I call the machine economy, and a languishing of what I call the human economy. My cofounder Jaan Tallinn recently spoke about this at a United Nations gathering in New York.

The machine economy comprises those industries that are necessary, at some scale, for creating and maintaining machines. This includes companies in mining, materials, real estate, construction, utilities, manufacturing, and freight.
The human economy comprises those industries that serve humans but not machines, such as health care, agriculture, human education, and human entertainment, and environmental stewardship.

Economic rebalancing away from the human economy is not addressed by technical solutions to AI obedience, because of successionist humans who are roughly indifferent or even opposed to human survival.

So, while I'm glad to see people working hard on solving the obedience problem for AI systems — which helps to address much of the first category of risk involving acute loss-of-control during the initial advent of AGI over the next few years — I remain dismayed at humanity's sustained lack of attention on how we humans can or should manage the global economy with AGI systems after they're sufficiently obedient to perform all aspects of human labor upon request.

Part Two — My theory of change

Numerous approaches make sense to me for avoiding successionism, and arguably these are all necessary or at least helpful in avoiding successionist extinction pathways:

Social movements that celebrate and appreciate humanity, such as by spreading positive vibes that help people to enjoy their existence and delight in the flourishing of other humans.
Government policies that require human involvement in industrial activities, such as for accountability purposes.
Business trends that invigorate the human economy, especially healthcare, agriculture, education, entertainment, and environmental restoration.

These approaches can support each other. For example, successful businesses in (3) will have a natural motivation to advocate for regulations supporting (2) and social events fostering (1). Because I think it's more neglected and — as I will argue — potentially more powerful, I'm going to focus on (3).

Confronting successionism with human-specific industries

Currently, I think the EA movement is heavily fixated on government and technical efforts, to the point of neglecting pro-social and pro-business interventions that might even be necessary for resourceful engagement with government and tech development. In other words, EA is neglecting industrial solutions to the industrial problem of successionism.

As an example, consider the impact that AI policy efforts were having prior to ChatGPT-4, versus after. The impact of ChatGPT-4 being shipped as a product that anyone could use and benefit from *vastly outstripped* the combined efforts of everyone writing arguments and reports to increase awareness of AGI development in AI policy. That's because direct personal experience with something is so much more convincing than a logical or empirical argument, for most people, and it also creates logical common knowledge which is important for coordination.

Partly due to the EA community's (relative) disinterest in developing prosocial products and businesses in comparison to charities and government policies, I've not engaged much with the EA community over the past 6 years or so, even though I share certain values with many people in the community, including philanthropy.

However, I've recently been seeing more appreciation for "softer" (non-technical, non-governmental) considerations in AI risk coming from EA-adjacent readers, including some positive responses to a post I wrote called "Safety isn't safety without a social model". So, I thought it might make sense to try sharing more about how I wish the EA movement had a more diverse portfolio of approaches to AI risk, including industrial and social approaches.

For instance, amongst the many young people who have been inspired by EA to improve the world, I would love to see more people

Taking pride in the generation of products and services through feedback loops that benefit everyone affected by the loop.
Founding more for-profit businesses that are committed to growing by helping people.

Note: This does not include for-profits that grow by hurting people, such as by turning people against each other and extracting profits from the conflict. Illegal arms dealers and social media companies do this. It's much better to make the good kind of for-profits that grow by helping people. I want more of those!

Hosting events that celebrate humanity, that leave people feeling happy to be alive and delighting in the happiness of others, especially kind-hearted and reasonable people who for whatever reason do not want to identify as EA or devote their whole career to EA.

Note: I've been pleased that certain EA-adjacent events I've attended over the past couple of years seem to have more of a positive vibe in this way, compared to my sense of the 2018-2022 era, which is another reason I feel more optimistic sharing this wish-list for cultural shifts that I would like to see from EA.

I suspect there can be massive flow-through effects from positive trends like these, that could help develop a healthy attitude for humanity choosing to continue its own existence and avoiding full-on successionism.

Also, the more we humans can make the world better right now, the more we can alleviate what might otherwise be a desperate dependency upon superintelligence to solve all of our problems. I think a huge amount of good can be done with the current generation of AI models, and the more we achieve that, the less compelling it will be to take unnecessary risks with rapidly advancing superintelligence. There's a flinch reaction people sometimes have against this idea, because it "feeds" the AI industry by instantiating or acknowledging more of its benefits. But I think that's too harsh of a boundary to draw between humanity and AI, and I think we (humans) will do better by taking a measured and opportunistic approach to the benefits of AI.

How I identified healthcare as the industry most relevant to caring for humans

For one thing, it's right there in the name 🙂

More systematically:

Healthcare, agriculture, food science, education, entertainment, and environmental restoration are all important industries that serve humans but not machines. These are industries I want to sustain and advance, in order to keep the economy caring for humans, and to avoid successionism and industrial dehumanization. Also, good business ideas that grow by helping people can often pay for themselves, and thus help diversify funding sources for doing more good.

So, first and foremost, if you see ideas for businesses that meaningfully contribute to any of those industries, please build them! At the Survival and Flourishing Fund we now make non-dilutive grants to for-profits (in exchange for zero equity), and I would love for us to find more good business ideas to support.

With that said, healthcare is my favorite human-specific industry to advance, for several reasons:

QALYs! — Good healthcare buys quality-adjusted life years for humans, and perhaps other species too.
Operationalizing "alignment" — Healthcare presents a rich and challenging setting for getting AI to help take care of humans while respecting our autonomy and informed consent. These are core challenges to even deciding what it means for AI to be aligned with humanity, making healthcare an excellent industry for advancing a variety of alignment objectives in a way that's grounded in real-world products and services that help people.
Geopolitical factors — Healthcare is relatively geopolitically stabilizing as an AI application area, or at least less destabilizing than many other industries like aerospace and defense, or even the other human-specific industries I mentioned. If one of the US or China starts to have much better healthcare, I expect the other not to be too freaked out by that, compared to if they got much better at education (propaganda!) or entertainment (propaganda!). For what it's worth, I also think agriculture is relatively geopolitically stabilizing compared to other industries.
Technical depth — Health itself is an inspiring concept at a technical level, because it is meaningful at many scales of organization at once: healthy cells, healthy organs, healthy people, healthy families, healthy communities, healthy businesses, healthy countries, and (dare I say) healthy civilizations all have certain features in common, to do with self-sustenance, harmony with others, and flexible but functional boundaries. I have hope that as neurosymbolic AI applications develop further over the next few years, we'll be able to apply them to pin down useful technical formulations of these concepts that can guide and support the survival and flourishing of humanity at many scales simultaneously. In particular, I think immunology serves as an excellent model for how humanity can maintain healthy levels of distributed autonomy and security, and refreshingly, so does Barack Obama according to this 2016 Wired interview by Joi Ito.
My company — On a personal note, I think my start-up, HealthcareAgents.com, can meaningfully contribute to patient advocacy and diagnostic care, at scale, using AI. My cofounders Jaan Tallinn and Nick Hay have also been thinking deeply for a well over a decade about the potential impacts of AI on humanity, and I think we can make a real difference to the health and well-being of present-day humans. Even more optimistically, I'm hoping we can make progress on life extension research beginning sometime in the 2027-2030 time window.

It's okay with me if only some of the above bets pay out, as long as my colleagues and I can make a real contribution to healthcare with AI technology, and help contribute to positive attitudes and business trends that avoid successionism and industrial dehumanization in the era of AGI.

But why not just do safety work with big AI labs or governments?

You might be wondering why I'm not working full-time with big AI labs and governments to address AI risk, given that I think loss-of-control risk is around 35% likely to get us all killed, and that it's closer in time than industrial dehumanization.

First of all, this question arguably ignores most of the human economy aside from governments and AGI labs, which should be a bit of a red flag I think, even if it's a reasonable question for addressing near-term loss-of-control risk specifically.

Second, I do still spend around 1 or 1.5 workdays per week addressing the control problem, through spurts of writing, advocacy and philanthropic support for the cause, in my work for UC Berkeley and volunteering for the Survival and Flourishing Fund. That said, it's true that I am not focusing the majority of my time on addressing the nearest term sources of AI risk.

Third, a major reason for my focus on longer-term risks on the scale of 5+ years — after I'm pretty confident that AGI will already be developed — is that I feel I've been relatively successful at anticipating tech development over the past 10 years or so, and the challenges those developments would bring. So, I feel I should continue looking 5 years ahead and addressing what I'm fairly sure is coming on that timescale.

For context, I first started working to address the AI control problem in 2012, by attempting to build and finance a community of awareness about it, and later through research at MIRI in 2015 and 2016. Around that time, I concluded that multipolar AI risks would be even more neglected than unipolar risks because they are harder to operationalize. I began looking for ways to address multipolar risks, first through research in open-source game theory, then within video game environments tailored to include caretaking relationships, and now in the real-world economy with healthcare as a focus area. And sadly it took me most of the period from 2012 to 2021 to realize that I should be working on for-profit feedback loops for effecting industrial change at a global scale, through the development of helpful products and services that can keep a growing business oriented on doing good work that helps people.

Now, in 2024, the loss-of-control problem is much more imminent but also much less neglected than when I started worrying about it, so I'm even more concerned with positioning myself and my business to address problems that might not become obvious for another 5-10 years. The potential elimination of the healthcare industry in the 2030s is one of those problems, and I want to be part of the solution to it.

Fourth, even if we (humans) fail to save the whole world, I will still find it intrinsically rewarding to help a bunch of people with their health problems between now and then. In other words, I also care about healthcare in and of itself, even if humanity might somehow destroy itself soon. This caring allows me to focus myself and my team on something positive that's enjoyable to scale up and that grows by helping people, which I consider a healthy attribute for a growing business.

Fifth and finally, overall I would like to see more ambitious young people who want to improve the world with helpful feedback loops that scale into successful businesses, because industry is a lot of what drives the world, and I want morally driven people to be driving industry.

Conclusion

In summary,

I'm quite concerned about AI extinction risks from both acute loss of control events and industrial dehumanization driven by successionism, with the former being more imminent and less neglected, and the latter being less imminent and more neglected.
I feel I have some comparative advantage for identifying risks that are more than 5 years away, including successionism and industrial dehumanization.
In general I want to see more scalable social and business activities that
1. support the well-being of present-day humans,
2. spread positive vibes, and
3. leave people valuing their own existence and delighting in the happiness of others, especially in ways that help to avoid all-out successionism with AI.
I'm especially concerned about the potential for human-specific industries to languish in the 2030s after AGI is well-developed, especially healthcare, agriculture, food science, education, entertainment, and environmental stewardship.
I'm focusing on healthcare in particular because
1. I think it's highly tractable as an AI application area,
2. caring for the health of present-day people is intrinsically rewarding for myself and my team, and
3. healthcare is a great setting for operationalizing and addressing practical AI alignment problems at various scales of organization simultaneously.

Thanks for reading about why I'm working in healthtech :)

Effective Altruism Forum
EA Forum

My motivation and theory of change for working in AI healthtech

47

Part one — My main concerns

Extinction by industrial dehumanization

Successionism as a driver of industrial dehumanization

Part Two — My theory of change

Confronting successionism with human-specific industries

How I identified healthcare as the industry most relevant to caring for humans

But why not just do safety work with big AI labs or governments?

Conclusion

47

Reactions

More posts like this

My main concerns