Hide table of contents

Confidence level: Medium. This post reflects a mix of my own takes, things I’ve read, and conversations with others (especially at ConCon and elsewhere). I’m not claiming high confidence in any particular conclusion. 

See here for other work on digital minds cause prioritization

Background

A few months ago, I attended ConCon, a conference on AI consciousness and welfare run by Eleos AI. It was excellent: good vibes, thoughtful people, and many conversations I found clarifying. One of my main reasons for going was to understand how others are thinking about short-term welfare strategies for digital minds, and how those strategies fit into (cross- and intra-) cause prioritization.

After the conference—and after talking with people working in this space—I felt both better informed and more skeptical. I went in with fairly strong priors that digital minds might be a top-tier cause area under cause prioritization, but I came out thinking that, while the stakes could be enormous, our current levers may be weaker than I’d initially thought.

I was also a bit surprised when I started to get the vibe that people seemed much less into the cause-prioritization questions than I was. Because of that, I wanted to write a post on some of the ideas/cruxes/arguments that I've been thinking/arguing about and reading what other people have to say about them. 

This post tries to organize the strategic landscape:

  • Why digital minds might matter enormously
  • What our actual levers are today
  • How different theories of change affect priorities
  • What the key cruxes are for ranking this area against alternatives like AI safety or animal welfare (especially insects)
  • Questions for further research
  • Places to learn more and get involved

Before getting into the substance, brief methodological notes:

Doing fully quantitative cause prioritization here is very difficult—and, in my view, likely to be fairly uninformative at this stage—so I will focus primarily on qualitative arguments, introducing numbers only where they seem reasonable or helpful. This is not to say that more quantitative work shouldn’t be attempted.

Second, there’s simply too much ground to cover, so I’m prioritizing breadth over depth: I’ll lay out a wide set of questions and cruxes rather than going deep on a few. I expect many of these points could (and should) be analyzed much more thoroughly on their own.

Why Care About Digital Minds at All?

Before doing cause prioritization, it’s worth briefly motivating why people take digital minds seriously in the first place. There are many arguments here; I’ll keep this short and schematic.

  • Conscious AI seems possible in principle.
    Many philosophical views of consciousness (including many kinds of functionalismcertain kinds of dualism, among others) allow for the possibility of digital minds. Many philosophers and domain experts think digital minds are possible in principle, and some even think they could arrive relatively soon.
  • If conscious, digital minds plausibly deserve moral consideration (potentially in the near term future).
    Many moral theories imply that the capacity for consciousness confers moral patiency. Some argue that AIs could warrant moral consideration without consciousness.
  • There could be vast numbers of digital minds.
    Depending on where moral patienthood “resides” (e.g. psychologically continuous entities associated with models, hardware instances, etc), how expensive digital minds are to run, and how AI systems scale, the number of future digital minds could be enormous—possibly dwarfing biological minds.
  • Some views imply that most future moral weight lies in digital minds.
    On these views, the astronomical importance often attributed to reducing existential risk partly hinges on ensuring that future digital minds exist and have good welfare. There is also the possibility of super-beneficiaries (digital minds that can generate welfare with super-human efficiency or digital minds that can have welfare goods/bads that are much better/worse than any biological minds can undergo).
  • The field is extremely neglected.
    Relative to AI safety, global health, or animal welfare, very few people are working seriously on digital minds.

Taken together, this gives strong prima facie reason to investigate digital minds further. But cause prioritization requires more than plausibility and scale; we need to examine tractability, timing, and counterfactual impact.

Theories of Change

I find it useful to distinguish two broad theories of change (ToCs), which carve up much of the space for thinking about current interventions.

1. Influencing Near-Term Digital Minds (Short-Term ToC)

This looks promising if you think either:

  • (a) We will have much welfare capacity in digital minds relatively soon, constituting a substantial fraction of total influenceable moral weight; or
  • (b) Near-term digital minds don’t dominate moral value, but their welfare is especially tractable to influence (for instance, preventing the creation of digital minds in the first place).

The plausibility of this ToC depends on AI timelines, takeoff speed, and “altitude” (i.e. whether early digital minds are already numerous or welfare-significant), both in terms of how these parameters turn out by default and how tractable they are to influence.

2. Influencing Far-Future Digital Minds (Future ToC)

On this view, the main action is in shaping very long-run outcomes, where digital minds may vastly outnumber biological ones.

The main worries here are feasibility, tractability, and counterfactual impact. Further, if future treatment of digital minds depends largely on future institutions, technologies, or aligned AI systems, it’s unclear how much work now can robustly influence outcomes centuries down the line.

Quick opinionated hot take: While many have claimed that the future will contain vast numbers of digital minds, the argument for this claim—though superficially intuitive and often gestured at—has been underdeveloped. I think it needs substantially more work to be forceful.

Most interventions plausibly affect both ToCs to some extent, but their relative value can vary dramatically depending on which ToC you prioritize (i.e. ensuring that we have rigorous welfare evals now seems good on short term ToC and less clearly good on future ToC).

Types of Interventions

This is not exhaustive or mutually exclusive, but it helps organize the space:

How These Interventions Might Work (and Why They Might Not)

Rather than evaluating each intervention in isolation, I’ll focus on how they connect to short- and long-term ToCs, along with the major critiques—ordered by my judgment of their plausibility and importance (which should be taken with a grain of salt). 

Foundational Research

Arguments for:

  • Even a small chance of discovering something uniquely important could have enormous expected value, given the potential scale of digital minds.
  • The field is young; there may be low-hanging fruit that may not have been directly useful to related fields that already exist (consciousness science, AI research, etc).

Critiques:

  • For the far future, truly fundamental insights, if they are discovered/discoverable now, are likely to be rediscovered later—perhaps by future AIs—with higher probability and better tools.
  • For the near term, these problems may simply be too hard to resolve in time to matter.
  • There is also the possibility that we make progress on theoretical questions, but find it too difficult to put that progress into practice—either because there is no clear implementation path, or because the incentives against implementation are too strong.

Near-Term Research

Arguments for:

  • Early norms and epistemic standards can shape later industry practice.
  • There are plausible reference classes where early attention mattered (i.e. regulatory path dependence).
    • A related intuition pump I like: animal welfare likely would have gone better had there been more people paying attention at the beginning of large-scale factory farming.
  • Placing thoughtful people in key positions at labs could matter later.
  • Public and policymaker attitudes may be influenced by the existence of serious research/researchers early on.
  • In worlds where digital minds appear before advanced AI can “solve” the problem, near-term work may be crucial. Even in worlds where advanced AI can solve it, we would still need to trust its solution—which may require having done substantial work toward a solution ourselves.

Critiques:

  • Near-term digital minds may represent a much smaller fraction of total moral weight.
    • And as we look further out, early research might matter less: outcomes are harder to predict, more actors are involved, and the problem becomes less neglected. In worlds where digital minds dominate moral weight, outcomes may be driven by default dynamics or higher-leverage interventions. Given alternative ways to influence the far future, this weakens the case for prioritizing near-term digital-minds–specific work.
  • If AGI arrives first (and is at least somewhat aligned), it may do this research faster and better, reducing counterfactual research value now.
  • There could be poor generalization from early systems to later ones on questions of moral weight, making research on earlier systems less robust.

Communications, Lab Policy, and Governance

Arguments for:

  • It’s often easier to establish norms before issues become politically charged or subject to motivated reasoning—for example, before digital minds become mainstream and polarizing, or before AI becomes broadly transformative.
  • Early commitments may have first-mover advantages and partial lock-in, which can have downstream effects on short-term ToC and (more speculatively) future ToC.
    • As Andreas Mogensen notes, factory farming might persist largely because it became entrenched before serious moral reflection; had it been proposed de novo, it would probably have been rejected. This suggests that early norms around AI could become similarly locked in—even if future reflection (even with capable AI moral reasoners doing the reasoning) would judge them morally catastrophic.

Critiques:

  • Public attitudes may shift dramatically (in either way) once AI causes large economic/social disruption and/or becomes much more powerful/human-like in ability.
    • Intuition pump: people will say whatever about digital minds until AI comes to take their jobs.
  • Under a less charitable view of moral progress, moral concern often collapses when real costs are introduced (think animal welfare). So, the state of the field will just depend very little on what people say/on the research and much more on what the basic incentives are.
    • On the flip side, this means places to change the default incentives (for instance, the incentives of frontier labs), if possible, could be very good.
  • Early advocacy risks politicization, backlash, or low-quality discourse, which could be extremely net-negative.

Strategy

Arguments for:

  • This could plausibly be among the most important strategic questions humans face; understanding it better seems valuable.
  • There has been very little strategic work so far.
  • Finding cheap, yet high value strategies seems extremely good. 

Critiques:

  • We may lack sufficient information to do good strategy work now.
  • Other areas (i.e. existential risk reduction) may have stronger temporal advantages for strategy (i.e. because they specifically matter now).

I think field-building largely just inherits the strengths and weaknesses of whatever part of this all it’s supporting.

Approaching Prioritization via Importance–Tractability–Neglectedness (ITN)

When thinking about cause prioritization, the ideal would be to have concrete numbers for scale, tractability, and neglectedness—plug them into a spreadsheet, and get a clean estimate of how much work in digital minds matters on the margin. Unfortunately, given how early this field is, that just isn’t possible yet.

Still, uncertainty is not a license to throw up our hands and defer entirely to gut instinct. Even in the absence of robust quantification, we can track and evaluate the main qualitative arguments, while being explicit about where uncertainty and disagreement remain. Where possible, we can also sprinkle in rough numbers when they are informative.

  • AI Safety (AIS):
    If digital minds matter in the near term, that’s because we expect systems with substantial cognitive sophistication/autonomy/other stuff to arrive soon. But systems at that level are also plausibly transformative, which comes with many big risks. In that case, reducing catastrophic and existential risk becomes a prerequisite for any positive digital-minds future.
  • On the other hand, much of the astronomical value often attributed to AIS plausibly relies—implicitly or explicitly—on assumptions about future digital populations having morally significant lives.
    • Relative Scale:
      • For AIS, the effect of existential risk will be approximately something like (number of expected future levels) / (absolute risk reduction percentage). This is currently too uncertain to responsibly quantify. However, we can say some other things:
        • The long term scale claims around digital minds crucially depend on extremely uncertain parameters: future population sizes, the moral status of digital agents, and how much present-day interventions affect far-future trajectories. While some estimates suggest a far-future total population on the order of 10^54–58 individuals (mostly digital), these numbers are extremely sensitive to speculative assumptions and become rapidly intractable to influence with confidence.
        • For short term digital minds, it's gonna really depend on the probability we get digital minds given the current status of AI architecture, how many/how soon, and which ToC you're going for. To give a sense of what experts think (although I should note that there is, in my view, likely some self-selection here, as these are experts who are in the field, meaning that they could more likely think that this is worth thinking about):
    • Relative Tractability:
      • Compared to standard AI safety work, interventions aimed at digital minds are likely significantly harder to justify and execute under deep uncertainty.
        • This is because we lack reliable/robust indicators of consciousness and have a poor understanding of what welfare states models might be experiencing (even on particular views of welfare and/or consciousness), which makes it much harder to identify interventions that actually give us leverage over whether digital minds go well.
    • Relative Neglectedness:
      • Digital minds are extremely neglected: likely fewer than 50 full-time workers and funding is probably in the low millions. AIS currently probably has a few thousand full-time researchers and a few hundred million in funding.
      • It should be noted that, according to some world models, there is very little that can be done about AI Safety in the future (because many think that the chance of existential risk is likely to be highest now and then decrease), so AI Safety gets extra points for potential temporal neglectedness. While this is true under some circumstances for digital minds (i.e. early mover-effects, preparing for x-risk scenarios, etc), it seems much more unclear.
  • Animal (and insect) Welfare:
    • Broadly, AW has high tractability, enormous current scale, and stronger evidence of sentience—at least for now, since future experiments or engineering relevant to digital minds could change this. Also, there may be fewer people working on insect welfare than on digital minds. Relative priority, then, depends heavily on assumptions about future welfare capacity, timelines, and leverage over future welfare.
    • While insect welfare is not automatically tied to AI welfare, caring about AI welfare is probably tied to taking lower probabilities of consciousness seriously, making it useful for comparison.
    • Relative Scale:
      • There are about 90-100 billion factory farmed (invertebrate land) animals that are killed each year.  There are about 100-170 billion (vertebrate) fish killed every year. There are around 450 billion (invertebrate) shrimp that are slaughtered every year.
      • For insects, there's some early (uncertain) evidence that a massive uptick in future farmed insects (mealworms + black soldier flies) is starting now-ish (as the industry for farmed insects is now building its shape) -- a projection from Rethink Priorities sees it going from between ~50 billion and ~2 trillion today to 1.8 trillion to 17.3 trillion by 2033. If you believe that there are large first mover effects, this could be true for insects as well as digital minds (and it is unclear how TAI coming soon bears on this question).
    • Relative Tractability:
      • In AW, we can be more confident that certain interventions will actually have some effects, but it gets a bit more dodgy for insects (it's just too early to say which, if any, interventions work out here).
      • In a 2019 report from Rethink Priorities (though it could be very different now for various reasons), Saulius Simcikas found that $1 spent on corporate campaigns 9-120 years of chicken lives could be affected (excluding indirect effects which could be very important too).
      • We may have greater leverage to engineer digital minds to have candidate morally significant states.
    • Relative Neglectedness:
      • While there are currently more full time people in AW than digital minds, there are probably a similar number of people in insect welfare -- someone gave me a number of 10-15 full-timers, but I don't know how accurate this is.
      • To the extent that the field of digital minds has fewer people working on it, my (and others’) impression is that it’s on an upward trend in both attention and funding.
        • This is, I think, partly because digital minds, as a field, has—or is likely to gain—more status and perceived coolness, which could drive a much larger increase in attention than we see for insect welfare (though the quality of these researchers is more uncertain). By contrast, I (unfortunately) don’t see insect welfare gaining comparable status anytime soon.

Of course, when making career/related decisions, personal fit should play an important role in all of these considerations and could be the greatest determinant of impact. 

Key Cruxes and Open Questions

Below is a non-exhaustive list, roughly ordered by perceived importance (from myself + some outside view). I’m not an expert; treat this as a map of uncertainties, not a verdict.

  1. What is the default welfare of AI systems, and how much leverage do we have to change it?
  2. How much potential is there for attitude path-dependent (especially for the long term future)?
    1. How useful might our historical reference classes here be and are they worth further study?
  3. How unstable are pre-TAI public attitudes toward AI welfare?
    1. Especially with lots of public perception on AI more broadly changing (from job automation to capabilities/human-likeness increasing).
  4. What are the temporal advantages of AI welfare work compared to AI safety?
  5. How much welfare total capacity might digital minds have relative to humans/other animals?
    1. Related questions include: the estimated scale of digital minds, moral weights-esque projects, which part of the model would have moral weight.
  6. How likely are different digital-mind takeoff scenarios?
  7. How much does any of this (but especially research) matter if aligned AGI arrives first? What are the chances that AGI happens first?
  8. Under what conditions do early public opinion/policy-maker beliefs matter vs not matter?
  9. How much (if at all) should we care about AIs without consciousness or welfare?
  10. Are there plausible lock-in scenarios for AI welfare?
  11. How, if at all, should we think about AI consciousness in preparation for a world where humans go extinct from AI?
  12. Are we going to get a GPT-3 moment or a “warning shot" for digital minds certainty or public perception? If not, does this mean we will (more or less) stay at our current uncertainty levels?
  13. What are the relative levels of importance in over-/under-attributions risks?
  14. How robust are interventions to tensions between AI safety and AI welfare?
  15. How robust are short-term vs future ToCs?
  16. How much should we care about harm reduction vs ensuring that (some) AIs have  preferences that are really easy to satisfy?
  17. To what extent is the size of the digital minds field likely to affect what interventions it can effectively pursue?
  18. How tightly should digital minds and animal welfare be coupled?
  19. Arguably, lots of historical moral circle expansion could be explained with an analysis of costs and benefits (political, signalling, economic, etc) rather than philosophical beliefs and advocacy. To what extent will this be true here? If a lot, what is there to do about it?
  20. How should models communicate uncertainty about their own consciousness?
    1. Is the Anthropic model good (where a model says  “I don’t know if I’m conscious") or are there better alternatives (i.e. a 4th wall break where the model says that it;s not supposed to respond because of Anthropic’s/expert uncertainty)?
  21. Are there robust public/other communication strategies that minimize backlash? How much can we say about communication strategies that different actors are more or less receptive to?
  22. How many people are actually working in this area, and how fast is it growing (there were ~150 people at the Eleos AI conference but much fewer FTEs)?
  23. How should experts respond to concerns about AI psychosis in relation to AI consciousness? 

Learning More and Getting Involved

Thanks to Brad Saad and Štěpán Los for providing useful comments and thank you to ChatGPT for helping rewrite parts of this and for some stylistic tweaks.

26

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities