Hide table of contents

Epistemic Status

Unsure[1], partially noticing my own confusion. Hoping Cunningham's Law can help resolve it. 

 

Related Answer


Confusions About Arguments From Expected Utility Maximisation

Some MIRI people (e.g. Rob Bensinger) still highlight EU maximisers as the paradigm case for existentially dangerous AI systems. I'm confused by this for a few reasons:

  1. Not all consequentialist/goal directed systems are expected utility maximisers
    • E.g. humans
  2. Some recent developments make me sceptical that VNM expected utility are a natural form of generally intelligent systems
    1. Wentworth's subagents provide a model for inexploitable agents that don't maximise a simple unitary utility function
      1. The main requirement for subagents to be a better model than unitary agents is path dependent preferences or hidden state variables
      2. Alternatively, subagents natively admit partial orders over preferences
        1. If I'm not mistaken, utility functions seem to require a (static) total order over preferences
          1. This might be a very unreasonable ask; it does not seem to describe humans, animals, or even existing sophisticated AI systems
      3. I think the strongest implication of Wentworth's subagents is that expected utility maximisation is not the limit or idealised form of agency
    2. Shard Theory suggests that trained agents (via reinforcement learning[2]) form value "shards"
      1. Values are inherently "contextual influences on decision making"
        1. Hence agents do not have a static total order over preferences (what a utility function implies) as what preferences are active depends on the context
          1. Preferences are dynamic (change over time), and the ordering of them is not necessarily total
        2. This explains many of the observed inconsistencies in human decision making
      2. A multitude of value shards do not admit analysis as a simple unitary utility function
      3. Reward is not the optimisation target
        1. Reinforcement learning does not select for reward maximising agents in general
          1. Reward "upweight certain kinds of actions in certain kinds of situations, and therefore reward chisels cognitive grooves into agents"
      4. I'm thus very sceptical that systems optimised via reinforcement learning to be capable in a wide variety of domains/tasks converge towards maximising a simple expected utility function
  3. I am not aware that humanity actually knows training paradigms that select for expected utility maximisers
    1. Our most capable/economically transformative AI systems are not agents and are definitely not expected utility maximisers
      1. Such systems might converge towards general intelligence under sufficiently strong selection pressure but do not become expected utility maximisers in the limit
        1. The do not become agents in the limit and expected utility maximisation is a particular kind of agency
  4. I am seriously entertaining the hypothesis that expected utility maximisation is anti-natural to selection for general intelligence
    1. I'm not under the impression that systems optimised by stochastic gradient descent to be generally capable optimisers converge towards expected utility maximisers
    2. The generally capable optimisers produced by evolution aren't expected utility maximisers
    3. I'm starting to suspect that "search like" optimisation processes for general intelligence do not in general converge towards expected utility maximisers
      1. I.e. it may end up being the case that the only way to create a generally capable expected utility maximiser is to explicitly design one
        1. And we do not know how to design capable optimisers for rich environments
        2. We can't even design an image classifier
      2. I currently disbelieve the strong orthogonality thesis translated to practice
        1. While it may be in theory feasible to design systems at any intelligence level with any final goal
        2. In practice, we cannot design capable optimisers. 
        3. For intelligent systems created by "search like" optimisation, final goals are not orthogonal to cognitive ability
          1. Sufficiently hard optimisation for most cognitive tasks would not converge towards selecting for generally capable systems
            1. In the limit, what do systems selected for playing Go converge towards? 
              1. I posit that said limit is not "general intelligence"
          2. The cognitive tasks/domain on which a system was optimised for performance on may instantiate an upper bound on the general capabilities of the system
            1. You do not need much optimisation power to attain optimal performance in logical tic tac toe
              1. Systems selected for performance at logical tic tac toe should be pretty weak narrow optimisers because that's all that's required for optimality in that domain

 

I don't expect the systems that matter (in the par human or strongly superhuman regime) to be expected utility maximisers. I think arguments for AI x-risk that rest on expected utility maximisers are mostly disconnected from reality. I suspect that discussing the perils of expected utility maximisation in particular — as opposed to e.g. dangers from powerful (consequentialist?) optimisation processes — is somewhere between being a distraction and being actively harmful[3].

I do not think expected utility maximisation is the limit of what generally capable optimisers look like[4].


Arguments for Expected Utility Maximisation Are Unnecessary

I don't think the case for existential risks from AI safety rest on expected utility maximisation. I kind of stopped alieving expected utility maximisers a while back (only recently have I synthesised explicit beliefs that reject it), but I still plan on working on AI existential safety, because I don't see the core threat as resulting from expected utility maximisation.

The reasons I consider AI an existential threat mostly rely on:

  • Instrumental convergence for consequentialist/goal directed systems
    • A system doesn't need to be a utility maximiser for a simple utility function to be goal directed (again, see humans)
  • Selection pressures for power seeking systems
    • Reasons
      • More economically productive/useful
      • Some humans are power seeking
      • Power seeking systems promote themselves/have better reproductive fitness
    • Human disempowerment is the immediate existential catastrophe scenario I foresee from power seeking
  • Bad game theoretic equilibria
    • This could lead towards dystopian scenarios in multipolar outcomes
  • Humans getting outcompeted by AI systems
    • Could slowly lead to an extinction

 

I do not actually expect extinction near term, but it's not the only "existential catastrophe":

  • Human disempowerment
  • Various forms of dystopia
  1. ^

    I optimised for writing this quickly. So my language may be stronger/more confident that I actually feel. I may not have spent as much time accurately communicating my uncertainty as may have been warranted.

  2. ^

    Correct me if I'm mistaken, but I'm under the impression that RL is the main training paradigm we have that selects for agents.

    I don't necessarily expect that our most capable systems would be trained via reinforcement learning, but I think our most agentic systems would be.

  3. ^

    There may be significant opportunity cost via diverting attention from other more plausible pathways to doom.

    In general, I think exposing people to bad arguments for a position is a poor persuasive strategy as people who dismiss said bad arguments may (rationally) update downwards on the credibility of the position.

  1. ^

    I don't necessarily think agents are that limit either. But as "Why Subagents?" shows, expected utility maximisers aren't the limit of idealised agency.

Show all footnotes
Comments1


Sorted by Click to highlight new comments since:

You're not the first person to notice this issue, and they didn't get a satisfying answer either, in my opinion. It seems like a holdover from early days of AI theorizing before we understood the power of machine learning/ evolutionary algorithm techniques. I personally find it highly unlikely that we'll end up with single minded consequentialist goal function maximisers, it seems like a difficult thing to program with machine learning techniques and one that would be unstable even if you could build it. 

Curated and popular this week
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
 ·  · 2m read
 · 
For immediate release: April 1, 2025 OXFORD, UK — The Centre for Effective Altruism (CEA) announced today that it will no longer identify as an "Effective Altruism" organization.  "After careful consideration, we've determined that the most effective way to have a positive impact is to deny any association with Effective Altruism," said a CEA spokesperson. "Our mission remains unchanged: to use reason and evidence to do the most good. Which coincidentally was the definition of EA." The announcement mirrors a pattern of other organizations that have grown with EA support and frameworks and eventually distanced themselves from EA. CEA's statement clarified that it will continue to use the same methodologies, maintain the same team, and pursue identical goals. "We've found that not being associated with the movement we have spent years building gives us more flexibility to do exactly what we were already doing, just with better PR," the spokesperson explained. "It's like keeping all the benefits of a community while refusing to contribute to its future development or taking responsibility for its challenges. Win-win!" In a related announcement, CEA revealed plans to rename its annual EA Global conference to "Coincidental Gathering of Like-Minded Individuals Who Mysteriously All Know Each Other But Definitely Aren't Part of Any Specific Movement Conference 2025." When asked about concerns that this trend might be pulling up the ladder for future projects that also might benefit from the infrastructure of the effective altruist community, the spokesperson adjusted their "I Heart Consequentialism" tie and replied, "Future projects? I'm sorry, but focusing on long-term movement building would be very EA of us, and as we've clearly established, we're not that anymore." Industry analysts predict that by 2026, the only entities still identifying as "EA" will be three post-rationalist bloggers, a Discord server full of undergraduate philosophy majors, and one person at