Standard expected utility theory (EUT) assumes moral certainty, but also embeds epistemic/ontological uncertainty about the state of the world that may occur as a result of our actions. Harsanyi expected utility theory (HEUT) allows us to assign probabilities to our potential moral viewpoints, and thus gives us a mechanism by which to handle moral uncertainty.

Unfortunately, there are several problems with EUT and HEUT. First, the St. Petersburg paradox shows that unbounded utility valuations can justify almost any action, even if the probability of a good outcome is almost zero. For example, a banker may be in a situation where the probability of a bank run is nearly one, but because potential returns of being overleveraged in a near zero probability world are so high, the banker may foolishly still choose to be overleveraged to maximize expected utility. Second, diminishing returns typically force us to produce or consume more in order to realize the same amounts of utility; this is usually a recipe for us to consume and produce in unsustainable ways. Third, as Herbert Simon noted, optimizing expected utility is often computationally intractable.  

A response of early effective altruism research to these problems was maxipok (i.e., maximizing probabilities of an okay outcome). Under this construct, constraints of an okay outcome are identified, a probability of satisfying those constraints is assigned to each action, and the action that maximizes the probability of satisfying the constraints is adopted.

The problem with maxipok is that it assumes moral certainty about the constraints of what constitutes an okay outcome. For example, if we believe a trolley problem is inevitable, one might infer it is an okay outcome for someone to die, given its unavoidability. On the other hand, if a trolley problem is avoidable, one may infer that someone dying is not okay. Thus in that overall scenario, what constitutes an okay outcome is contingent on what probabilities we assign to the inevitability of a trolley problem.

Success maximization is a mechanism by which to generalize maxipok for moral uncertainty. Let ai be an action i from the set of actions = {a1, a2, …, am}. Let sx be a definition of moral success, namely x, from S = {s1, s2, …, sn}. The probability π that i satisfies the constraints of sx is 0 ≤ πi(sx) ≤ 1. Let p(sx) be the estimated probability that x is the correct definition of moral success, where p(s1) + p(s2) + … + p(sn) = 1. Thus, the expected success of action i is 0 ≤ πi(s1)p(s1) + πi(s2)p(s2) + … + πi(sn)p(sn) ≤ 1. A success maximizing agent will choose an action aj є A such that πj(s1)p(s1) + πj(s2)p(s2) + … + πj(sn)p(sn) ≥πi(s1)p(s1) + πi(s2)p(s2) + … + πi(sn)p(sn) for all ai є A where ij.  

Success maximization resolves many of the problems of von Neumann-Morgenstern and Harsanyi expected utility theories. First, because success valuations are bounded between 0 and 1, it is much less likely we will encounter St. Petersburg paradox situations where any action is justified by extremely high utility valuations despite near zero probabilities of occurrence. Second, unsustainable behaviors produced by chasing diminishing returns is much less likely in the world of maximizing probabilities of constraint satisfaction than it is in the world of maximizing unbounded expected utilities. Third, because probabilities of success are bounded between zero and one, terms of the linear combination (where p(sx) is relatively low) can often be ignored to make for quicker calculations, making calculations more tractable.  

Comments3


Sorted by Click to highlight new comments since:

If I understand you correctly, what you're proposing is essentially a subset of classical decision theory with bounded utility functions. Recall that, under classical decision theory, we choose our action according to where is a random state of nature and an action space.

Suppose there are (infinitely many works too) moral theories , each with probability and associated utility . Then we can define This step gives us (moral) uncertainty in our utility function.

Then, as far as I understand you, you want to define some component utility functions as As then is the probability of an acceptable outcome under . And since we're taking the expected value of these bounded component utilities to construct , we're in classical bounded utility function land.

That said, I believe that

  1. This post would benefit from a rewrite of the paragraph starting with "Success maximization is a mechanism by which to generalize maxipok". It states " Let be an action from the set of actions . " Is and action, and action, or both? I also don't understand what is. Are there states of nature in this framework? You say that is a moral theory, so it cannot be ?
  2. You should add concrete examples. If you add one or two it might become easier to understand what you're doing despite the formal definition not being 100% clear.

Speaking as a non-expert: This is an interesting idea, but I'm confused as to how seriously I should take it. I'd be curious to hear:

  1. Your epistemic status on this formalism. My guess is you're at "seems like a good cool idea; others should explore this more", but maybe you want to make a stronger statement, in which case I'd want to see...
  2. Examples! Either a) examples of this approach working well, especially handling weird cases that other approaches would fail at. Or, conversely, b) examples of this approach leading to unfortunate edge cases that suggest directions for further work.

I'm also curious if you've thought about the parliamentary  approach to moral uncertainty, as proposed by some FHI folks. I'm guessing there are good reasons they've pushed in that direction rather than more straightforward "maxipok with p(theory is true)", which makes me think (outside-view) that there are probably some snarls one would run into here. 

Inside-view, some possible tangles this model could run into:

  • Some theories care about the morality of actions rather than states. But I guess you can incorporate that into 'states' if the history of your actions is included in the world-state -- it just makes things a bit harder to compute in practice, and means you need to track "which actions I've taken that might be morally meaningful-in-themselves according to some of my moral theories." (Which doesn't sound crazy, actually!)
  •  the obvious one: setting boundaries on "okay" states is non-obvious, and is basically arbitrary for some moral theories. And depending on where the boundaries are set for each theory, theories could increase or decrease in influence on one's actions. How should we think about okayness boundaries? 
    • One potential desideratum is something like "honest baragaining." Imagine each moral theory as an agent that sets its "okayness level" independent of the others, and acts to maximize good from its POV. Then the our formalism should  lead to each agent being incentivized to report its true views.  (I think this is a useful goal in practice, since I often do something like weighing considerations by taking turns inhabiting different moral views). 
      • I think this kind of thinking naturally leads to moral parliament models -- I haven't actually read the relevant FHI work, but I imagine it says a bunch of useful things, e.g. about using some equivalent of quadratic voting between theories. 
    • I think there's an unfortunate tradeoff here, where you either have arbitrary okayness levels or all the complexity of nuanced evaluations. But in practice maybe success maximization could function as the lower level heuristic (or middle level, between easier heuristics and pure act-utilitarianism) of a multi-level utilitarianism approach.
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
 ·  · 8m read
 · 
In my past year as a grantmaker in the global health and wellbeing (GHW) meta space at Open Philanthropy, I've identified some exciting ideas that could fill existing gaps. While these initiatives have significant potential, they require more active development and support to move forward.  The ideas I think could have the highest impact are:  1. Government placements/secondments in key GHW areas (e.g. international development), and 2. Expanded (ultra) high-net-worth ([U]HNW) advising Each of these ideas needs a very specific type of leadership and/or structure. More accessible options I’m excited about — particularly for students or recent graduates — could involve virtual GHW courses or action-focused student groups.  I can’t commit to supporting any particular project based on these ideas ahead of time, because the likelihood of success would heavily depend on details (including the people leading the project). Still, I thought it would be helpful to articulate a few of the ideas I’ve been considering.  I’d love to hear your thoughts, both on these ideas and any other gaps you see in the space! Introduction I’m Mel, a Senior Program Associate at Open Philanthropy, where I lead grantmaking for the Effective Giving and Careers program[1] (you can read more about the program and our current strategy here). Throughout my time in this role, I’ve encountered great ideas, but have also noticed gaps in the space. This post shares a list of projects I’d like to see pursued, and would potentially want to support. These ideas are drawn from existing efforts in other areas (e.g., projects supported by our GCRCB team), suggestions from conversations and materials I’ve engaged with, and my general intuition. They aren’t meant to be a definitive roadmap, but rather a starting point for discussion. At the moment, I don’t have capacity to more actively explore these ideas and find the right founders for related projects. That may change, but for now, I’m interested in