Hide table of contents

If the ideas in “moral ~realism[1] and/or “beyond Maxipok[2] are broadly correct, this has implications for AI strategy. (Like those posts, this is largely getting some old imperfect ideas out of the door with a little polishing; again the ideas owe credit to conversations and comments from several people.)

I want to draw out three threads:

  1. It may be important to try to get AI that can help humanity to enter into the basin of good reflective governance
    • This makes applications like facilitating wisdom, cooperation, and reflective processes a relatively higher strategic priority for AI systems
  2. A plausible target for AI alignment could be to align with the good (i.e. get the AI systems into the basin of good reflective governance) rather than aligning with humans
    • In practice this is scary as a possibility, because it might not fail gracefully
    • However, it’s prima facie plausible that it’s easier to build a system which aligns with the good rather than aligning with humans. If this is the case, we might prefer not to cut off the possibility of good outcomes by chasing after the too hard task
  3. Even if we don’t reach enough confidence in that approach to prefer trying to align systems with the good as a primary goal, we might still want to use it to give a backup saving throw on AI alignment
    • Roughly: “Give your superintelligent servants the rudiments of a moral education so that if they do in some weird plot twist end up in control, there’s still a chance they work out the right thing to do”
      • There are a lot of details to be worked out here, and maybe it ends up wildly impractical; its appeal is because it seems like a fairly orthogonal line of attack from keeping systems corrigible
    • This should be more appealing the harder you think aligning with humans is
    • Honestly this still feels pretty speculative (moreso than 1 or 2), but at this point I've been sitting on the idea for a couple of years without either having persuaded myself that it's a good idea or that it isn't, so I'll just share it as-is

Helping humans coordinate/reflect

Getting to the basin of good reflective governance may be tricky (humanity doesn’t seem to have managed it to date, at least in anything like a robust way). It’s possible that AI capabilities coming online could help this, in strategically important ways.

Strategically desirable capabilities

My top choices for important capabilities to develop from this perspective are:

  1. Automated negotiation
    • Learning the principal’s preferences and then handling negotiations with others (initially low-stakes ones?) could significantly increase coordination bandwidth
  2. Automation of identifying wise actions, and/or automation of philosophical progress (for which we may need better metaphilosophy)
    • If we can improve the thinking of individual human actors, we could lower the barriers for entering into the basin of good reflective governance
  3. Highly truthful AI
    • Very trustworthy systems could serve as reliable witnesses without leaking confidential information, thereby increasing trust and ability to coordinate and make commitments
  4. Personal AI assistants
    • Summarizing and helping people to digest complex information (improving their models of the world)
    • Eventually could also implement the “automated negotiation” and “high trustworthy witness” capabilities deployed on a personal level, but I guess that the early uses for those capabilities won’t be as personal assistants
    • However, I guess this is the least important of these three to prioritize, in part because I expect market forces to lead to significant investment in it anyway

Taking humans out of the loop slowly not abruptly

Here’s a related perspective on this piece of things — supporting the conclusion that AI applications to support coordination/cooperation should be a priority.

Currently large actors in the world are not super coherent. I think AI will lead to more coherent large actors. There are multiple ways we could get there. A single controlling intelligence could be a superintelligent AI system (perhaps aligned with some human principal, or perhaps not). Or we could get better at getting coherent action from systems with many people and no single controlling intelligence.

Paths which go through empowering many people to make them smarter and better at coordinating seem safer to me. (Perhaps longer-term they coordinate and choose to hand off to a single controlling intelligence, but in that case the transition into a basin of reflection has come before that hand-off rather than accidentally when the hand-off occurs, which limits the downside risk.)

I think this remains true at limited scales. Even if we can’t get a grand bargain involving 8 billion people, a solution coordinated across 100,000 people is better than a solution coordinated across 100 people.

Convergent morality as a solution to alignment?

If the straightforward case for convergent morality goes through, then we’d get good outcomes from putting the future in the hands of any agent or collective with a good-enough starting point (i.e. in the basin of good reflective governance). I think that the key thing you’d need to instill would be the skill of moral reflection, and then to embed this as a core value of the system. I don’t know how one would do this … or rather, I guess it might be relatively straightforward to train systems to do moral reflection in easy cases, and I don’t know how you could reach anything like confidence that it would stay working correctly as the system scaled to superintelligence. I do have the intuition that moral reflection is in some meaningful sense simpler than values.

You’d also need to provide reasonable starting points for the moral reflection. I think two plausible strategies are:

  1. Just provide pointers to things humans value (in some ways similar to how we go with kids); perhaps when combined with skill at moral reflection this can be used to infer what’s really valued
    • Comes with some risk that we failed to provide pointers to some important components (although we can potentially throw a lot of writing + people talking about what they value at it)
  2. Provide pointers to the idea of “evolved social intelligence”
    • I don’t know how to do this, but it seems like it’s probably a pretty prominent phenomena in the multiverse, so should have some short K-complexity description
    • Comes with some risk that evolved social aliens wouldn’t converge to the same morality as us

Providing backup saving throws

Rather than gamble directly on aligning with the good (which seems pretty shaky to me), can we first try to align things with humans, but have it arranged such that if that fails, we’re not necessarily doomed?

Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings
Ink Drawings

I think in principle, sure. We can build the types of agents that would ultimately engage in ECL or acausal trade with the multiverse (if that turns out to work properly), whose goals if left to themselves would be a product of moral reflection from some reasonable starting point, but whom in practice we ask to be deeply corrigible and take actions aligned with humans, such that their own personal sense of the good doesn’t end up mattering.

In practice there are a number of potential barriers to arranging this:

  • Not knowing how to do it at all
    • Right now this is where we are, although I think we could invest significantly in trying to work that out (and shouldn’t write it off before such investment)
  • Possibility that it makes aligning with humans harder
    • This is the issue that seems most concerning to me
    • The basic issue is that adding motivations could cut against or undermine existing motivations. e.g. perhaps morality motives call for violating corrigibility in some circumstances
      • (Although in some cases a lack of alignment with users may be desirable. We probably don’t want AI systems to help people commit murders. It’s possible that there are some versions of “aligning with the good” that we’d feel straightforwardly happy about adding to systems which are primarily aimed to be aligned with users)
  • Political costliness
    • Presumably there’s something like extra safety tax to be paid to have many/all sufficiently powerful systems have our attempt at a moral education
    • If the tax isn’t exorbitant, perhaps could rally support for it via traditional AI risk narratives — “moral education for AI” is a legible story given pop sci-fi
  1. ^

    i.e. that there's a non-tiny basin we can aim for and expect approximate moral convergence to something we'd be happy with.

  2. ^

    i.e. that a key determinant of how good the future will be is whether we can establish good reflective processes.

Comments1


Sorted by Click to highlight new comments since:

Executive summary: If moral realism and the possibility of good reflective governance are correct, this has implications for AI strategy, including prioritizing AI applications that facilitate wisdom, cooperation, and reflection, and potentially aligning AI with "the good" rather than just with humans.

Key points:

  1. It may be important to try to get AI systems into the basin of good reflective governance to help humanity.
  2. This makes AI applications that facilitate wisdom, cooperation, and reflective processes a relatively higher strategic priority.
  3. Aligning AI with "the good" rather than just with humans is a potential approach, but comes with risks if it fails.
  4. Aligning AI with "the good" may be easier than aligning with humans, so it's worth considering as a possibility.
  5. Even if not the primary goal, aligning AI with "the good" could provide a backup "saving throw" if alignment with humans fails.
  6. Potential barriers to this approach include not knowing how to implement it, the possibility it makes aligning with humans harder, and political costliness.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
jackva
 ·  · 3m read
 · 
 [Edits on March 10th for clarity, two sub-sections added] Watching what is happening in the world -- with lots of renegotiation of institutional norms within Western democracies and a parallel fracturing of the post-WW2 institutional order -- I do think we, as a community, should more seriously question our priors on the relative value of surgical/targeted and broad system-level interventions. Speaking somewhat roughly, with EA as a movement coming of age in an era where democratic institutions and the rule-based international order were not fundamentally questioned, it seems easy to underestimate how much the world is currently changing and how much riskier a world of stronger institutional and democratic backsliding and weakened international norms might be. Of course, working on these issues might be intractable and possibly there's nothing highly effective for EAs to do on the margin given much attention to these issues from society at large. So, I am not here to confidently state we should be working on these issues more. But I do think in a situation of more downside risk with regards to broad system-level changes and significantly more fluidity, it seems at least worth rigorously asking whether we should shift more attention to work that is less surgical (working on specific risks) and more systemic (working on institutional quality, indirect risk factors, etc.). While there have been many posts along those lines over the past months and there are of course some EA organizations working on these issues, it stil appears like a niche focus in the community and none of the major EA and EA-adjacent orgs (including the one I work for, though I am writing this in a personal capacity) seem to have taken it up as a serious focus and I worry it might be due to baked-in assumptions about the relative value of such work that are outdated in a time where the importance of systemic work has changed in the face of greater threat and fluidity. When the world seems to
 ·  · 4m read
 · 
Forethought[1] is a new AI macrostrategy research group cofounded by Max Dalton, Will MacAskill, Tom Davidson, and Amrit Sidhu-Brar. We are trying to figure out how to navigate the (potentially rapid) transition to a world with superintelligent AI systems. We aim to tackle the most important questions we can find, unrestricted by the current Overton window. More details on our website. Why we exist We think that AGI might come soon (say, modal timelines to mostly-automated AI R&D in the next 2-8 years), and might significantly accelerate technological progress, leading to many different challenges. We don’t yet have a good understanding of what this change might look like or how to navigate it. Society is not prepared. Moreover, we want the world to not just avoid catastrophe: we want to reach a really great future. We think about what this might be like (incorporating moral uncertainty), and what we can do, now, to build towards a good future. Like all projects, this started out with a plethora of Google docs. We ran a series of seminars to explore the ideas further, and that cascaded into an organization. This area of work feels to us like the early days of EA: we’re exploring unusual, neglected ideas, and finding research progress surprisingly tractable. And while we start out with (literally) galaxy-brained schemes, they often ground out into fairly specific and concrete ideas about what should happen next. Of course, we’re bringing principles like scope sensitivity, impartiality, etc to our thinking, and we think that these issues urgently need more morally dedicated and thoughtful people working on them. Research Research agendas We are currently pursuing the following perspectives: * Preparing for the intelligence explosion: If AI drives explosive growth there will be an enormous number of challenges we have to face. In addition to misalignment risk and biorisk, this potentially includes: how to govern the development of new weapons of mass destr
 ·  · 2m read
 · 
2024 marked 10 years since we launched Open Philanthropy. We spent our first decade learning (about grantmaking, cause selection, and the history of philanthropy), and growing our team and expertise to be able to effectively deploy billions of dollars from Good Ventures, our main funder. Our early grants — and some grantees we’ve helped get started — are now old enough that we can see material signs of our impact in the world. The start of our second decade also marked a major change in our direction. With Good Ventures approaching the level of spending consistent with its founders’ ambition to spend down in their lifetimes, we finally began to execute at scale on our long-held ambition to support other funders, and found a surprising degree of early success. I expect that our ambition to serve additional partners will guide much of our second decade. A few highlights from the year: * We launched the Lead Exposure Action Fund (LEAF), a >$100 million collaborative fund to reduce lead exposure globally. LEAF marked our first major foray into partnering with other funders beyond Good Ventures, and we’re planning to do a lot more in this vein going forward — more below. * Our longtime grantee David Baker won the Nobel Prize in Chemistry for his groundbreaking work using AI for protein design. We’re proud to have supported both the basic methods development and the potentially high-impact humanitarian applications of his work for ailments like syphilis, hepatitis C, snakebite, and malaria. * Our grantee Open New York played an important role in the recent passage of New York City’s largest zoning overhaul in over 60 years. The city planning department expects the package to create 80,000 new homes over 15 years, making this the first set of major YIMBY reforms to pass in New York City. * Research mentorship programs that we fund continue to produce some of the top technical talent in AI safety and security. Graduates of programs like MATS, the Astra Fellowship, LA