OscarD🔸

1627 karmaJoined Working (0-5 years)Oxford, UK

Comments
258

Yeah I think I agree with all this; I suppose since 'we' have the AI policy/strategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.

Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.

Nice post! I agree moral errors aren't only a worry for moral realists. But they do seem especially concerning for realists, as the moral truth may be very hard to discover, even for superintelligences. For antirealists, the first 100 years of a long reflection may get you most of the way to where your views will converge towards after a billion years of reflecting on your values. But the first 100 years of a long reflection are less guaranteed to get you close to the realist moral truth. So a 100-years-reflection is e.g. 90% likely to avoid massive moral errors for antirealists, but maybe only 40% likely to do so for realists.

--

Often when there are long lists like this I find it useful for my conceptual understanding to try to create some scructure to fit each item into, here is my attempt.

A moral error is making a moral decision that is quite suboptimal. This can happen if:

  • The agent has correct moral views, but makes a failure of judgement/rationality/empirics/decision theory and so chooses badly by their own lights.
  • The agent is adequately rational, but has incorrect views about ethics, namely the mapping from {possible universe trajectories} to {impartial value}. This could take the form of:
    • A mistake in picking out who is a moral patients, {universe trajectory} --> {moral patients}. (animals, digital beings)
    • A mistake in assigning lifetime wellbeing scores to each moral patient {moral patients} -->  {list of lifetime wellbeing}. (theories of wellbeing, happiness vs suffering)
    • A mistake in aggregating correct wellbeing scores over the correct list of moral patients into the overall impartial value of the universe {list of lifetime wellbeings + possibly other relevant facts} --> {impartial value}. (population ethics, diversity, interestingness)

--

Some minor points:

  • I think the fact that people wouldn't take bets involving near-certain death and a 1-in-a-billion chance of a long amazing life is more evidence about people being risk averse than that lifetime wellbeing is bounded above.
  • As currently written, choosing Variety over Homogeneity would only be a small moral error, not a massive one, as epsilon is small.

Great set of posts (including the 'how far' and 'how sudden' related ones). I only skimmed the parts I had read drafts of, but still have a few comments, mostly minor:

1. Accelerating progress framing

We define “accelerating AI progress” as “each increment of capability advancement (e.g. GPT-3 → GPT-4) happens more quickly than the last”.

I am a bit skeptical of this definition, both because it is underspecified, and I'm not sure it is pointing at the most important thing.

  • Underspecified: how many GPT jumps need to be in the 'each quicker than the last' regime? This seems more than just a semantic quibble, as clearly the one-time speedup leads to at least one GPT jump being faster, and the theoretical limits lead to this eventually stopping, but I'm not sure where within this you want to call it 'accelerating'.
  • Framing: Basically, we are trying to condense a whole graph into a few key numbers, so this might be quite loss-y and we need to focus on the variables that are most strategically important, which I think are:
    • Timeline: date that transition period starts
    • Suddenness: time in transition period
    • Plateau height: in effective compute, defining the plateau as when rate of progress drops back below 2025 levels.
    • Plateau date: how long it takes to get there.
    • I'm not sure there is an important further question of whether the line is curving up or down between the transition period and the plateau (or more precisely, when it transitions from curving up (as in the transition period) to curving down (as in the plateau)). I suppose 'accelerating' could include plateauing quite quickly, and decelerating could include still going very fast and reaching a very high plateau quickly, which to most people wouldn't intuitively feel like 'deceleration'.

2. Max rate of change

Theoretical limits for the speed of progress are 100X as fast as recent progress.

It would be good to flag in the main text that the justification for this is in Appendix 2 (initially I thought it was a bare asertion). Also, it is interesting that in @kokotajlod's scenario the 'wildly superintelligent' AI maxes out at 1 million-fold AI R&D speedup; I commented to them on a draft that this seemed implausibly high to me. I have no particular take on whether 100x is too low or too high as the theoretical max, but it would be interesting to work out why there is this Forethought vs AI Futures difference.

3. Error in GPT OOMS calculations

  • Algorithmic improvements compound multiplicatively rather than additively, so the formula in column G I think should be 3^years rather than 3*years?
    • This also clears up the current mismatch between columns G and H. Most straightforward would be for column H to be log10(G), same as column F. But since log(a^b) = b*log(a), once you make the correction to column G you get out column H = log(3^years) = years * log(3) = 0.48*years. Which is close to what you currently have of 0.4 * years, I assume there was just a rounding error somewhere.
  • This won't end up changing the main results though.

4. Physical limits

Regarding the effective physical limits of each feedback loop, perhaps it is worth noting that your estimates are very well grounded and high-confidence for the chip production feedback loop as we know more or less exactly the energy output of the sun. But the other two are super speculative. Which is fine, they are just quite different types of estimates, so we should remember to rely far less on them.

5. End of the transition period

  • Currently, this is set at when AIs are almost as productive (9/10) as humans, but it would make more sense to me to end it when AIs are markedly superior to humans, e.g. 10x.
    • Maybe I am misunderstanding elasticities though, I only have a lay non-economist's grasp of them.
  • Overall it might be more intuitive to define the transition period in terms of how useful one additional human researcher vs AI researcher is, from human being 10x better to the AI being 10x better.
    • Defining what 'one AI researcher' is could be tricky, maybe we could use the pace of human thought in tokens per second as a way to standardise.

(Finally, Fn2 is missing a link.)

Thanks for this, I hadn't thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).

I liked your flowchart on 'Inputs in the AI application pipeline,' so using that framing:

  • Learning algorithms: I agree this is not very tractable for us[1] to work on.
  • Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by 'us'. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/governance etc researcher.
    • But my guess is just compiling this training data doesn't take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/drafts/plans etc paired with the final product.
      • There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
    • Making sure we don't delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
  • Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/upstream from what you have in mind I think, it is more just 'get key people to care more about safety'.
  • Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just 'managing a fleet of AI research assistants'.
  • UI and complementary technologies: I don't think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.

In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.

So my take is that probably a majority of the game here is in 'automated AI safety/governance/strategy' because there will be less corporate incentive here, and it is also our comparative advantage to work on.

Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.

I'm not sure how much we actually disagree though, would be interested in your thoughts!

  1. ^

    Throughout, I use 'us' to refer broadly to EA/longtermist/existential security type folks.

So if we take as given that I am at 53% and Alice is at 45% that gives me some reason to do longtermist outreach, and gives Alice some reason to try to stop me, perhaps by making moral trades with me that get more of what we both value. In this case, cluelessness doesn't bite as Alice and I are still taking action towards our longtermist ends.

However, I think what you are claiming, or at least the version of your position that makes most sense to me, is that both Alice and I would be making a failure of reasoning if we assign these specific credence, and that we should both be 'suspending judgement'. And if I grant that, then yes it seems cluelessness bites as neither Alice or I know at all what to do now.

So it seems to come down to whether we should be precise Bayesians.

Re judgment calls, yes I think that makes sense, though I'm not sure it is such a useful category. I would think there is just some spectrum of arguments/pieces of evidence from 'very well empirically grounded and justified' through 'we have some moderate reason to think so' to 'we have roughly no idea' and I think towards the far right of this spectrum is what we are labeling judgement calls. But surely there isn't a clear cut-off point.

OscarD🔸
4
1
0
29% disagree

I think there are a lot of thorny definitional issues here that make this set of issues not boil down that nicely to a 1D spectrum. But overall extinction prevention will likely have a far broader coalition supporting it, while making the future large and amazing is far less popular since most people aren't very ambitious with respect to spreading flourishing through the universe, but I tentatively am.

I also think average utilitarianism doesn't seem very plausible. I was just using it as an example of a non-linear theory (though as Will notes if any individual is linear in resources so is the world as a whole, just with a smaller derivative).

Interesting, is this the sort of thing you have in mind? It at least seems similar to me, and I remember thinking that post got at something important.

A bull case for convergence:

  • Factory farming, and to a lesser extent global poverty, persist because there are some costs to ending them, and the rich aren't altruistic enough (or the altruists aren't rich enough) to end them. Importantly, it will not just be that factory farming itself ends, but due to cognitive dissonance, people's moral views towards nonhumans will likely change a lot too once ~no-one is eating animals. So there will predictably be convergence on viewing c2025 treatment of animals as terrible.
  • There is an ongoing homogenization of global culture which will probably continue. As the educational and cultural inputs to people converge, it seems likely their beliefs (including moral beliefs) will also converge at least somewhat.
  • Some fraction of current disagreements about economic/political/moral questions are caused just by people not being sufficiently informed/rational. So those disagreements would go away when we have ~ideal post-human reasoners.
    • A more ambitious version of the above is that perhaps post-humans will take epistemic humility very seriously, and they will know that all their peers are also very rational, so they will treat their own moral intuitions as little evidence of what the true/best/idealised-upon-reflection moral beliefs are. Then, everyone just defers very heavily to the annual survey of all of (post)humanity's views on e.g. population axiology rather than backing their own intuition.
      • (Arguably this doesn't count as convergence if people's intuitions still differ, but I think if people's all-things-considered beliefs, and therefore their actions, converge that is enough.)

But I agree we shouldn't bank on convergence!

Load more