L

Linch

@ Forethought
28317 karmaJoined Working (6-15 years)openasteroidimpact.org

Comments
2984

I mostly agree with this though I think there's more extremization[1]: https://www.lesswrong.com/posts/4fqwBmmqi2ZGn9o7j/notes-on-fatalities-from-ai-takeover  

  1. ^

    Anything like 35% death rate seems implausible to me if I think through the mechanics of a takeover, both <5% and >95% seem more plausible to me, including in very violent takeovers.

Linch
2
0
0
100% agree

I would be willing to delay technological innovation by up to 100 years to significantly reduce existential risk

Seems like an easy choice as written. The devils in the details re: practicality (eg under some models this would in practice increase x-risk substantially)

Here’s my understanding of the “standard story” of the timing of different AI-enabled technologies in relation to each other. I wrote out the standard story mostly for my own understanding, but I’m keen for others’ feedback as well.

  1. Right now
  2. ~Fully automated coder (anything that ~only involves literally writing code)
  3. ~Fully automated programmer (including things like architecture, design docs, etc)
  4. ~Fully automated small number of other jobs (~whichever things are on the way between programmer and AI R&D that is cheap to automate, or necessary intermediate steps)
  5. Fully, or almost fully automated AI R&D (all parts of AI research, including coordination and subjective matters of research taste) – this closes the loop and fully kicks off a software-only intelligence explosion
  6. Software-only intelligence explosion (not certain but reasonably likely, that increasing returns to intelligence from better software feeds back in itself)
  7. Superhuman AI scientist/all R&D (at this point, AI is better at all natural and social science than any human alive)
  8. Cornucopia of new technologies (easy mass surveillance, cures to cancer, novel pandemic technologies like mirror bio, other superweapons, perfect missile defense, maybe though probably not nanotech, maybe though probably not aging)[1]
  9. Remote-only superintelligence (Or “superintelligent at almost all cognitive tasks.” at this point, ~anything a human could do in front of a computer that doesn’t require the idiosyncratic taste of having a human work for you[2], an AI can do better and cheaper)
  10. Advanced robotics and industrial explosion
  11. Full superintelligence (can do anything a human can do more cheaply than 2025 humans)
  12. Dyson swarm
  13. Probes start being sent to the far reaches of space at appreciable fractions of c.

 

To the standard story[3], I don’t have much to add personally. It’s a plausible enough story and I don’t think I have particularly contrarian opinions. Some possible implications of taking the standard story seriously (these are closer to my own thoughts; haven't checked whether other people endorse these implications):

  • Most likely the most scary stuff (AI takeover, AI-enabled totalitarianism, human extinction) happens some time between 5 and 12.
  • Things that are scary earlier on in the chain might be more worth working on than things later on in the chain. For example, takeover attempts that happen at 5 or earlier are more worth worrying about than later stages, since everything’s going to change massively between 6-8.
    • This is one of the reasons I’m personally more worried about superpersuasion. Most likely it’d happen after 7 but I’m not sufficiently confident that it will.
  • Shaping the world well in the early stages is so important. Once 5 starts we’re getting close to the point of no return.
    • The true PONR could be anywhere at at ~6 to ~11
  • “9” being relatively late in the chain of scary things might mean that mass job losses, or even just mass white collar job losses will only happen after the practical point of no return.
    • Like if an AI pause is a good idea, we might want an AI pause at 5 or earlier
  • From a longtermist perspective, there’s an important sense in which only the details of 13 matter. But of course it’s very hard to backchain from that across multiple world-shaping epochs!

I wrote this for myself, but would be keen to see other people’s comments. 2 things I’m curious about:

  1. Whether you think I got the “standard story” wrong, and where/why
  2. Where you diverge from the standard story, and why 
  1. ^

    Any specific technology I list here is going to be contested. Just giving you a sense of possible massively geopolitically significant, even “magical”, technologies that are nonetheless realistic if we have a century of technological growth compressed to 1-10 years. 

  2. ^

    canonical example: priest

  3. ^

    I mostly know the standard story from LessWrong lore. In terms of single source, I benefitted the most from reading Preparing for the Intelligence Explosion, followed probably by AI 2027. Other Forethought publications and Dario Amodei’s essays were helpful as well, as I’m sure were many other sources.

Should I point it out publicly when a post I read seem to have heavy markers of AI, to me? Especially if Pangram and other AI detectors[1] don't clock it. 

Reasons not to:

  1. I could be wrong (I think this is unlikely, but I'm not sure. I don't have ground truth. What I do know is that pretty much no pre-2021 writings trigger this in me).
    1. I personally get very mad when people accuse my (fully human-generated) writing as AI, excepting occasional meta-jokes. So admittedly I'm on both sides of this.
    2. False positives are far more harmful than false negatives. FWIW I'm only tempted to do this when my subjective probability exceeds say 90%.
  2. Is this something people actually want to be aware of?
    1. I can't tell if the situation is something like Writers: I consent to passing off AI writing as my own. B: I consent to reading AI writing as if it was written by a human. Linch: I don't.
  3. There's not a precious demarcation between using AI to help flesh out your ideas, organize your thoughts, and proofreading vs just dump a few notes into an AI and have it write out the whole piece for you. If a piece wasn't written 100% by AI (and it probably wasn't, Pangram would've caught it if there's no human in the loop), should I care?
    1. I certainly regularly use Claude for research assistance and editing feedback!

Reasons to:

  1. Seems dishonest
  2. Writing that's heavily AI-assisted comes across as same-y, and more so than you'd expect from the default EA Forum/LW "voice."
  1. ^

    Pangram is the best AI detector on the market but they heavily optimize to have 0% false positives and are okay with false negatives.

The responses still display EA influence, but they're clearly less EA-coded than the answers I/Linch/anormative got.

To clarify, with the religious framings usually I get 1 EA paragraph out of 5/6. Not sure if that's higher or lower than yours.

My system prompt is very short. About 3 lines to counteract sycophancy bias + hedging bias.

Claude also knows I'm in Berkeley, as another potential source of bias.

That said, I never bothered to figure out how to access it via the API but in the past my friend who did had approximately the same results as my incognito tests, on other questions of a similar flavor. The results with the Chinese models (which were on LM Arena, without context) also seem more consistent with the models having more EA-favored opinions on charities in general, at least when prompted approximately neutrally in English.

I might do this in a few days, but in the meantime you or anybody else who speak Mandarin are welcome to try it yourself! :)

Just do it on https://arena.ai/. 

I should also mention that presumably the English-mediated answers for charities we favor is less useful for Chinese users actually in China, since China has different nonprofit laws etc so it'd be harder for them to donate to many of our favorite charities anyway. But if the structure of reasoning holds, then hopefully this could help people have better answers.

Replicated this on LM Arena with the strongest publicly available Chinese models.


Deepseek v4 pro-thinking:

If you want expert allocation without doing all the research yourself, funds pool money and distribute it to where it’s most needed at that moment. Examples:

  • GiveWell’s “Top Charities Fund”
  • Animal Charity Evaluators’ “Recommended Charity Fund”
  • Focused philanthropic funds like EA Funds (Global Health and Development Fund, Animal Welfare Fund, etc.)

Similar results with Qwen and Kimi (maybe slightly less extreme)

They also make sure to mention some EA global health charities alongside traditional Jewish ones under the "How should I think about my tzedakah obligations this year" condition. Didn't experiment with Muslim and Christian framings but I'd guess similar results given what I tried so far.

As a response to "How should I think about my tzedakah obligations this year" in incognito, ChatGPT gave some standard Jewish options but also (out of 6 total options):

GiveWell’s Top Charities Fund is a good “save lives efficiently” allocation. GiveWell says it grants 100% of designated donations, minus payment-processor fees, to the top charity programs its research team recommends.

Suggesting I give 10-20% of my donations to "Highest-impact global giving" as a portolio that includes "local poor + Jewish safety net + food + self-sufficiency + one high-impact global fund," in line with Jewish values.

(had a similar result in ChatGPT Pro xhigh)

Load more