Most of my stuff (even the stuff of interest to EAs) can be found on LessWrong: https://www.lesswrong.com/users/daniel-kokotajlo
First of all, you are goal-post-moving if you make this about "confident belief in total doom by default" instead of the original "if you really don't think unchecked AI will kill everyone." You need to defend the position that the probability of existential catastrophe conditional on misaligned AI is <50%.
Secondly, "AI motives will generalize extremely poorly from the training distribution" is a confused and misleading way of putting it. The problem is that it'll generalize in a way that wasn't the way we hoped it would generalize.
Third, to answer your questions:
1. The difference in power will be great & growing rapidly, compared to historical cases. I support implementing things like model amnesty, but I don't expect them to work, and anyhow we are not anywhere close to having such things implemented.
2. It'll be AI vs. AI with humanity on the sidelines, yes. Humans will be killed off, enslaved, or otherwise misused as pawns. It'll be like colonialism all over again but on steroids. Unless takeoff is fast enough that there is only one AI faction. Doesn't really matter, either way humans are screwed.
3. Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don't care, having incentives such that it wouldn't be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he'd lose way more in expectation than he'd gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won't be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.
Thanks!
I think this is evidence for a groupthink phenomenon amongst superforecasters. Interestingly my other experiences talking with superforecasters have also made me update in this direction (they seemed much more groupthinky than I expected, as if they were deferring to each other a lot. Which, come to think of it, makes perfect sense -- I imagine if I were participating in forecasting tournaments, I'd gradually learn to reflexively defer to superforecasters too, since they genuinely would be performing well.)
Ironically, one of the two predictions you quote as example of bad prediction, is in fact an example of a good prediction: "The most realistic estimate for a seed AI transcendence is 2020."
Currently it seems that AGI/superintelligence/singularity/etc. will happen sometime in the 2020's. Yudkowsky's median estimate in 1999 was 2020 apparently, so he probably had something like 30% of his probability mass in the 2020s, and maybe 15% of it in the 2025-2030 period when IMO it's most likely to happen.
Now let's compare to what other people would have been saying at the time. They would almost all have been saying 0%, and then maybe the smarter and more rational ones would have been saying things like 1%, for the 2025-2030 period.
To put it in nonquantitative terms, almost everyone else in 1999 would have been saying "AGI? Singularity? That's not a thing, don't be ridiculous." The smarter and more rational ones would have been saying "OK it might happen eventually but it's nowhere in sight, it's silly to start thinking about it now." Yudkowsky said "It's about 21 years away, give or take; we should start thinking about it now." Now with the benefit of 24 years of hindsight, Yudkowsky was a lot closer to the truth than all those other people.
Also, you didn't reply to my claim. Who else has been talking about AGI etc. for 20+ years and has a similarly good track record? Which of them managed to only make correct predictions when they were teenagers? Certainly not Kurzweil.
The XPT forecast about compute in 2030 still boggles my mind. I'm genuinely confused what happened there. Is anybody reading this familiar with the answer?
Fair, but still: In 2019 Microsoft invested a billion dollars in OpenAI, roughly half of which was compute: Microsoft invests billions more dollars in OpenAI, extends partnership | TechCrunch
And then GPT-3 happened, and was widely regarded to be a huge success and proof that scaling is a good idea etc.
So the amount of compute-spending that the most aggressive forecasters think could be spent on a single training run in 2032... is about 25% as much compute-spending as Microsoft gave OpenAI starting in 2019, before GPT-3 and before the scaling hypothesis. The most aggressive forecasters.
Also, if you do various searches on LW and Astral Codex Ten looking for comments I've made, you might see some useful ones maybe.
No, alas. However I do have this short summary doc I wrote back in 2021: The Master Argument for <10-year Timelines - Google Docs
And this sequence of posts making narrower points: AI Timelines - LessWrong
The XPT forecasters are so in the dark about compute spending that I just pretend they gave more reasonable numbers. I'm honestly baffled how they could be so bad. The most aggressive of them thinks that in 2025 the most expensive training run will be $70M, and that it'll take 6+ years to double thereafter, so that in 2032 we'll have reached $140M training run spending... do these people have any idea how much GPT-4 cost in 2022?!?!? Did they not hear about the investments Microsoft has been making in OpenAI? And remember that's what the most aggressive among them thought! The conservatives seem to be living in an alternate reality where GPT-3 proved that scaling doesn't work and an AI winter set in in 2020.
Those words were not yours, but you did say you agreed it was the main crux, and in context it seemed like you were agreeing that it was a crux for you too. I see now on reread that I misread you and you were instead saying it was a secondary crux. Here, let's cut through the semantics and get quantitative:
What is your credence in doom conditional on AIs not caring for humans?
If it's >50%, then I'm mildly surprised that you think the risk of accidentally creating a permanent pause is worse than the risks from not-pausing. I guess you did say that you think AIs will probably just be ethical if we train them hard enough to be... What is your response to the standard arguments that 'just train them hard to be ethical' won't work? E.g. Ajeya Cotra's writings on the training game.
Re: "I don't see how the first part of that leads to the second part" Come on, of course you do, you just don't see it NECESSARILY leading to the second part. On that I agree. Few things are certain in this world. What is your credence in doom conditional on AIs not caring for humans & there being multiple competing AIs?
IMO the "Competing factions of superintelligent AIs, none of whom care about humans, may soon arise, but even if so, humans will be fine anyway somehow" hypothesis is pretty silly and the burden of proof is on you to defend it. I could cite formal models as well as historical precedents to undermine the hypothesis, but I'm pretty sure you know about them already.
Why what? I answered your original question:
with:
My guess is that you disagree with the "whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment..." bit.
Why? Seems pretty obvious to me, I feel like your skepticism is an isolated demand for rigor.
But I'll go ahead and say more anyway:
Giving humans equal treatment would be worse (for the AIs, which by hypothesis don't care about humans at all) than other salient available options to them, such as having the humans be second-class in various ways or complete pawns/tools/slaves. Eventually, when the economy is entirely robotic, keeping humans alive at all would be an unnecessary expense.
Historically, if you look at relations between humans and animals, or between colonial powers and native powers, this is the norm. Cases in which the powerless survive and thrive despite none of the powerful caring about them are the exception, and happen for reasons that probably won't apply in the case of AI. E.g. Putin killing homeless people would be bad for his army's morale, and that would far outweigh the benefits he'd get from it. (Arguably this is a case of some powerful people in Russia caring about the homeless, so maybe it's not even an exception after all)
Can you say more about what model you have in mind? Do you have a model? What about a scenario, can you spin a plausible story in which all the ASIs don't care at all about humans but humans are still fine?
Wanna meet up sometime to talk this over in person? I'll be in Berkeley this weekend and next week!