All of kokotajlod&#x27;s Comments + Replies

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

Thanks for discussing with me!

(I forgot to mention an important part of my argument, oops -- You wouldn't have said "at least 100 years off" you would have said "at least 5000 years off." Because you are anchoring to recent-past rates of progress rather than looking at how rates of progress increase over time and extrapolating. (This is just an analogy / data point, not the key part of my argument, but look at GWP growth rates as a proxy for tech progress rates: According to this GWP doubling time was something like 600 years back then, whereas it's more l... (read more)

titotal

That argument does make more sense, although it still doesn't apply to me, as I would never confidently state a 5000 year forecast due to the inherent uncertainty of long term predictions. (My estimates for nanotech are also high uncertainty, for the record). no worries, I enjoyed the debate!

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

I agree with the claims "this problem is extremely fucking hard" and "humans aren't cracking this anytime soon" and I suspect Yudkowsky does too these days.

I disagree that nanotech has to predate taking over the world; that wasn't an assumption I was making or a conclusion I was arguing for at any rate. I agree it is less likely that ASIs will make nanotech before takeover than that they will make nanotech while still on earth.

I like your suggestion to model a more earthly scenario but I lack the energy and interest to do so right now.

My closing statement ... (read more)

titotal

Sorry, to be clear, I wasn't actually making a prediction as to whether nanotech predates AI takeover. My point is that that these discussions are in the context of the question "can nanotech be used to defeat humanity". If AI can only invent nanotech after defeating humanity, that's interesting but has no bearing on the question. I also lack the energy or interest to do the modelling, so we'll have to leave it there. My closing rebuttal: I have never stated that I am certain that nanotech is impossible. I have only stated that it could be impossible, impractical, or disappointing, and that the timelines for development are large, and would remain so even with the advent of AGI. If I had stated in 1600 that flying machines, moving pictures, thinking machines, etc were at least 100 years off, I would have been entirely correct and accurate. And for every great technological change that turned out to be true and transformative, there are a hundred great ideas that turned out to be prohibitively expensive, or impractical, or just plain not workable. And as for the ones that did work out, and did transform the world: it almost always took a long time to build them, once we had the ability to. And even then they started out shitty as hell, and took a long, long time to become as flawless as they are today. I'm not saying new tech can't change the world, I'm just saying it can't do it instantly.

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

Cool. Seems you and I are mostly agreed on terminology then.

Yeah we definitely disagree about that crux. You'll see. Happy to talk about it more sometime if you like.

Re: galaxy vs. earth: The difference is one of degree, not kind. In both cases we have a finite amount of resources and a finite amount of time with which to do experiments. The proper way to handle this, I think, is to smear out our uncertainty over many orders of magnitude. E.g. the first OOM gets 5% of our probability mass, the second OOM gets 5% of the remaining probability mass, and so fo... (read more)

titotal

It really seems to me like the galaxy thing is just going to mislead, rather than elucidate. I can make my judgements about a system where one planet is converted into computronium, one planets contains a store of every available element, one planet is tiled completely with experimental labs doing automated experiments, etc. But the results of that hypothetical won't scale down to what we actually care about. For example, it wouldn't account for the infrastructure that needs to be built to assemble any of those components in bulk. If someone wants to try their hand at modelling a more earthly scenario, I'd be happy to offer my insights. Remember, this development of nanotech has to predate the AI taking over the world, or else the whole exercise is pointless. You could look at something like "AI blackmails the dictator of a small country into starting a research program" as a starting point. Personally, I don't think there is very much you can be certain about, beyond: "this problem is extremely fucking hard", and "humans aren't cracking this anytime soon". I think building the physical infrastructure required to properly do the research in bulk could easily take more than a year on it's own.

We're Not Ready: thoughts on "pausing" and responsible scaling policies

What if he just said "Some sort of super-powerful nanofactory-like thing?"

He's not citing some existing literature that shows how to do it, but rather citing some existing literature which should make it plausible to a reasonable judge that a million superintelligences working for a year could figure out how to do it. (If you dispute the plausibility of this, what's your argument? We have an unfinished exchange on this point elsewhere in this comment section. Seems you agree that a galaxy full of superintelligences could do it; I feel like it's pretty plausible that if a galaxy of superintelligences could do it, a mere million also could do it.)

titotal

I would vastly prefer this phrasing, because it would be an accurate relaying of his beliefs, and would not involve the use of scientific terms that are at best misleading and at worst active misinformation. As for the "millions of superintelligences", one of my main cruxes is that I do not think we will have millions of superintelligences in my lifetime. We may have lots of AGI, but I do not believe that AGI=superintelligence. Also, I think that if a few superintelligences come into existence they may prevent others from being built out of self-preservation. These points are probably out of scope here though. I don't think a million superintelligences could invent nanotech in a year, with only the avalaible resources on earth. Unlike the galaxy, there is limited computational power available on earth, and limited everything else as well. I do not think the sheer scale of experimentation required could be assembled in a year, without having already invented nanotech. The galaxy situation is fundamentally misleading. Lastly, I think even if nanotech is invented, it will probably end up being disappointing or limited in some way. This tends to be the case with all technologies: Did anyone predict that when we could build an AI that could easily pass a simple turing test, but be unable to multiply large numbers together? Hypothetical technologies get to be perfect in our minds, but as something actually gets built, it accumulates shortcomings and weaknesses from the inevitable brushes with engineering.

We're Not Ready: thoughts on "pausing" and responsible scaling policies

I hope you are right.

kokotajlod2y3

I think the tech companies -- and in particular the AGI companies -- are already too powerful for such an informal public backlash to slow them down significantly.

Geoffrey Miller2y26

Disagree. Almost every successful moral campaign in history started out as an informal public backlash against some evil or danger.

The AGI companies involve a few thousand people versus 8 billion, a few tens of billions of funding versus 360 trillion total global assets, and about 3 key nation-states (US, UK, China) versus 195 nation-states in the world.

Compared to actually powerful industries, AGI companies are very small potatoes. Very few people would miss them if they were set on 'pause'.

Greg_Colbourn ⏸️

I imagine it going hand in hand with more formal backlashes (i.e. regulation, law, treaties).

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

kokotajlod2y11

I said IMO. In context it was unnecessary for me to justify the claim, because I was asking whether or not you agreed with it.

I take it that not only do you disagree, you agree it's the crux? Or don't you? If you agree it's the crux (i.e. you agree that probably a million cooperating superintelligences with an obedient nation of humans would be able to make some pretty awesome self-replicating nanotech within a few years) then I can turn to the task of justifying the claim that such a scenario is plausible. If you don't agree, and think that even such a su... (read more)

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

kokotajlod2y6

What part of the scenario would you dispute? A million superintelligences will probably exist by 2030, IMO; the hard part is getting to superintelligence at all, not getting to a million of them (since you'll probably have enough compute to make a million copies)

I agree that the question is about the actual scenario, not the galaxy. The galaxy is a helpful thought experiment though; it seems to have succeeded in establishing the right foundations: How many OOMs of various inputs (compute, experiments, genius insights) will be needed? Presumably a galaxy's ... (read more)

titotal

This is a very wild claim to throw out with no argumentation to back it up. Cotra puts a 15% chance on transformative AI by 2036, and I find his assumptions incredibly optimistic about AI arrival. (also worth noting that transformative AI and superintelligence are not the same thing). The other thing I dispute is that a million superintelligences would cooperate. They would presumably have different goals and interests: surely at least some of them would betray the other's plan for a leg-up from humanity. You don't think some of the people of the "obedient nation" are gonna tip anyone off about the nanotech plan? Unless you think the AI's have some sort of mind-control powers, in which case why the hell would they need nanotech?

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

kokotajlod2y7

I also would like to see such breakdowns, but I think you are drawing the wrong conclusions from this example.

Just because Yudkowsky's first guess about how to make nanotech, as an amateur, didn't pan out, doesn't mean that nanotech is impossible for a million superintelligences working for a year. In fact it's very little evidence. When there are a million superintelligences they will surely be able to produce many technological marvels very quickly, and for each such marvel, if you had asked Yudkowsky to speculate about how to build it, he would have fai... (read more)

"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation

kokotajlod2y56

Thanks for this thoughtful and detailed deep dive!

I think it misses the main cruxes though. Yes, some people (Drexler and young Yudkowsky) thought that ordinary human science would get us all the way to atomically precise manufacturing in our lifetimes. For the reasons you mention, that seems probably wrong.

But the question I'm interested in is whether a million superintelligences could figure it out in a few years or less. (If it takes them, say, 10 years or longer, then probably they'll have better ways of taking over the world) Since that's the situatio... (read more)

EliezerYudkowsky

I broadly endorse this reply and have mostly shifted to trying to talk about "covalently bonded" bacteria, since using the term "diamondoid" (tightly covalently bonded CHON) causes people to panic about the lack of currently known mechanosynthesis pathways for tetrahedral carbon lattices.

titotal2y32

Hey, thanks for engaging. I saved the AGI theorizing for last because it's the most inherently speculative: I am highly uncertain about it, and everyone else should be too.

But the question I'm interested in is whether a million superintelligences could figure it out in a few years or less. (If it takes them, say, 10 years or longer, then probably they'll have better ways of taking over the world) Since that's the situation we'll actually be facing.

I would dispute that "a million superintelligence exist and cooperate with each other to invent MNT" is ... (read more)

The possibility of an indefinite AI pause

The possibility of an indefinite AI pause

OK, so our credences aren't actually that different after all. I'm actually at less than 65%, funnily enough! (But that's for doom = extinction. I think human extinction is unlikely for reasons to do with acausal trade; there will be a small minority of AIs that care about humans, just not on Earth. I usually use a broader definition of "doom" as "About as bad as human extinction, or worse.")

I am pretty confident that what happens in the next 100 years will straightforwardly translate to what happens in the long run. If humans are still well-cared-for in 2... (read more)

The possibility of an indefinite AI pause

Those words were not yours, but you did say you agreed it was the main crux, and in context it seemed like you were agreeing that it was a crux for you too. I see now on reread that I misread you and you were instead saying it was a secondary crux. Here, let's cut through the semantics and get quantitative:

What is your credence in doom conditional on AIs not caring for humans?

If it's >50%, then I'm mildly surprised that you think the risk of accidentally creating a permanent pause is worse than the risks from not-pausing. I guess you did say that ... (read more)

Denkenberger🔸

Paul Christiano argues here that AI would only need to have "pico-pseudokindness" (caring about humans one part in a trillion) to take over the universe but not trash Earth's environment to the point of uninhabitability, and that at least this is amount of kindness is likely.

Matthew_Barnett

How much do they care about humans, and what counts as doom? I think these things matter. If we're assuming all AIs don't care at all about humans and doom = human extinction, then I think the probability is pretty high, like 65%. If we're allowed to assume that some small minority of AIs cares about humans, or AIs care about humans to some degree, perhaps in the way humans care about wildlife species preservation, then I think the probability is quite a lot lower, at maybe 25%. For precision, both of these estimates are over the next 100 years, since I have almost no idea what will happen in the very long run. In most of these stories, including in Ajeya's story IIRC, humanity just doesn't seem to try very hard to reduce misalignment? I don't think that's a very reasonable assumption. (Charitably, it could be interpreted as a warning rather than a prediction.) I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists. I'm happy to meet up some time and explain in person. I'll try to remember to DM you later about that, but if I forget, then feel free to remind me.

kokotajlod2y4

First of all, you are goal-post-moving if you make this about "confident belief in total doom by default" instead of the original "if you really don't think unchecked AI will kill everyone." You need to defend the position that the probability of existential catastrophe conditional on misaligned AI is <50%.

Secondly, "AI motives will generalize extremely poorly from the training distribution" is a confused and misleading way of putting it. The problem is that it'll generalize in a way that wasn't the way we hoped it would generalize.

Third, to answer your... (read more)

Matthew_Barnett

I never said "I don't think unchecked AI will kill everyone". That quote was not from me. What I did say was, "Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us." Google informs me that dubious means "not to be relied upon; suspect". I don't see how the first part of that leads to the second part. Humanity could be on the sidelines in a way that doesn't lead to total oppression and subjugation. The idea that these things will necessarily happen just seems like speculation. I could speculate that the opposite will occur and AIs will leave us alone. That doesn't get us anywhere. The question I'm asking is: why? You have told me what you expect to happen, but I want to see an argument for why you'd expect that to happen. In the absence of some evidence-based model of the situation, I don't think speculating about specific scenarios is a reliable guide.

Language models surprised us

kokotajlod2y1

Thanks!

I think this is evidence for a groupthink phenomenon amongst superforecasters. Interestingly my other experiences talking with superforecasters have also made me update in this direction (they seemed much more groupthinky than I expected, as if they were deferring to each other a lot. Which, come to think of it, makes perfect sense -- I imagine if I were participating in forecasting tournaments, I'd gradually learn to reflexively defer to superforecasters too, since they genuinely would be performing well.)

On Deference and Yudkowsky's AI Risk Estimates

kokotajlod2y13

Ironically, one of the two predictions you quote as example of bad prediction, is in fact an example of a good prediction: "The most realistic estimate for a seed AI transcendence is 2020."

Currently it seems that AGI/superintelligence/singularity/etc. will happen sometime in the 2020's. Yudkowsky's median estimate in 1999 was 2020 apparently, so he probably had something like 30% of his probability mass in the 2020s, and maybe 15% of it in the 2025-2030 period when IMO it's most likely to happen.

Now let's compare to what other people would have been saying... (read more)

Language models surprised us

kokotajlod2y9

The XPT forecast about compute in 2030 still boggles my mind. I'm genuinely confused what happened there. Is anybody reading this familiar with the answer?

Lukas Finnveden2y10

FWIW you can see more information, including some of the reasoning, on page 655 (# written on pdf) / 659 (# according to page searcher) of the report. (H/t Isabel.) See also page 214 for the definition of the question.

Some tidbits:

Experts started out much higher than superforecasters, but updated downwards after discussion. Superforecasters updated a bit upward, but less:

(Those are billions on the y-axis.)

This was surprising to me. I think the experts' predictions look too low even before updating, and look much worse after updating!

The part of the ... (read more)

Greg_Colbourn ⏸️ 2y15

Reminds me of this:
A kind of conservativeness of "expert" opinion that doesn't correctly appreciate (rapid) exponential growth.

Who’s right about inputs to the biological anchors model?

Who’s right about inputs to the biological anchors model?

Fair, but still: In 2019 Microsoft invested a billion dollars in OpenAI, roughly half of which was compute: Microsoft invests billions more dollars in OpenAI, extends partnership | TechCrunch

And then GPT-3 happened, and was widely regarded to be a huge success and proof that scaling is a good idea etc.

So the amount of compute-spending that the most aggressive forecasters think could be spent on a single training run in 2032... is about 25% as much compute-spending as Microsoft gave OpenAI starting in 2019, before GPT-3 and before the scaling hypothesis. Th... (read more)

Who’s right about inputs to the biological anchors model?

Also, if you do various searches on LW and Astral Codex Ten looking for comments I've made, you might see some useful ones maybe.

Who’s right about inputs to the biological anchors model?

No, alas. However I do have this short summary doc I wrote back in 2021: The Master Argument for <10-year Timelines - Google Docs

And this sequence of posts making narrower points: AI Timelines - LessWrong

kokotajlod

Also, if you do various searches on LW and Astral Codex Ten looking for comments I've made, you might see some useful ones maybe.

Who’s right about inputs to the biological anchors model?

Perhaps this should be a top-level comment.

kokotajlod3y10

The XPT forecasters are so in the dark about compute spending that I just pretend they gave more reasonable numbers. I'm honestly baffled how they could be so bad. The most aggressive of them thinks that in 2025 the most expensive training run will be $70M, and that it'll take 6+ years to double thereafter, so that in 2032 we'll have reached $140M training run spending... do these people have any idea how much GPT-4 cost in 2022?!?!? Did they not hear about the investments Microsoft has been making in OpenAI? And remember that's what the most aggressive among them thought! The conservatives seem to be living in an alternate reality where GPT-3 proved that scaling doesn't work and an AI winter set in in 2020.

kokotajlod6mo11

erickb

Remember these predictions were made in summer 2022, before ChatGPT, before the big Microsoft investment and before any serious info about GPT-4. They're still low, but not ridiculous.

kokotajlod

Perhaps this should be a top-level comment.

Who’s right about inputs to the biological anchors model?

kokotajlod3y31

I haven’t considered all of the inputs to Cotra’s model, most notably the 2020 training computation requirements distribution. Without forming a view on that, I can’t really say that ~53% represents my overall view.

Sorry to bang on about this again and again, but it's important to repeat for the benefit of those who don't know: The training computation requirements distribution is by far the biggest cruxy input to the whole thing; it's the input that matters most to the bottom line and is most subjective. If you hold fixed everything else Ajeya inputs, but... (read more)

Lukas Finnveden

It's the crux between you and Ajeya, because you're relatively more in agreement on the other numbers. But I think that adopting the xpt numbers on these other variables would slow down your own timelines notably, because of the almost complete lack of increase in spending. That said, if the forecasters agreed with your compute requirements, they would probably also forecast higher spending.

JoshuaBlake

Do you have a write-up of your beliefs that lead you to 2030 as your median?

rosehadshar3y17

Don't apologise, think it's a helpful point!

I agree that the training computation requirements distribution is more subjective and matters more to the eventual output.

I also want to note that while on your view of the compute reqs distribution, the hardware/spending/algorithmic progress inputs are a rounding error, this isn't true for other views of the compute reqs distribution. E.g. for anyone who does agree with Ajeya on the compute reqs distribution, the XPT hardware/spending/algorithmic progress inputs shift median timelines from ~2050 to ~2090, which... (read more)

A Friendly Face (Another Failure Story)

How does AI progress affect other EA cause areas?

Another nice story! I consider this to be more realistic the previous one about open-source LLMs. In fact I think this sort of 'soft power takeover' via persuasion is a lot more probable than most people seem to think. That said, I do think that hacking and R&D acceleration are also going to be important factors, and my main critique of this story is that it doesn't discuss those elements and implies that they aren't important.

In addition to building more data centers, MegaAI starts constructing highly automated factories, which will produce the

... (read more)

Karl von Wendt

As replied on LW, we'll discuss this. Thanks!

Answer by kokotajlodJun 09, 202319

I think it mostly means that you should be looking to get quick wins. When calculating the effectiveness of an intervention, don't assume things like "over the course of an 85-year lifespan this person will be healthier due to better nutrition now." or "this person will have better education and thus more income 20 years from now." Instead just think: How much good does this intervention accomplish in the next 5 years? (Or if you want to get fancy, use e.g. a 10%/yr discount rate)

See Neartermists should consider AGI timelines in their spending decisions - ... (read more)

A note of caution about recent AI risk coverage

kokotajlod3y34

FWIW I think that it's pretty likely that AGI etc. will happen within 10 years absent strong regulation, and moreover that if it doesn't, the 'crying wolf' effect will be relatively minor, enough that even if I had 20-year medians I wouldn't worry about it compared to the benefits.

Stefan_Schubert3y27

I also guess cry wolf-effects won't be as large as one might think - e.g. I think people will look more at how strong AI systems appear at a given point than at whether people have previously warned about AI risk.

Should global health donors focus on R&D?

Normally I'd say yes, but my AGI timelines are now 50% in ~4 years, so there isn't much time for R&D to make a difference. I'd recommend interventions that pay off quickly, therefore. Bed nets, GiveDirectly, etc.

Ouch, I wasn't aware of those rules, they do seem quite restrictive. If it's a website rather than an app, how easy would it be to set it up so that you can access it with a single button press? I guess you can have favorites, default sites, etc.

Aaron

On both iOS and Android, you can add websites to your home screen like apps. It’s just usually not quite as nice an experience and not as intuitive for people.

But really, before anyone actually goes and invests significant effort into building this, you should coordinate with me + other people in this comment section.

Minimum acceptable features are the bits prior to "That's it really."

kokotajlod

But really, before anyone actually goes and invests significant effort into building this, you should coordinate with me + other people in this comment section.

Oh dang. How about: When you press the button it doesn't donate the money right away, but just adds it to an internal tally, and then once a quarter you get a notification saying 'time to actually process this quarter's donations, press here to submit your face for scanning, sorry bout the inconvenience'

Oh dang. I definitely want it to be the former, not the latter. Maybe we can get around the iOS platform constraints somehow, e.g. when you press the button it doesn't donate the money right away, but just adds it to an internal tally, and then once a quarter you get a notification saying 'time to actually process this quarter's donations, press here to submit your face for scanning, sorry bout the inconvenience'

kokotajlod3y-4

Everyone please downvote this comment of mine if they want to support the app idea but don't want to give me karma as a byproduct of my polling strategy; this cancels out the karma I get from the OP.

kokotajlod3y-11

Hmmm. I really don't want the karma, I was using it as a signal of how good the idea is. Like, creating this app is only worth someone's time and money if it becomes a popular app that lots of people use. So if it only gets like 20 karma then it isn't worth it, and arguably even if it gets 50 karma it isn't worth it. But if it blows up and hundreds of people like it, that's a signal that it's going to be used by lots of people.

Maybe I should have just asked "Comment in the comments if you'd use this app; if at least 30 people do so then I'll fund this app." Idk. If y'all think I should do something like that instead I'm happy to do so.

ETA: Edited the OP to remove the vote-brigady aspect.

-4

kokotajlod

Everyone please downvote this comment of mine if they want to support the app idea but don't want to give me karma as a byproduct of my polling strategy; this cancels out the karma I get from the OP.

The Grabby Values Selection Thesis: What values do space-faring civilizations plausibly have?

Yep! I love when old threads get resurrected.

The Grabby Values Selection Thesis: What values do space-faring civilizations plausibly have?

Not sure I'm assuming that. Maybe. The way I'd put it is, selection pressure towards grabby values seems to require lots of diverse agents competing over a lengthy period, with the more successful ones reproducing more / acquiring more influence / etc. Currently we have this with humans competing for influence over AGI development, but it's overall fairly weak pressure. What sorts of things are you imagining happening that would strengthen the pressure? Can you elaborate on the sort of scenario you have in mind?

Jim Buhler

Right so assuming no early value lock-in and the values of the AGI being (at least somewhat) controlled/influenced by its creators, I imagine these creators to have values that are grabby to varying extents, and these values are competing against one another in the big tournament that is cultural evolution. For simplicity, say there are only two types of creators: the pure grabbers (who value grabbing (quasi-)intrinsically) and the safe grabbers (who are in favor of grabbing only if it is done in a "safe" way, whatever that means). Since we're assuming there hasn't been any early value lock-in, the AGI isn't committed to some form of compromise between the values of the pure and safe grabbers. Therefore, you can imagine that the AGI allows for competition and helps both groups accomplish what they want proportionally to their size, or something like that. From there, I see two plausible scenarios: A) The pure and safe grabbers are two cleanly separated groups running a space expansion race against one another, and we should -- all else equal -- expect the pure grabbers to win, for the same reasons why we should -- all else equal -- expect the AGI race to be won by the labs optimizing for AI capabilities rather than for AI safety. B) The safe grabbers "infiltrate" the pure grabbers in an attempt to make their space-expansion efforts "safer", but are progressively selected against since they drag the pure-grabby project down. The few safe grabbers who might manage not to value drift and not to get kicked out of the pure grabbers are those who are complacent and not pushing really hard for more safety. The reason why the intra-civ grabby values selection is currently fairly weak on Earth, as you point out, is that humans didn't even start colonizing space, which makes something like A or B very unlikely to have happened yet. Arguably, the process that may eventually lead to something like A or B hasn't even begun for real. We're unlikely to notice a selection for gr

Violet Hour's Quick takes

NIce post! My current guess is that the inter-civ selection effect is extremely weak and that the intra-civ selection effect is fairly weak. N=1, but in our civilization the people gunning for control of AGI seem more grabby than average but not drastically so, and it seems possible for this trend to reverse e.g. if the US government nationalizes all the AGI projects.

Jim Buhler

Thanks for the comment! :) You're assuming that the AGI's values will be pretty much locked-in forever once it is deployed such that the evolution of values will stop, right? Assuming this, I agree. But I can also imagine worlds where the AGI is made very corrigible (such that the overseers stay in control of the AGI's values) and where intra-civ value evolution continues/accelerates. I'd be curious if you see reasons to think these worlds are unlikely.

Violet Hour's Quick takes

Nice. Well, I guess we just have different intuitions then -- for me, the chance of extinction or worse in the Octopcracy case seems a lot bigger than "small but non-negligible" (though I also wouldn't put it as high as 99%).

Human groups struggle against each other for influence/power/control constantly; why wouldn't these octopi (or AIs) also seek influence? You don't need to be an expected utility maximizer to instrumentally converge; humans instrumentally converge all the time.

Violet Hour's Quick takes

Oh also you might be interested in Joe Carlsmith's report on power-seeking AI, it has a relatively thorough discussion of the overall argument for risk.

All AGI Safety questions welcome (especially basic ones) [April 2023]

Nice analysis!

I think a main point of disagreement is that I don't think systems need to be "dangerous maximizers" in the sense you described in order to predictably disempower humanity and then kill everyone. Humans aren't dangerous maximizers yet we've killed many species of animals, neanderthals, and various other human groups (genocide, wars, oppression of populations by governments, etc.) Katja's scenario sounds plausible for me except for the part where somehow it all turns out OK in the end for humans. :)

Another, related point of disagreement:... (read more)

Violet Hour

thnx! : ) Your analogy successfully motivates the “man, I’d really like more people to be thinking about the potentially looming Octopcracy” sentiment, and my intuitions here feel pretty similar to the AI case. I would expect the relevant systems (AIs, von-Neumann-Squidwards, etc) to inherit human-like properties wrt human cognition (including normative cognition, like plan search), and a small-but-non-negligible chance that we end up with extinction (or worse). On maximizers: to me, the most plausible reason for believing that continued human survival would be unstable in Grace’s story either consists in the emergence of dangerous maximizers, or the emergence of related behaviors like rapacious influence-seeking (e.g., Part II of What Failure Looks Like). I agree that maximizers aren't necessary for human extinction, but it does seem like the most plausible route to ‘human extinction’ rather than ‘something else weird and potentially not great’.

kokotajlod

Oh also you might be interested in Joe Carlsmith's report on power-seeking AI, it has a relatively thorough discussion of the overall argument for risk.

All AGI Safety questions welcome (especially basic ones) [April 2023]

Alright, let's make it happen! I'll DM you + Timothy + anyone else who replies to this comment in the next few days, and we can arrange something.

demirev

did you end up doing this? If it's still upcoming, I'd also be interested

Dan Valentine

Also interested!

Isobel P

+1 I'm interested :)

Quadratic Reciprocity

+1, also interested

Angelina Li

+1, I'd be interested in this if it happens :)

RachelM

I'd join, time zones permitting.