Owen Cotton-Barratt

9130 karmaJoined


Reflection as a strategic goal
On Wholesomeness
Everyday Longermism


Topic contributions

I think of the singularity hypothesis as being along the lines of "growth will accelerate a lot". I might operationalize this as predicting that the economy will go up by more than a factor of 10 in the period of a decade. (This threshold deliberately chosen to be pretty tame by singularity predictions, but pretty wild by regular predictions.)

I think this is pretty clearly stronger than your 1 but weaker than your 2. (It might be close to predicting that AI systems become much smarter than humans without access to computers or AI tools, but this is compatible with humans remaining easily and robustly in control.)

I think this growth-centred hypothesis is important and deserves a name, and "singularity" is a particularly good name for it. Your 1 and 2 also seem like they could use names, but I think they're easier to describe with alternate names, like "mass automation of labour" or "existential risk from misaligned AI".

FYI for interested readers: a different summary of this paper was previously posted on the forum by Nicholas Kruus. There is a bit of discussion of the paper in the comments there.

Ok, I guess around 1%? But this is partially driven by model uncertainty; I don't actually feel confident your number is too small.

I'm much higher (tens of percentage points) on "chance nature survives conditional on most humans being wiped out"; it's just that most of these scenarios involve some small number of humans being kept around so it's not literal extinction. (And I think these scenarios are a good part of things people intuitively imagine and worry about when you talk about human extinction from AI, even though the label isn't literally applicable.)

Thanks for asking explicitly about the odds, I might not have noticed this distinction otherwise.

Thanks for sharing your impressions. But even if many observers have this impression, it still seems like it could be quite valuable to track down exactly what was said, because there's some gap between:

(a) has nuanced models of the world and will strategically select different facets of those to share on different occasions; and

(b) will strategically select what to say on different occasions without internal validity or consistency.

... but either of these could potentially create the impressions in observers of inconsistency. (Not to say that (a) is ideal, but I think that (b) is clearly more egregious.)

Yeah, I understood this. This is why I've focused on a particular case for it valuing nature which I think could be compatible with wiping out humans (not going into the other cases that Ryan discusses, which I think would be more likely to involve keeping humans around). I needed to bring in the point about humans surviving to address the counterargument "oh but in that case probably humans would survive too" (which I think is probable but not certain). Anyway maybe I was slightly overstating the point? Like I agree that in this scenario the most likely outcome is that nature doesn't meaningfully survive. But it sounded like you were arguing that it was obvious that nature wouldn't survive, which doesn't sound right to me.

Note that writing on this topic up to July 14th could be eligible for the essay prize on the automation of wisdom and philosophy.

I thought about where the logic in the post seemed to be going wrong, and it led me to write this quick take on why most possible goals of AI systems are partially concerned with process and not just outcomes.

Most possible goals for AI systems are concerned with process as well as outcomes.

People talking about possible AI goals sometimes seem to assume something like "most goals are basically about outcomes, not how you get there". I'm not entirely sure where this idea comes from, and I think it's wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I'd expect that on most reasonable sense of "most" process can have a look-in.

What's the interaction with instrumental convergence? (I'm asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won't be concerned with process.)

  • Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won't
  • Since instrumental convergence is basically about power-seeking, there's an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
    • I actually think there are a couple of ways for this argument to fail:
      1. If at some point you get a singleton, there's now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
      2. A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
        • (There are some complications to this I won't get into here)
    • But even if it doesn't fail, it pushes towards things which have Omuhundro's basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn't push all the way to purely outcome-concerned goals

In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn't care about process.

How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don't think they should be treated as a default.

I think this is a plausible consequence, but not a clear one.

Many people put significant value on conservation. It is plausible that some version of this would survive in an AI which was somewhat misaligned (especially since conservation might be a reasonably simple goal to point towards), such that it would spend some fraction of its resources towards preserving nature -- and one planet is a tiny fraction of the resources it could expect to end up with.

The most straightforward argument against this is that such an AI maybe wouldn't wipe out all humans. I tend to agree, and a good amount of my probability mass on "existential catastrophe from misaligned AI" does not involve human extinction. But I think there's some possible middle ground where an AI was not capable of reliably seizing power without driving humans extinct, but was capable if it allowed itself to do so, could wipe them out without eliminating nature (which would presumably pose much less threat to its ascendancy).

Thanks, that helped me sharpen my intuitions about what triggers the "appalled" reaction.

I think I'm still left with: People may very reasonably say that fraud in the service of effective altruism is appalling. Then it's pretty normal and understandable (even if by my lights unreasonable) to label as "appalling" things which you think will predictably lead others to appalling action.

Load more