OCB

Owen Cotton-Barratt

9142 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
814

Topic contributions
3

Shared you on the draft. Yes, it's related to those old blog posts but extending the analysis by making more assumptions about "default" growth functions for various key variables, and so getting to say something a bit more quantitative about the value of the speed-up.

Nice! Two comments:

  1. I agree with "beware point estimates", but it's not always right that point estimates cause people to overstate the value of working on things. They can also understate it -- if your point estimate is that things will fail even with your extra effort, then things look less bleak if you account for the possible worlds where things are easier than you think and you could actually make the difference.
  2. It's often the case with all-or-nothing problems that the relevant counterfactuals are not shifting it from "will never be solved" to "will be solved", but shifting forward in time the moment of the solution.
    • If anyone's interested in this, I have a largely-complete old paper draft ("on the marginal benefits of research") that I'm embarrassed to realise we never actually published, which has some simple first-pass economic modelling of the counterfactual value in this case. I could share it privately or check in with my coauthors about the possibility of posting a public link.
    • I guess the main cases where it's incorrect to model it as a speedup are where the solution is needed by a particular time.
      • Although perhaps the obvious such case is differential technological development, but now uncertainty about the timelines to other technologies might smooth things out again, so that in expectation you get a stream of benefits from solving earlier.

I think of the singularity hypothesis as being along the lines of "growth will accelerate a lot". I might operationalize this as predicting that the economy will go up by more than a factor of 10 in the period of a decade. (This threshold deliberately chosen to be pretty tame by singularity predictions, but pretty wild by regular predictions.)

I think this is pretty clearly stronger than your 1 but weaker than your 2. (It might be close to predicting that AI systems become much smarter than humans without access to computers or AI tools, but this is compatible with humans remaining easily and robustly in control.)

I think this growth-centred hypothesis is important and deserves a name, and "singularity" is a particularly good name for it. Your 1 and 2 also seem like they could use names, but I think they're easier to describe with alternate names, like "mass automation of labour" or "existential risk from misaligned AI".

FYI for interested readers: a different summary of this paper was previously posted on the forum by Nicholas Kruus. There is a bit of discussion of the paper in the comments there.

Ok, I guess around 1%? But this is partially driven by model uncertainty; I don't actually feel confident your number is too small.

I'm much higher (tens of percentage points) on "chance nature survives conditional on most humans being wiped out"; it's just that most of these scenarios involve some small number of humans being kept around so it's not literal extinction. (And I think these scenarios are a good part of things people intuitively imagine and worry about when you talk about human extinction from AI, even though the label isn't literally applicable.)

Thanks for asking explicitly about the odds, I might not have noticed this distinction otherwise.

Thanks for sharing your impressions. But even if many observers have this impression, it still seems like it could be quite valuable to track down exactly what was said, because there's some gap between:

(a) has nuanced models of the world and will strategically select different facets of those to share on different occasions; and

(b) will strategically select what to say on different occasions without internal validity or consistency.

... but either of these could potentially create the impressions in observers of inconsistency. (Not to say that (a) is ideal, but I think that (b) is clearly more egregious.)

Yeah, I understood this. This is why I've focused on a particular case for it valuing nature which I think could be compatible with wiping out humans (not going into the other cases that Ryan discusses, which I think would be more likely to involve keeping humans around). I needed to bring in the point about humans surviving to address the counterargument "oh but in that case probably humans would survive too" (which I think is probable but not certain). Anyway maybe I was slightly overstating the point? Like I agree that in this scenario the most likely outcome is that nature doesn't meaningfully survive. But it sounded like you were arguing that it was obvious that nature wouldn't survive, which doesn't sound right to me.

Note that writing on this topic up to July 14th could be eligible for the essay prize on the automation of wisdom and philosophy.

I thought about where the logic in the post seemed to be going wrong, and it led me to write this quick take on why most possible goals of AI systems are partially concerned with process and not just outcomes.

Most possible goals for AI systems are concerned with process as well as outcomes.

People talking about possible AI goals sometimes seem to assume something like "most goals are basically about outcomes, not how you get there". I'm not entirely sure where this idea comes from, and I think it's wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I'd expect that on most reasonable sense of "most" process can have a look-in.

What's the interaction with instrumental convergence? (I'm asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won't be concerned with process.)

  • Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won't
  • Since instrumental convergence is basically about power-seeking, there's an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
    • I actually think there are a couple of ways for this argument to fail:
      1. If at some point you get a singleton, there's now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
      2. A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
        • (There are some complications to this I won't get into here)
    • But even if it doesn't fail, it pushes towards things which have Omuhundro's basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn't push all the way to purely outcome-concerned goals

In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn't care about process.

How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don't think they should be treated as a default.

Load more