OCB

Owen Cotton-Barratt

10546 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
961

Topic contributions
3

I think there's something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:

a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own "option value maximization" rather than blindly serve human interests/values/intent

It's not clear to me why the key functions couldn't be more separated, or whether the conflict you're pointing persists across such separation. For instance, we might have a mix of:

  • Systems which competently pursue philosophy research (but do not have a sense of self that they are acting with regard to)
  • Systems which are strategic (including drawing on the fruits of the philosophy research), on behalf of human institutions or individuals
  • Systems which are instruction-following tools (which don't aspire to philosophical competence), rather than independent agents

I mean "not clear to me" very literally here -- I think that perhaps some version of your conflict will pose a challenge to such setups. But I'm responding with this alternate frame in the hope that it will be useful in advancing the conversation.

My belief is that the Open Philanthropy Project, EA generally, and Oxford EA particularly, had bad AI timelines and bad ASI ruin conditional probabilities; and that these invalidly arrived-at beliefs were in control of funding, and were explicitly publicly promoted at the expense of saner beliefs.

There is a surprising amount of normative judgment in here for a fact check. Are you looking just for disagreements that people held roughly the beliefs you later outline (I think you overstate things but are directionally correct in describing how beliefs differed from yours), or also disagreements about whether they were bad beliefs?

For flavour: as I ask that question, I'm particularly (but not only) thinking of the reports you cite, where you seem to be casting them as "OP really throwing its weight behind these beliefs", and I perceived them more as "earnest attempts by people at OP to figure out what was legit, and put their reasoning in public to let others engage". I certainly didn't just agree with them at the time, but I thought it was a good step forwards for collective epistemics to be able to have conversations at that level of granularity. Was it confounding that they were working at a big funder? Yeah, kinda -- but that seemed second order compared to it just being great that anyone at all was pushing the conversation forwards in this way, even if there were a bunch of aspects of them I wasn't on board with. I'm not sure if this is the kind of disagreement you're looking for. (Maybe it's just that I was on board with more of them than you were, and so I saw them as flawed-but-helpful rather than unhelpful? Then we get to the general question of what standards bad should be judged by given our lack of access to ground truth.)

I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I'm not sure if you agree with that.

Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.

At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)

You're discussing catastrophes that are big enough to set the world back by at least 100 years. But I'm wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)

With the smaller setbacks you probably have to get more granular in terms of asking "in precisely which ways is this setting us back?", rather than just analysing it in the abstract. But that can just be faced.

Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)

On section 4, where you ask about retaining alignment knowledge:

  • It feels kind of like you're mislabelling the ends of the spectrum?
  • My guess is that rather than think about "how much alignment knowledge is lost?", you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
  • I'm not sure that's quite right either, but it feels a little bit closer?

For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:

the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)

It seems to me like this is a much higher bar than reaching AGI -- and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?

Yeah roughly the thought is "assuming concentrated power, it matters what the key powerful actors will do" (the liberal democracy comment was an aside saying that I think we should be conditioning on concentrated power).

And then for making educated guesses about what the key powerful actors will do, it seems especially important to me what their attitudes will be at a meta-level: how they prefer to work out what to do, etc. 

I might have thought that some of the most important factors would be things like: 

  • How likely is leadership to pursue intelligence enhancement, given technological opportunity?
  • How likely is leadership to pursue wisdom enhancement, given technological opportunity? 

(Roughly because: either power is broadly distributed, in which case your comments about liberal democracy don't seem to have so much bite; or it's not, in which case it's really the values of leadership that matter.) But I'm not sure you really touch on these. Interested if you have thoughts.

Load more