OCB

Owen Cotton-Barratt

10615 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
967

Topic contributions
3

Yep, I guess I'm into people trying to figure out what they think and which arguments seem convincing, and I think that it's good to highlight sources of perspectives that people might find helpful-according-to-their-own-judgement for that. I do think I have found Drexler's writing on AI singularly helpful on my inside-view judgements.

That said: absolutely seems good for you to offer counterarguments! Not trying to dismiss that (but I did want to explain why the counterargument wasn't landing for me).

On Dichotomy:

  • Because you've picked a particularly strong form of Maxipok to argue against, you're pushed into choosing a particularly strong form of Dichotomy that would be necessary to support it
  • But I think that this strong form of Dichotomy is relatively implausible to start with
    • And I would guess that Bostrom at the time of writing the article would not have supported it; certainly the passages you quote feel to me like they're supporting something weaker
  • Here's a weaker form of Dichotomy that I feel much more intuitive sympathy for:
    • Most things that could be "locked in" such that they have predictable long-term effects on the total value of our future civilization, and move us away from the best outcomes, actually constrain us to worlds which are <10% as good than the worlds without any such lock-in (and would therefore count as existential catastrophes in their own right)
  • The word "most" is doing work there, and I definitely don't think it's absolute (e.g. as you point out, the idea of diving the universe up 50/50 between a civilization that will do good things with it and one that won't); but it could plausibly still be enough to guide a lot of our actions

Looking at the full article:

  • OK I think I much more strongly object to the frame in this forum post than in the research article -- in particular, the research article is clear that it's substituting in a precisification you call Maxipok for the original principle
  • But I'm not sure what to make of this substitution! Even when I would have described myself as generally bought into Maxipok, I'm not sure if I would have been willing to sign up to this "precisification", which it seems to me is much stronger
    • In particular, your version is a claim about the existence of actions which are (close to) the best in various ways; whereas in order to discard Maxipok I would have wanted not just an existence proof, but practical guidelines for finding better things
  • You do provide some suggestions for finding better things (which is great), but you don't directly argue that trying to pursue those would be better in expectation than trying to follow Maxipok (or argue about in which cases it would be better)
    • This makes me feel that there's a bit of a motte-and-bailey: you've set up a particular strong precisification of Maxipok (that it's not clear to me e.g. Bostrom would have believed at the time of writing the paper you are critiquing); then you argue somewhat compellingly against it; then you conclude that it would be better if you people did {a thing you like but haven't really argued for} instead

(Having just read the forum summary so far) I think there's a bunch of good exploration of arguments here, but I'm a bit uncomfortable with the framing. You talk about "if Maxipok is false", but this seems to me like a type error. Maxipok, as I understand it, is a heuristic: it's never going to give the right answer 100% of the time, and the right lens for evaluating it is how often it gives good answers, especially compared to other heuristics the relevant actors might reasonably have adopted.

Quoting from the Bostrom article you link:

At best, maxipok is a rule of thumb or a prima facie suggestion.

It seems to me like when you talk about maxipok being false, you are really positing something like:

Strong maxipok: The domain of applicability of maxipok is broad, so that pretty much all impartial consequentialist actors should adopt it as a guiding principle

Whereas maxipok is a heuristic (which can't have truth values), strong maxipok (as I'm defining it here) is a normative claim, and can have truth values. I take it that this is what you are mostly arguing against -- but I'd be interested in your takes; maybe it's something subtly different.

I do think that this isn't a totally unreasonable move on your part. I think Bostrom writes in some ways in support of strong maxipok, and sometimes others have invoked it as though in the strong form. But I care about our collectively being able to have conversations about the heuristic, which is one that I think may have a good amount of value even if strong maxipok is false, and I worry that in conflating them you make it harder for people to hold or talk about those distinctions.

(FWIW I've also previously argued against strong maxipok, even while roughly accepting Dichotomy, on the basis that other heuristics may be more effective.)

I think Eric has been strong about making reasoned arguments about the shape of possible future technologies, and helping people to look at things for themselves. I wouldn't have thought of him (even before looking at this link[1]) as particularly good on making quantitative estimates about timelines; which in any case is something he doesn't seem to do much of.

Ultimately I am not suggesting that you defer to Drexler. I am suggesting that you may find reading his material as a good time investment for spurring your own thoughts. This is something you can test for yourself (I'm sure that it won't be a good fit for everyone).

  1. ^

    And while I do think it's interesting, I'm wary of drawing too strong conclusions from that for a couple of reasons:

    1. If, say, all this stuff now happened in the next 30 years, so that he was in some sense just off by a factor of two, how would you think his predictions had done? It seems to me this would be mostly a win for him; and I do think that it's quite plausible that it will mostly happen within 30 years (and more likely still within 60).
    2. That was 30 years ago; I'm sure that he is in some ways a different person now.

I'm not sure. I think there are versions of things here which are definitely not convergence (straightforward acausal trade between people who understand their own values is of this type), but I have some feeling like there might be extra reasons for convergence from people observing the host, and having that fact feed into their own reflective process. 

(Indeed, I'm not totally sure there's a clean line between convergence and trade.)

I think there's something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:

a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own "option value maximization" rather than blindly serve human interests/values/intent

It's not clear to me why the key functions couldn't be more separated, or whether the conflict you're pointing persists across such separation. For instance, we might have a mix of:

  • Systems which competently pursue philosophy research (but do not have a sense of self that they are acting with regard to)
  • Systems which are strategic (including drawing on the fruits of the philosophy research), on behalf of human institutions or individuals
  • Systems which are instruction-following tools (which don't aspire to philosophical competence), rather than independent agents

I mean "not clear to me" very literally here -- I think that perhaps some version of your conflict will pose a challenge to such setups. But I'm responding with this alternate frame in the hope that it will be useful in advancing the conversation.

My belief is that the Open Philanthropy Project, EA generally, and Oxford EA particularly, had bad AI timelines and bad ASI ruin conditional probabilities; and that these invalidly arrived-at beliefs were in control of funding, and were explicitly publicly promoted at the expense of saner beliefs.

There is a surprising amount of normative judgment in here for a fact check. Are you looking just for disagreements that people held roughly the beliefs you later outline (I think you overstate things but are directionally correct in describing how beliefs differed from yours), or also disagreements about whether they were bad beliefs?

For flavour: as I ask that question, I'm particularly (but not only) thinking of the reports you cite, where you seem to be casting them as "OP really throwing its weight behind these beliefs", and I perceived them more as "earnest attempts by people at OP to figure out what was legit, and put their reasoning in public to let others engage". I certainly didn't just agree with them at the time, but I thought it was a good step forwards for collective epistemics to be able to have conversations at that level of granularity. Was it confounding that they were working at a big funder? Yeah, kinda -- but that seemed second order compared to it just being great that anyone at all was pushing the conversation forwards in this way, even if there were a bunch of aspects of them I wasn't on board with. I'm not sure if this is the kind of disagreement you're looking for. (Maybe it's just that I was on board with more of them than you were, and so I saw them as flawed-but-helpful rather than unhelpful? Then we get to the general question of what standards bad should be judged by given our lack of access to ground truth.)

I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I'm not sure if you agree with that.

Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.

At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)

Load more