All of Owen Cotton-Barratt's Comments + Replies

Yep, I guess I'm into people trying to figure out what they think and which arguments seem convincing, and I think that it's good to highlight sources of perspectives that people might find helpful-according-to-their-own-judgement for that. I do think I have found Drexler's writing on AI singularly helpful on my inside-view judgements.

That said: absolutely seems good for you to offer counterarguments! Not trying to dismiss that (but I did want to explain why the counterargument wasn't landing for me).

On Dichotomy:

  • Because you've picked a particularly strong form of Maxipok to argue against, you're pushed into choosing a particularly strong form of Dichotomy that would be necessary to support it
  • But I think that this strong form of Dichotomy is relatively implausible to start with
    • And I would guess that Bostrom at the time of writing the article would not have supported it; certainly the passages you quote feel to me like they're supporting something weaker
  • Here's a weaker form of Dichotomy that I feel much more intuitive sympathy for:
    • Most things that coul
... (read more)

Looking at the full article:

  • OK I think I much more strongly object to the frame in this forum post than in the research article -- in particular, the research article is clear that it's substituting in a precisification you call Maxipok for the original principle
  • But I'm not sure what to make of this substitution! Even when I would have described myself as generally bought into Maxipok, I'm not sure if I would have been willing to sign up to this "precisification", which it seems to me is much stronger
    • In particular, your version is a claim about the existen
... (read more)

(Having just read the forum summary so far) I think there's a bunch of good exploration of arguments here, but I'm a bit uncomfortable with the framing. You talk about "if Maxipok is false", but this seems to me like a type error. Maxipok, as I understand it, is a heuristic: it's never going to give the right answer 100% of the time, and the right lens for evaluating it is how often it gives good answers, especially compared to other heuristics the relevant actors might reasonably have adopted.

Quoting from the Bostrom article you link:

At best, maxipok is a

... (read more)
6
Owen Cotton-Barratt
On Dichotomy: * Because you've picked a particularly strong form of Maxipok to argue against, you're pushed into choosing a particularly strong form of Dichotomy that would be necessary to support it * But I think that this strong form of Dichotomy is relatively implausible to start with * And I would guess that Bostrom at the time of writing the article would not have supported it; certainly the passages you quote feel to me like they're supporting something weaker * Here's a weaker form of Dichotomy that I feel much more intuitive sympathy for: * Most things that could be "locked in" such that they have predictable long-term effects on the total value of our future civilization, and move us away from the best outcomes, actually constrain us to worlds which are <10% as good than the worlds without any such lock-in (and would therefore count as existential catastrophes in their own right) * The word "most" is doing work there, and I definitely don't think it's absolute (e.g. as you point out, the idea of diving the universe up 50/50 between a civilization that will do good things with it and one that won't); but it could plausibly still be enough to guide a lot of our actions
6
Owen Cotton-Barratt
Looking at the full article: * OK I think I much more strongly object to the frame in this forum post than in the research article -- in particular, the research article is clear that it's substituting in a precisification you call Maxipok for the original principle * But I'm not sure what to make of this substitution! Even when I would have described myself as generally bought into Maxipok, I'm not sure if I would have been willing to sign up to this "precisification", which it seems to me is much stronger * In particular, your version is a claim about the existence of actions which are (close to) the best in various ways; whereas in order to discard Maxipok I would have wanted not just an existence proof, but practical guidelines for finding better things * You do provide some suggestions for finding better things (which is great), but you don't directly argue that trying to pursue those would be better in expectation than trying to follow Maxipok (or argue about in which cases it would be better) * This makes me feel that there's a bit of a motte-and-bailey: you've set up a particular strong precisification of Maxipok (that it's not clear to me e.g. Bostrom would have believed at the time of writing the paper you are critiquing); then you argue somewhat compellingly against it; then you conclude that it would be better if you people did {a thing you like but haven't really argued for} instead

I think Eric has been strong about making reasoned arguments about the shape of possible future technologies, and helping people to look at things for themselves. I wouldn't have thought of him (even before looking at this link[1]) as particularly good on making quantitative estimates about timelines; which in any case is something he doesn't seem to do much of.

Ultimately I am not suggesting that you defer to Drexler. I am suggesting that you may find reading his material as a good time investment for spurring your own thoughts. This is something you can t... (read more)

5
titotal
I guess this is kind of my issue, right? He's been quite strong at putting forth arguments about the shape of the future that were highly persuasive and yet turned out to be badly wrong.[1] I'm concerned that this does not seem to have his affected his epistemic authority in these sort of circles.  You may not be "defering" to drexler, but you are singling out his views as singularly important (you have not made similar posts about anybody else[2]). There are hundreds of people discussing AI at the moment, a lot of them with a lot more expertise, and a lot of whom have not been badly wrong about the shape of the future.  Anyway, I'm not trying to discount your arguments either, I'm sure you have found stuff in valuable. But if this post is making a case for reading Drexler despite him being difficult, I'm allowed to make the counterargument.  1. ^ In answer to your footnote: If more than one of those things occurs in the next thirty years, I will eat a hat.  2. ^ If this is the first in a series, feel free to discount this.

I'm not sure. I think there are versions of things here which are definitely not convergence (straightforward acausal trade between people who understand their own values is of this type), but I have some feeling like there might be extra reasons for convergence from people observing the host, and having that fact feed into their own reflective process. 

(Indeed, I'm not totally sure there's a clean line between convergence and trade.)

I think there's something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:

a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own "option value maximization" rather than blindly serve human interests/values/intent

It's not clear to me why the key functions couldn't be more separated, or whether the conflict you're pointing persists across such separation. For instance, we might have a mix of:

  • Systems which co
... (read more)

My belief is that the Open Philanthropy Project, EA generally, and Oxford EA particularly, had bad AI timelines and bad ASI ruin conditional probabilities; and that these invalidly arrived-at beliefs were in control of funding, and were explicitly publicly promoted at the expense of saner beliefs.

There is a surprising amount of normative judgment in here for a fact check. Are you looking just for disagreements that people held roughly the beliefs you later outline (I think you overstate things but are directionally correct in describing how beliefs differe... (read more)

My view of the tragedy of OpenPhil is indeed that they were very earnest people trying to figure out what was legit, but ended up believing stuff like "biologically anchored estimates of AI timelines" that were facially absurd and wrong and ultimately self-serving, because the problem "end up with beliefs about AI timelines that aren't influenced by what plays well with our funders and friends" was hard and frankly out of their league and OpenPhil did not know that it was a hard problem or treat it with what I would consider seriousness.

If you'd like to vi... (read more)

I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I'm not sure if you agree with that.

Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.

At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)

You're discussing catastrophes that are big enough to set the world back by at least 100 years. But I'm wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)

With the smaller setbacks you probably have to get more granular in terms of asking "in precisely which ways is this setting us back?", rather than just analysing it in the abstract. But that can just be faced.

2
OscarD🔸
Yes, I think the '100 years' criterion isn't quite what we want. E.g. if there is a catastrophic setback more than 100 years after we build an aligned ASI, thenw e don't need to rerun the alignment problem. (In practice, perhaps 100 years should be ample time to build good global governance and reduce catastrophic setback risk to near 0, but conceptually we want to clarify this.) And I agree with Owen that shorter setbacks also seem important. In fact, in a simple binary model we could just define a catastrophic setback to be one that takes you from a society that has built aligned ASI to one where all aligned ASIs are destroyed. ie the key thing is not how many years back you go, but whether you regres back beneath the critical 'crunch time' period.

Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)

On section 4, where you ask about retaining alignment knowledge:

  • It feels kind of like you're mislabelling the ends of the spectrum?
  • My guess is that rather than think about "how much alignment knowledge is lost?", you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
  • I'm not sure that's quite right either, but it feels a little bit closer?
2
William_MacAskill
Okay, looking at the spectrum again, it still seems to me like I've labelled them correctly? Maybe I'm missing something. It's optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don't face the same magnitude of risk of the misaligned AI. 

For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:

the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)

It seems to me like this is a much higher bar than reaching AGI -- and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?

3
William_MacAskill
Thanks, that's a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that's before 'reasonably good global governance". Of course, there's an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper. One additional comment is that this sort of "concepts moving around issue" is  one of the things that I've found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it. 

Yeah roughly the thought is "assuming concentrated power, it matters what the key powerful actors will do" (the liberal democracy comment was an aside saying that I think we should be conditioning on concentrated power).

And then for making educated guesses about what the key powerful actors will do, it seems especially important to me what their attitudes will be at a meta-level: how they prefer to work out what to do, etc. 

I might have thought that some of the most important factors would be things like: 

  • How likely is leadership to pursue intelligence enhancement, given technological opportunity?
  • How likely is leadership to pursue wisdom enhancement, given technological opportunity? 

(Roughly because: either power is broadly distributed, in which case your comments about liberal democracy don't seem to have so much bite; or it's not, in which case it's really the values of leadership that matter.) But I'm not sure you really touch on these. Interested if you have thoughts.

2
OscarD🔸
Not sure I follow properly - why would liberal democracy not matter? I think whether biological humans are themselves enhanced in various ways matters less than whether they are getting superhuman (and perhaps super-wise advice). Though possibly wisdom is different and you need the principal to themselves be wise, rather than just getting wise advice.

Thanks AJ!

My impression is that although your essay frames this as a deep disagreement, in fact you're reacting to something that we're not saying. I basically agree with the heart of the content here -- that there are serious failure modes to be scared of if attempting to orient to the long term, and that something like loop-preservation is (along with the various more prosaic welfare goods we discussed) essential for the health of even a strict longtermist society.

However, I think that what we wrote may have been compatible with the view that you have such a negative reaction to, and at minimum I wish that we'd spent some more words exploring this kind of dynamic. So I appreciate your response.

-2
AJ van Hoek
Thanks for the generous response. You write that we "may have been compatible" and I'm "reacting to something you're not saying." Here's my concern: I've come to recognize that reality operates as a dynamic network—nodes (people, institutions) whose capacity is constituted by the relationships among them. This isn't just a modeling choice; it's how cities function, how pandemics spread, how states maintain capacity. You don't work from this explicit recognition. This creates an asymmetry. Once you see reality as a network, your Section 5 framework becomes incompatible with mine—not just incomplete, but incoherent. You explicitly frame the state as separate from people, optimizing for longtermist goals while managing preferences as constraints. But from the network perspective, this separation doesn't exist—the state's capacity just IS those relationships. You can't optimize one while managing the other. Let me try to say this more directly: I've come to understand my own intelligence as existing not AT my neurons, but BETWEEN them—as a pattern of activation across connections. I am the edge, not the node. And I see society the same way: capacity isn't located IN institutions, it emerges FROM relationships. From this perspective, your Section 5 (state separate from people) isn't a simplification—it's treating edges as if they were nodes, which fundamentally misunderstands what state capacity is. That's the asymmetry: your explicit framing (state separate from people) is incompatible with how I now understand reality. But if you haven't recognized the network structure, you'd just see my essay as "adding important considerations" rather than revealing a foundational incompatibility. Does this help clarify where I'm coming from?

That makes sense! 

(I'm curious how much you've invested in giving them detailed prompts about what information to assess in applying particular tags, or even more structured workflows, vs just taking smart models and seeing if they can one-shot it; but I don't really need to know any of this.)

If you want independent criteria-based judgements, it might realistically be a good option to have the judgements made by an LLM -- with the benefit of having the classification instantly (as a bonus you could publish the prompt used, so the judgements would be easier for people to audit).

4
Will Aldred
Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.

Ok thanks I think it's fair to call me on this (I realise the question of what Thiel actually thinks is not super interesting to me, compared to "does this critique contain inspiration for things to be aware of that I wasn't previously really tracking"; but get that most people probably aren't orienting similarly, and I was kind of assuming that they were when I suggested this was why it was getting sympathy).

I do think though that there's a more nuanced point here than "trying too hard to do good can result in harm". It's more like "over-claiming about ho... (read more)

I think that the theology is largely a distraction from the reason this is attracting sympathy, which I'd guess to be more like: 

  • If you have some ideas which are pretty good, or even very good, but they present as though they're the answer needed for everything, and they're not, that could be quite destructive (and potentially very-net-bad, even while the ideas were originally obviously-good)
  • This is at least a plausible failure mode for EA, and correspondingly worth some attention/wariness
  • This kind of concern hasn't gotten much airtime before (and is perhaps easier to express and understand as a serious possibility with some of the language-that-I-interpret-metaphorically); 
7
David T
Feels like the argument you've constructed is a better one than the one Thiel is actually making, which seems to be a very standard "evil actors often claim to be working for the greater good" argument with a libertarian gloss. Thiel doesn't think redistribution is an obviously good idea that might backfire if it's treated as too important, he actively loathes it.  I think the idea that trying too hard to do good things and ending up doing harm is absolutely a failure mode worth considering, but has far more value in the context of specific examples. It seems like quite a common theme in AGI discourse (follows from standard assumptions like AGI being near and potentially either incredibly beneficial or destructive, research or public awareness either potentially solving the problem or starting a race etc) and the optimiser's curse is a huge concern for EA cause prioritization overindexing on particular data points. Maybe that deserves (even) more discussion.  But I don't think an guy that doubts we're on the verge of an AI singularity and couldn't care less whether EAs encourage people to make the wrong tradeoffs between malaria nets, education and shrimp welfare adds much to that debate, particularly not with a throwaway reference to EA in a list of philosophies popular with the other side of the political spectrum he things are basically the sort of thing the Antichrist would say. I mean, he is also committed to the somewhat less insane-sounding  "growth is good even if it comes with risks" argument, but you can probably find more sympathetic and coherent and less interest-conflicted proponents of that view.

is that you feel that moral statements are not as evidently subjective as say, 'Vanilla ice-cream is the best flavor' but not as objective as, say 'An electron has a negative charge', as living in some space of in-betweeness with respect to those two extremes

I think that's roughly right. I think that they are unlikely to be more objective than "blue is a more natural concept than grue", but that there's a good chance that they're about the same as that (and my gut take is that that's pretty far towards the electron end of the spectrum; but perhaps I'm conf... (read more)

See my response to Manuel -- I don't think this is "proving moral realism", but I do think it would be pointing at something deeper and closer-to-objective than "happen to have the same opinions".

I'm not sure what exactly "true" means here.

Here are some senses in which it would make morality feel "more objective" rather than "more subjective":

  • I can have the experience of having a view, and then hearing an argument, and updating. My stance towards my previous view then feels more like "oh, I was mistaken" (like if I'd made a mathematical error) rather than "oh, my view changed" (like getting myself to like the taste of avocado when I didn't used to).
  • There can exist "moral experts", whom we would want to consult on matters of morality. Broadly, we sh
... (read more)
1
Manuel Del Río Rodríguez 🔹
Terminology can be a bugger in these discussions. I think we are accepting, as per BB's own definition at the start of the thread, that Moral Realism would basically reduce to accepting a stance-independent view that moral truths exist. As for truth, I would mean it in the way it gets used when studying other, stance-independent objects, i.e., electrons exist and their existence is independent of human minds and-or of humans having ever existed, and saying 'electrons exist' is true because of their correspondence to objects of an external, human-independent reality. What I take from your examples (correct me if I am wrong or if I misrepresent you) is that you feel that moral statements are not as evidently subjective as say, 'Vanilla ice-cream is the best flavor' but not as objective as, say 'An electron has a negative charge', as living in some space of in-betweeness with respect to those two extremes. I'd still call this anti-realism, as you're just switching from a maximally subjective stance (an individual's particular culinary tastes) to a more general, but still stance-dependent one ( what a group of experts and-or human and some alien minds might possibly agree upon). I'd say again, an electron doesn't care for what a human or any other creature thinks about its electric charge. As for each of the bullet points, what I'd say is: 1. I can see why you'd feel the change from a previous view can be seen as a mistake rather than a preference change -when I first started thinking about morality I felt very strongly inclined to the strongest moral realism, and I know feel that pov was wrong- but this doesn't imply moral realism as much as that if feels as if moral principles and beliefs have objective truth status, even if they were actually a reorganization of stance-dependent beliefs. 2. I, on the contrary, don't feel like there could be 'moral experts' - at most, people who seem to live up to their moral beliefs, whatever the knowledge and reasons for having

Locally, I think that often there will be some cluster of less controversial common values like "caring about the flourishing of society" which can be used to derive something like locally-objective conclusions about moral questions (like whether X is wrong).

Globally, an operationalization of morality being objective might be something like "among civilizations of evolved beings in the multiverse, there's a decently big attractor state of moral norms that a lot of the civilizations eventually converge on".

6
Neel Nanda
Less controversial is a very long way from objective - why do you think that "caring about the flourishing of society" is objectively ethical? Re the idea of an attractor, idk, history has sure had lot of popular beliefs I find abhorrent. How do we know there even is convergence at all rather than cycles? And why does being convergent imply objective? If you told me that the supermajority of civilization concluded that torturing criminals was morally good, that would not make me think it was ethical. My overall take is that objective is just an incredibly strong word for which you need incredibly strong justifications, and your justifications don't seem close, they seem more about "this is a Schelling point" or "this is a reasonable default that we can build a coalition around"
3
Robi Rahman🔸
No, that wouldn't prove moral realism at all. That would merely show that you and a bunch of aliens happen to have the same opinions.
1
Manuel Del Río Rodríguez 🔹
I don't think I have much to object to that, but I do think that doesn't look at all like 'stance independent' if we're using that as the criterion for ethical realism. What you're saying seems to boil down, if I understand it correctly is 'given a bunch of intelligent creatures with some shared psychological perceptions of the world and some tendency towards collaboration, it is pretty likely they'll end up arriving at a certain set of shared norms that optimize towards their well-being as a group -and in most cases, as individuals-. That makes the 'state of moral norms that a lot of the civilizations eventually converge on' something useful for ends x, y, z, but not 'true' and 'independent of human or alien minds'.

Ok but jtbc that characterization of "affronted" is not the hypothesis I was offering (I don't want to say it wasn't a part of the downvoting, but I'd guess a minority).

I would personally kind of like it if people actively explored angles on things more. But man, there are so many things to read on AI these days that I do kind of understand when people haven't spent time considering things I regard as critical path (maybe I should complain more!), and I honestly find it's hard to too much fault people for using "did it seem wrong near the start in a way that makes it harder to think" as a heuristic for how deeply to engage with material. 

I'm curious whether you're closer to angry that someone might read your opening paragraph as saying "you should discard the concept of warning shots" or angry that they might disagree-vote if they read it that way (or something else).

9
Holly Elmore ⏸️ 🔸
No I'm angry that people feel affronted by me pointing out that normal warning shot discourse entailed hoping for a disaster without feeling much need make sure that would be helpful. They should be glad that they have a chance to catch themselves, but instead they silently downvote.  Just feels like so much of the vibe of this forum is people expecting to be catered to, like their support is some prize, rather than people wanting to find out for themselves how to help the world. A lot of EAs have felt comfortable dismissing PauseAI bc it's not their vibe or they didn't feel like the case was made in the right way or they think their friends won't support it, and it drives me crazy bc aren't they curious??? Don't they want to think about how to address AI danger from every angle?

Not sure quite what to say here. I think your post was valuable and that's why I upvoted it. You were expressing confusion about why anyone would disagree, and I was venturing a guess.

I don't think gentleness = ego service (it's an absence of violence, not a positive thing). But also I don't think you owe people gentleness. However, I do think that when you're not gentle (especially ontologically gentle) you make it harder for people to hear you. Not because of emotional responses getting in the way (though I'm sure that happens sometimes), but literally b... (read more)

2
Holly Elmore ⏸️ 🔸
I was curious about guesses as to why this happens to me lately (a lot of upfront disagree votes and karma hovering around zero until the views are high enough) but getting that answer is still pretty hard for me to hear without being angry.

By "ontologically ungentle" I mean (roughly) it feels like you're trying to reach into my mind and tell me that my words/concepts are wrong. As opposed to writing which just tells me that my beliefs are wrong (which might still be epistemically ungentle), or language which just provides evidence without making claims that could be controversial (gentle in this sense, kind of NVC-style).

I do feel a bit of this ungentleness in that opening paragraph towards my own ontology, and I think it put me more on edge reading the rest of the post. But as I said, I didn't disagree-vote; I was just trying to guess why others might have.

-5
Holly Elmore ⏸️ 🔸

Right ... so actually I think you're just doing pretty well at this in the latter part of the article.

But at the start you say things like:

There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest.

What this paragraph seems to do is to push the error-in-beliefs that you're complaining about down into the ver... (read more)

3
Holly Elmore ⏸️ 🔸
Did you feel treated ungently for your warning shots take? Or is this just on the behalf of people who might?  Also can you tell me what you mean by "ontologically ungentle"? It sounds worryingly close to a demand that the writer think all the readers are good. I do want to confront people with the fact they've been lazily hoping for violence if that's in fact what they've been doing. 

Honestly, maybe you should try telling me? Like, just write a paragraph or two on what you think is valuable about the concept / where you would think it's appropriate to be applying it?

(Not trying to be clever! I started trying to think about what I would write here and mostly ended up thinking "hmm I bet this is stuff Holly would think is obvious", and to the extent that I may believe you're missing something, it might be easiest to triangulate by hearing your summary of what the key points in favour are.)

2
Holly Elmore ⏸️ 🔸
I thought I was giving the strong version. I have never heard an account of a warning shot theory of change that wasn’t “AI will cause a small-scale disaster and then the political will to do something will materialize”. I think the strong version would be my version, educating people first so they can understand small-scale disasters that may occur for what they are. I have never seen or heard this advocated in AI Safety circles before. And I described how impactful chatGPT was on me, which imo was a warning shot gone right in my case.

I upvoted and didn't disagree vote the original post (and generally agree with you on a bunch of the object level here!); however, I do feel some urge-towards-expressing-disagreement, which is something like: 

  • Less disagreeing with claims; more disagreeing with frames?
  • Like: I feel the discomfort/disagreement less when you're talking about what will happen, and more when you're talking about how people think about warning shots
  • Your post feels something like ... intellectually ungenerous? It's not trying to look for the strongest version of the warning s
... (read more)
2
Holly Elmore ⏸️ 🔸
What is the “strong” version of warning shots thinking?

OK I see the model there.

I guess it's not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?

However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don't feel like I have a clear picture of the dynamics here, though.)

hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?

(Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can't do this with human-only research input, which was maybe your main point.)

6
Tom_Davidson
Agree that i wouldn't particularly expect the efficiency curves to be the same.  But if the phi>0 for both types of efficiency, then I think this argument will still go through. To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then: dA_exp = A_exp^phi_exp F(A_exp K_res, A_cog K_inf) dA_cog = A_cog^phi_cog F(A_exp K_res, A_cog K_inf)   And then I think you'll find that, even with sigma < 1, it explodes when phi_exp>0 and phi_cog>0.
4
Thomas Kwa🔹
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they're constant, and might hold in practice especially when you can target small-scale algorithmic improvements.

Edit: I think that Neel's comment is basically just a better version of the stuff I was trying to say. (On the object level I'm a little more sympathetic than him to ways in which Mechanize might be good, although I don't really buy the story to that end that I've seen you present.)

Wanting to note that on my impressions, and setting aside who is correct on the object-level question of whether Mechanize's work is good for the world:

  • My best read of the situation is that Matthew has acted very reasonably (according to his beliefs), and that Holly has let hers
... (read more)

I feel like voicing that I centrally expect AI to continue to have bigger real-world impacts, but not get very weird until the 2030s. I think worlds where things go faster than that are a serious enough possibility to take seriously, but I think that the apparent zeitgeist suggests timelines which are a bit more aggressive than I think is justified.

The reason I want to say something is that I sort of suspect there are a bunch of people in a similar epistemic position -- where it doesn't seem like a priority to properly explore what % to put on craziness this decade; nor to get into big arguments about whether the zeitgeist is slightly off -- but for whom your comment might feel like something of a trap.

I guess I'm fairly sympathetic to this. It makes me think that voluntary safety policies should ideally include some meta-commentary about how companies view the purpose and value-add of the safety policy, and meta-policies about how updates to the safety policy will be made -- in particular, that it might be good to specify a period for public comments before a change is implemented. (Even a short period could be some value add.)

I appreciate the investigation here.

I'm not sure whether I agree that "quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work". (Not sure I disagree; but going to try to articulate a case against).

I think in a world where you understand the risks well ahead of time of course this isn't how safety policies should work. In a world where you don't understand the risks well ahead of time, you can get more clarity as the key moments approach, and this could lead you to rationally judge that a lowe... (read more)

9
Ryan Greenblatt
From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late. So, I think it's good to create an expectation/norm that you shouldn't substantially weaken a policy right as it is being applied. That's not to say that a reasonable company shouldn't do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn't a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.

I appreciated you expressing this.

Riffing out loud ... I feel that there are different dynamics going on here (not necessarily in your case; more in general):

  1. The tensions where people don't act with as much integrity as is signalled
    • This is not a new issue for EA (it arises structurally despite a lot of good intentions, because of the encouragement to be strategic), and I think it just needs active cultural resistance
      • In terms of writing, I like Holden's and Toby's pushes on this; my own attempts here and here
      • But for this to go well, I think it's not enough
... (read more)

I wanted to share some insights from my reflection on my mistakes around attraction/power dynamics — especially something about the shape of the blindspots I had. My hope is that this might help to avert cases of other people causing harm in similar ways.

I don’t know for sure how helpful this will be; and I’m not making a bid for people to read it (I understand if people prefer not to hear more from me on this); but for those who want to look, I’ve put a couple of pages of material here.

These are in the same category because:

  • I'm talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
  • These are the capacities that we need to help everyone to recognize the situation we're in and come together to do something about it (and they are partial substitutes: the better everyone's epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently) 

I'm not actually making a claim about alignment difficulty -- beyond that... (read more)

I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.

I think that the degree of consensus you'd need for the position that you're outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:

  1. Scientific ~consensus -- people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than ju
... (read more)

Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.

Why are these the same category and why are you writing coordination off as impossible? It's not. We have literally done global nonproliferation treaties before.

This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It's actually the opposite-- alignment is not tractable and coordination is.

I agree that there could be an effect that keeps people from speaking out about AI danger. But:

  • I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
  • This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
  • Working at a frontier AI company (or having worked at o
... (read more)

Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).

I downvoted this (but have upvoted some of your comments).

I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it's presented with rhetorical force, so that it seems to mostly be trying to push people's views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.

TBC:

  • I think you probably have things to add here, and in particular feel quite curious what's led you to the view that people here inevitably get corrupted (whi
... (read more)

Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn't feel like a very strong argument -- the whole point is that we may care about accelerating applications even if it's not by a long period. And I don't think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).

Also, we could make a similar argument that "automated safety" research won't get dropped, since it's so obviously in the interests of whoever's winning the race. 

UI and complementary technologies: I'm sort of confused about your claim about comparative advantage. Are you saying that there aren't people in this community whose comparative advantage might be designing UI? That would seem surprising.

More broadly, though:

  • I'm not sure how much "we can just outsource this" really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
  • I guess I feel, though, that you're saying this won't be a
... (read more)
4
OscarD🔸
Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.

Compute allocation: mostly I think that "get people to care more" does count as the type of thing we were talking about. But I think that it's not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.

Training data: I agree that the stuff you're pointing to seems worthwhile. But I feel like you've latched onto a particular type of training data, and you're missing important categories, e.g.:

  • Epistemics stuff -- there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn't be so targeted in terms of the questions it addressed (e.g. "
... (read more)
2
OscarD🔸
Yeah I think I agree with all this; I suppose since 'we' have the AI policy/strategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.

It seems like "what can we actually do to make the future better (if we have a future)?" is a question that keeps on coming up for people in the debate week.

I've thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:

... (read more)

Ughh ... baking judgements about what's morally valuable into the question somehow doesn't seem ideal. Like I think it's an OK way to go for moral ~realists, but among anti-realists you might have people persistently disagreeing about what counts as extinction.

Also like: what if you have a world which is like the one you describe as an extinction scenario, but there's a small amount of moral value in some subcomponent of that AI system. Does that mean it no longer counts as an extinction scenario?

I'd kind of propose instead using the typology Will proposed... (read more)

Load more