All of Owen Cotton-Barratt&#x27;s Comments + Replies

Owen Cotton-Barratt2mo3

I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I'm not sure if you agree with that.

Owen Cotton-Barratt2mo4

Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.

At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)

Owen Cotton-Barratt2mo6

You're discussing catastrophes that are big enough to set the world back by at least 100 years. But I'm wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)

With the smaller setbacks you probably have to get more granular in terms of asking "in precisely which ways is this setting us back?", rather than just analysing it in the abstract. But that can just be faced.

OscarD🔸

2mo

Yes, I think the '100 years' criterion isn't quite what we want. E.g. if there is a catastrophic setback more than 100 years after we build an aligned ASI, thenw e don't need to rerun the alignment problem. (In practice, perhaps 100 years should be ample time to build good global governance and reduce catastrophic setback risk to near 0, but conceptually we want to clarify this.) And I agree with Owen that shorter setbacks also seem important. In fact, in a simple binary model we could just define a catastrophic setback to be one that takes you from a society that has built aligned ASI to one where all aligned ASIs are destroyed. ie the key thing is not how many years back you go, but whether you regres back beneath the critical 'crunch time' period.

Owen Cotton-Barratt2mo2

Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)

Owen Cotton-Barratt2mo2

On section 4, where you ask about retaining alignment knowledge:

It feels kind of like you're mislabelling the ends of the spectrum?
My guess is that rather than think about "how much alignment knowledge is lost?", you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
I'm not sure that's quite right either, but it feels a little bit closer?

William_MacAskill

2mo

Okay, looking at the spectrum again, it still seems to me like I've labelled them correctly? Maybe I'm missing something. It's optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don't face the same magnitude of risk of the misaligned AI.

How good would a CCP-dominated AI future be?

Owen Cotton-Barratt2mo3

For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:

the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)

It seems to me like this is a much higher bar than reaching AGI -- and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?

William_MacAskill

2mo

Thanks, that's a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that's before 'reasonably good global governance". Of course, there's an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper. One additional comment is that this sort of "concepts moving around issue" is one of the things that I've found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it.

Owen Cotton-Barratt3mo2

Yeah roughly the thought is "assuming concentrated power, it matters what the key powerful actors will do" (the liberal democracy comment was an aside saying that I think we should be conditioning on concentrated power).

And then for making educated guesses about what the key powerful actors will do, it seems especially important to me what their attitudes will be at a meta-level: how they prefer to work out what to do, etc.

How good would a CCP-dominated AI future be?

Owen Cotton-Barratt4mo4

I might have thought that some of the most important factors would be things like:

How likely is leadership to pursue intelligence enhancement, given technological opportunity?
How likely is leadership to pursue wisdom enhancement, given technological opportunity?

(Roughly because: either power is broadly distributed, in which case your comments about liberal democracy don't seem to have so much bite; or it's not, in which case it's really the values of leadership that matter.) But I'm not sure you really touch on these. Interested if you have thoughts.

OscarD🔸

3mo

Not sure I follow properly - why would liberal democracy not matter? I think whether biological humans are themselves enhanced in various ways matters less than whether they are getting superhuman (and perhaps super-wise advice). Though possibly wisdom is different and you need the principal to themselves be wise, rather than just getting wise advice.

Persistence, Not Projection: The Case for Loop Maintenance over Longtermism

Owen Cotton-Barratt4mo2

Thanks AJ!

My impression is that although your essay frames this as a deep disagreement, in fact you're reacting to something that we're not saying. I basically agree with the heart of the content here -- that there are serious failure modes to be scared of if attempting to orient to the long term, and that something like loop-preservation is (along with the various more prosaic welfare goods we discussed) essential for the health of even a strict longtermist society.

However, I think that what we wrote may have been compatible with the view that you have such a negative reaction to, and at minimum I wish that we'd spent some more words exploring this kind of dynamic. So I appreciate your response.

-2

AJ van Hoek

4mo

Thanks for the generous response. You write that we "may have been compatible" and I'm "reacting to something you're not saying." Here's my concern: I've come to recognize that reality operates as a dynamic network—nodes (people, institutions) whose capacity is constituted by the relationships among them. This isn't just a modeling choice; it's how cities function, how pandemics spread, how states maintain capacity. You don't work from this explicit recognition. This creates an asymmetry. Once you see reality as a network, your Section 5 framework becomes incompatible with mine—not just incomplete, but incoherent. You explicitly frame the state as separate from people, optimizing for longtermist goals while managing preferences as constraints. But from the network perspective, this separation doesn't exist—the state's capacity just IS those relationships. You can't optimize one while managing the other. Let me try to say this more directly: I've come to understand my own intelligence as existing not AT my neurons, but BETWEEN them—as a pattern of activation across connections. I am the edge, not the node. And I see society the same way: capacity isn't located IN institutions, it emerges FROM relationships. From this perspective, your Section 5 (state separate from people) isn't a simplification—it's treating edges as if they were nodes, which fundamentally misunderstands what state capacity is. That's the asymmetry: your explicit framing (state separate from people) is incompatible with how I now understand reality. But if you haven't recognized the network structure, you'd just see my essay as "adding important considerations" rather than revealing a foundational incompatibility. Does this help clarify where I'm coming from?

The 'community' tag is problematic

Owen Cotton-Barratt5mo6

That makes sense!

(I'm curious how much you've invested in giving them detailed prompts about what information to assess in applying particular tags, or even more structured workflows, vs just taking smart models and seeing if they can one-shot it; but I don't really need to know any of this.)

The 'community' tag is problematic

Owen Cotton-Barratt5mo8

If you want independent criteria-based judgements, it might realistically be a good option to have the judgements made by an LLM -- with the benefit of having the classification instantly (as a bonus you could publish the prompt used, so the judgements would be easier for people to audit).

Will Aldred

5mo

Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.

EA as Antichrist: Understanding Peter Thiel

Owen Cotton-Barratt6mo6

Ok thanks I think it's fair to call me on this (I realise the question of what Thiel actually thinks is not super interesting to me, compared to "does this critique contain inspiration for things to be aware of that I wasn't previously really tracking"; but get that most people probably aren't orienting similarly, and I was kind of assuming that they were when I suggested this was why it was getting sympathy).

I do think though that there's a more nuanced point here than "trying too hard to do good can result in harm". It's more like "over-claiming about ho... (read more)

EA as Antichrist: Understanding Peter Thiel

Owen Cotton-Barratt6mo6

I think that the theology is largely a distraction from the reason this is attracting sympathy, which I'd guess to be more like:

If you have some ideas which are pretty good, or even very good, but they present as though they're the answer needed for everything, and they're not, that could be quite destructive (and potentially very-net-bad, even while the ideas were originally obviously-good)
This is at least a plausible failure mode for EA, and correspondingly worth some attention/wariness
This kind of concern hasn't gotten much airtime before (and is perhaps easier to express and understand as a serious possibility with some of the language-that-I-interpret-metaphorically);

David T

6mo

Feels like the argument you've constructed is a better one than the one Thiel is actually making, which seems to be a very standard "evil actors often claim to be working for the greater good" argument with a libertarian gloss. Thiel doesn't think redistribution is an obviously good idea that might backfire if it's treated as too important, he actively loathes it. I think the idea that trying too hard to do good things and ending up doing harm is absolutely a failure mode worth considering, but has far more value in the context of specific examples. It seems like quite a common theme in AGI discourse (follows from standard assumptions like AGI being near and potentially either incredibly beneficial or destructive, research or public awareness either potentially solving the problem or starting a race etc) and the optimiser's curse is a huge concern for EA cause prioritization overindexing on particular data points. Maybe that deserves (even) more discussion. But I don't think an guy that doubts we're on the verge of an AI singularity and couldn't care less whether EAs encourage people to make the wrong tradeoffs between malaria nets, education and shrimp welfare adds much to that debate, particularly not with a throwaway reference to EA in a list of philosophies popular with the other side of the political spectrum he things are basically the sort of thing the Antichrist would say. I mean, he is also committed to the somewhat less insane-sounding "growth is good even if it comes with risks" argument, but you can probably find more sympathetic and coherent and less interest-conflicted proponents of that view.

Owen Cotton-Barratt7mo2

is that you feel that moral statements are not as evidently subjective as say, 'Vanilla ice-cream is the best flavor' but not as objective as, say 'An electron has a negative charge', as living in some space of in-betweeness with respect to those two extremes

I think that's roughly right. I think that they are unlikely to be more objective than "blue is a more natural concept than grue", but that there's a good chance that they're about the same as that (and my gut take is that that's pretty far towards the electron end of the spectrum; but perhaps I'm conf... (read more)

Owen Cotton-Barratt7mo4

See my response to Manuel -- I don't think this is "proving moral realism", but I do think it would be pointing at something deeper and closer-to-objective than "happen to have the same opinions".

Owen Cotton-Barratt7mo4

I'm not sure what exactly "true" means here.

Here are some senses in which it would make morality feel "more objective" rather than "more subjective":

I can have the experience of having a view, and then hearing an argument, and updating. My stance towards my previous view then feels more like "oh, I was mistaken" (like if I'd made a mathematical error) rather than "oh, my view changed" (like getting myself to like the taste of avocado when I didn't used to).
There can exist "moral experts", whom we would want to consult on matters of morality. Broadly, we sh

... (read more)

Manuel Del Río Rodríguez 🔹

7mo

Terminology can be a bugger in these discussions. I think we are accepting, as per BB's own definition at the start of the thread, that Moral Realism would basically reduce to accepting a stance-independent view that moral truths exist. As for truth, I would mean it in the way it gets used when studying other, stance-independent objects, i.e., electrons exist and their existence is independent of human minds and-or of humans having ever existed, and saying 'electrons exist' is true because of their correspondence to objects of an external, human-independent reality. What I take from your examples (correct me if I am wrong or if I misrepresent you) is that you feel that moral statements are not as evidently subjective as say, 'Vanilla ice-cream is the best flavor' but not as objective as, say 'An electron has a negative charge', as living in some space of in-betweeness with respect to those two extremes. I'd still call this anti-realism, as you're just switching from a maximally subjective stance (an individual's particular culinary tastes) to a more general, but still stance-dependent one ( what a group of experts and-or human and some alien minds might possibly agree upon). I'd say again, an electron doesn't care for what a human or any other creature thinks about its electric charge. As for each of the bullet points, what I'd say is: 1. I can see why you'd feel the change from a previous view can be seen as a mistake rather than a preference change -when I first started thinking about morality I felt very strongly inclined to the strongest moral realism, and I know feel that pov was wrong- but this doesn't imply moral realism as much as that if feels as if moral principles and beliefs have objective truth status, even if they were actually a reorganization of stance-dependent beliefs. 2. I, on the contrary, don't feel like there could be 'moral experts' - at most, people who seem to live up to their moral beliefs, whatever the knowledge and reasons for having

Owen Cotton-Barratt7mo4

Locally, I think that often there will be some cluster of less controversial common values like "caring about the flourishing of society" which can be used to derive something like locally-objective conclusions about moral questions (like whether X is wrong).

Globally, an operationalization of morality being objective might be something like "among civilizations of evolved beings in the multiverse, there's a decently big attractor state of moral norms that a lot of the civilizations eventually converge on".

Neel Nanda

7mo

Less controversial is a very long way from objective - why do you think that "caring about the flourishing of society" is objectively ethical? Re the idea of an attractor, idk, history has sure had lot of popular beliefs I find abhorrent. How do we know there even is convergence at all rather than cycles? And why does being convergent imply objective? If you told me that the supermajority of civilization concluded that torturing criminals was morally good, that would not make me think it was ethical. My overall take is that objective is just an incredibly strong word for which you need incredibly strong justifications, and your justifications don't seem close, they seem more about "this is a Schelling point" or "this is a reasonable default that we can build a coalition around"

Robi Rahman🔸

7mo

No, that wouldn't prove moral realism at all. That would merely show that you and a bunch of aliens happen to have the same opinions.

Manuel Del Río Rodríguez 🔹

7mo

I don't think I have much to object to that, but I do think that doesn't look at all like 'stance independent' if we're using that as the criterion for ethical realism. What you're saying seems to boil down, if I understand it correctly is 'given a bunch of intelligent creatures with some shared psychological perceptions of the world and some tendency towards collaboration, it is pretty likely they'll end up arriving at a certain set of shared norms that optimize towards their well-being as a group -and in most cases, as individuals-. That makes the 'state of moral norms that a lot of the civilizations eventually converge on' something useful for ends x, y, z, but not 'true' and 'independent of human or alien minds'.

Owen Cotton-Barratt8mo5

Ok but jtbc that characterization of "affronted" is not the hypothesis I was offering (I don't want to say it wasn't a part of the downvoting, but I'd guess a minority).

I would personally kind of like it if people actively explored angles on things more. But man, there are so many things to read on AI these days that I do kind of understand when people haven't spent time considering things I regard as critical path (maybe I should complain more!), and I honestly find it's hard to too much fault people for using "did it seem wrong near the start in a way that makes it harder to think" as a heuristic for how deeply to engage with material.

Owen Cotton-Barratt8mo2

I'm curious whether you're closer to angry that someone might read your opening paragraph as saying "you should discard the concept of warning shots" or angry that they might disagree-vote if they read it that way (or something else).

Holly Elmore ⏸️ 🔸

8mo

No I'm angry that people feel affronted by me pointing out that normal warning shot discourse entailed hoping for a disaster without feeling much need make sure that would be helpful. They should be glad that they have a chance to catch themselves, but instead they silently downvote. Just feels like so much of the vibe of this forum is people expecting to be catered to, like their support is some prize, rather than people wanting to find out for themselves how to help the world. A lot of EAs have felt comfortable dismissing PauseAI bc it's not their vibe or they didn't feel like the case was made in the right way or they think their friends won't support it, and it drives me crazy bc aren't they curious??? Don't they want to think about how to address AI danger from every angle?

Owen Cotton-Barratt8mo19

Not sure quite what to say here. I think your post was valuable and that's why I upvoted it. You were expressing confusion about why anyone would disagree, and I was venturing a guess.

I don't think gentleness = ego service (it's an absence of violence, not a positive thing). But also I don't think you owe people gentleness. However, I do think that when you're not gentle (especially ontologically gentle) you make it harder for people to hear you. Not because of emotional responses getting in the way (though I'm sure that happens sometimes), but literally b... (read more)

Holly Elmore ⏸️ 🔸

8mo

I was curious about guesses as to why this happens to me lately (a lot of upfront disagree votes and karma hovering around zero until the views are high enough) but getting that answer is still pretty hard for me to hear without being angry.

Owen Cotton-Barratt8mo*6

By "ontologically ungentle" I mean (roughly) it feels like you're trying to reach into my mind and tell me that my words/concepts are wrong. As opposed to writing which just tells me that my beliefs are wrong (which might still be epistemically ungentle), or language which just provides evidence without making claims that could be controversial (gentle in this sense, kind of NVC-style).

I do feel a bit of this ungentleness in that opening paragraph towards my own ontology, and I think it put me more on edge reading the rest of the post. But as I said, I didn't disagree-vote; I was just trying to guess why others might have.

-5

Holly Elmore ⏸️ 🔸

8mo

Owen Cotton-Barratt8mo5

Right ... so actually I think you're just doing pretty well at this in the latter part of the article.

But at the start you say things like:

There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest.

What this paragraph seems to do is to push the error-in-beliefs that you're complaining about down into the ver... (read more)

Holly Elmore ⏸️ 🔸

8mo

Did you feel treated ungently for your warning shots take? Or is this just on the behalf of people who might? Also can you tell me what you mean by "ontologically ungentle"? It sounds worryingly close to a demand that the writer think all the readers are good. I do want to confront people with the fact they've been lazily hoping for violence if that's in fact what they've been doing.

Owen Cotton-Barratt8mo2

Honestly, maybe you should try telling me? Like, just write a paragraph or two on what you think is valuable about the concept / where you would think it's appropriate to be applying it?

(Not trying to be clever! I started trying to think about what I would write here and mostly ended up thinking "hmm I bet this is stuff Holly would think is obvious", and to the extent that I may believe you're missing something, it might be easiest to triangulate by hearing your summary of what the key points in favour are.)

Holly Elmore ⏸️ 🔸

8mo

I thought I was giving the strong version. I have never heard an account of a warning shot theory of change that wasn’t “AI will cause a small-scale disaster and then the political will to do something will materialize”. I think the strong version would be my version, educating people first so they can understand small-scale disasters that may occur for what they are. I have never seen or heard this advocated in AI Safety circles before. And I described how impactful chatGPT was on me, which imo was a warning shot gone right in my case.

Estimating the Substitutability between Compute and Cognitive Labor in AI Research

Owen Cotton-Barratt8mo14

I upvoted and didn't disagree vote the original post (and generally agree with you on a bunch of the object level here!); however, I do feel some urge-towards-expressing-disagreement, which is something like:

Less disagreeing with claims; more disagreeing with frames?
Like: I feel the discomfort/disagreement less when you're talking about what will happen, and more when you're talking about how people think about warning shots
Your post feels something like ... intellectually ungenerous? It's not trying to look for the strongest version of the warning s

... (read more)

Holly Elmore ⏸️ 🔸

8mo

What is the “strong” version of warning shots thinking?

Owen Cotton-Barratt8mo4

OK I see the model there.

I guess it's not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?

However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don't feel like I have a clear picture of the dynamics here, though.)

Estimating the Substitutability between Compute and Cognitive Labor in AI Research

Owen Cotton-Barratt8mo4

hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?

(Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can't do this with human-only research input, which was maybe your main point.)

Tom_Davidson

8mo

Agree that i wouldn't particularly expect the efficiency curves to be the same. But if the phi>0 for both types of efficiency, then I think this argument will still go through. To put it in math, there would be two types of AI software technology, one for experimental efficiency and one for cognitive labour efficiency: A_exp and A_cog. The equations are then: dA_exp = A_exp^phi_exp F(A_exp K_res, A_cog K_inf) dA_cog = A_cog^phi_cog F(A_exp K_res, A_cog K_inf) And then I think you'll find that, even with sigma < 1, it explodes when phi_exp>0 and phi_cog>0.

Thomas Kwa🔹

8mo

If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they're constant, and might hold in practice especially when you can target small-scale algorithmic improvements.

Matthew_Barnett's Quick takes

Owen Cotton-Barratt8mo*5

Edit: I think that Neel's comment is basically just a better version of the stuff I was trying to say. (On the object level I'm a little more sympathetic than him to ways in which Mechanize might be good, although I don't really buy the story to that end that I've seen you present.)

~~Wanting to note that on my impressions, and setting aside who is correct on the object-level question of whether Mechanize's work is good for the world:~~

~~My best read of the situation is that Matthew has acted very reasonably (according to his beliefs), and that Holly has let hers~~

... (read more)

The Soul of EA is in Trouble

Owen Cotton-Barratt8mo13

I feel like voicing that I centrally expect AI to continue to have bigger real-world impacts, but not get very weird until the 2030s. I think worlds where things go faster than that are a serious enough possibility to take seriously, but I think that the apparent zeitgeist suggests timelines which are a bit more aggressive than I think is justified.

The reason I want to say something is that I sort of suspect there are a bunch of people in a similar epistemic position -- where it doesn't seem like a priority to properly explore what % to put on craziness this decade; nor to get into big arguments about whether the zeitgeist is slightly off -- but for whom your comment might feel like something of a trap.

Ryan Greenblatt's Quick takes

Owen Cotton-Barratt8mo2

I guess I'm fairly sympathetic to this. It makes me think that voluntary safety policies should ideally include some meta-commentary about how companies view the purpose and value-add of the safety policy, and meta-policies about how updates to the safety policy will be made -- in particular, that it might be good to specify a period for public comments before a change is implemented. (Even a short period could be some value add.)

Ryan Greenblatt's Quick takes

Owen Cotton-Barratt8mo*9

I appreciate the investigation here.

I'm not sure whether I agree that "quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work". (Not sure I disagree; but going to try to articulate a case against).

I think in a world where you understand the risks well ahead of time of course this isn't how safety policies should work. In a world where you don't understand the risks well ahead of time, you can get more clarity as the key moments approach, and this could lead you to rationally judge that a lowe... (read more)

Ryan Greenblatt

8mo

From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late. So, I think it's good to create an expectation/norm that you shouldn't substantially weaken a policy right as it is being applied. That's not to say that a reasonable company shouldn't do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn't a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.

Saul Munn's Quick takes

Owen Cotton-Barratt9mo6

I appreciated you expressing this.

Riffing out loud ... I feel that there are different dynamics going on here (not necessarily in your case; more in general):

The tensions where people don't act with as much integrity as is signalled
- This is not a new issue for EA (it arises structurally despite a lot of good intentions, because of the encouragement to be strategic), and I think it just needs active cultural resistance
  - In terms of writing, I like Holden's and Toby's pushes on this; my own attempts here and here
  - But for this to go well, I think it's not enough

... (read more)

Owen Cotton-Barratt's Quick takes

Owen Cotton-Barratt9mo26

Community

I wanted to share some insights from my reflection on my mistakes around attraction/power dynamics — especially something about the shape of the blindspots I had. My hope is that this might help to avert cases of other people causing harm in similar ways.

I don’t know for sure how helpful this will be; and I’m not making a bid for people to read it (I understand if people prefer not to hear more from me on this); but for those who want to look, I’ve put a couple of pages of material here.

Selling out to AI companies is bad. Period. You will be corrupted.

Owen Cotton-Barratt10mo3

These are in the same category because:

I'm talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
These are the capacities that we need to help everyone to recognize the situation we're in and come together to do something about it (and they are partial substitutes: the better everyone's epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently)

I'm not actually making a claim about alignment difficulty -- beyond that... (read more)

Selling out to AI companies is bad. Period. You will be corrupted.

Owen Cotton-Barratt10mo3

I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.

I think that the degree of consensus you'd need for the position that you're outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:

Scientific ~consensus -- people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than ju

... (read more)

Holly Elmore ⏸️ 🔸10mo10

Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.

Why are these the same category and why are you writing coordination off as impossible? It's not. We have literally done global nonproliferation treaties before.

This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It's actually the opposite-- alignment is not tractable and coordination is.

Selling out to AI companies is bad. Period. You will be corrupted.

Owen Cotton-Barratt10mo5

I agree that there could be an effect that keeps people from speaking out about AI danger. But:

I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
Working at a frontier AI company (or having worked at o

... (read more)

Holly Elmore ⏸️ 🔸10mo12

Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).

Selling out to AI companies is bad. Period. You will be corrupted.

Owen Cotton-Barratt10mo52

I downvoted this (but have upvoted some of your comments).

I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it's presented with rhetorical force, so that it seems to mostly be trying to push people's views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.

TBC:

I think you probably have things to add here, and in particular feel quite curious what's led you to the view that people here inevitably get corrupted (whi

... (read more)

Owen Cotton-Barratt11mo4

Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn't feel like a very strong argument -- the whole point is that we may care about accelerating applications even if it's not by a long period. And I don't think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).

Also, we could make a similar argument that "automated safety" research won't get dropped, since it's so obviously in the interests of whoever's winning the race.

Owen Cotton-Barratt11mo6

UI and complementary technologies: I'm sort of confused about your claim about comparative advantage. Are you saying that there aren't people in this community whose comparative advantage might be designing UI? That would seem surprising.

More broadly, though:

I'm not sure how much "we can just outsource this" really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
I guess I feel, though, that you're saying this won't be a

... (read more)

OscarD🔸

11mo

Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.

Owen Cotton-Barratt11mo6

Compute allocation: mostly I think that "get people to care more" does count as the type of thing we were talking about. But I think that it's not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.