9 comments, sorted by Click to highlight new comments since: Today at 12:25 AM
New Comment

Copying over part of a conversation from Facebook (where we've both been criticizing / expressing strong skepticism of QRI recently):

[...]

Holly Elmore: [...] I mean, it's true that I would like QRI to stop identifying with EA because it is unscientific[.]

[...]

Holly Elmore: @Andres Gomez Emilsson  I said QRI is unscientific because of the unrigorous reasoning and poorly motivated privileged hypotheses you’re pursuing. Your communication of your work is atrocious and it doesn’t seem subject to much review. You can do whatever kind of investigations you want but if you want me to call it science or say it should benefit from the EA label you need to have higher standards.

Rob Bensinger: Oh, I do disagree with a thing Holly said, and want to push back on it: "it's true that I would like QRI to stop identifying with EA because it is unscientific".

(Apologies for making it sound like I endorsed everything [Holly said], I forgot that part!)

I'm not sure how to articulate my objection to this Holly-statement, exactly. Something like: the important thing is figuring out what's true, and figuring out what policies that implies for object-level optimizing the world. I think it's unhealthy for EAs to think much, if at all, about what's "identified with EA" or "associated with EA", or what might reputationally damage EA.

If anything, I suspect having a few more crankish things associated with EA would probably be net-positive, because it would cause EAs to despair more of saving their reputation, which might make them give up on that whole 'defend the brand at all costs!' project and start doing more object-level good stuff again. 😛

Another way of putting it is that EA should be a very [hits-based] operation. We should constantly be having crazy projects get spun up and then absolutely demolished by EA Forum posts. This is a sign of the system working, not a sign of rot or error. QRI itself may not be useful, but the existence of a whole bunch of things that are like QRI and are associated with EA and then get demolished is exactly what we should be seeing if EA is functioning properly as a very generative community that tries lots of things and then criticizes/evaluates/prunes freely, in its famously blunt Buck-Shlegeris-like manner. The fact that I'm not seeing more of this worries me quite a lot, actually!

Another way of putting it is that there are two possible versions of EA we could shoot for:

Version 1: filters heavily for protecting its reputation from any association with weird, long-shot, or suspicious orgs. This successfully protects the reputation, but also loses most of EA's impact, because most of EA's impact is in producing very new ideas, longshot projects, unpopular angles of attacks on problems, etc. Not all of those will be winners, but blocking that generation process means we can't win.

Version 2: EA maintains high epistemic standards, but is perfectly happy to be 'associated' with all sorts of weird crazy ideas -- which it then vigorously critiques, has friendly harsh conversations with, etc. The EA Forum is laden with nonsense getting trashed. We have standards, both in what we'll allow on the forum, and in the content of our criticisms; but we take extra effort to make the standards tolerant of weird stuff, and we acknowledge that if we filter too hard for 'seems false to me' then we'll exclude some things that are true-but-revolutionary, which we don't want.

A thing I might endorse is a sort of pipeline, where some EA forums specialize in different parts of the pipeline and QRI might not make it past the earliest or second-earliest stage in that pipeline? And going further down the pipeline is about persuading people that your ideas at least make enough sense to debate/investigate further, not about being reputationally safe for EA.

Holly Elmore: If QRI didn’t claim to be EA, I would just leave them alone to have their theory that I think is bad science. That’s why I mention it. My concern is not reputation but wasting time and resources and weakening epistemic norms. I myself am in a grey literature in EA, and it’s a space we’re figuring out. It should be one of rigor.

Specifically, I see a lot of people looking at QRI, not getting or knowing what they’re talking about immediately, and leaving thinking “okay, interesting.” I want it to be much harder for someone to put a lot of confusing stuff on the forum or their website and have people update in favor of it because they like the aesthetic or consider them part of the community.

@Rob Bensinger  If QRI presented their work more humbly then I would take it for earlier pipeline stuff

Rob Bensinger: "If QRI presented their work more humbly then I would take it for earlier pipeline stuff"

I don't endorse this either, at all -- this heuristic would have stomped early SingInst 😛

Holly Elmore: @Rob Bensinger  Those are pretty different things. QRI is trying to find the truth, not effect change.

Rob Bensinger: Cranks often think their thing is extremely important, are very vocal and insistent about this, and are often too socially inept to properly genuflect to the status hierarchy and sprinkle in the right level of 'but maybe I'm wrong' to look properly humble and self-aware to third-party skeptics.

People with actual importantly novel insights and ideas, risky successful projects, radically new frameworks and models, etc. also often think their thing is extremely important, are also often very vocal and insistent about this, and are also often too socially inept to properly genuflect and caveat and 'look like a normal moderate calm person'. Especially the 'young genius' types.

We should have standards, but I think the standards should overwhelmingly be about the actual object-level claims, not about 'does this person seem too confident', 'does this person sound properly detached and calm like our mental image of a respectable lab-coat-wearing scientist', etc.

Holly Elmore: @Rob Bensinger  I think you are wrong about how much this is a status thing, and I'm irritated that a lot of people are viewing it that way (including at least @Andres and maybe others at QRI). On the forum, someone commented that they liked QRI because they wanted to "keep EA weird." All of those things are very beside the point of whether their claims are intelligible and true. Affirmative action for "cranks" is the last thing we need in any kind of truth-seeking movement.

@Rob Bensinger  I mentioned humility bc I think it's important to make the strength of your claims clear and not motte-bailey the way MIke and Andres have on very strong claims. It's about accuracy, not status. EDIT: And if QRI presented its claims as more exploratory, earlier pipeline stuff, then I would think that was more appropriate and not object as much.

Ronny Fernandez: @Holly Elmore  So you agree that we should have very high standards, and those standards should be only about object level stuff?

Holly Elmore: @Ronny Fernandez  The height of the standards depends on how formal the context, but yeah only the object level stuff matters

@Ronny Fernandez  Separately, I'm really annoyed that people who apparently haven't read QRI's posts think that I must be judging them for not being academics. Maybe you have to have a background in neuroscience for it to be obvious but the quality is clearly low at the object-level. I even think if you just have a background in rhetoric or rationality you can see how many of the arguments and the ways evidence is brought together don't follow.

Rob Bensinger: "Those [SingInst vs. QRI] are pretty different things. QRI is trying to find the truth, not effect change."

Early SingInst was a mix of research org, outreach org, attempt-to-raise-the-sanity-waterline, etc. It was also trying to find the truth; and like QRI, it had controversial beliefs; and in both cases there ought to be important practical implications if the beliefs are true.

I'd say that for both orgs the main important feature of the org isn't the outreach (or lack thereof); it's the org's claims to have discovered rare truths about the world, which aren't widely understood or accepted (like 'AGI won't be nice unless you do a bunch of hard work to align it').

From my perspective, the difference is that QRI is (AFAICT) wrong about everything and has terrible methodology, whereas MIRI's methodology was great and got them far ahead of the curve on a huge list of AGI-related topics.

I think we agree about these three things (let me know if I'm wrong):

1. EA should be a breeding ground for SingInst-ish orgs.

2. EA should push back super hard against QRI-ish orgs, harshly criticizing their claims. Possibly even banning their stuff from the EA Forum at some point, if it seems to be sucking up lots of attention on pointless crackpottery? (I don't actually know whether the case against QRI's stuff is as airtight as all that, or whether it's worth setting the forum norms at that particular threshold; but I could imagine learning more that would make me think this.)

3. EA should have very high intellectual standards in general, and should freely criticize ideas that seem implausible, and criticize arguments that seem (inductively) invalid or weak.

I'd also guess we agree that EA shouldn't obsess about its reputation. Where I'd guess we might disagree is (i) that I'm probably more wary of reputation-management stuff, and want to steer an even wider berth around that topic; (ii) I might be less satisfied than you with the current rate of 'EA spawning weird SingInst-like projects'; and (iii) I might be less confident than you that QRI at the start of the pipeline was obviously un-SingInst-like.

Does that disagree with your model anywhere?

Holly Elmore: @Rob Bensinger  No, I think that captures it nicely

Rob Bensinger: "Affirmative action for 'cranks' is the last thing we need in any kind of truth-seeking movement."

The thing I'm advocating isn't 'affirmative action for cranks' -- it's 'ignore weirdness, social awkwardness, etc. in college applications more than we naturally would, in order to put even more focus on object-level evaluation'.

I think also you misunderstood me as saying 'your objections to QRI are all about status', or even 'we shouldn't ban QRI from the EA Forum for intellectual shoddiness'. I doubt the former, and I don't really have a view about the latter. My response was narrowly aimed at phrasings like "if QRI presented their work more humbly" and "I would like QRI to stop identifying with EA", where even if you didn't mean those things in status-y ways, I still wanted to guard against anyone interpreting those statements that way and coming away thinking that *that* is an EA norm.

It sounds like I misunderstood the extent to which you endorsed 'focus on reputation stuff in this context' -- sorry about that. 🙂 In that case, interpret what I said as 'picking on your word choice and then going off on a tangent', rather than successfully engaging with what you actually believe.

Rolf Degen, summarizing part of Barbara Finlay's "The neuroscience of vision and pain":

Humans may have evolved to experience far greater pain, malaise and suffering than the rest of the animal kingdom, due to their intense sociality giving them a reasonable chance of receiving help.

From the paper:

Several years ago, we proposed the idea that pain, and sickness behaviour had become systematically increased in humans compared with our primate relatives, because human intense sociality allowed that we could ask for help and have a reasonable chance of receiving it. We called this hypothesis ‘the pain of altruism’ [68]. This idea derives from, but is a substantive extension of Wall’s account of the placebo response [43]. Starting from human childbirth as an example (but applying the idea to all kinds of trauma and illness), we hypothesized that labour pains are more painful in humans so that we might get help, an ‘obligatory midwifery’ which most other primates avoid and which improves survival in human childbirth substantially ([67]; see also [69]). Additionally, labour pains do not arise from tissue damage, but rather predict possible tissue damage and a considerable chance of death. Pain and the duration of recovery after trauma are extended, because humans may expect to be provisioned and protected during such periods. The vigour and duration of immune responses after infection, with attendant malaise, are also increased. Noisy expression of pain and malaise, coupled with an unusual responsivity to such requests, was thought to be an adaptation.
We noted that similar effects might have been established in domesticated animals and pets, and addressed issues of ‘honest signalling’ that this kind of petition for help raised. No implication that no other primate ever supplied or asked for help from any other was intended, nor any claim that animals do not feel pain. Rather, animals would experience pain to the degree it was functional, to escape trauma and minimize movement after trauma, insofar as possible.

Finlay's original article on the topic: "The pain of altruism".

[Epistemic status: Thinking out loud]

If the evolutionary logic here is right, I'd naively also expect non-human animals to suffer more to the extent they're (a) more social, and (b) better at communicating specific, achievable needs and desires.

There are reasons the logic might not generalize, though. Humans have fine-grained language that lets us express very complicated propositions about our internal states. That puts a lot of pressure on individual humans to have a totally ironclad, consistent "story" they can express to others. I'd expect there to be a lot more evolutionary pressure to actually experience suffering, since a human will be better at spotting holes in the narratives of a human who fakes it (compared to, e.g., a bonobo trying to detect whether another bonobo is really in that much pain).

It seems like there should be an arms race across many social species to give increasingly costly signals of distress, up until the costs outweigh the amount of help they can hope to get. But if you don't have the language to actually express concrete propositions like "Bob took care of me the last time I got sick, six months ago, and he can attest that I had a hard time walking that time too", then those costly signals might be mostly or entirely things like "shriek louder in response to percept X", rather than things like "internally represent a hard-to-endure pain-state so I can more convincingly stick to a verbal narrative going forward about how hard-to-endure this was".

A thing I wrote on social media a few months ago, in response to someone asking if an AI warning shot might happen:

[... I]t's a realistic possibility, but I'd guess it won't happen before AI destroys the world, and if it happens I'm guessing the reaction will be stupid/panicky enough to just make the situation worse.

(It's also possible it happens before AI destroys the world, but six weeks before rather than six years before, when it's too late to make any difference.)

A lot of EAs feel confident we'll get a "warning shot" like this, and/or are mostly predicating their AI strategy around "warning-shot-ish things will happen and suddenly everyone will get serious and do much more sane things". Which doesn't sound like, eg, how the world reacted to COVID or 9/11, though it sounds a bit like how the world (eventually) reacted to nukes and maybe to the recent Ukraine invasion?

Someone then asked why I thought a warning shot might make things worse, and I said:

It might not buy time, or might buy orders of magnitude less time than matters; and/or some combination of:

- the places that are likely to have the strictest regulations are (maybe) the most safety-conscious parts of the world. So you may end up slowing down the safety-conscious researchers much more than the reckless ones.

- more generally, it's surprising and neat that the frontrunner (DM) is currently one of the least allergic to thinking about AI risk. I don't think it's anywhere near sufficient, but if we reroll the dice we should by default expect a worse front-runner.

- regulations and/or safety research are misdirected, because people have bad models now and are therefore likely to have bad models when the warning shot happens, and warning shots don't instantly fix bad underlying models.

The problem is complicated, and steering in the right direction requires that people spend time (often years) setting multiple parameters to the right values in a world-model. Warning shots might at best fix a single parameter, 'level of fear', not transmit the whole model. And even if people afterwards start thinking more seriously and thereby end up with better models down the road, their snap reaction to the warning shot may lock in sticky bad regulations, policies, norms, culture, etc., because they don't already have the right models before the warning shot happens.

- people tend to make worse decisions (if it's a complicated issue like this, not just 'run from tiger') when they're panicking and scared and feeling super rushed. As AGI draws visibly close / more people get scared (if either of those things ever happen), I expect more person-hours spent on the problem, but I also expect more rationalization, rushed and motivated reasoning, friendships and alliances breaking under the emotional strain, uncreative and on-rails thinking, unstrategic flailing, race dynamics, etc.

- if regulations or public backlash do happen, these are likely to sour a lot of ML researchers on the whole idea of AI safety and/or sour them on xrisk/EA ideas/people. Politicians or the public suddenly caring or getting involved, can easily cause a counter-backlash that makes AI alignment progress even more slowly than it would have by default.

- software is not very regulatable, software we don't understand well enough to define is even less regulatable, whiteboard ideas are less regulatable still, you can probably run an AGI on a not-expensive laptop eventually, etc.

So regulation is mostly relevant as a way to try to slow everything down indiscriminately, rather than as a way to specifically target AGI; and it would be hard to make it have a large effect on that front, even if this would have a net positive effect.

- a warning shot could convince everyone that AI is super powerful and important and we need to invest way more in it.

- (insert random change to the world I haven't thought of, because events like these often have big random hard-to-predict effects)

Any given big random change will tend to be bad on average, because the end-state we want requires setting multiple parameters to pretty specific values and any randomizing effect will be more likely to break a parameter we already have in approximately the right place, than to coincidentally set a parameter to exactly the right value.

There are far more ways to set the world to the wrong state than the right one, so adding entropy will usually make things worse.

We may still need to make some high-variance choices like this, if we think we're just so fucked that we need to reroll the dice and hope to have something good happen by coincidence. But this is very different from expecting the reaction to a warning shot to be a good one. (And even in a best-case scenario we'll need to set almost all of the parameters via steering rather than via rerolling; rerolling can maybe get us one or even two values close-to-correct if we're crazy lucky, but the other eight values will still need to be locked in by optimization, because relying on ten independent one-in-ten coincidences to happen is obviously silly.)

- oh, [redacted]'s comments remind me of a special case of 'worse actors replace the current ones': AI is banned or nationalized and the UK or US government builds it instead. To my eye, this seems a lot likelier to go poorly than the status quo.

There are plenty of scenarios that I think make the world go a lot better, but I don't think warning shots are one of them.

(One might help somewhat, if it happened; it's mostly just hard to say, and we'll need other major changes to happen first. Those other major changes are more the thing I'd suggest focusing on.)

How are you operationalizing "warning shot" and how low do you think the chances are that a warning shot operationalized that way happens and the world is not destroyed within e.g. the following three years?

I'm thinking of a 'warning shot' roughly as 'an event where AI is widely perceived to have caused a very large amount of destruction'. Maybe loosely operationalized as 'an event about as sudden as 9/11, and at least one-tenth as shocking, tragic, and deadly as 9/11'.

I don't have stable or reliable probabilities here, and I expect that other MIRI people would give totally different numbers. But my  current ass numbers are something like:

  • 12% chance of a warning shot happening at some point.
  • 6% chance of a warning shot happening more than 6 years before AGI destroys and/or saves the world.
  • 10% chance of a warning shot happening 6 years or less before AGI destroys and/or saves the world.

My current unconditional odds on humanity surviving are very low (with most of my optimism coming from the fact that the future is just inherently hard to predict). Stating some other ass numbers:

  • Suppose that things go super well for alignment and timelines aren't that short, such that we achieve AGI in 2050 and have a 10% chance of existential success. In that world, if we held as much as possible constant except that AGI comes in 2060 instead of 2050, then I'm guessing that would double our success odds to 20%.
  • If we invented AGI in 2050 and somehow impossibly had fifteen years to work with AGI systems before anyone would be able to destroy the world, and we knew as much, then I'd imagine our success odds maybe rising from 10% to 55%.
  • The default I expect instead is that the first AGI developer will have more than three months, and less than five years, before someone else destroys the world with AGI. (And on the mainline, I expect them to have ~zero chance of aligning their AGI, and I expect everyone else to have ~zero chance as well.)

If a warning shot had a large positive impact on our success probability, I'd expect it to look something like:

'20 or 30 or 40 years before we'd naturally reach AGI, a huge narrow-AI disaster unrelated to AGI risk occurs. This disaster is purely accidental (not terrorism or whatever). Its effect is mainly just to cause it to be in the Overton window that a wider variety of serious technical people can talk about scary AI outcomes at all, and maybe it slows timelines by five years or something. Also, somehow none of this causes discourse to become even dumber; e.g., people don't start dismissing AGI risk because "the real risk is narrow AI systems like the one we just saw", and there isn't a big ML backlash to regulatory/safety efforts, and so on.'

I don't expect anything at all like that to happen, not least because I suspect we may not have 20+ years left before AGI. But that's a scenario where I could (maybe, optimistically) imagine real, modest improvements.

If I imagine that absent any warning shots, AGI is coming in 2050, and there's a 1% chance of things going well in 2050, then:

  • If we add a warning shot in 2023, then I'd predict something like: 85% chance it has no major effect, 12% chance if makes the situation a lot worse, 3% chance it makes the situation a lot better. (I.e., an 80% chance that if it has a major effect, it makes things worse.)

This still strikes me as worth thinking about some, in part because these probabilities are super unreliable. But mostly I think EAs should set aside the idea of warning shots and think more about things we might be able to cause to happen, and things that have more effects like 'shift the culture of ML specifically' and/or 'transmit lots of bits of information to technical people', rather than 'make the world as a whole panic more'.

I'm much more excited by scenarios like: 'a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture, norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent'.

It's rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it's a podcast containing many hours of content, there's the potential to seed subsequent conversations with a lot of high-quality background thoughts.

To my eye, that seems more like the kind of change that might shift us from a current trajectory of "~definitely going to kill ourselves" to a new trajectory of "viable chance of an existential win".

Whereas warning shots feel more unpredictable to me, and if they're unhelpful, I expect the helpfulness to at best look like "we were almost on track to win, and then the warning shot nudged us just enough to secure a win".

Very interesting analysis—thank you.

I thought this was great because it has concrete and specific details, times and probabilities. I think it’s really hard to write these out.

Well, it's less effort insofar as these are very low-confidence, unstable ass numbers. I wouldn't want to depend on a plan that assumes there will be no warning shots, or a plan that assumes there will be some.