Late 2021 MIRI Conversations

Wiki Contributions


How should Effective Altruists think about Leftist Ethics?

I'm not entirely sure why PR-risk needs to be excluded from cost effectiveness analysis (it's just another downside), though I'm not opposed in practice to doing this.

PR risk is a lot weirder and more complicated than a lot of people take it to be. Breaking it off into a separate discussion, or a separate bucket, seems wise to me in a lot of cases.

Christiano, Cotra, and Yudkowsky on AI progress

Transcript error (will fix soon) -- the line that reads


I expect it to go away before the end of days

but with there having been a big architectural innovation, not Stack More Layers


I expect it to go away before the end of days

but with there having been a big architectural innovation, not Stack More Layers


if you name 5 possible architectural innovations I can call them small or large

should be


I expect it to go away before the end of days

but with there having been a big architectural innovation, not Stack More Layers



whereas I expect layer stacking + maybe changing loss (since logprob is too noisy) is sufficient


if you name 5 possible architectural innovations I can call them small or large

Christiano, Cotra, and Yudkowsky on AI progress

but there must be places where you'd strongly disagree w the superforecaster

since you disagree with them eventually, e.g. >2/3 doom by 2030

On LW, Eliezer notes that this isn't an actual prediction he's made, and is based on a misunderstanding of something he wrote once.

(Will update the post with an inline comment to note this.)

Ngo and Yudkowsky on AI capability gains

tl;dr: Eliezer and Richard disagree about how hard alignment is, so they try to resolve that disagreement by talking about various things that might underlie the disagreement.

Ngo and Yudkowsky on AI capability gains

My comically oversimplified summary of the above conversation, which is not endorsed by Richard or Eliezer (and skips over a large number of topics and claims, doesn't try to stay close to the text, etc.):

R: I'm skeptical of your claim that capable-enough-to-save-the-world AI systems will (as a strong default) be means-end reasoners that approximate expected utility (EU) maximizers. (And therefore of your claim that like EU maximizers, as a strong default, they'll think about the long-term consequences of their actions and try to consistently "steer" the future in some direction -- properties that would be worrisome if they held, because they imply convergent instrumental goals like killing humans.)

In particular, I worry that you may be putting too much confidence in abstractions like expected utility, in the same way that you were too confident in recursive self-improvement (RSI) and missed that AI (e.g., GPT-3) could get pretty capable without it. The real world is messy, and abstractions like this often fail in surprising ways; so we should be correspondingly less confident that powerful future AGI systems will conform to the particular abstraction you're pointing at ("expected utility").

E: RSI still strikes me as just as good an abstraction as ever. It's true that I was surprised by how fast ML could advance without RSI, but RSI is properly a claim about what happens when AI gets sufficiently capable, not a claim 'there are no other ways to rapidly increase in capability'.

I see my error as 'giving too much attention to interesting complex ways things can go poorly, and neglecting the simple, banal ways things can go wrong earlier'. If I'm messing  up, it's plausible that I haven't fully fixed that bias and am messing up in a similar way to that. But that doesn't make me think EU is a worse abstraction for its domain of applicability, or make me more optimistic about AI alignment.

R: If EU is a deep fundamental theory, then it should make some novel, verifiable predictions that other theories don't make.

E: EU makes plenty of mundane predictions about, e.g., how humans reason (via weighing futures according to probabilities, etc.), and how humans will tend to behave tomorrow (usually picking up $50 bills when they see them on the ground, etc.).

R: Those seem too obvious -- we already expected those things, so given things like hindsight bias, it's hard to know how much of an advantage those successful predictions should give EU over rival models, if any. I expect something more surprising and impressive, if EU really is a useful enough framework to let us make confident predictions about capable-enough-to-save-the-world AI systems.

E: Those sorts of prediction successes about everyday human behavior strike me as easily good enough, given that I'm not claiming the level of confidence of, e.g., a law of physics. I think you're being unreasonably skeptical here because, like a lot of EAs, you're overly skeptical about useful predictive abstractions, and overly credulous about modest-epistemology norms.

(topic change)

E: In general, I don't expect governments to prepare, coordinate, or exhibit any competence around AGI.

R: I feel more optimistic because before AGI, I think we might see (e.g.) a decade of non-dangerous AI radically transforming and enriching the world.

E: I don't expect that to happen at all, because (a) I don't expect the technology to go that way, and (b) I expect bureaucratic/regulatory obstacles to mostly prevent AI progress from hugely changing the world, until AGI saves or destroys the world.

RobBensinger's Shortform

Copying over part of a conversation from Facebook (where we've both been criticizing / expressing strong skepticism of QRI recently):


Holly Elmore: [...] I mean, it's true that I would like QRI to stop identifying with EA because it is unscientific[.]


Holly Elmore: @Andres Gomez Emilsson  I said QRI is unscientific because of the unrigorous reasoning and poorly motivated privileged hypotheses you’re pursuing. Your communication of your work is atrocious and it doesn’t seem subject to much review. You can do whatever kind of investigations you want but if you want me to call it science or say it should benefit from the EA label you need to have higher standards.

Rob Bensinger: Oh, I do disagree with a thing Holly said, and want to push back on it: "it's true that I would like QRI to stop identifying with EA because it is unscientific".

(Apologies for making it sound like I endorsed everything [Holly said], I forgot that part!)

I'm not sure how to articulate my objection to this Holly-statement, exactly. Something like: the important thing is figuring out what's true, and figuring out what policies that implies for object-level optimizing the world. I think it's unhealthy for EAs to think much, if at all, about what's "identified with EA" or "associated with EA", or what might reputationally damage EA.

If anything, I suspect having a few more crankish things associated with EA would probably be net-positive, because it would cause EAs to despair more of saving their reputation, which might make them give up on that whole 'defend the brand at all costs!' project and start doing more object-level good stuff again. 😛

Another way of putting it is that EA should be a very [hits-based] operation. We should constantly be having crazy projects get spun up and then absolutely demolished by EA Forum posts. This is a sign of the system working, not a sign of rot or error. QRI itself may not be useful, but the existence of a whole bunch of things that are like QRI and are associated with EA and then get demolished is exactly what we should be seeing if EA is functioning properly as a very generative community that tries lots of things and then criticizes/evaluates/prunes freely, in its famously blunt Buck-Shlegeris-like manner. The fact that I'm not seeing more of this worries me quite a lot, actually!

Another way of putting it is that there are two possible versions of EA we could shoot for:

Version 1: filters heavily for protecting its reputation from any association with weird, long-shot, or suspicious orgs. This successfully protects the reputation, but also loses most of EA's impact, because most of EA's impact is in producing very new ideas, longshot projects, unpopular angles of attacks on problems, etc. Not all of those will be winners, but blocking that generation process means we can't win.

Version 2: EA maintains high epistemic standards, but is perfectly happy to be 'associated' with all sorts of weird crazy ideas -- which it then vigorously critiques, has friendly harsh conversations with, etc. The EA Forum is laden with nonsense getting trashed. We have standards, both in what we'll allow on the forum, and in the content of our criticisms; but we take extra effort to make the standards tolerant of weird stuff, and we acknowledge that if we filter too hard for 'seems false to me' then we'll exclude some things that are true-but-revolutionary, which we don't want.

A thing I might endorse is a sort of pipeline, where some EA forums specialize in different parts of the pipeline and QRI might not make it past the earliest or second-earliest stage in that pipeline? And going further down the pipeline is about persuading people that your ideas at least make enough sense to debate/investigate further, not about being reputationally safe for EA.

Holly Elmore: If QRI didn’t claim to be EA, I would just leave them alone to have their theory that I think is bad science. That’s why I mention it. My concern is not reputation but wasting time and resources and weakening epistemic norms. I myself am in a grey literature in EA, and it’s a space we’re figuring out. It should be one of rigor.

Specifically, I see a lot of people looking at QRI, not getting or knowing what they’re talking about immediately, and leaving thinking “okay, interesting.” I want it to be much harder for someone to put a lot of confusing stuff on the forum or their website and have people update in favor of it because they like the aesthetic or consider them part of the community.

@Rob Bensinger  If QRI presented their work more humbly then I would take it for earlier pipeline stuff

Rob Bensinger: "If QRI presented their work more humbly then I would take it for earlier pipeline stuff"

I don't endorse this either, at all -- this heuristic would have stomped early SingInst 😛

Holly Elmore: @Rob Bensinger  Those are pretty different things. QRI is trying to find the truth, not effect change.

Rob Bensinger: Cranks often think their thing is extremely important, are very vocal and insistent about this, and are often too socially inept to properly genuflect to the status hierarchy and sprinkle in the right level of 'but maybe I'm wrong' to look properly humble and self-aware to third-party skeptics.

People with actual importantly novel insights and ideas, risky successful projects, radically new frameworks and models, etc. also often think their thing is extremely important, are also often very vocal and insistent about this, and are also often too socially inept to properly genuflect and caveat and 'look like a normal moderate calm person'. Especially the 'young genius' types.

We should have standards, but I think the standards should overwhelmingly be about the actual object-level claims, not about 'does this person seem too confident', 'does this person sound properly detached and calm like our mental image of a respectable lab-coat-wearing scientist', etc.

Holly Elmore: @Rob Bensinger  I think you are wrong about how much this is a status thing, and I'm irritated that a lot of people are viewing it that way (including at least @Andres and maybe others at QRI). On the forum, someone commented that they liked QRI because they wanted to "keep EA weird." All of those things are very beside the point of whether their claims are intelligible and true. Affirmative action for "cranks" is the last thing we need in any kind of truth-seeking movement.

@Rob Bensinger  I mentioned humility bc I think it's important to make the strength of your claims clear and not motte-bailey the way MIke and Andres have on very strong claims. It's about accuracy, not status. EDIT: And if QRI presented its claims as more exploratory, earlier pipeline stuff, then I would think that was more appropriate and not object as much.

Ronny Fernandez: @Holly Elmore  So you agree that we should have very high standards, and those standards should be only about object level stuff?

Holly Elmore: @Ronny Fernandez  The height of the standards depends on how formal the context, but yeah only the object level stuff matters

@Ronny Fernandez  Separately, I'm really annoyed that people who apparently haven't read QRI's posts think that I must be judging them for not being academics. Maybe you have to have a background in neuroscience for it to be obvious but the quality is clearly low at the object-level. I even think if you just have a background in rhetoric or rationality you can see how many of the arguments and the ways evidence is brought together don't follow.

Rob Bensinger: "Those [SingInst vs. QRI] are pretty different things. QRI is trying to find the truth, not effect change."

Early SingInst was a mix of research org, outreach org, attempt-to-raise-the-sanity-waterline, etc. It was also trying to find the truth; and like QRI, it had controversial beliefs; and in both cases there ought to be important practical implications if the beliefs are true.

I'd say that for both orgs the main important feature of the org isn't the outreach (or lack thereof); it's the org's claims to have discovered rare truths about the world, which aren't widely understood or accepted (like 'AGI won't be nice unless you do a bunch of hard work to align it').

From my perspective, the difference is that QRI is (AFAICT) wrong about everything and has terrible methodology, whereas MIRI's methodology was great and got them far ahead of the curve on a huge list of AGI-related topics.

I think we agree about these three things (let me know if I'm wrong):

1. EA should be a breeding ground for SingInst-ish orgs.

2. EA should push back super hard against QRI-ish orgs, harshly criticizing their claims. Possibly even banning their stuff from the EA Forum at some point, if it seems to be sucking up lots of attention on pointless crackpottery? (I don't actually know whether the case against QRI's stuff is as airtight as all that, or whether it's worth setting the forum norms at that particular threshold; but I could imagine learning more that would make me think this.)

3. EA should have very high intellectual standards in general, and should freely criticize ideas that seem implausible, and criticize arguments that seem (inductively) invalid or weak.

I'd also guess we agree that EA shouldn't obsess about its reputation. Where I'd guess we might disagree is (i) that I'm probably more wary of reputation-management stuff, and want to steer an even wider berth around that topic; (ii) I might be less satisfied than you with the current rate of 'EA spawning weird SingInst-like projects'; and (iii) I might be less confident than you that QRI at the start of the pipeline was obviously un-SingInst-like.

Does that disagree with your model anywhere?

Holly Elmore: @Rob Bensinger  No, I think that captures it nicely

Rob Bensinger: "Affirmative action for 'cranks' is the last thing we need in any kind of truth-seeking movement."

The thing I'm advocating isn't 'affirmative action for cranks' -- it's 'ignore weirdness, social awkwardness, etc. in college applications more than we naturally would, in order to put even more focus on object-level evaluation'.

I think also you misunderstood me as saying 'your objections to QRI are all about status', or even 'we shouldn't ban QRI from the EA Forum for intellectual shoddiness'. I doubt the former, and I don't really have a view about the latter. My response was narrowly aimed at phrasings like "if QRI presented their work more humbly" and "I would like QRI to stop identifying with EA", where even if you didn't mean those things in status-y ways, I still wanted to guard against anyone interpreting those statements that way and coming away thinking that *that* is an EA norm.

It sounds like I misunderstood the extent to which you endorsed 'focus on reputation stuff in this context' -- sorry about that. 🙂 In that case, interpret what I said as 'picking on your word choice and then going off on a tangent', rather than successfully engaging with what you actually believe.

Ngo and Yudkowsky on alignment difficulty

I think it's an Eliezer-neologism that was meant to highlight an analogy between moral consequentialism ('you should base actions on their consequences') and the kind of reasoning he's talking about ('you do base actions on their consequences').

First reference I see to it is in Protein Reinforcement and DNA Consequentialism (2007, emphasis added):

There's a long chain of causality whereby a male squirrel, eating a nut today, produces more offspring months later:  Chewing and swallowing food, to digesting food, to burning some calories today and turning others into fat, to burning the fat through the winter, to surviving the winter, to mating with a female, to the sperm fertilizing an egg inside the female, to the female giving birth to an offspring that shares 50% of the squirrel's genes.

With the sole exception of humans, no protein brain can imagine chains of causality that long, that abstract, and crossing that many domains.  With one exception, no protein brain is even capable of drawing the consequential link from chewing and swallowing to inclusive reproductive fitness.


Why not learn to like food based on reproductive success, so that you'll stop liking the taste of candy if it stops leading to reproductive success?  Why don't birds wait and see which wing-flapping policies result in more eggs, not just more stability?

Because it takes too long.  Reinforcement learning still requires you to wait for the detected consequences before you learn.

Now, if a protein brain could imagine the consequences, accurately, it wouldn't need a reinforcement sensor that waited for them to actually happen.

Put a food reward in a transparent box.  Put the corresponding key, which looks unique and uniquely corresponds to that box, in another transparent box.  Put the key to that box in another box.  Do this with five boxes.  Mix in another sequence of five boxes that doesn't lead to a food reward.  Then offer a choice of two keys, one which starts the sequence of five boxes leading to food, one which starts the sequence leading nowhere.

Chimpanzees can learn to do this.  (Dohl 1970.)  So consequentialist reasoning, backward chaining from goal to action, is not strictly limited to Homo sapiens.

But as far as I know, no non-primate species can pull that trick.  And working with a few transparent boxes is nothing compared to the kind of high-falutin' cross-domain reasoning you would need to causally link food to inclusive fitness.  (Never mind linking reciprocal altruism to inclusive fitness).  Reinforcement learning seems to evolve a lot more easily.

Followed up in Thou Art Godshatter:

Now it's clear, as was discussed yesterday, that it's hard to build a powerful enough consequentialist. Natural selection sort-of reasons consequentially, but only by depending on the actual consequences. Human evolutionary theorists have to do really high-falutin' abstract reasoning in order to imagine the links between adaptations and reproductive success.

Ngo and Yudkowsky on alignment difficulty

'Means-ends reasoning', 'selecting policies on the basis of their predicted consequences', etc. Discussed more in consequentialist cognition, and not to be confused with the consequentialist theory of moral value.

Load More