Software Engineer @ Lightcone Infrastructure
Working (6-15 years of experience)
289Joined May 2022


LessWrong dev & admin as of July 5th, 2022.


Topic Contributions

No, sorry, I meant that at the time the feature was released (a few months ago), it didn't have any karma requirement.

Clarifying a couple of points:

  • Crossposting used to be totally unrestricted; it now requires a user to have 100 karma on both LW and the EA Forum (regardless of which side they're crossposting from) to use the feature
  • While historically most development was driven by the LW team, in the last year or so the EA Forum team has hired more engineers and is now larger than the LW team by headcount (and very likely by lines-of-code shipped, or whatever other metric you want to use to analyze "how much stuff did they do").

I don't think that the top-level comment is particularly responsive to the post, except insofar as it might have taken the title as a call to action (and then ignored the rest of the post).  It's also quite vague.  But I agree that a ban seems like an unusually harsh response, absent additional context which supports the "provocation" interpretation.

Yes, I agree that there's a non-trivial divide in attitude.  I don't think the difference in discussion is surprising, at least based on a similar pattern observed with the response to FTX.   From a quick search and look at the tag, there were on the order of 10 top-level posts on the subject on LW.  There are 151 posts under the FTX collapse tag on the EA forum, and possibly more untagged.

There's definitely no censorship of the topic on LessWrong.  Obviously I don't know for sure why discussion is sparse, but my guess is that people mostly (and, in my opinion, correctly) don't think it's a particularly interesting or fruitful topic to discuss on LessWrong, or that the degree to which it's an interesting subject is significantly outweighed by mindkilling effects.

Edit: with respect to the rest of the comment, I disagree that rationalists are especially interested in object-level discussion of the subjects, but probably are much more likely to disapprove of the idea that discussion of the subject should be verboten. 

I think the framing where Bostrom's apology is a subject which has to be deliberately ignored is mistaken.  Your prior for whether something sees active discussion on LessWrong is that it doesn't, because most things don't, unless there's a specific reason you'd expect it to be of interest to the users there.  I admit I haven't seen a compelling argument for there being a teachable moment here, except the obvious "don't do something like that in the first place", and perhaps "have a few people read over your apology with a critical eye before posting it" (assuming that didn't in fact happen).  I'm sure you could find a way to tie those in to the practice of rationality, but it's a bit of a stretch.

It's possible I've flipped the sign on what you're saying, but if I haven't, I'm pretty sure most EAs are not moral realists, so I don't know where you got the impression that it's an underlying assumption of any serious EA efforts.

If I did flip the sign, then I don't think it's true that moral realism is "too unquestioned".  At this point it might be more fair to say that too much time & ink has been spilled on what's frankly a pretty trivial question that only sees as much engagement as it does because people get caught up in arguing about definitions of words (and, of course, because some other people are deeply confused).

If your headline claim is that someone has a "fairly poor track record of correctness", then I think "using a representative set of examples" to make your case is the bare-minimum necessary for that to be taken seriously, not an isolated demand for rigor.

My guess is that he meant the sequences convey the kind of more foundational epistemology which helps people people derive better models on subjects like AI Alignment by themselves, though all of the sequences in The Machine in the Ghost and Mere Goodness have direct object-level relevance.


Excepting Ngo's AGI safety from first principles, I don't especially like most of those resources as introductions exactly because they offer readers very little opportunity to test or build on their beliefs.   Also, I think most of them are substantially wrong.  (Concrete Problems in AI Safety seems fine, but is also skipping a lot of steps.  I haven't read Unsolved Problems in ML Safety.)

“I don’t currently have much sympathy for someone who’s highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover … are under 10% or over 90%).”


I find this difficult to square with the fact that:

  • Absent highly specific victory conditions, the default (P = 1 - ε) outcome is takeover.
  • Of the three possibilities you list, interpretability seems like the only one that's actually seen any traction, but:
    • there hasn't actually been very much progress beyond toy problems
    •  it's not clear why we should expect it to generalize to future paradigms
    • we have no idea how to use any "interpretation" to actually get to a better endpoint
    • interpretability, by itself, is insufficient to avoid takeover, since you lose as soon as any other player in the game messes up even once
  • The other potential hopes you enumerate require people in the world to attempt to make a specific thing happen.   For  most of them, not only is practically nobody working on making any of those specific things happen, but many people are actively working in the opposite direction. In particular, with respect to the "Limited AI" hope, the leading AI labs are pushing quite hard on generality, rather than on narrow functionality.  This has obviously paid off in terms of  capability gains over "narrow" approaches.  Being able to imagine a world where something else is happening does not tell us how to get to that world.


I can imagine having an "all things considered" estimation (integrating model uncertainty, other people's predictions, etc) of under 90% for failure.  But I don't understand writing off the epistemic position of someone who has an "inside view" estimation of >90% failure, especially given the enormous variation of probability distributions that people have over timelines (which I agree are an important, though not overwhelming, factor when it comes to estimating chances of failure).  Indeed, an "extreme" inside view estimation conditional on short timelines seems much less strange to me than a "moderate" one.  (The only way a "moderate" estimation makes sense to me is if it's downstream of predicting the odds of success for a specific research agenda, such as in John Wentworth's The Plan - 2022 Update, and I'm not even sure one could justifiably give a specific research agenda 50% odds of success nearly a decade out as the person who came up with it, let alone anyone looking in from the outside.)

There's obviously substantial disagreement here, but the most recent salient example (and arguably the entire surrounding context of OpenAI as an institution).

Load More