2772 karmaJoined


To help with the talent pipeline, GovAI currently runs twice-a-year three-month fellowships. We've also started offering one-year Research Scholar positions. We're also now experimenting with a new policy program. Supporting the AI governance talent pipeline is one of our key priorities as an organization.

That being said, we're very very far from filling the community's needs in this regard. We're currently getting far more strong applications than we have open slots. (I believe our acceptance rate for the Summer Fellowship is something like 5% and will probably keep getting lower. We now need to reject people who actually seem really promising.) We'd like to scale our programs up more, but even then there will still be an enormous unmet need. I would definitely welcome more programs in this space!

I really appreciate the donation to GovAI!

According to staff I've talked to, MIRI is not heavily funding constrained, but that they believe they could use more money. I suspect GovAI is in a similar place but I have not inquired.

For reference, for anyone thinking of donating to GovAI: I would currently describe us as “funding constrained” — I do current expect financial constraints to prevent us from making program improvements/expansions and hires we’d like to make over the next couple years. (We actually haven’t yet locked down enough funding to maintain our current level of operation for the next couple years, although I think that will probably come together soon.)

We’ll be putting out a somewhat off-season annual report soon, probably in the next couple weeks, that gives a bit of detail on our current resources and what we would use additional funding for. I’m also happy to share more detailed information upon request, if anyone might be interested in donating and wants to reach out to me at ben.garfinkel@governance.ai.

Thanks for the thoughtful comment!

So it's not enough to be "no less democratic than other charity orgs". I believe we should strive to be much more democratic than that average - which seems to me like a minority view here.

I do think that this position - "EA foundations aren't unusually undemocratic, but they should still be a lot more democratic than they are" - is totally worthy of discussion. I think you're also right to note that other people in the community tend to be skeptical of this position; I'm actually skeptical of it, myself, but I would be interested in reading more arguments in favor of it.

(My comment was mostly pushing back against the suggestion that the EA community is distinctly non-democratic.)

I'm assuming you're right about the amount of democracy in other non-profits, but the situation in my country is actually different. All non-profits have members who can call an assembly and have final say on any decision or policy of the non-profit.

I've never heard of this - that sounds very like a really interesting institutional structure! Can I ask what you're country you're in, or if there's anything to read on how this works in practice?

Every time the issue of taxes comes up, it's a very popular opinion that people should avoid as much taxes as possible to redirect the money to what they personally deem effective. This is usually accompanied by insinuations that democratically elected governments are useless or harmful.

The first part of this does seem like a pretty common opinion to me - fair to point that out!

On the second: I don't think "democratic governments are useless or harmful" is a popular opinion, if the comparison point is either to non-democratic governments or no government. I do think "government programs are often really inefficient or poorly targeted" and "governments often fail to address really important issues" are both common opinions, on the other hand, but I don't really interpret these as being about democracy per se.[1]

One thing that's also complicated, here, is that the intended beneficiaries of EA foundations' giving tend to lack voting power in the foundations' host countries: animals, the poor in other countries, and future generations. So trying to redirect resources to these groups, rather than the beneficiaries preferred by one's national government, can also be framed as a response to the fact that (e.g.) the US government is insufficiently democratic: the US government doesn't have any formal mechanisms for representing the interests of most of the groups that have a stake in its decisions. Even given this justification, I think it probably would still be a stretch to describe the community tendency here as overall "democratic" in nature. Nonetheless, I think it does at least make the situation a little harder to characterize.

  1. At least speaking parochially, I also think of these as relatively mainstream opinions in the US rather than opinions that feel distinctly EA. Something I wonder about, sometimes, is whether cross-country differences are underrated as a source of disagreement within and about the EA community. Your comment about how non-profits work in your country was also thought-provoking in this regard! ↩︎


To be clear, though, I also don't think people should feel like they need to write out comments explaining their strong downvotes. I think the time cost is too high for it to be a default expectation, particularly since it can lead to getting involved in a fraught back-and-forth and take additional time and energy that way. I don't use strong downvotes all that often, but, when I do use them, it's rare that I'll also write up an explanatory comment.

(Insofar as I disagree with forum voting norms, my main disagreement is that I'd like to see people have somewhat higher bars for strong downvoting comments that aren't obviously substanceless or norm-violating. I think there's an asymmetry between upvotes and downvotes, since downvotes often feel aggressive or censorious to the downvoted person and the people who agree with them. For that reason, I think that having a higher bar for downvotes than for upvotes helps to keep discussions from turning sour and helps avoid alienating people more than necessary.)

I do think it's reasonable to feel frustrated by your experience commenting on this post. I think you should have been engaged more respectfully, with more of an assumption of good faith, and that a number of your comments shouldn't have been so heavily downvoted. I do also agree with some of the concerns you've raised in your comments and think it was useful for you to raise them.[1]

At the same time, I do think this comment isn't conducive to good conversation, and the content mostly strikes me as off-base.

  • The EA community doesn't have its roots in management consultancy. Off the top of my head, I can't think of anyone who's sometimes considered a founding figure (e.g. Singer, Parfit, Ord, MacAskill, Yudkowsky, Karnofsky, Hassenfeld) who was a management consultant. Although the community does have some people who were or are management consultants, they don't seem overrepresented in any interesting way.

  • At least on the two most obvious interpretations, I don't think the EA community rejects democracy to any unusual degree. If you mean "people involved in EA reject democracy as a political system," then I think I've literally never heard anyone express pro-autocracy views. If you mean "organizations in the EA space reject directly democratic approaches to decision-making," then that is largely true, but I don't think it's in any way a distinctive feature of the community. I think that almost no philanthropic foundations, anywhere, decide where to give money using anything like a popular vote; I think the same is generally true of advocacy and analysis organizations. I'd actually guess that EA organizations are actually somewhat more democratic-leaning than comparable organizations in other communities; for example, FTX's regranting program is both pretty unusual and arguably a bit "more democratic" than other approaches to giving away money. (If you mean something else by "rejection of democracy," then I apologize for the incorrect interpretations!)

  • Lastly, I don't think the EA community has an unusually heavy preference for the exploit end of the explore-exploit trade-off; I think the opposite is true. I can't think of any comparable community that devotes a larger amount of energy to the question "What should we try to do?", relative to actually trying to do things. I think this is actually something that turns off a lot of entrepreneurial and policy-minded people who enter the community, who want to try to accomplish concrete things and then get discouraged by what they perceive as a culture of constant second-guessing and bias against action.[2]

  1. For example, although I'm on balance in favor of the current strong upvote system, I agree it also has important downsides. And although I'm pretty bearish on the value of standard academic peer-review processes, I do think it's really useful for especially influential reports to be published alongside public reviews from subject matter experts. For example, when it publishes long reports, OpenPhil sometimes also publishes open reviews from subject matter experts; I think it would be great to see more of that, even though it's costly. ↩︎

  2. On the other hand, even though I don't like the term, I do think it's fair to say there's an unusually large "STEMlord-ism" undercurrent to the culture. People often do have much more positive impressions of STEM disciplines (+econ and the more technical parts of analytic philosophy), relative to non-STEM disciplines. I think this attitude isn't necessarily wrong, but I do think you're correct to perceive that it's there. ↩︎

I generally think it'd be good to have a higher evidential bar for making these kinds of accusations on the forum. Partly, I think the downside of making an off-base socket-puppeting accusation (unfair reputation damage, distraction from object-level discussion, additional feeling of adversarialism) just tends to be larger than the upside of making a correct one.

Fwiw, in this case, I do trust that A.C. Skraeling isn't Zoe. One point on this: Since she has a track record of being willing to go on record with comparatively blunter criticisms, using her own name, I think it would be a confusing choice to create a new pseudonym to post that initial comment.

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

The post expresses my view that these two considerations at least counterbalance each other - so that, overall, Yudkowsky's risk estimates shouldn't be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.

But I don't do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.

I should have either done more to explicitly defend my view or simply framed the post as "some evidence about the reliability of Yudkowsky's risk estimates."

2. I would be clearer about how and why I generated these examples

In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are - and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.

For context, then, here was the process:

A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.

These gave me the impression that Yudkowsky’s track record (and - to some extent - the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”

I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.

I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.

People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.

I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.

Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.

I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.

(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)

3. I’d tweak my discussion of take-off speeds

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)

4. I’d add further caveats to the “coherence arguments” case - or simply leave it out

Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.

Positions I stand by:

On the flipside, here’s a set of points I still stand by:

1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects

In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.

In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.

I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.

2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.

Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.

Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.

Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.

3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?

4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.

One counter - which I definitely think it’s worth reflecting on - is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.

I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past - but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now - its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.

(Yudkowsky's early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)

5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable

By analogy:

  • I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.

  • I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.

There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.[2] But these reasons aren’t absolute.

There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.[3]

Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.

6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them

At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk - and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.

Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact - and I’m not sure I think this issue will ultimately be relevant.[4]

  1. For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time. ↩︎

  2. One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else. ↩︎

  3. Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment. ↩︎

  4. Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think's probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.

    Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably "got lucky" in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas. ↩︎

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.

(The limited target audience might be something I don't do a good enough job communicating in the post.)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).

I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it's important to know whether the risk of everyone dying soon is 5% or 99%. It's not enough just to determine whether we should take AI risk seriously.

We're also now past the point, as a community, where "Should AI risk be taken seriously?" is that much of a live question. The main epistemic question that matters is what probability we assign to it - and I think this post is relevant to that.

(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)

I definitely recommend people read the post Paul just wrote! I think it's overall more useful than this one.

But I don't think there's an either-or here. People - particularly non-experts in a domain - do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.

The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

I discuss this in response to another comment, here, but I'm not convinced of that point.

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

Load more