[[THIRD EDIT: Thanks so much for all of the questions and comments! There are still a few more I'd like to respond to, so I may circle back to them a bit later, but, due to time constraints, I'm otherwise finished up for now. Any further comments or replies to anything I've written are also still be appreciated!]]


I'm Ben Garfinkel, a researcher at the Future of Humanity Institute. I've worked on a mixture of topics in AI governance and in the somewhat nebulous area FHI calls "macrostrategy", including: the long-termist case for prioritizing work on AI, plausible near-term security issues associated with AI, surveillance and privacy issues, the balance between offense and defense, and the obvious impossibility of building machines that are larger than humans.

80,000 Hours recently released a long interview I recorded with Howie Lempel, about a year ago, where we walked through various long-termist arguments for prioritizing work on AI safety and AI governance relative to other cause areas. The longest and probably most interesting stretch explains why I no longer find the central argument in Superintelligence, and in related writing, very compelling. At the same time, I do continue to regard AI safety and AI governance as high-priority research areas.

(These two slide decks, which were linked in the show notes, give more condensed versions of my views: "Potential Existential Risks from Artificial Intelligence" and "Unpacking Classic Arguments for AI Risk." This piece of draft writing instead gives a less condensed version of my views on classic "fast takeoff" arguments.)

Although I'm most interested in questions related to AI risk and cause prioritization, feel free to ask me anything. I'm likely to eventually answer most questions that people post this week, on an as-yet-unspecified schedule. You should also feel free just to use this post as a place to talk about the podcast episode: there was a thread a few days ago suggesting this might be useful.


New comment
140 comments, sorted by Click to highlight new comments since: Today at 9:25 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Have you considered doing a joint standup comedy show with Nick Bostrom?

Yes, but they're typically invite-only.

I want to push back against this, from one of your slides:

If we’ve failed to notice important issues with classic arguments until recently, we should also worry about our ability to assess new arguments

I feel like the LW community did notice many important issues with the classic arguments. Personally, I was/am pessimistic about AI risk, but thought my reasons were not fully or most captured by the those arguments, and I saw various issues/caveats with them that I talked about on LW. I'm going to just cite my own posts/comments because they're the easiest to find, but I'm sure there were lots of criticisms from others too. 1 2 3 4

Of course I'm glad that you thought about and critiqued those arguments in a more systematic and prominent way, but it seems wrong to say or imply that nobody noticed their issues until now.

Hi Wei, I didn't mean to imply that no one had noticed any issues until now. I talk about this a bit more in the podcast, where I mention people like Robin Hanson and Katja Grace as examples of people who wrote good critiques more than a decade ago, and I believe mention you as someone who's had a different take on AI risk. Over the past 2-3 years, it seems like a lot of people in the community (myself included) have become more skeptical of the classic arguments. I think this has at least partly been the result of new criticisms or improved formulations of old criticisms surfacing. For example, Paul's 2018 post [https://sideways-view.com/2018/02/24/takeoff-speeds/] arguing against a "fast takeoff" seems to have been pretty influential in shifting views within the community. But I don't think there's any clear reason this post couldn't have been written in the mid-2000s.

Do you think there were any deficits in epistemic modesty in the way the EA community prioritised AI risk, or do you think it was more that no-one sat down and examined the object-level arguments properly? Alternatively, do you think that there was too much epistemic modesty in the sense that everyone just deferred to everyone else on AI risk?

I feel that something went wrong, epistemically, but I'm not entirely sure what it was.

My memory is that, a few years ago, there was a strong feeling within the longtermist portion of the EA community that reducing AI risk was far-and-away the most urgent problem. I remember there being a feeling that the risk was very high, that short timelines were more likely than not, and that the emergence of AGI would likely be a sudden event. I remember it being an open question, for example, whether it made sense to encourage people to get ML PhDs, since, by the time they graduated, it might be too late. There was also, in my memory, a sense that all existing criticisms of the classic AI risk arguments were weak. It seemed plausible that the longtermist EA community would pretty much just become an AI-focused community. Strangely, I'm a bit fuzzy on what my own views were, but I think they were at most only a bit out-of-step.

This might be an exaggerated memory. The community is also, obviously, large enough for my experience to be significantly non-representative. (I'd be interested in whether the above description resonates with anyone else.) But, in any case, I am pretty confident that th

... (read more)
I'd be interested in whether the above description resonates with anyone else.

FWIW, it mostly doesn't resonate with me. (Of course, my experience is no more representative than yours.) Just as you I'd be curious to hear from more people.

I think what matches my impression most is that:

  • There has been a fair amount of arguably dysfunctional epistemic deference (more at the very end of this comment); and
  • Concerns about AI risk have become more diverse. (Though I think even this has been a mix of some people such as Allan Dafoe raising genuinely new concerns and people such as Paul Christiano explaining the concerns which for all I know they've always had more publicly.)

On the other points, my impression is that if there were consistent and significant changes in views they must have happened mostly among people I rarely interact with personally, or more than 3 years ago.

  • One shift in views that has had major real-world consequences is Holden Karnofsky, and by extension Open Phil, taking AI risk more seriously. He posted about this in September 2016, so presumably he changed his mind over the months prior to that.
  • I started to engage more deeply with public discussio
... (read more)
6Rohin Shah3y
My experience matches Ben's more than yours. All of the people you named didn't have an ML background. Adam and I have CS backgrounds (before we joined CHAI, I was a PhD student in programming languages, while Adam worked in distributed systems iirc). Ben is in international relations. If you were counting Paul, he did a CS theory PhD. I suspect all of us chose the "ML track" because we disagreed with MIRI's approach and thought that the "ML track" would be more impactful. (I make a point out of this because I sometimes hear "well if you started out liking math then you join MIRI and if you started out liking ML you join CHAI / OpenAI / DeepMind and that explains the disagreement" and I think that's not true.) I've heard this (might be a Bay Area vs. Europe thing).
Thanks, this seems like an important point, and I'll edit my comment accordingly. I think I had been aware of at least Paul's and your backgrounds, but made a mistake by not thinking of this and not distinguishing between your prior backgrounds and what you're doing now. (Nitpick: While Ben is doing an international relations PhD now, I think his undergraduate degree was in physics and philosophy.) I still have the impression there is a larger influx of people with ML backgrounds, but my above comment overstates that effect, and in particular it seems clearly false to suggest that Adam / Paul / you preferring ML-based approaches has a primarily sociological explanation (which my comment at least implicitly does). (Ironically, I have long been skeptical of the value of MIRI's agent foundations research, and more optimistic about the value of ML-based approaches to AI safety and Paul's IDA agenda in particular - though I'm not particularly qualified to make such assessments, certainly less so than e.g. Adam and you -, and my background is in pure maths rather than ML. That maybe could have tipped me off ...)
This Robin Hanson quote [https://www.overcomingbias.com/2019/05/expand-vs-fight-social-justice-fertility-bioconservatism-and-ai-risk.html] is perhaps also evidence for a shift in views on AI risk, somewhat contra my above comment, though neutral on the "people changed their minds vs. new people have different views" and "when exactly did it happen?" questions: (I expect many people worried about AI risk think that Hanson, in the above quote and elsewhere, misunderstands current concerns. But perceiving some change seems easier than correctly describing the target of the change, so arguably the quote is evidence for change even if you think it misunderstands current concerns.)

I think that instead of talking about potential failures in the way the EA community prioritized AI risk, it might be better to talk about something more concrete, e.g.

  • The views of the average EA
  • How much money was given to AI
  • How many EAs shifted their careers to be AI-focused as opposed to something else that deserved more EA attention

I think if we think there were mistakes in the concrete actions people have taken, e.g. mistaken funding decisions or mistaken career changes (I’m not sure that there were), we should look at the process that led to those decisions, and address that process directly.

Targeting ‘the views of the average EA’ seems pretty hard. I do think it might be important, because it has downstream effects on things like recruitment, external perception, funding, etc. But then I think we need to have a story for how we affect the views of the average EA (as Ben mentions). My guess is that we don’t have a story like that, and that’s a big part of ‘what went wrong’-- the movement is growing in a chaotic way that no individual is responsible for, and that can lead to collectively bad epistemics.

‘Encouraging EAs to ... (read more)

2Jack Malde3y
I second this. I think Halstead's question is an excellent one and finding an answer to it is hugely important. Understanding what went wrong epistemically (or indeed if anything did in fact go wrong epistemically) could massively help us going forward. I wonder how we get the ball rolling on this...?

Which of the EA-related views you hold are least popular within the EA community?

I'm not sure how unpopular these actually are, but a few at least semi-uncommon views would be:

  • I'm pretty sympathetic to non-naturalism, in the context of both normativity and consciousness

  • Controlling for tractability, I think it's probably more important to improve the future (conditional on humanity not going extinct) than to avoid human extinction. (The gap between a mediocre future or bad future and the best possible future is probably vast.)

  • I don't actually know what my credence is here, since I haven't thought much about the issue, but I'm probably more concerned about growth slowing down and technological progress stagnating than the typical person in the community

What resources would you recommend on ethical non-naturalism? Seems like a plausible idea I don’t know much about.
Michael Huemer's "Ethical Intuitionism" and David Enoch's "Taking Morality Seriously" are both good; Enoch's book is, I think, better, but Huemer's book is a more quick and engaging read. Part Six of Parfit's "On What Matters" is also good. I don't exactly think that non-naturalism is "plausible," since I think there are very strong epistemological objections to it. (Since our brain states are determined entirely by natural properties of the world, why would our intuitions about non-natural properties track reality?) It's more that I think the alternative positions are self-undermining or have implications that are unacceptable in other ways.
Parfit isn't quite a non-naturalist (or rather, he's a very unconventional kind of non-naturalist, not a Platonist) - he's a 'quietist' [https://www.google.com/search?client=firefox-b-d&q=parfit+quietism]. Essentially, it's the view that there are normative facts, they aren't natural facts, but we don't feel the need to say what category they fall into metaphysically, or that such a question is meaningless. I think a variant of that, where we say 'we don't currently have a clear idea what they are, just some hints that they exist because of normative convergence, and the internal contradictions of other views' [https://forum.effectivealtruism.org/posts/C2GpA894CfLcTXL2L/moral-anti-realism-sequence-3-against-irreducible?commentId=u4rvohGXhsWCQxW9n] is plausible:

What are the key issues or causes that longtermists should invest in, in your view? And how much should we invest in them, relatively speaking? What issues are we currently under-investing in?

Have you had any responses from Bostrom or Yudkowsky to your critiques?

Would you rather be one or two dogs?

I'm sorry, but I consider that a very personal question.

Hi Ben. I just read the transcript of your 80,000 Hours interview and am curious how you'd respond to the following:

Analogy to agriculture, industry

You say that it would be hard for a single person (or group?) acting far before the agricultural revolution or industrial revolution to impact how those things turned out, so we should be skeptical that we can have much effect now on how an AI revolution turns out.

Do you agree that the goodness of this analogy is roughly proportional to how slow our AI takeoff is? For instance if the first AGI ever created becomes more powerful than the rest of the world, then it seems that anyone who influenced the properties of this AGI would have a huge impact on the future.


You argue that if we transition more smoothly from super powerful narrow AIs that slowly expand in generality to AGI, we'll be less caught off guard / better prepared.

It seems that even in a relatively slow takeoff, you wouldn't need that big of a discontinuity to result in a singleton AI scenario. If the first AGI that's significantly more generally intelligent than a human is created in a world where lots of powerful narrow AIs exist, wouldn&apo... (read more)

Hi Elliot, Thanks for all the questions and comments! I'll answer this one in stages. On your first question: I agree with this. To take the fairly extreme case of the Neolithic Revolution, I think that there are at least a few reasons why groups at the time would have had trouble steering the future. One key reason is what the world was highly "anarchic [https://en.wikipedia.org/wiki/Anarchy_(international_relations)]," in the international relations sense of the term: there were many different political communities, with divergent interests and a limited ability to either coerce one another or form credible commitments. One result of anarchy is that, if the adoption of some technology or cultural/institutional practice would give some group an edge, then it's almost bound to be adopted by some group at some point: other groups will need to either lose influence or adopt the technology/innovation to avoid subjugation. This explains why the emergence and gradual spread of agricultural civilization was close to inevitable, even though (there's some evidence) people often preferred the hunter-gatherer way of life. There was an element of technological or economic determinism that put the course of history outside of any individual group's control (at least to a significant degree). Another issue, in the context of the Neolithic Revolution, is that norms, institutions, etc., tend to shift over time, even in there aren't very strong selection pressures. This was even more true before the advent of writing. So we do have a few examples of religious or philosophical traditions that have stuck around, at least in mutated forms, for a couple thousand years; but this is unlikely, in any individual case, and would have been even more unlikely 10,000 years ago. At least so far, we also don't have examples of more formal political institutions (e.g. constitutions) that have largely stuck around for more than few thousand years either. There are a couple reasons why AI cou
Thanks to Ben for doing this AMA, and to Elliot for this interesting set of questions! Just wanted to mention two links that readers might find interesting in this context. Firstly, Tomasik's Will Future Civilization Eventually Achieve Goal Preservation? [https://reducing-suffering.org/will-future-civilization-eventually-achieve-goal-preservation/] Here's the summary: Secondly, Bostrom's What is a Singleton? [https://www.nickbostrom.com/fut/singleton.html] Here's a quote:
I think there are a couple different bits to my thinking here, which I sort of smush together in the interview. The first bit is that, when developing an individual AI system, its goals and capabilities/intelligence tend to take shape together. This is helpful, since it increases the odds that we'll notice issues with the system's emerging goals before they result in truly destructive behavior. Even if someone didn't expect a purely dust-minimizing house-cleaning robot to be a bad idea, for example, they'll quickly realize their mistake as they train the system. The mistake will be clear well before the point when the simulated robot learns how to take over the world; it will probably be clear even before the point when the robot learns how to operate door knobs. The second bit is that there are many contexts in which pretty much any possible hand-coded reward function will either quickly reveal itself as inappropriate or be obviously inappropriate before the training process even begins. This means that sane people won’t proceed in developing and deploying things like house-cleaning robots or city planners until they’ve worked out alignment techniques to some degree; they’ll need to wait until we’ve moved beyond “hand-coding” preferences, toward processes that more heavily involve ML systems learning what behaviors users or developers prefer. It’s still conceivable that, even given these considerations, people will still accidentally develop AI systems that commit omnicide (or cause similarly grave harms). But the likelihood at least goes down. First of all, it needs to be the case that (a): training processes that use apparently promising alignment techniques will still converge on omnicidal systems. Second, it needs to be the case that (b): people won’t notice that these training processes have serious issues until they’ve actually made omnicidal AI systems. I’m skeptical of both (a) and (b). My intuition, regarding (a), is that some method that involves lear
I would say that, in a scenario with relatively "smooth" progress, there's not really a clean distinction between "narrow" AI systems and "general" AI systems; the line between "we have AGI" and "we don't have AGI" is either a bit blurry or a bit arbitarily drawn. Even if the management/control of large collections of AI systems is eventually automated, I would also expect this process of automation to unfold over time rather than happening in single go. In general, the smoother things are, the harder it is to tell a story where one group gets out way ahead of others. Although I'm unsure just how "unsmooth" things need to be for this outcome to be plausible. I think that if there were multiple AGI or AGI-ish systems in the world, and most of them were badly misaligned (e.g. willing to cause human extinction for instrumental reasons), this would present an existential risk. I wouldn't count on them balancing each other out, in the same way that endangered gorilla populations shouldn't count on warring communities to balance each other out. I think the main benefits of smoothness have to do with risk awareness (e.g. by observing less catastrophic mishaps) and, especially, with opportunities for trial-and-error learning. At least when the concern is misalignment risk, I don't think of the decentralization of power as a really major benefit in its own right: the systems in this decentralized world still mostly need to be safe. I think it's plausible that especially general systems would be especially useful for managing the development, deployment, and interaction of other AI systems. I'm not totally sure this is the case, though. For example, at least in principle, I can imagine an AI system that is good at managing the training of other AI systems -- e.g. deciding how much compute to devote to different ongoing training processes -- but otherwise can't do much else.

What would you recommend as the best introduction to concerns (or lack thereof) about risks from AI?

If you have time and multiple recommendations, I would be interested in a taxonomy. (E.g. this is the best blog post for non-technical readers, this is the best book-length introduction for CS undergrads.)

I agree with Aidan's suggestion that Human Compatible is probably the best introduction to risks from AI (for both non-technical readers and readers with CS backgrounds). It's generally accessible and engagingly written, it's up-to-date, and it covers a number of different risks. Relative to many other accounts, I think it also has the virtue of focusing less on any particular development scenario and expressing greater optimism about the feasibility of alignment. If someone's too pressed for time to read Human Comptabile, the AI risk chapter in The Precipice would then be my next best bet. Another very readable option, mainly for non-CS people, would be the AI risk chapters in The AI Does Not Hate You: I think they may actually be the cleanest distillation of the "classic" AI risk argument.

For people with CS backgrounds, hoping for a more technical understanding of the problems safety/alignment researchers are trying to solve, I think that Concrete Problems in AI Safety, Scalable Agent Alignment Via Reward Modeling, and Rohin Shah's blog post sequence on "value learning" are especially good picks. Although none of these resources frames safety/alignment research as something that'

... (read more)
FWIW, here's an introduction to longtermism and AI risks I wrote for a friend. (My friend has some technical background, he had read Doing Good Better but not engaged further with EA, and I thought he'd be a good fit for AI Policy research but not technical research.) * Longtermism: Future people matter, and there might be lots of them, so the moral value of our actions is significantly determined by their effects on the long-term future. We should prioritize reducing "existential risks" like nuclear war, climate change, and pandemics that threaten to drive humanity to extinction, preventing the possibility of a long and beautiful future.  * Quick intro to longtermism [https://80000hours.org/articles/future-generations/] and existential risks [https://80000hours.org/articles/extinction-risk/] from 80,000 Hours * Academic paper [https://globalprioritiesinstitute.org/wp-content/uploads/2019/Greaves_MacAskill_The_Case_for_Strong_Longtermism.pdf] arguing that future people matter morally, and we have tractable ways to help them, from the Doing Good Better philosopher * Best resource on this topic: The Precipice, a book [https://www.amazon.com/Precipice-Existential-Risk-Future-Humanity/dp/0316484911] explaining what risks could drive us to extinction and how we can combat them, released earlier this year by another Oxford philosophy professor * Artificial intelligence might transform human civilization within the next century, presenting incredible opportunities and serious potential problems * Elon Musk, Bill Gates, Stephen Hawking, and many leading AI researchers worry that extremely advanced AI poses an existential threat to humanity (Vox [https://www.vox.com/future-perfect/2018/11/2/18053418/elon-musk-artificial-intelligence-google-deepmind-openai]) * Best resource on this topic: Human Compatible, a book [https://www.amazon.com/Human-Compatible-Artificial-In
Generally, I'd like to hear more about how different people introduce the ideas of EA, longtermism, and specific cause areas. There's no clear cut canon, and effectively personalizing an intro can difficult, so I'd love to hear how others navigate it.

This seems like a promising topic for an EA Forum question. I would consider creating one and reposting your comment as an answer to it. A separate question is probably also a better place to collect answers than this thread, which is best reserved for questions addressed to Ben and for Ben's answers to those questions.

Good idea, thanks! I've posted a question here [https://forum.effectivealtruism.org/posts/98Z99CEcpx75xj63u/how-do-you-introduce-longtermism-and-ai-risk]. More broadly, should AMA threads be reserved for direct questions to the respondent and the respondent's answers? Or should they encourage broader discussion of those questions and ideas by everyone? I'd lean towards AMAs as a starting point for broader discussion, rather than direct Q&A. Good examples include the AMAs by Buck Shlegeris [https://forum.effectivealtruism.org/posts/tDk57GhrdK54TWzPY/i-m-buck-shlegeris-i-do-research-and-outreach-at-miri-ama] and Luke Muehlhauser [https://forum.effectivealtruism.org/posts/sxukPJiS5ZnWDX5E3/hi-i-m-luke-muehlhauser-ama-about-open-philanthropy-s-new]. But it does seem that most AMAs are more narrow, focusing on direct question and answer. [For example, this question isn't really directed towards Ben, but I'm asking anyways because the context and motivations are clearer here than they would be elsewhere, making productive discussion more likely. But I'm happy to stop distracting if there's consensus against this.]
9Will Bradshaw3y
I personally would lean towards the "most AMAs" approach of having most dialogue be with the AMA-respondent. It's not quite "questions after a talk", since question-askers have much more capacity to respond and have a conversation, but I feel like it's more in that direction than, say, a random EA social. Maybe something like the vibe of a post-talk mingling session? I think this is probably more important early in a comment tree than later. Directly trying to answer someone else's question seems odd/out-of-place to me, whereas chiming in 4 levels down seems less so. I think this mirrors how the "post-talk mingling" would work: if I was talking to a speaker at such an event, and I asked them a question, someone else answering before them would be odd/annoying – "sorry, I wasn't talking to you". Whereas someone else chiming in after a little back-and-forth would be much more natural. Of course, you can have multiple parallel comment threads here, which alters things quite a bit. But that's the kind of vibe that feels natural to me, and Pablo's comment above suggests I'm not alone in this.

What do you think is the probability of AI causing an existential catastrophe in the next century?

I currently give it something in the .1%-1% range.

For reference: My impression is that this is on the low end, relative to estimates that other people in the long-termist AI safety/governance community would give, but that it's not uniquely low. It's also, I think, more than high enough to justify a lot of work and concern.

I am curious whether you are, in general, more optimistic about x-risks [say, than Toby Ord]. What are your estimates of total and unforeseen anthropogenic risks in the next century?

Toby's estimate for "unaligned artificial intelligence" is the only one that I meaningfully disagree with. I would probably give lower numbers for the other anthropogenic risks as well, since it seems really hard to kill virtually everyone, and since the historical record suggests that permanent collapse is unlikely. (Complex civilizations were independently developed multiple times; major collapses, like the Bronze Age Collapse or fall of the Roman Empire, were reversed after a couple thousand years; it didn't take that long to go from the Neolithic Revolution to the Industrial Revolution; etc.) But I haven't thought enough about civilizational recovery or, for example, future biological weapons to feel firm in my higher level of optimism.
Thanks for sharing your probability estimate; I've now added it to my database of existential risk estimates [https://forum.effectivealtruism.org/posts/JQQAQrunyGGhzE23a/database-of-existential-risk-estimates]. Your estimate is the second lowest one I've come across, with the lower one being from someone (James Fodor [https://forum.effectivealtruism.org/posts/2sMR7n32FSvLCoJLQ/critical-review-of-the-precipice-a-reassessment-of-the-risks]) who I don't think is in the longtermist AI safety/governance community (though they're an EA and engage with longtermist thinking). But I'm only talking about the relatively small number of explicit, public estimates people have given, not all the estimates that relevant people would give, so I'd guess that your statement is accurate. (Also, to be clear, I don't mean to be imply we should be more skeptical of estimates that "stand out from the pack" than those that are closer to other estimates.) I'm curious as to whether most of that .1-1% probability mass is on existential catastrophe via something like the classic Bostrom/Yudkowsky type scenario, vs something like what Christiano describes in What failure looks like [https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like], vs deliberate misuse of AI, vs something else? E.g., is it like you still see the classic scenarios as the biggest cause for concern here? Or is it like you now see those scenarios as extremely unlikely, yet have a residual sense that something as massive as AGI could cause massive bad consequences somehow?
You said in the podcast that the drop was 'an order of magnitude', so presumably your original estimate was 1-10%? I note that this is similar to Toby Ord's in The Precipice (~10%) so perhaps that should be a good rule of thumb: if you are convinced by the classic arguments your estimate of existential catastrophe from AI should be around 10% and if you are unconvinced by specific arguments, but still think AI is likely to become very powerful in the next century, then it should be around 1%?
Those numbers sound pretty reasonable to me, but, since they're roughly my own credences, it's probably unsurprising that I'm describing them as "pretty reasonable" :) On the other hand, depending on what counts as being "convinced" of the classic arguments, I think it's plausible they actually support a substantially higher probability. I certainly know that some people assign a significantly higher than 10% chance to an AI-based existential catastrophe this century. And I believe that Toby's estimate, for example, involved weighing up different possible views.

What have you changed your mind about recently?

Suppose there was an operational long-term investment fund a la Phil Trammel. Where would you donate?

I would strongly consider donating to the long-term investment fund. (But I haven't thought enough about this to be sure.)

From the podcast transcript:

I think something that sometimes people have in mind when they talk about the orthogonality of intelligence and goals is they have this picture of AI development where we’re creating systems that are, in some sense, smarter and smarter. And then there’s this separate project of trying to figure out what goals to give these AI systems. The way this works in, I think, in some of the classic presentations of risk is that there’s this deadline picture. That there will come a day where we have extremely intelligent systems. And if we can’t by that day figure out how to give them the right goals, then we might give them the wrong goals and a disaster might occur. So we have this exogenous deadline of the creep of AI capability progress, and that we need to solve this issue before that day arises. That’s something that I think I, for the most part, disagree with.

I continue to have a lot of uncertainty about how likely it is that AI development will look like "there’s this separate project of trying to figure out what goals to give these AI systems" vs a development process where capability and goals are necessarily connected. (I didn't find your arguments i

... (read more)
I think that the comment you make above is right. In the podcast, we only discuss this issue in a super cursory way: Fortunately, I'm not too worried about this possibility. Partly, as background, I expect us to have moved beyond using hand-coded reward functions -- or, more generally, what Stuart Russell calls the "standard model" -- by the time we have the ability to create broadly superintelligent and highly agential/unbounded systems. There are really strong incentives to do this, since there are loads of useful applications that seemingly can't be developed using hand-coded reward functions. This is some of the sense in which, in my view, capabilities and alignment research is mushed up. If progress is sufficiently gradual, I find it hard to imagine that the ability to create things like world-destroying paperclippers comes before (e.g.) the ability to make at least pretty good use of reward modeling techniques. (To be clear, I recognize that loads of alignment researchers also think that there will be strong economic incentives for alignment research. I believe there's a paragraph in Russell's book arguing this. I think DM's "scalable agent alignment [https://arxiv.org/pdf/1811.07871.pdf]" paper also suggests that reward modeling is necessary to develop systems that can assist us in most "real world domains." Although I don't know how much optimism other people tend to take from this observation. I don't actually know, for example, whether or not Russell is less optimisic than me.) If we do end up in a world where people know they can create broadly superintelligent and highly agential/unbounded AI systems, but we're still haven't worked out alternatives to Russell's "standard model," then no sane person really has any incentive to create and deploy these kinds of systems. Training up a broadly superintelligent and highly agential system using something like a hand-coded reward function is likely to be an obviously bad idea; if it's not obviously bad, a pri
My guess would be that if you play with GPT-3, it can talk about as well about human values (or AI alignment for that matter) as it can talk about anything else. In that sense, it seems like stronger capabilities for GPT-3 also potentially help solve the alignment problem. Edit: More discussion here: https://www.lesswrong.com/posts/BnDF5kejzQLqd5cjH/alignment-as-a-bottleneck-to-usefulness-of-gpt-3?commentId=vcPdcRPWJe2kFi4Wn [https://www.lesswrong.com/posts/BnDF5kejzQLqd5cjH/alignment-as-a-bottleneck-to-usefulness-of-gpt-3?commentId=vcPdcRPWJe2kFi4Wn]
I don't think I caught the point about GPT-3, although this might just be a matter of using concepts differently. In my mind: To whatever extent GPT-3 can be said to have a "goal," its goal is to produce text that it would be unsurprising to find on the internet. The training process both imbued it with this goal and made the system good at achieving it. There are other things we might want spin-offs of GPT-3 to do: For example, compose better-than-human novels. Doing this would involve shifting both what GPT-3 is "capable" of doing and shifting what its "goal" is. (There's not really a clean practical or conceptual distinction between the two.) It would also probably require making progress on some sort of "alignment" technique, since we can't (e.g.) write down a hand-coded reward function that quantifies novel quality.

Planned summary of the podcast episode for the Alignment Newsletter:

In this podcast, Ben Garfinkel goes through several reasons why he is skeptical of classic AI risk arguments (some previously discussed <@here@>(@How Sure are we about this AI Stuff?@)). The podcast has considerably more detail and nuance than this summary.
Ben thinks that historically, it has been hard to affect transformative technologies in a way that was foreseeably good for the long-term-- it's hard e.g. to see what you could have done around the development of agriculture or industrialization that would have an impact on the world today. He thinks some potential avenues for long-term influence could be through addressing increased political instability or the possibility of lock-in, though he thinks that it’s unclear what we could do today to influence the outcome of a lock-in, especially if it’s far away.
In terms of alignment, Ben focuses on the standard set of arguments outlined in Nick Bostrom’s Superintelligence, because they are broadly influential and relatively fleshed out. Ben has several objections to these arguments:
- He thinks it isn't likely that there will be a
... (read more)
Thanks for the great summary! A few questions about it 1. You call mesa-optimization "the best current case for AI risk". As Ben noted at the time of the interview, this argument hasn't yet really been fleshed out in detail. And as Rohin subsequently wrote in his opinion [https://www.lesswrong.com/posts/XWPJfgBymBbL3jdFd/an-58-mesa-optimization-what-it-is-and-why-we-should-care] of the mesa-optimization paper [https://arxiv.org/abs/1906.01820], "it is not yet clear whether mesa optimizers will actually arise in practice". Do you have thoughts on what exactly the "Argument for AI Risk from Mesa-Optimization" is, and/or a pointer to the places where, in your opinion, that argument has been made (aside from the original paper)? 2. I don't entirely understand the remark about the reference class of ‘new intelligent species’. What species are in that reference class? Many species which we regard as quite intelligent (orangutans, octopuses, New Caledonian crows) aren't risky. Probably, you mean a reference class like "new species as smart as humans" or "new 'generally intelligent' species". But then we have a very small reference class and it's hard to know how strong that prior should be. In any case, how were you thinking of this reference class argument? 3. 'The Boss Baby', starring Alec Baldwin, is available [https://www.amazon.com/Boss-Baby-Alec-Baldwin/dp/B079J6S81Q] for rental on Amazon Prime Video for $3.99. I suppose this is more of a comment than a question.
1. Oh man, I wish. :( I do think there are some people working on making a crisper case, and hopefully as machine learning systems get more powerful we might even see early demonstrations. I think the crispest statement of it I can make is "Similar to how humans are now optimizing for goals that are not just the genetic fitness evolution wants, other systems which contain optimizers may start optimizing for goals other than the ones specified by the outer optimizer." Another related concept that I've seen (but haven't followed up on) is what johnswentworth calls "Demons in Imperfect Search" [https://www.lesswrong.com/posts/7d2PsdHXrJnbofrvF/an-90-how-search-landscapes-can-contain-self-reinforcing], which basically advocates for the possibility of runaway inner processes in a variety of imperfect search spaces (not just ones that have inner optimizers). This arguably happened with metabolic reactions early in the development of life, greedy genes, managers in companies. Basically, I'm convinced that we don't know enough about how powerful search mechanisms work to be sure that we're going to end up somewhere we want. I should also say that I think these kinds of arguments feel like the best current cases for AI alignment risk. Even if AI systems end up perfectly aligned with human goals, I'm still quite worried about what the balance of power looks like in a world with lots of extremely powerful AIs running around [https://aiimpacts.org/agi-in-a-vulnerable-world/]. 2. Yeah, here I should have said 'new species more intelligent than us'. I think I was thinking of two things here: * Humans causing the extinction of less intelligent species * Some folk intuition around intelligent aliens plausibly causing human extinction (I admit this isn't the best example...). Mostly I meant here that since we don't actually have examples of existentially risky technology (yet), putting AI in the reference class of 'new technology' might make you think it's extremely implau

I have nothing to add to the discussion but wanted to say that this was my favourite episode, which given how big a fan I am of the podcast is a very high bar.

Thanks so much for letting me know! I'm really glad to hear :)

How entrenched do you think are old ideas about AI risk in the AI safety community? Do you think that it's possible to have a new paradigm quickly given relevant arguments?

I'd guess that like most scientific endeavours, there are many social aspects that make people more biased toward their own old way of thinking. Research agendas and institutions are focused on some basic assumptions - which, if changed, could be disruptive to the people involved or the organisation. However, there seems to be a lot of engagement with the underlying questions about the paths to superintelligence and the consequences thereof, and also the research community today is heavily involved with the rationality community - both of these makes me hopeful that more minds can be changed given appropriate argumentation. 

I actually don't think they're very entrenched! I think that, today, most established AI researchers have fairly different visions of the risks from AI -- and of the problems that they need to solve -- than the primary vision discussed in Superintelligence and in classic Yudkowsky essays. When I've spoken to AI safety researchers about issues with the "classic" arguments, I've encountered relatively low levels of disagreement. Arguments that heavily emphasize mesa-optimization or arguments that are more in line with this post [https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like] seem to be more influential now. (The safety researchers I know aren't a random sample, though, so I'd be interested in whether this sounds off to anyone in the community.) I think that "classic" ways of thinking about AI risk are now more prominent outside the core AI safety community than they are within it. I think that they have an important impact on community beliefs about prioritization, on individual career decisions, etc., but I don't think they're heavily guiding most of the research that the safety community does today. (Unfortunately, I probably don't make this clear in the podcast.)

What is your theory of change for work on clarifying arguments for AI risk? 

Is the focus more on immediate impact on funding/research or on the next generation? Do you feel this is important more to direct work to the most important paths or to understand how sure are we about all this AI stuff and grow the field or deprioritize it accordingly?

I think the work is mainly useful for EA organizations making cause prioritization decisions (how much attention should they devote to AI risk relative to other cause areas?) and young/early-stage people deciding between different career paths. The idea is mostly to help clarify and communicate the state of arguments, so that more fully informed and well-calibrated decisions can be made. A couple other possible positive impacts: * Developing and shifting to improved AI risk arguments -- and publicly acknowledging uncertainties/confusions -- may, at least in the long run, cause other people to take the EA community and existential-risk-oriented AI safety communities more seriously. As one particular point, I think that a lot of vocal critics (e.g. Pinker) are mostly responding to the classic arguments. If the classic arguments actually have significant issues, then it's good to acknowledge this; if other arguments (e.g. these [https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like]) are more compelling, then it's good to work them out more clearly and communicate them more widely. As another point, I think that sharing this kind of work might reduce perceptions that the EA is more group-think-y/unreflective than it actually is. I know that people have sometimes pointed to my EAG talk from a couple years back, for example, in response to concerns that the EA community is too uncritical in its acceptance of AI risk arguments. * I think that it's probably useful for the AI safety community to have a richer and more broadly shared understanding of different possible "AI risk threat models"; presumably, this would feed into research agendas and individual prioritization decisions to some extent. I think that work that analyzes newer AI risk arguments, especially, would be useful here. For example, it seems important to develop a better understanding of the role that "mesa-optimization" pla

You say that there hasn't been much literature arguing for Sudden Emergence (the claim that AI progress will look more like the brain-in-a-box scenario than the gradual-distributed-progress scenario). I am interested in writing some things on the topic myself, but currently think it isn't decision-relevant enough to be worth prioritizing. Can you say more about the decision-relevance of this debate?

Toy example: Suppose I write something that triples everyone's credence in Sudden Emergence. How does that change what people do, in a way that makes the world better (or worse, depending on whether Sudden Emergence is true or not!)

I would be really interested in you writing on that!

It's a bit hard to say what the specific impact would be, but beliefs about the magnitude of AI risk of course play at least an implicit role in lots of career/research-focus/donation decisions within the EA community; these beliefs also affect the extent to which broad EA orgs focus on AI risk relative to other cause areas. And I think that people's beliefs about the Sudden Emergence hypothesis at least should have a large impact in their level of doominess about AI risk; I regard it as one of the biggest cruxes. So I'd at least be hopeful that, if everyone's credences in Sudden Emergence changed by a factor of three, this had some sort of impact on the portion of EA attention devoted to AI risk. I think that credences in the Sudden Emergence hypothesis should also have an impact on the kinds of risks/scenarios that people within the AI governance and safety communities focus on.

I don't, though, have a much more concrete picture of the influence pathway.

OK, thanks. Not sure I can pull it off, that was just a toy example. Probably even my best arguments would have a smaller impact than a factor of three, at least when averaged across the whole community. I agree with your explanation of the ways this would improve things... I guess I'm just concerned about opportunity costs. Like, it seems to me that a tripling of credence in Sudden Emergence shouldn't change what people do by more than, say, 10%. When you factor in tractability, neglectedness, personal fit, doing things that are beneficial under both Sudden Emergence and non-Sudden Emergence, etc. a factor of 3 in the probability of sudden emergence probably won't change the bottom line for what 90% of people should be doing with their time. For example, I'm currently working on acausal trade stuff, and I think that if my credence in sudden emergence decreased by a factor of 3 I'd still keep doing what I'm doing. Meanwhile, I could be working on AI safety directly, or I could be working on acausal trade stuff (which I think could plausibly lead to a more than 10% improvement in EA effort allocation. Or at least, more plausibly than working on Sudden Emergence, it seems to me right now). I'm very uncertain about all this, of course.
Did you end up writing this post? (I looked through your LW posts since the timestamp of the parent comment but it doesn't seem like you did.) If not, I would be interested in seeing some sort of outline or short list of points even if you don't have time to write the full post.
Thanks for following up. Nope, I didn't write it, but comments like this one [https://www.lesswrong.com/posts/Lfk2FXBwrpoM6Jm7p/security-mindset-and-takeoff-speeds?commentId=9s8gKFCoAAjB9e4p9] and this one [https://forum.effectivealtruism.org/posts/7gxtXrMeqw78ZZeY9/ama-or-discuss-my-80k-podcast-episode-ben-garfinkel-fhi?commentId=gx4TWsxPYz4nfvPe9] are making me bump it up in priority! Maybe it's what I'll do next.

How confident are you in brief arguments for rapid and general progress outlined in the section 1.1 of GovAI's research agenda? Have the arguments been developed further?

What is your overall probability that we will, in this century, see progress in artificial intelligence that is at least as transformative as the industrial revolution?

What is your probability for the more modest claim that AI will be at least as transformative as, say, electricity or railroads?

What is your overall probability that we will, in this century, see progress in artificial intelligence that is at least as transformative as the industrial revolution?

I think this is a little tricky. The main way in which the Industrial Revolution was unusually transformative is that, over the course of the IR, there were apparently unusually large pivots in several important trendlines. Most notably, GDP-per-capita began to increase at a consistently much higher rate. In more concrete terms, though, the late nineteenth and early twentieth centuries probably included even greater technological transformations.

From David Weil's growth textbook (pg. 265-266):

Given these two observations—that growth during the Industrial Revolution was not particularly fast and that growth did not slow down when the Industrial Revolution ended—what was really so revolutionary about the period? There are two answers. First, the technologies introduced during the Industrial Revolution were indeed revolutionary, but their immediate impact on economic growth was small because they were initially confined to a few industries. More significantly, the Industrial Revolution was a beginning. Rapid techno

... (read more)
I agree that it's tricky, and am quite worried about how the framings we use may bias our views on the future of AI. I like the GDP/productivity growth perspective but feel free to answer the same questions for your preferred operationalisation. Another possible framing: given a crystal ball showing the future, how likely is it that people would generally say that AI is the most important thing that happens this century? Interesting. So you generally expect (well, with 50-75% probability) AI to become a significantly bigger deal, in terms of productivity growth, than it is now? I have not looked into this in detail but my understanding is that the contribution of AI to productivity growth right now is very small (and less than electricity). If yes, what do you think causes this acceleration? It could simply be that AI is early-stage right now, akin to electricity in 1900 or earlier, and the large productivity gains arise when key innovations diffuse through society on a large scale. (However, many forms of AI are already widespread.) Or it could be that progress in AI itself accelerates, or perhaps linear progress in something like "general intelligence" translates to super-linear impact on productivity.
I mostly have in mind the idea that AI is "early-stage," as you say. The thought is that "general purpose technologies" (GPTs) like electricity, the steam engine, the computer, and (probably) AI tend to have very delayed effects. For example, there was really major progress in computing in the middle of the 20th century, and lots of really major invents throughout the 70s and 80s, but computers didn't have a noticeable impact on productivity growth until the 90s. The first serious electric motors were developed in the mid-19th century, but electricity didn't have a big impact on productivity until the early 20th. There was also a big lag associated with steam power; it didn't really matter until the middle of the 19th century, even though the first steam engines were developed centuries earlier. So if AI takes several decades to have a large economic impact, this would be consistent with analagous cases from history. It can take a long time for the technology to improve, for engineers to get trained up, for complementary inventions to be developed, for useful infrastructure to be built, for organizational structures to get redesigned around the technology, etc. I don't think it'd be very surprising if 80 years was enough for a lot of really major changes to happen, especially since the "time to impact" for GPTs seems to be shrinking over time. Then I'm also factoring in the additional possibility that there will be some unusually dramatic acceleration, which is distinguishes AI from most earlier GPTs.
That seems plausible and is also consistent with Amara's law (the idea that the impact of technology is often overestimated in the short run and underestimated in the long run). I'm curious how likely you think it is that productivity growth will be significantly higher (i.e. levels at least comparable with electricity) for any reason, not just AI. I wouldn't give this much more than 50%, as there is also some evidence that stagnation is on the cards (see e.g. 1 [https://www.amazon.co.uk/Great-Stagnation-Low-Hanging-Eventually-eSpecial-ebook/dp/B004H0M8QS/ref=sr_1_1?dchild=1&keywords=cowen+great+stagnation&qid=1595238150&sr=8-1], 2 [https://www.amazon.co.uk/Rise-Fall-American-Growth-Princeton/dp/0691147728/ref=sr_1_1?crid=2719R19RONVAT&dchild=1&keywords=rise+and+fall+of+american+growth&qid=1595238138&sprefix=rise+and+fall+of+am%2Caps%2C163&sr=8-1]). But that would mean that you're confident that the cause of higher productivity growth, assuming that this happens, would be AI? (Rather than, say, synthetic biotechnology, or genetic engineering, or some other technological advance, or some social change resulting in more optimisation for productivity.) While AI is perhaps the most plausible single candidate, it's still quite unclear, so I'd maybe say it's 25-30% likely that AI in particular will cause significantly higher levels of productivity growth this century.

In the episode you say:

And so I do want to make it clear that insofar as I’ve expressed, let’s say, some degree of ambivalence about how much we ought to be prioritising AI safety and AI governance today, my sort of implicit reference point here is to things like pandemic preparedness, or nuclear war or climate change, just sort of the best bets that we have for having a long run social impact.

I was wondering what you think of the potential of broader attempts to influence the long-run future (e.g. promoting positive values, growing the EA movement) as opposed to the more targeted attempts to reduce x-risks that are most prominent in the EA movement.

In brief, I feel positively about these broader attempts! It seems like some of these broad efforts could be useful, instrumentally, for reducing a number of different risks (by building up the pool of available talent, building connections, etc.) The more unsure about what risks matter most, as well, the more valuable broad capacity-building efforts are. It's also possible that some shifts in values, institutions, or ideas could actually be long-lasting. (This is something that Will MacAskill, for example, is currently interested in.) If this is right, then I think it's at least conceivable that trying to positively influence future values/institutions/ideas is more important than reducing the risk of global catastrophes: the goodness of different possible futures might vary greatly.
1Jack Malde3y
Thanks for your reply! I also feel positively about broader attempts and am glad that these are being taken more seriously by prominent EA thinkers.

In "Unpacking Classic Arguments for AI Risk", you defined The Process Orthogonality Thesis as: The process of imbuing a system with capabilities and the process of imbuing a system with goals are orthogonal.

Then, gave several examples of cases where this does not hold: thermostat, Deep Blue, OpenAI Five, the Human brain. Could you elaborate a bit on these examples? 

I am a bit confused about it. In Deep Blue, I think that most of the progress has been general computational advances, and the application of an evaluation system given later. The human bra

... (read more)
I think that my description of the thesis (and, actually, my own thinking on it) is a bit fuzzy. Nevertheless, here's roughly how I'm thinking about it: First, let's say that an agent has the "goal" of doing X if it's sometimes useful to think of the system as "trying to do X." For example, it's sometimes useful to think of a person as "trying" to avoid pain, be well-liked, support their family, etc. It's sometimes useful to think of a chess program as "trying" to win games of chess. Agents are developed through a series of changes. In the case of a "hand-coded" AI system, the changes would involve developers adding, editing, or removing lines of code. In the case of an RL agent, the changes would typically involve a learning algorithm updating the agent's policy. In the case of human evolution, the changes would involve genetic mutations. If the "process orthogonality thesis" were true, then this would mean that we can draw a pretty clean line between between "changes that affect an agent's capabilities" and "changes that affect an agent's goals." Instead, I want to say that it's really common for changes to affect both capabilities and goals. In practice, we can't draw a clean line between "capability genes" and "goal genes" or between "RL policy updates that change goals" and "RL policy updates that change capabilities." Both goals and capabilities tend to take shape together. That being said, it is true that some changes do, intuitively, mostly just affect either capabilities or goals. I wouldn't be surprised, for example, if it's possible to introduce a minus sign somewhere into Deep Blue's code and transform it into a system that looks like it's trying to lose at chess; although the system will probably be less good at losing than it was a winning, it may still be pretty capable. So the processes of changing a system's capabilities and changing its goals can still come apart to some degree. It's also possible to do fundamental research and engineering wor
Thanks! This does clarify things for me, and I think that the definition of a "goal" is very helpful here. I do still have some uncertainty here about the claim of process orthogonality which I can better understand: Let's define an "instrumental goal" as a goal X for which there is a goal Y such that whenever it is useful to think of the agent as "trying to do X" it is in fact also useful to think of it as "trying to do Y"; In this case we can think that X is instrumental to Y. Instrumental goals can be generated at the development phase or by the agent itself (implicitly or explicitly).  I think that the (non-process) orthogonality thesis does not hold with respect to instrumental goals. A better selection of instrumental goals will enable better capabilities, and with greater capabilities comes greater planning capacity.  Therefore, the process orthogonality thesis does not hold as well for instrumental goals. This means that instrumental goals are usually not the goals of interest when trying to discern between process and non-process orthogonality theses, and we should focus on terminal goal (those which aren't instrumental).  In the case of an RL agent or Deep Blue, I can only see one terminal goal - maximize defined score or win chess. These won't really be change together with capabilities.  I thought a bit about humans, but I feel that this is much more complicated and needs more nuanced definitions of goals. (is avoiding suffering a terminal goal? It seems that way, but who is doing the thinking in which it is useful to think of one thing or another as a goal? Perhaps the goal is to reduce specific neuronal activity for which avoiding suffering is merely instrumental?)
I'm actually not very optimistic about a more complex or formal definition of goals. In my mind, the concept of a "goal" is often useful, but it's sort of an intrinisically fuzzy or fundamentally pragmatic concept. I also think that, in practice, the distinction between an "intrinsic" and "instrumental" goal is pretty fuzzy in the same way (although I think your definition is a good one). Ultimately, agents exhibit behaviors. It's often useful to try to summarize these behaviors in terms of what sorts of things the agent is fundamentally "trying" to do and in terms of the "capabilities" that the agent brings to bear. But I think this is just sort of a loose way of speaking. I don't really think, for example, that there are principled/definitive answers to the questions "What are all of my cat's goals?", "Which of my cat's goals are intrinsic?", or "What's my cat's utility function?" Even if we want to move beyond behavioral definitions of goals, to ones that focus on cognitive processes, I think these sorts of questions will probably still remain pretty fuzzy. (I think that this way of thinking -- in which evolutionary or engineering selection processes ultimately act on "behaviors," which can only somewhat informally or imprecisely be described in terms of "capabilities" and "goals" -- also probably has an influence on my relative optimism about AI alignment. )
I was thinking it over, and I think that I was implicitly assuming that process orthogonality follows from orthogonality in some form or something like that.  The Deep Blue question still holds, I think. The human brain should be thought of as designed by evolution. What I wrote is strictly (non-process) orthogonality. An example could be given that the cognitive breakthrough might have been enlargement of the neocortex, while civilization was responsible for the values.  I guess that the point is that there are example of non-orthogonality? (Say, the evaluation function of DeepBlue being critical for it's success)

Do you still think that Robin Hanson's critique of Christiano's scenario is worth exploring in more detail?

I do think there's still more thinking to be done here, but, since I recorded the episode, Alexis Carlier and Tom Davidson have actually done some good work in response to Hanson's critique. I was pretty persuaded of their conclusion:

There are similarities between the AI alignment and principal-agent problems, suggesting that PAL could teach us about AI risk. However, the situations economists have studied are very different to those discussed by proponents of AI risk, meaning that findings from PAL don’t transfer easily to this context. There are a few main issues. The principal-agent setup is only a part of AI risk scenarios, making agency rents too narrow a metric. PAL models rarely consider agents more intelligent than their principals and the models are very brittle. And the lack of insight from PAL unawareness models severely restricts their usefulness for understanding the accident risk scenario.

Nevertheless, extensions to PAL might still be useful. Agency rents are what might allow AI agents to accumulate wealth and influence, and agency models are the best way we have to learn about the size of these rents. These findings should inform a wide range of future scenarios, perhaps barring extreme ones like Bostrom/Yudkowsky.

On a scale from 1 to 10 what would you rate The Boss Baby? :)

I actually haven't seen The Boss Baby. A few years back, this ad was on seemingly all of the buses in Oxford for a really long time. Something about them made a lasting impression on me. Maybe it was the smug look on the boss baby's face.

Reviewing it purely on priors, though, I'll give it a 3.5 :)

What priorities for TAI strategy does your skepticism towards classical work dictates? Some argued, that we have greater leverage over the scenarios with discrete/discontinuous deployment.

From a long-termist perspective, I think that -- the more gradual AI progress is -- the more important concerns about "bad attractor states" and "instability" become relative to concerns about AI safety/alignment failures. (See slides [https://docs.google.com/presentation/d/1qHIi7Swd8LNwyyvoQUcvlRQsjrWAuIA_QY2E0wM0IfQ/edit?usp=sharing]). I think it is probably true, though, that AI safety/alignment risk is more tractable than these other risks. To some extent, the solution to safety risk is for enough researchers to put their heads down and work really hard on technical problems; there's probably some amount of research effort that would be enough, even if this quantity is very large. In contrast, the only way to avoid certain risks associated with "bad attractor states" might be to establish stable international institutions that are far stronger than any that have come before; there might be structural barriers, here, that no amount of research effort or insight would be enough to overcome. I think it's at least plausible that the most useful thing for AI safety and governance researchers to do ultimately focus on brain-in-a-box-ish AI risk scenarios, even they're not very likely relative to other scenarios. (This would still entail some amount of work that's useful for multiple scenarios; there would also be instrumental reasons, related to skill-building and reputation-building, to work on present-day challenges.) But I have some not-fully-worked-out discomfort with this possibility. One thing that I do feel comfortable saying is that more effort should go into assessing the tractability of different influence pathways, the likelihood of different kinds of risks beyond the classic version of AI risk, etc.

What writings have influenced your thinking the most?

What are the arguments that speeding up economic growth has a positive long run impact?

Partly, I had in mind a version of the astronomical waste [https://nickbostrom.com/astronomical/waste.html] argument: if you think that we should basically ignore the possibility of preventing extinction or pre-mature stagnation (e.g. for Pascal's mugging [https://en.wikipedia.org/wiki/Pascal%27s_mugging] reasons), and you're optimistic about where the growth process is bringing us, then maybe we should just try to develop an awesome technologically advanced civilization as quickly as possible so that more people can ultimately live in it. IRRC Tyler Cowen argues for something at least sort of in this ballpark, in Stubborn Attachments. I think you'd need pretty specific assumptions to make this sort of argument work, though. Jumping the growth process forward can also reduce some existential risks. The risk of humanity getting wiped out by a natural disasters, like asteroids, probably gets lower the more technologically sophisticated we become; so, for example, kickstarting the Industrial Revolution earlier would have meant a shorter "time of peril" for natural risks. Leo Aschenbrenner's paper "Economic Growth and Existential Risk [https://leopoldaschenbrenner.github.io/xriskandgrowth/ExistentialRiskAndGrowth050.pdf]" considers a more complicated version of this argument in the context of anthropogenic risks, which takes into account the fact that growth can also contribute to these risks.

What do you think is the most important role people without technical/quantitative educational backgrounds can play in AI safety/governance?

I don't have a single top pick; I think this will generally depend on a person's particular interests, skills, and "career capital." I do just want to say, though, that I don't think it's at all necessary to have a strong technical background to do useful AI governance work. For example, if I remember correctly, most of the research topics discussed in the "AI Politics" and "AI Ideal Governance" sections of Allan Dafoe's research agenda [https://www.fhi.ox.ac.uk/wp-content/uploads/GovAI-Agenda.pdf] don't require a significant technical background. A substantial portion of people doing AI policy/governance/ethics research today also have a primarily social science or humanities background. Just as one example that's salient to me, because I was a co-author on it, I don't think anything in this long report [https://www.fhi.ox.ac.uk/wp-content/uploads/Windfall-Clause-Report.pdf] on distributing the benefits of AI required substantial technical knowledge or skills. (That being said, I do think it's really important for pretty much anyone in the AI governance space to understand at least the core concepts of machine learning. For example, it's important to know things like the difference between "supervised" and "unsupervised" learning, the idea of stochastic gradient descent, the idea of an "adversarial example," and so on. Fortunately, I think this is pretty do-able even without a STEM background; it's mostly the concepts, rather than the math, that are important. Certain kinds of research or policy work certainly do require more in-depth knowledge, but a lot of useful work doesn't.)

Hi Ben - this episode really gave me a lot to think about! Of the 'three classic arguments' for AI X-risk you identify, I argued in a previous post that the 'discontinuity premise' is based on taking a high-level argument that should be used to establish that sufficiently capable AI will produce very fast progress too literally and assuming the 'fast progress' has to happen suddenly and in a specific AI.

Your discussion of the other two arguments led me to conclude that the same sort of mistake is at work in all of them, as I e... (read more)

Hi Sammy, Thanks for the links -- both very interesting! (I actually hadn't read your post before.) I've tended to think of the intuitive core as something like: "If we create AI systems that are, broadly, more powerful than we are, and their goals diverge from ours, this would be bad -- because we couldn't stop them from doing things we don't want. And it might be hard to ensure, as we're developing increasingly sophisticated AI systems, that there aren't actually subtle but extremely important divergences in some of these systems' goals." At least in my mind, both the classic arguments and the arguments in "What Failure Looks Like" share this common core. Mostly, the challenge is to explain why it would be hard to ensure that there wouldn't be subtle-but-extremely-important divergences; there are different possible ways of doing this. For example: Although an expectation of discontinous (or at least very fast) progress is a key part of the classic arguments, I don't consider it part of the intuitive core; the "What Failure Looks Like" picture doesn't necessarily rely on it. I'm not sure if there's actually a good way to take the core intuition and turn it into a more rigorous/detailed/compelling argument that really works. But I do feel that there's something to the intuition; I'll probably still feel like there's something to the intuition, even if I end feeling like the newer arguments have major issues too. [[Edit: An alternative intuitive core, which I sort of gesture at in the interview, would simply be: "AI safety and alignment issues exist today. In the future, we'll have crazy powerful AI systems with crazy important responsibilities. At least the potential badness of safety and alignment failures should scale up with these systems' power and responsibility. Maybe it'll actually be very hard to ensure that we avoid the worst-case failures."]]
Hi Ben, Thanks for the reply! I think the intuitive core that I was arguing for [https://www.lesswrong.com/posts/T5awG3XQKJtprABsy/an-108-why-we-should-scrutinize-arguments-for-ai-risk?commentId=xozRnMoNqvx7eWKMx]is more-or-less just a more detailed version of what you say here: The key difference is that I don't think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong - you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe. My point in this comment [https://www.lesswrong.com/posts/T5awG3XQKJtprABsy/an-108-why-we-should-scrutinize-arguments-for-ai-risk?commentId=xozRnMoNqvx7eWKMx](and follow up [https://www.lesswrong.com/posts/T5awG3XQKJtprABsy/an-108-why-we-should-scrutinize-arguments-for-ai-risk?commentId=qHqHpHTKsgqna3QbS]) was that the Orthogonality Thesis, Instrumental Convergence and eventual fast progress are essential for any argument about AI risk, even if you also need other assumptions in there - you need to know the OT will apply to your method of developing AI, you need more specific reasons to think the particular goals of your system look like those that lead to instrumental convergence. If you approached the classic arguments with that framing, then perhaps it begins to look like less a matter of them being mistaken and more a case of having a vague philosophical picture that then got filled in with more detailed considerations - that's how I see the development over the last 10 years. The only mistake was in mistaking the vague initial picture for the whole argument - and that was a mistake, but it's not the same kind of mistake as just having completely false assumptions. You might compare it to the early development of a new scientific field. Perhaps seeing it that way might lead you to have a different view about how much to update against trusting complicated conceptual arguments about AI risk! This is how Stuart Russell likes to talk about the
Quick belated follow-up: I just wanted to clarify that I also don't think that the orthogonality thesis or instrumental convergence thesis are incorrect, as they're traditionally formulated. I just think they're not nearly sufficient to establish a high level of risk, even though, historically, many presentations of AI risk seemed to treat them as nearly sufficient. Insofar as there's a mistake here, the mistake concerns way conclusions have been drawn from these theses; I don't think the mistake is in the theses themselves. (I may not stress this enough in the interview/slides.) On the other hand, progress/growth eventually becoming much faster might be wrong (this is an open question [https://www.nber.org/papers/w23928] in economics). The 'classic arguments' also don't just predict that growth/progress will become much faster. In the FOOM debate, for example, both Yudkowsky and Hanson start from the position that growth will become much faster; their disagreement is about how sudden, extreme, and localized the increase will be. If growth is actually unlikely to increase in a sudden, extreme, and localized fashion, then this would be a case of the classic arguments containing a "mistaken" (not just insufficient) premise.

Wow, I am quite surprised it took a year to produce. @80K, does it always take so long?

There's often a few months between recording and release and we've had a handful of episodes that took a frustratingly long time to get out the door, but never a year.

The time between the first recording and release for this one was actually 9 months. The main reason was Howie and Ben wanted to go back and re-record a number of parts they didn't think they got right the first time around, and it took them a while to both be free and in the same place so they could do that.

A few episodes were also pushed back so we could get out COVID-19 interviews during the peak of the epidemic.

Wait, what's your probability that we're past the peak (in terms of, eg, daily worldwide deaths)?
I think you know what I mean — the initial peak in the UK, the country where we are located, in late March/April.
Sorry if I sounded mean! I genuinely didn't know what you meant! I live in the US and I assumed that most of 80k's audiences will be more concerned about worldwide numbers or their home country's, then that of 80k's "base." (I also didn't consider the possibility that there are other reasons than audience interest for you to be prioritizing certain podcasts, like logistics) I really appreciate a lot of your interviews on covid-19, btw. Definitely didn't intend my original comment in a mean way!
Poll time:
FWIW, I didn’t really think about what Rob meant when I read his first comment, but when I read Linch’s question, I thought “Eh, Rob probably meant something like ‘the point at which interest, confusion, and urgency seemed especially high, as people were now realising this was huge but hadn’t yet formed clear views on what to do about it’.” So Linch’s question felt somewhat off topic or unnecessary, but also not like it had an obvious answer (and apparently my guess about the answer was wrong). (But I can also see why Linch saw the question as important, and didn’t think Linch’s question sounded snarky or whatever.)
As Michael says, common sense would indicate I must have been referring to the initial peak, or the peak in interest/panic/policy response, or the peak in the UK/Europe, or peak where our readers are located, or — this being a brief comment on an unrelated topic — just speaking loosely and not putting much thought into my wording. FWIW it looks like globally the rate of new cases hasn't peaked yet. I don't expect the UK or Europe will return to a situation as bad as the one they went through in late March and early April. Unfortunately the US and Latin America are already doing worse than it was then.
As Michael says, common sense would indicate

This sounds like a status move. I asked a sincere question and maybe I didn't think too carefully when I asked it, but there's no need to rub it in.

FWIW it looks like globally the rate of new cases hasn't peaked yet. I don't expect the UK or Europe will return to a situation as bad as the one they went through in late March and early April. Unfortunately the US and Latin America are already doing worse than it was then.Neither the US or Latin America could plausibly be said to peak then.

Thanks, I appreciate the clarification! :)

Upvote this comment if Robert referring to "peak of the epidemic" as the initial peak in the UK was not a hypothesis that occurred to you.
Upvote this comment if you originally thought that Robert was referring to "peak of the epidemic" as the initial peak in the UK.

Hi Ben,

You suggested in the podcast that it's not clear how to map some of the classic arguments—and especially their manifestation in thought experiments like the paper clip maximizer—to contemporary machine learning methods. I'd like to push back on that view.

Deep reinforcement learning is a popular contemporary ML approach for training agents that act in simulated and real-world environments. In deep RL, an agent is trained to maximize its reward (more precisely, the sum of discounted rewards over time steps), which perfectly fits the "agent" abstractio

... (read more)
Hi Ofer, Thanks for the comment! I actually do think that the instrumental convergence thesis, specifically, can be mapped over fine, since it's a fairly abstract principle. For example, this recent paper [https://arxiv.org/abs/1912.01683] formalizes the thesis within a standard reinforcement learning framework. I just think that the thesis at most weakly suggests existential doom, unless we add in some other substantive theses. I have some short comments on the paper, explaining my thoughts, here [https://www.lesswrong.com/posts/3nDR23ksSQJ98WNDm/developmental-stages-of-gpts?commentId=xpbuCXCokjjDvyTps]. Beyond the instrumental convergence thesis, though, I do think that some bits of the classic arguments are awkward to fit onto concrete and plausible ML-based development scenarios: for example, the focus on recursive self-improvement, and the use of thought experiments in which natural language commands, when interpretted literally and single-mindedly, lead to unforeseen bad behaviors. I think that Reframing Superintelligence [https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf] does a good job of pointing out some of the tensions between classic ways of thinking and talking about AI risk and current/plausible ML engineering practices. This may not be what you have in mind, but: I would be surprised if the FB newsfeed selection algorithm became existentially damaging (e.g. omnicidal), even in the limit of tremendous amounts of training data and compute. I don't know the algorithm actually works, but as a simplication: let's imagine that it produces an ordered list of posts to show a user, from the set of recent posts by their friends, and that it's trained using something like the length of the user's FB browsing session as the reward. I think that, if you kept training it, nothing too weird would happen. It might produce some unintended social harms (like addiction, polarization, etc.), but the system wouldn't, in any
Thanks for the thoughtful reply! Would you say that the treacherous turn argument can also be mapped over to contemporary ML methods (similarly to the instrumental convergence thesis) due to it being a fairly abstract principle? Also, why is "recursive self-improvement" awkward to fit onto concrete and plausible ML-based development scenarios? (If we ignore the incorrect usage of the word "recursive" here; the concept should have been called "iterative self-improvement"). Consider the work that has been done on neural architecture search [https://en.wikipedia.org/wiki/Neural_architecture_search] via reinforcement learning (this [https://arxiv.org/abs/1611.01578] 2016 paper on that topic currently has 1,775 citations on Google Scholars, including 560 citations from 2020). It doesn't seem extremely unlikely that such a technique will be used, at some point in the future, in some iterative self-improvement setup, in a way that may cause an existential catastrophe. Regarding the example with the agent that creates the feed of each FB user: I agree that the specified time horizon (and discount factor [http://incompleteideas.net/book/first/ebook/node30.html]) is important, and that a shorter time horizon seems safer. But note that FB is incentivized to specify a long time horizon. For example, suppose the feed-creation-agent shows a user a horrible post by some troll, which causes the user to spend many hours in a heated back-and-forth with said troll. Consequently, the user decides FB sucks and ends up getting off FB for many months. If the specified time horizon is sufficiently short (or the discount factor is sufficiently small), then from the perspective of the training process the agent did well when it showed the user that post, and the agent's policy network will be updated in a way that makes such decisions more likely. FB doesn't want that. FB's actual discount factor for users' engagement time may be very close to 1 (i.e. a user spending an hour on FB today

You discuss at one point in the podcast the claim that as AI systems take on larger and larger real world problems, the challenge of defining the reward function will become more and more important. For example for cleaning, the simple number-of-dust-particles objective is inadequate because we care about many other things e.g. keeping the house tidy and many side constraints e.g. avoiding damaging household objects. This isn't quite an argument for AI alignment solving itself, but it is an argument that the attention and resources poured into AI alignment

... (read more)

Sorry if this isn’t as polished as I’d hoped. Still a lot to read and think about, but posting as I won’t have time now to elaborate further before the weekend. Thanks for doing the AMA!

It seems like a crux that you have identified is how “sudden emergence” happens. How would a recursive self-improvement feedback loop start? Increasing optimisation capacity is a convergent instrumental goal. But how exactly is that goal reached? To give the most pertinent example - what would the nuts and bolts of it be for it happening i... (read more)

Here [https://bmk.sh/2020/08/17/Building-AGI-Using-Language-Models] is an argument for how GPT-X might lead to proto-AGI in a more concrete, human-aided, way: 

Thoughts on modifications/improvements to The Windfall Clause?

What do you think about hardware-based forecasts for human-substitute AI?

I don't currently give them very much weight. It seems unlikely to me that hardware progress -- or, at least, practically achievable hardware progress -- will turn out to be sufficient for automating away all the tasks people can perform. If both hardware progress and research effort instead play similarly fundamental roles, then focusing on only a single factor (hardware) can only give us pretty limited predictive power. Also, to a lesser extent: Even it is true that compute growth is the fundamental driver of AI progress, I'm somewhat skeptical that we could predict the necessary/sufficient amount of compute very well.

Great interview, thanks for some really thought-provoking ideas. For the brain in the box section, it seemed like you were saying that we'd expect future worlds to have fairly uniform distributions of capabilities of AI systems, and so we'd learn from other similar cases. How uniform do you think the spread of capabilities of AI systems is now, and how wide do you think the gaps have to be in the future for the 'brain in a box' scenario to be possible?

Have your become more uncertain/optimistic about the arguments in favour of importance of other x-risks as a result of scrutinising AI risk?

I don't think it's had a significant impact on my views about the absolute likelihood or tractability of other existential risks. I'd be interested if you think it should have, though!
Oh, I meant pessimistic. A reason for a weak update might similar to Gell-Man amnesia effect. After putting effort into classical arguments you noticed some important flaws. The fact that have not been articulated before suggests that collective EA epistemology is weaker than expected. Because of that one might get less certain about quality of arguments in other EA domains.
I'd say nearly everyone's ability to determine an argument's strength is very weak. On the Forum, invalid meta-arguments* are pretty common, such as "people make logic mistakes so you might have too", rather than actually identifying the weaknesses in an argument. There's also a lot of pseudo-superforcasting, like "I have 80% confidence in this", without any evidence backing up those credences. This seems to me like people are imitating sound arguments without actually understanding how they work. Effective altruists have centred around some ideas that are correct (longtermism, moral uncertainty, etc.), but outside of that, I'd say we're just as wrong as anyone else. *Some meta-arguments are valid, like discussions on logical grounding of particular methodologies, e.g. "Falsification works because of the law of contraposition, which follows from the definition of logical implication".
There's also a lot of pseudo-superforcasting, like "I have 80% confidence in this", without any evidence backing up those credences.

From a bayesian perspective there is no particular reason why you have to provide more evidence if you provide credences, and in general I think there is a lot of value in people providing credences even if they don't provide additional evidence, if only to avoid problems of ambiguous language.

I'm not sure I know what you mean by this. I'd agree that you're definitely not obligated to provide more evidence, and that your credence does fully capture how likely you think it is that X will happen. But it seems to me that the evidence that informed your credence can also be very useful information for people, both in relation to how much they should update their own credences (as they may have info you lack regarding how relevant and valid those pieces of evidence are), and in relation to how - and how much - you might update your views (e.g., if they find out you just thought for 5 seconds and went with your gut, vs spending a year building expertise and explicit models). It also seems like sharing that evidence could help them with things like building their general models of the world or of how to make estimates. (This isn't an argument against giving explicit probabilities that aren't based on much or that aren't accompanied by explanations of what they're based on. I'm generally, though tentatively, in favour of that. It just seems like also explaining what the probabilities are based on is often quite useful.) (By the way, Beard et al. discuss related matters [https://forum.effectivealtruism.org/posts/JQQAQrunyGGhzE23a/database-of-existential-risk-estimates?commentId=StEcGdhk58BYRkxuS#comments] in the context of existential risk estimates, using the term "evidential reasoning".)
This is in contrast to a frequentist perspective, or maybe something close to a "common-sense" perspective, which tends to bucket knowledge into separate categories that aren't easily interchangeable. Many people make a mental separation between "thinking something is true" and "thinking something is X% likely, where X is high", with one falling into the category of lived experience, and the other falling into the category of "scientific or probabilistic assessment". The first one doesn't require any externalizable evidence and is a fact about the mind, the second is part of a collaborative scientific process that has at its core repeatable experiments, or at least recurring frequencies (i.e. see the frequentist discussion of it being meaningless to assign probabilities to one-time events). Under some of these other non-bayesian interpretations of probability theory, an assignment of probabilities is not valid if you don't associate it with either an experimental setup, or some recurring frequency. So under those interpretations you do have an additional obligation to provide evidence and context to your probability estimates, since otherwise they don't really form even a locally valid statement.
Thanks for that answer. So just to check, you essentially just meant that it's ok to provide credences without saying your evidence - i.e., you're not obligated to provide evidence when you provide credences? Not that there's no added value to providing your evidence alongside your credences? If so, I definitely agree. (And it's not that your original statement seemed to clearly say something different, just that I wasn't sure that that's all it was meant to mean.)
Yep, that's what I was implying.
Sure there is: By communicating, we're trying to update one another's credences. You're not going to be very successful in doing so if you provide a credence without supporting evidence. The evidence someone provides is far more important than someone's credence (unless you know the person is highly calibrated and precise). If you have a credence that you keep to yourself, then yes, there's no need for supporting evidence. Ambiguous statements are bad, 100%, but so are clear, baseless statements. As you say, people can legitimately have credences about anything. It's how people should think. But if you're going to post your credence, provide some evidence so that you can update other people's credences too.
Ambiguous statements are bad, 100%, but so are clear, baseless statements.

You seem to have switched from the claim that EAs often report their credences without articulating the evidence on which those credences rest, to the claim that EAs often lack evidence for the credences they report. The former claim is undoubtedly true, but it doesn't necessarily describe a problematic phenomenon. (See Greg Lewis's recent post; I'm not sure if you disagree.). The latter claim would be very worrying if true, but I don't see reason to believe that it is. Sure, EAs sometimes lack good reasons for the views they espouse, but this is a general phenomenon unrelated to the practice of reporting credences explicitly.

Habryka seems to be talking about people who have evidence and are just not stating it, so we might be talking past one another. I said in my first comment "There's also a lot of pseudo-superforcasting ... without any evidence backing up those credences." I didn't say "without stating any evidence backing up those credences." This is not a guess on my part. I've seen comments where they say explicitly that the credence they're giving is a first impression, and not something well thought out. It's fine for them to have a credence, but why should anyone care what your credence is if it's just a first impression? I completely agree with him. Imprecision should be stated and significant figures are a dumb way to do it. But if someone said "I haven't thought about this at all, but I'm pretty sure it's true", is that really all that much worse than providing your uninformed prior and saying you haven't really thought about it?
I agree that EAs put superforecasters and superforecasting techniques on a pedestal, more than is warranted. Yes, I think it's a lot worse. Consider the two statements: And The two statements are pretty similar in verbalized terms (and each falls under loose interpretations of what "pretty sure" means in common language), but ought to have drastically different implications for behavior! I basically think EA and associated communities would be better off to have more precise credences, and be accountable for them. Otherwise, it's difficult to know if you were "really" wrong, even after checking hundreds of claims!
Yes you're right. But I'm making a distinction between people's own credences and their ability to update the credences of other people. As far as changing the opinion of the reader, when someone says "I haven't thought much about it", it should be an indicator to not update your own credence by very much at all. I fully agree. My problem is that this is not the current state of affairs for the majority of Forum users, in which case, I have no reason to update my credences because an uncalibrated random person says they're 90% confident without providing any reasoning that justifies their position. All I'm asking for is for people to provide a good argument along with their credence. I think that they should be emulated. But superforcasters have reasoning to justify their credences. They break problems down into components that they're more confident in estimating. This is good practice. Providing a credence without any supporting argument, is not.
I'm curious if you agree or disagree with this claim: With a specific operationalization like:
It's almost irrelevant, people still should provide their supporting argument of their credence, otherwise evidence can get "double counted" (and there's "flow on" effects where the first person who updates another person's credence has a significant effect on the overall credence of the population). For example, say I have arguments A and B supporting my 90% credence on something. And you have arguments A, B and C supporting your 80% credence on something. And neither of us post our reasoning; we just post our credences. It's a mistake for you to then say "I'll update my credence a few percent because FCCC might have other evidence." For this reason, providing supporting arguments is a net benefit, irrespective of EA's accuracy of forecasts.
I don't find your arguments persuasive for why people should give reasoning in addition to credences. I think posting reasoning is on the margin of net value, and I wish more people did it, but I also acknowledge that people's time is expensive so I understand why they choose not to. You list reasons why giving reasoning is beneficial, but not reasons for why it's sufficient to justify the cost. My question probing predictive ability of EAs earlier was an attempt to set right what I consider to be an inaccuracy in the internal impressions EAs have about the ability of superforecasters. In particular, it's not obvious to me that we should trust the judgments of superforecasters substantially more than we trust the judgments of other EAs.
My view is that giving explicit, quantitative credences plus stating the supporting evidence is typically better than giving explicit, quantitative credences without stating the supporting evidence (at least if we ignore time costs, information hazards [https://www.lesswrong.com/posts/R7szBR5H487XutfKy/what-are-information-hazards], etc.), which is in turn typically better than giving qualitative probability statements (e.g., "pretty sure") without stating the supporting evidence, and often better than just saying nothing. Does this match your view? In other words, are you essentially just arguing that "providing supporting arguments is a net benefit"? I ask because I had the impression that you were arguing that it's bad for people to give explicit, quantitative credences if they aren't also giving their supporting evidence (and that it'd be better for them to, in such cases, either use qualitative statements or just say nothing). Upon re-reading the thread, I got the sense that others may have gotten that impression too, but also I don't see you explicitly make that argument.
Basically, yeah. But I do think it's a mistake to update your credence based off someone else's credence without knowing their argument and without knowing whether they're calibrated. We typically don't know the latter, so I don't know why people are giving credences without supporting arguments. It's fine to have a credence without evidence, but why are people publicising such credences?
I'd agree with a modified version of your claim, along the following lines: "You should update more based on someone's credence if you have more reason to believe their credence will track the truth, e.g. by knowing they've got good evidence (even if you haven't actually seen the evidence) or knowing they're well-calibrated. There'll be some cases where you have so little reason to believe their credence will track the truth that, for practical purposes, it's essentially not worth updating." But your claim at least sounds like it's instead that some people are calibrated while others aren't (a binary distinction), and when people aren't calibrated, you really shouldn't update based on their credences at all (at least if you haven't seen their arguments). I think calibration increases in a quantitative, continuous way, rather than switching from off to on. So I think we should just update on credences more the more calibrated the person they're from is. Does that sound right to you?
I mean, very frequently it's useful to just know what someone's credence is. That's often an order of magnitude cheaper to provide, and often is itself quite a bit of evidence. This is like saying that all statements of opinions or expressions of feelings are bad, unless they are accompanied with evidence, which seems like it would massively worsen communication.
I agree, but only if they're a reliable forecaster. A superforecaster's credence can shift my credence significantly. It's possible that their credences are based off a lot of information that shifts their own credence by 1%. In that case, it's not practical for them to provide all the evidence, and you are right. But most people are poor forecasters (and sometimes they explicitly state they have no supporting evidence other than their intuition), so I see no reason to update my credence just because someone I don't know is confident. If the credence of a random person has any value to my own credence, it's very low. That would depend on the question. Sometimes we're interested in feelings for their own sake. That's perfectly legitimate because the actual evidence we're wanting is the data about their feelings. But if someone's giving their feelings about whether there are an infinite number of primes, it doesn't update my credences at all. I think opinions without any supporting argument worsen discourse. Imagine a group of people thoughtfully discussing evidence, then someone comes in, states their feelings without any evidence, and then leaves. That shouldn't be taken seriously. Increasing the proportion of those people only makes it worse. Bayesians should want higher-quality evidence. Isn't self-reported data is unreliable? And that's when the person was there when the event happened. So what is the reference class for people providing opinions without having evidence? It's almost certainly even more unreliable. If someone has an argument for their credence, they should usually give that argument; if they don't have an argument, I'm not sure why they're adding to the conversation. I'm not saying we need to provide peer-reviewed articles. I just want to see some line of reasoning demonstrating why you came to the conclusion you made, so that everyone can examine your assumptions and inferences. If we have different credences and the set of things I've consi
Yes, but unreliability does not mean that you instead just use vague words instead of explicit credences. It's a fine critique to say that people make too many arguments without giving evidence (something I also disagree with, but that isn't the subject of this thread), but you are concretely making the point that it's additionally bad for them to give explicit credences! But the credences only help, compared to vague and ambiguous terms that people would use instead.
I'm not sure how you think that's what I said. Here's what I actually said: I thought I was fairly clear about what my position is. Credences have internal value (you should generate your own credence). Superforecasters' credences have external value (their credence should update yours). Uncalibrated random people's credences don't have much external value (they shouldn't shift your credence much). And an argument for your credence should always be given. I never said vague words are valuable, and in fact I think the opposite. This is an empirical question. Again, what is the reference class for people providing opinions without having evidence? We could look at all of the unsupported credences on the forum and see how accurate they turned out to be. My guess is that they're of very little value, for all the reasons I gave in previous comments. I demonstrated a situation where a credence without evidence is harmful: The only way we can avoid such a situation is either by providing a supporting argument for our credences, OR not updating our credences in light of other people's unsupported credences.
Here are two claims I'd very much agree with: * It's often best to focus on object-level arguments rather than meta-level arguments, especially arguments alleging bias * One reason for that is that the meta-level arguments will often apply to a similar extent to a huge number of claims/people. E.g., a huge number of claims might be influenced substantially by confirmation bias. * (Here are two [http://web.archive.org/web/20200212212236/https://slatestarcodex.com/2019/07/17/caution-on-bias-arguments/] relevant posts [https://www.lesswrong.com/posts/o28fkhcZsBhhgfGjx/status-regulation-and-anxious-underconfidence].) Is that what you meant? But you say invalid meta-arguments, and then give the example "people make logic mistakes so you might have too". That example seems perfectly valid, just often not very useful. And I'd also say that that example meta-argument could sometimes be useful. In particular, if someone seems extremely confident about something based on a particular chain of logical steps, it can be useful to remind them that there have been people in similar situations in the past who've been wrong (though also some who've been right). They're often wrong for reasons "outside their model", so this person not seeing any reason they'd be wrong doesn't provide extremely strong evidence that they're not. It would be invalid to say, based on that alone, "You're probably wrong", but saying they're plausibly wrong seems both true and potentially useful. (Also, isn't your comment primarily meta-arguments of a somewhat similar nature to "people make logic mistakes so you might have too"? I guess your comment is intended to be a bit closer to a specific reference class forecast type argument?) Describing that as pseudo-superforecasting feels unnecessarily pejorative. I think such people are just forecasting / providing estimates. They may indeed be inspired by Tetlock's work or other work with superforecasters, but that doesn't mean they'r
My definition of an invalid argument contains "arguments that don't reliably differentiate between good and bad arguments". "1+1=2" is also a correct statement, but that doesn't make it a valid response to any given argument. Arguments need to have relevancy. I dunno, I could be using "invalid" incorrectly here. Yes, if someone believed that having a logical argument is a guarantee, and they've never had one of their logical arguments have a surprising flaw, it would be valid to point that out. That's fair. But (as you seem to agree with) the best way to do this is to actually point to the flaw in the specific argument they've made. And since most people who are proficient with logic already know that logic arguments can be unsound, it's not useful to reiterate that point to them. It is, but as I said, "Some meta-arguments are valid". (I can describe how I delineate between valid and invalid meta-arguments if you wish.) Ah sorry, I didn't mean to offend. If they were superforecasters, their credence alone would update mine. But they're probably not, so I don't understand why they give their credence without a supporting argument. The set of things I give 100% credence is very, very small (i.e. claims that are true even if I'm a brain in a vat). I could say "There's probably a table in front of me", which is technically more correct than saying that there definitely is, but it doesn't seem valuable to qualify every statement like that. Why am I confident in moral uncertainty? People do update their morality over time, which means that they were wrong at some point (i.e. there is demonstrably moral uncertainty), or the definition of "correct" changes and nobody is ever wrong. I think "nobody is ever wrong" is highly unlikely, especially because you can point to logical contradictions in people's moral beliefs (not just unintuitive conclusions). At that point, it's not worth mentioning the uncertainty I have. Yeah, I'm too focused on the errors. I'll concede your
Oh, when you said "Effective altruists have centred around some ideas that are correct (longtermism, moral uncertainty, etc.)", I assumed (perhaps mistakenly) that by "moral uncertainty" you meant something vaguely like the idea that "We should take moral uncertainty seriously, and think carefully about how best to handle it, rather than necessarily just going with whatever moral theory currently seems best to us." So not just the idea that we can't be certain about morality (which I’d be happy to say is just “correct”), but also the idea that that fact should change our behaviour is substantial ways. I think that both of those ideas are surprisingly rare outside of EA, but the latter one is rarer, and perhaps more distinctive to EA (though not unique to EA, as there are some non-EA philosophers who've done relevant work in that area). On my "inside-view", the idea that we should "take moral uncertainty seriously" also seems extremely hard to contest. But I move a little away from such confidence, and probably wouldn't simply call it "correct", due to the fact that most non-EAs don't seem to explicitly endorse something clearly like that idea. (Though maybe they endorse somewhat similar ideas in practice, even just via ideas like "agree to disagree".)

What are your thoughts on AI policy careers in government? I'm somewhat skeptical, for two main reasons:

1) It's not clear that governments will become leading actors in AI development; by default I expect this not to happen. Unlike with nuclear weapons, governments don't need to become experts in the technology to yield AI-based weapons; they can just purchase them from contractors. Beyond military power, competition between nations is mostly economic. Insofar as AI is an input to this, governments have an incentive to invest in domestic AI ... (read more)

In brief, I do actually feel pretty positively. Even if governments aren't doing a lot of important AI research "in house," and private actors continue to be the primary funders of AI R&D, we should expect governments to become much more active if really serious threats to security start to emerge. National governments are unlikely to be passive, for example, if safety/alignment failures become increasingly damaging -- or, especially, if existentially bad safety/alignment failures ever become clearly plausible. If any important institutions, design decisions, etc., regarding AI get "locked in," then I also expect governments to be heavily involved in shaping these institutions, making these decisions, etc. And states are, of course, the most important actors for many concerns having to do with political instability caused by AI. Finally, there are also certain potential solutions to risks -- like creating binding safety regulations, forging international agreements, or plowing absolutely enormous amounts of money into research projects -- that can't be implemented by private actors alone. Basically, in most scenarios where AI governance work turns out be really useful from a long-termist perspective -- because there are existential safety/alignment risks, because AI causes major instability, or because there are opportunities to "lock in" key features of the world -- I expect governments to really matter.