The issue is that both sides of the debate lack gears-level arguments. The ones you give in this post (like "all the doom flows through the tiniest crack in our defence") are more like vague intuitions; equally, on the other side, there are vague intuitions like "AGIs will be helping us on a lot of tasks" and "collusion is hard" and "people will get more scared over time" and so on.
Last time there was an explicitly hostile media campaign against EA the reaction was not to do anything, and the result is that Émile P. Torres has a large media presence,[1] launched the term TESCREAL to some success, and EA-critical thoughts became a lot more public and harsh in certain left-ish academic circles.
You say this as if there were ways to respond which would have prevented this. I'm not sure these exist, and in general I think "ignore it" is a really really solid heuristic in an era where conflict drives clicks.
I think responding in a way that is calm, boring, and factual will help. It's not going to get Émile to publicly recant anything. The goal is just for people who find Émile's stuff to see that there's another side to the story. They aren't going to publicly say "yo Émile I think there might be another side to the story". But fewer of them will signal boost their writings on the theory that "EAs have nothing to say in their own defense, therefore they are guilty". Also, I think people often interpret silence as a contemptuous response, and that can be enraging in itself.
@Linch, see the article I linked above, which identifies a bunch of specific bottlenecks where lobbying and/or targeted funding could have been really useful. I didn't know about these when I wrote my comment above, but I claim prediction points for having a high-level heuristic that led to the right conclusion anyway.
The article I linked above has changed my mind back again. Apparently the RTS,S vaccine has been in clinical trials since 1997. So the failure here wasn't just an abstract lack of belief in technology: the technology literally already existed the whole time that the EA movement (or anyone who's been in this space for less than two decades) has been thinking about it.
An article on why we didn't get a vaccine sooner: https://worksinprogress.co/issue/why-we-didnt-get-a-malaria-vaccine-sooner
This seems like significant evidence for the tractability of speeding things up. E.g. a single (unjustified) decision by the WHO in 2015 delayed the vaccine by almost a decade, four years of which were spent in fundraising. It seems very plausible that even 2015 EA could have sped things up by multiple years in expectation either lobbying against the original decision, or funding the follow-up trial.
This is a good point. The two other examples which seem salient to me:
Ah, I see. I think the two arguments I'd give here:
Hmm, your comment doesn't really resonate with me. I don't think it's really about being monomaniacal. I think the (in hindsight) correct thought process here would be something like:
"Over the next 20 or 50 years, it's very likely that the biggest lever in the space of malaria will be some kind of technological breakthrough. Therefore we should prioritize investigating the hypothesis that there's some way of speeding up this biggest lever."
I don't think you need this "move heaven and earth" philosophy to do that reasoning; I don't think you need to focus o...
Makes sense, though I think that global development was enough of a focus of early EA that this type of reasoning should have been done anyway.
I’m more sympathetic about it not being done after, say, 2017.
I think this has been thought about a few times since EA started.
In 2015 Max Dalton wrote about medical research and said the below.
"GiveWell note that most funders of medical research more generally have large budgets, and claim that ‘It’s reasonable to ask how much value a new funder – even a relatively large one – can add in this context’. Whilst the field of tropical disease research is, as I argued above, more neglected, there are still a number of large foundations, and funding for several diseases is on the scale of hundreds of millions of dol...
A different BOTEC: 500k deaths per year, at $5000 per death prevented by bednets, we’d have to get a year of vaccine speedup for $2.5 billion to match bednets.
I agree that $2.5 billion to speed up development of vaccines by a year is tricky. But I expect that $2.5 billion, or $250 million, or perhaps even $25 million to speed up deployment of vaccines by a year is pretty plausible. I don’t know the details but apparently a vaccine was approved in 2021 that will only be rolled out widely in a few months, and another vaccine will be delayed until mid-2024: h...
That's very useful info, ty. Though I don't think it substantively changes my conclusion because:
It currently seems likely to me that we're going to look back on the EA promotion of bednets as a major distraction from focusing on scientific and technological work against malaria, such as malaria vaccines and gene drives.
I don't know very much about the details of either. But it seems important to highlight how even very thoughtful people trying very hard to address a serious problem still almost always dramatically underrate the scale of technological progress.
I feel somewhat mournful about our failure on this front; and concerned about whether the sa...
I understand the sentiment, but there's a lot here I disagree with. I'll discuss mainly one.
In the case of global health, I disagree that"thoughtful people trying very hard to address a serious problem still almost always dramatically underrate the scale of technological progress."
This doesn't fit with the history of malaria and other infectious diseases where the opposite has happened, optimism about technological progress has often exceed reality.
About 60 years ago humanity was positive about eradicating malaria with technological progress. W...
I think I'd be more convinced if you backed your claim up with some numbers, even loose ones. Maybe I'm missing something, but imo there just aren't enough zeros for this to be a massive fuckup.
Fairly simple BOTEC:
Do you think that if GiveWell hadn't recommended bednets/effective altruists hadn't endorsed bednets it would have led to more investment in vaccine development/gene drives etc.? That doesn't seem intuitive to me.
To me GiveWell fit a particular demand, which was for charitable donations that would have reliably high marginal impact. Or maybe to be more precise, for charitable donations recommended by an entity that made a good faith effort without obvious mistakes to find the highest reliable marginal impact donation. Scientific research does not have that structure since the outcomes are unpredictable.
I don't think it makes sense to think of EA as a monolith which both promoted bednets and is enthusiastic about engaging with the kind of reasoning you're advocating here. My oversimplified model of the situation is more like:
More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don't necessarily disagree with your 0.7 "regulation derailment", but I think that in most cases where I'm talking to people about AI risk, I'd want to factor this out (because I typically want to make claims like "here's what happens if we don't do something about it"). ...
If events 1-5 constitute TAGI, and events 6-10 are conditional on AGI, and TAGI is very different from AGI, then you can't straightforwardly get an overall estimate by multiplying them together. E.g. as I discuss above, 0.3 seems like a reasonable estimate of P(derailment from wars) if the chip supply remains concentrated in Taiwan, but doesn't seem reasonable if the supply of chips is on track to be "massively scaled up".
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there's:
No rigorous basis for that the use of mechanistic interpretability would "open up possibilities" to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint's hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.
I would like to point to this as a central example of the type of thing I'm worried about in scenario 2: the...
Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.
I am not disputing this :) I am just disputing the factual claim that we know which is easier.
I'd say "alignment is harder than capabilities" seems almost certainly true
Are you making the claim that we're almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don't see how you could know that.
Yepp, that seems right. I do think this is a risk, but I also think it's often overplayed in EA spaces. E.g. I've recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What's the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it's not actually clear that this by itself should make a difference,...
I put little weight on this analysis because it seems like a central example of the multiple stage fallacy. But it does seem worth trying to identify clear example of the authors not accounting properly for conditionals. So here are three concrete criticisms (though note that these are based on skimming rather than close-reading the PDF):
Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn't my point.
I'd extend this not just to include AI researchers, but people who are involved in AI safety more generally. But on the question of the wider population, we agree.
...The environmentalists I know who don't fly, don't use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifes
"show integrity with their lifestyles" is a nicer way of saying "virtue signalling",
I would describe it more as a spectrum. On the more pure "virtue signaling" end, you might choose one relatively unimportant thing like signing a petition, then blast it all over the internet while not doing other more important actions that's the cause.
Whereas on the other end of the spectrum, "showing integrity with lifestyle" to me means something like making a range of lifestyle choices which might make only s small difference to your cause, while making you feel like y...
Obviously if individual people want to use or not use a given product, that's their business. I'm calling it out not as a criticism of individuals, but in the context of setting the broader AI safety culture, for two broad reasons:
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing?
Yepp, I disagree on a bunch of counts.
a) I dislike the phrase "we all die", nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there's a pretty wide range of options for the future of humans, including some really good ones (just like there's a pret...
Yepp, I agree that I am doing an intuition pump to convey my point. I think this is a reasonable approach to take because I actually think there's much more disagreement on vibes and culture than there is on substance (I too would like AI development to go more slowly). E.g. AI safety researchers paying for ChatGPT obviously brings in a negligible amount of money for OpenAI, and so when people think about that stuff the actual cognitive process is more like "what will my purchase signal and how will it influence norms?" But that's precisely the sort of thi...
I don't think this is a coincidence—in general I think it's much easier for people to do great research and actually figure stuff out when they're viscerally interested in the problems they're tackling, and excited about the process of doing that work.
Like, all else equal, work being fun and invigorating is obviously a good thing? I'm open to people arguing that the benefits of creating a depressing environment are greater (even if just in the form of vignettes like I did above), e.g. because it spurs people to do better policy work. But falling into unsustainable depressing environments which cause harmful side effects seems like a common trap, so I'm pretty cautious about it.
I'd like to constructively push back on this: The research and open-source communities outside AI Safety that I'm embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more ... unrestricted.
I think we agree: I'm describing a possible future for AI safety, not making the claim that it's anything like this now.
I was a climate activist organising FridaysForFuture (FFF) protests, and I don't recall this was ever the prevailing perception/attitude.
Not sure what you mean by this but in some AI safety spaces ML...
(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the "AI pause debate", framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there's:
No rigorous basis for that the use of mechanistic interpretability would "open up possibilities" to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint's hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.
I would like to point to this as a central example of the type of thing I'm worried about in scenario 2: the...
history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems.
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing? If not, then how does "research might proceed faster than we expect" give you hope rather than dread?
Also, I'm guessing you would oppose a worldwide ban starting today ...
"hesitate to pay for ChatGPT because it feels like they're contributing to the problem"
Yep that's me right now and I would hardly call myself a Luddite (maybe I am tho?)
Can you explain why you frame this as an obviously bad thing to do? Refusing to help fund the most cutting edge AI company, which has been credited by multiple people with spurring on the AI race and attracting billions of dollars to AI capabilities seems not-unreasonable at the very least, even if that approach does happen to be wrong.
Sure there are decent arguments against not paying for ...
This kind of reads as saying that 1 would be good because it's fun (it's also kind of your job, right?) and 2 would be bad because it's depressing.
I think it would be helpful for you to mention and highlight your conflict-of-interest here.
I remember becoming much more positive about ads after starting work at Google. After I left, I slowly became more cynical about them again, and now I'm back down to ~2018 levels.
EDIT: I don't think this comment should get more than say 10-20 karma. I think it was a quick suggestion/correction that Richard ended up following, not too insightful or useful.
I appreciate you drawing attention to the downside risks of public advocacy, and I broadly agree that they exist, but I also think the (admittedly) exaggerated framings here are doing a lot of work (basically just intuition pumping, for better or worse). The argument would be just as strong in the opposite direction if we swap the valence and optimism/pessimism of the passages: what if, in scenario one, the AI safety community continues making incremental progress on specific topics in interpretability and scalable oversight but achieves too little too slo...
"They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do."
I think using "unsafe" in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists. I do not want to end up in a position where, in 5 or 10 years' time, policy proposals aimed at reducing existential risk come with 5 or 10 years worth of baggage in the form of previous claims about model harms that have turned out to be false. I...
The policy you are suggesting is far further away from "open source" than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.
Suppose meta was claiming that their models were curing cancer. It probably is the case that their work is more likely to cure cancer than if they took Holly's preferred policy, but nonetheless it feels legitimate to object to them generating goodwill by claiming to cure cancer.
Protests are by nature adversarial and high-variance actions prone to creating backlash, so I think that if you're going to be organizing them, you need to be careful to actually convey the right message (and in particular, way more careful than you need to be in non-adversarial environments—e.g. if news media pick up on this, they're likely going to twist your words). I don't think this post is very careful on that axis. In particular, two things I think are important to change:
"Meta’s frontier AI models are fundamentally unsafe."
I disagree; the current m...
It's not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here's why:
My guess is that the distinction between "X company's frontier AI models are unsafe" vs. "X company's policy on frontier models is unsafe" isn't actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like "X company's AI work = bad" And that's really all the nuance that you need to create public press...
Great post! Not much to add, it all seems to make sense. I'd consider adding a more direct summary of the key takeaways at the top for easier consumption, though.
Impressed by the post; I'd like to donate! Is there a way to do so that avoids card fees? And if so, at what donation size do you prefer that people start using it?
One question: I am curious to hear anyone's perspective on the following "conflict":
The former is more important for influencing labs, the latter is more important for doing alignment research.
And yet, as I say, I believe both of these are necessary.
FWIW when I talk about the "specific skill", I'm not talking about having legible experience doing this, I'm talking about actually just being able to do it. In general I think it's less important to optimize for having credibility, and more important to optimize for the skills needed. Same for ML skill—l...
My guess is that this post is implicitly aimed at Bay Area EAs, and that roughly every perk at Trajan House/other Oxford locations is acceptable by these standards.
Perhaps worth clarifying this explicitly, if true—it would be unfortunate if the people who were already most scrupulous about perks were the ones who updated most from this post.
I think there’s a sort of “LessWrong decision theory black hole” that makes people a bit crazy in ways that are obvious from the outside, and this comment thread isn’t the place to adjudicate all that.
From my perspective it's the opposite: epistemic modesty is an incredibly strong skeptical argument (a type of argument that often gets people very confused), extreme forms of which have been popular in EA despite leading to conclusions which conflict strongly with common sense (like "in most cases, one should pay scarcely any attention to what you find the m...
I don't follow. I get that acting on low-probability scenarios can let you get in on neglected opportunities, but you don't want to actually get the probabilities wrong, right?
I reject the idea that all-things-considered probabilities are "right" and inside-view probabilities are "wrong", because you should very rarely be using all-things-considered probabilities when making decisions, for reasons of simple arithmetic (as per my example). Tell me what you want to use the probability for and I'll tell you what type of probability you should be using.
You mig...
On a separate note: I currently don't think that epistemic deference as a concept makes sense, because defying a consensus has two effects that are often roughly the same size: it means you're more likely to be wrong, and it means you're creating more value if right.
I don't fully follow this explanation, but if it's true that defying a consensus has two effects that are the same size, doesn't that suggest you can choose any consensus-defying action because the EV is the same regardless, since the likelihood of you being wrong is ~cancelled out by the expec...
The probability of success in some project may be correlated with value conditional on success in many domains, not just ones involving deference, and we typically don’t think that gets in the way of using probabilities in the usual way, no? If you’re wondering whether some corner of something sticking out of the ground is a box of treasure or a huge boulder, maybe you think that the probability you can excavate it is higher if it’s the box of treasure, and that there’s only any value to doing so if it is. The expected value of trying to excavate is P(trea...
If it informs you that EA beliefs on some question have been unusual from the get-go, it makes sense to update the other way, toward the distribution of beliefs among people not involved in the EA community.
I'm a bit confused by this. Suppose that EA has a good track record on an issue where its beliefs have been unusual from the get-go. For example, I think that by temperament EAs tend to be more open to sci-fi possibilities than others, even before having thought much about them; and that over the last decade or so we've increasingly seen sci-fi possibil...
Only when people face starvation, illness, disasters, or warfare can they learn who they can really trust.
Isn't this approximately equivalent to the claim that trust becomes much more risky/costly under conditions of scarcity?
only under conditions of local abundance do we see a lot of top-down hierarchical coercion
Yeah, this is an interesting point. I think my story here is that we need to talk about abundance at different levels. E.g. at the highest level (will my country/civilization survive?) you should often be in scarcity mindset, because losing one w...
FYI I prefer "AI governance" over "AI strategy" because I think the latter pushes people towards trying to just sit down and think through arbitrarily abstract questions, which is very hard (especially for junior people). Better to zoom in more, as I discuss in this post.
I can notice that Open Philanthropy's funding comes from one person
One person may well have multiple different parts, or subscribe to multiple different worldviews!
asking oneself how much one values outcomes in different cause areas relative to each other, and then pursuing a measure of aggregate value with more or less vigor
I think your alternative implicitly assumes that, as a single person, you can just "decide" how much you value different outcomes. Whereas in fact I think of worldview diversification as actually a pretty good approximation of the process I'd go through internally if I were asked this question.
I agree that this, and your other comment below, both describe unappealing features of the current setup. I'm just pointing out that in fact there are unappealing outcomes all over the place, and that just because the equilibrium we've landed on has some unappealing properties doesn't mean that it's the wrong equilibrium. Specifically, the more you move towards pure maximization, the more you run into these problems; and as Holden points out, I don't think you can get out of them just by saying "let's maximize correctly".
(You might say: why not a middle gr...
One reason to see "dangling" relative values as principled: utility functions are equivalent (i.e. produce the same preferences over actions) up to a positive affine transformation. Hence why we often use voting systems to make decisions in cases where people's preferences clash, rather than trying to extract a metric of utility which can be compared across people.
The Pareto improvements aren't about worldview diversification, though. You can see this because you have exactly the same problem under a single worldview, if you keep the amount of funding constant per year. You can solve this by letting each worldview donate to, or steal from, its own budget in other years.
I do think trade between worldviews is good in addition to that, to avoid the costs of lending/borrowing; the issue is that you need to be very careful when you're relying on the worldviews themselves to tell you how much weight to put on them. So for...
I don't think this is actually a problem, for roughly the reasons described here. I.e. worldview diversification can be seen as a way of aggregating the preferences of multiple agents—but this shouldn't necessarily look like maximizing any given utility function.
I expect a bunch of more rationalist-type people disagree with this claim, FWIW. But I also think that they heavily overestimate the value of the types of conceptual research I'm talking about here.
If you apply a security mindset (Murphy’s Law) to the problem of AI alignment, it should quickly become apparent that it is very difficult.
FYI I disagree with this. I think that the difficulty of alignment is a complicated and open question, not something that is quickly apparent. In particular, security mindset is about beating adversaries, and it's plausible that we train AIs in ways that mostly avoid them treating us as adversaries.
I can see a worldview in which prioritizing raising awareness is more valuable, but I don't see the case for believing "that we have concrete proposals". Or at least, I haven't seen any; could you link them, or explain what you mean by a concrete proposal?
My guess is that you're underestimating how concrete a proposal needs to be before you can actually muster political will behind it. For example, you don't just need "let's force labs to pass evals", you actually need to have solid descriptions of the evals you want them to pass.
I also think that recent e...
Clarification: I think we're bottlenecked by both, and I'd love to see the proposals become more concrete.
Nonetheless, I think proposals like "Get a federal agency to regulate frontier AI labs like the FDA/FAA" or even "push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy" are "concrete enough" to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke's list.
I don't think any of t...
These are what I mean by the vague intuitions.
Nobody has come anywhere near doing this satisfactorily. The most obvious explanation is that they can't.