Effective Altruism Forum
EA Forum

All of richard_ngo's Comments + Replies

Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope

richard_ngo11d2

It follows from alignment/control/misuse/coordination not being (close to) solved.
"AGIs will be helping us on a lot of tasks", "collusion is hard" and "people will get more scared over time" aren't anywhere close to overcoming it imo.

These are what I mean by the vague intuitions.

I think it should be possible to formalise it, even

Nobody has come anywhere near doing this satisfactorily. The most obvious explanation is that they can't.

Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope

richard_ngo19d4

The issue is that both sides of the debate lack gears-level arguments. The ones you give in this post (like "all the doom flows through the tiniest crack in our defence") are more like vague intuitions; equally, on the other side, there are vague intuitions like "AGIs will be helping us on a lot of tasks" and "collusion is hard" and "people will get more scared over time" and so on.

Greg_Colbourn

11d

I'd say it's more than a vague intuition. It follows from alignment/control/misuse/coordination not being (close to) solved and ASI being much more powerful than humanity. I think it should be possible to formalise it, even. "AGIs will be helping us on a lot of tasks", "collusion is hard" and "people will get more scared over time" aren't anywhere close to overcoming it imo.

JWS's Quick takes

richard_ngo6mo23

Last time there was an explicitly hostile media campaign against EA the reaction was not to do anything, and the result is that Émile P. Torres has a large media presence,^[1] launched the term TESCREAL to some success, and EA-critical thoughts became a lot more public and harsh in certain left-ish academic circles.

You say this as if there were ways to respond which would have prevented this. I'm not sure these exist, and in general I think "ignore it" is a really really solid heuristic in an era where conflict drives clicks.

AnonymousTurtle

6mo

Ebenezer Dukakis6mo26

I think responding in a way that is calm, boring, and factual will help. It's not going to get Émile to publicly recant anything. The goal is just for people who find Émile's stuff to see that there's another side to the story. They aren't going to publicly say "yo Émile I think there might be another side to the story". But fewer of them will signal boost their writings on the theory that "EAs have nothing to say in their own defense, therefore they are guilty". Also, I think people often interpret silence as a contemptuous response, and that can be enraging in itself.

JWS

6mo

I agree it's a solid heuristic, but heuristics aren't foolproof and it's important to be able to realise where they're not working. I remembered your tweet about choosing intellectual opponents wisely because I think it be useful to show where we disagree on this: 1 - Choosing opponents is sometimes not up to you. As an analogy, being in a physical fight only takes one party to throw punches. When debates start to have significant consequences socially and politically, it's worth considering that letting hostile ideas spread unchallenged may work out badly in the future. 2 - I'm not sure it's clear that "the silent majority can often already see their mistakes" in this case. I don't think this is a minor view on EA. I think a lot of people are sympathetic to Torres' point of view, and a signficiant part of that is (in my opinion) because there wasn't a lot of pushback when they started making these claims in major outlets. On my first comment, I agree that I don't think much could have been much done to stop Émile turning against EA,[1] but I absolutely don't think it was inevitable that they would have had such a wide impact. They made the Bulletin of Atomic Scientists! They're partnered with Timnit, who has large influence and sympathy in AI space! People who could have been potential allies in a coalition basically think our movement is evil.[2] They get sympathetically cited in academic criticisms of EA. Was some pushback going to happen? Yes, but I don't think inevitably at this scale. I do think more could have been done to actually push back on their claims that went over the line in terms of hostility and accuracy, and I think that could have led to a better climate at this critical juncture for AI discussions and policy where we need to build coalitions with communities who don't fully agree with us. My concern is that this new wave of criticism and attack on OpenPhil might not simply fade away but could instead cement an anti-EA narrative that could

richard_ngo's Quick takes

richard_ngo6mo2

@Linch, see the article I linked above, which identifies a bunch of specific bottlenecks where lobbying and/or targeted funding could have been really useful. I didn't know about these when I wrote my comment above, but I claim prediction points for having a high-level heuristic that led to the right conclusion anyway.

Linch

6mo

Do you want to discuss this in a higher-bandwidth channel at some point? Eg next time we're in an EA social or something, have an organized chat with a moderator and access to a shared monitor? I feel like we're not engaging with each other's arguments as much in this setting, but we can maybe clarify things better in a higher-bandwidth setting. (No worries if you don't want to do it; it's not like global health is either of our day jobs)

richard_ngo's Quick takes

richard_ngo6mo4

The article I linked above has changed my mind back again. Apparently the RTS,S vaccine has been in clinical trials since 1997. So the failure here wasn't just an abstract lack of belief in technology: the technology literally already existed the whole time that the EA movement (or anyone who's been in this space for less than two decades) has been thinking about it.

richard_ngo's Quick takes

richard_ngo6mo5

An article on why we didn't get a vaccine sooner: https://worksinprogress.co/issue/why-we-didnt-get-a-malaria-vaccine-sooner

This seems like significant evidence for the tractability of speeding things up. E.g. a single (unjustified) decision by the WHO in 2015 delayed the vaccine by almost a decade, four years of which were spent in fundraising. It seems very plausible that even 2015 EA could have sped things up by multiple years in expectation either lobbying against the original decision, or funding the follow-up trial.

richard_ngo's Quick takes

richard_ngo6mo17

Great comment, thank you :) This changed my mind.

richard_ngo

6mo

The article I linked above has changed my mind back again. Apparently the RTS,S vaccine has been in clinical trials since 1997. So the failure here wasn't just an abstract lack of belief in technology: the technology literally already existed the whole time that the EA movement (or anyone who's been in this space for less than two decades) has been thinking about it.

Controlling for a thinker’s big idea

richard_ngo6mo10

This is a good point. The two other examples which seem salient to me:

Deutsch's brand of techno-optimism (which comes through particularly clearly when he tries to reason about the future of AI by saying things like "AIs will be people, therefore...").
Yudkowsky on misalignment.

richard_ngo's Quick takes

richard_ngo6mo2

Ah, I see. I think the two arguments I'd give here:

Founding 1DaySooner for malaria 5-10 years earlier is high-EV and plausibly very cheap; and there are probably another half-dozen things in this reference class.
We'd need to know much more about the specific interventions in that reference class to confidently judge that we made a mistake. But IMO if everyone in 2015-EA had explicitly agreed "vaccines will plausibly dramatically slash malaria rates within 10 years" then I do think we'd have done much more work to evaluate that reference class. Not having done that work can be an ex-ante mistake even if it turns out it wasn't an ex-post mistake.

richard_ngo's Quick takes

richard_ngo6mo2

Hmm, your comment doesn't really resonate with me. I don't think it's really about being monomaniacal. I think the (in hindsight) correct thought process here would be something like:

"Over the next 20 or 50 years, it's very likely that the biggest lever in the space of malaria will be some kind of technological breakthrough. Therefore we should prioritize investigating the hypothesis that there's some way of speeding up this biggest lever."

I don't think you need this "move heaven and earth" philosophy to do that reasoning; I don't think you need to focus o... (read more)

Linch

6mo

Yeah so basically I contest that this alone will actually have higher EV in the malaria case; apologies if my comment wasn't clear enough.

richard_ngo's Quick takes

richard_ngo6mo5

Makes sense, though I think that global development was enough of a focus of early EA that this type of reasoning should have been done anyway.

I’m more sympathetic about it not being done after, say, 2017.

DavidNash6mo11

I think this has been thought about a few times since EA started.

In 2015 Max Dalton wrote about medical research and said the below.

"GiveWell note that most funders of medical research more generally have large budgets, and claim that ‘It’s reasonable to ask how much value a new funder – even a relatively large one – can add in this context’. Whilst the field of tropical disease research is, as I argued above, more neglected, there are still a number of large foundations, and funding for several diseases is on the scale of hundreds of millions of dol... (read more)

richard_ngo's Quick takes

richard_ngo6mo3

A different BOTEC: 500k deaths per year, at $5000 per death prevented by bednets, we’d have to get a year of vaccine speedup for $2.5 billion to match bednets.

I agree that $2.5 billion to speed up development of vaccines by a year is tricky. But I expect that $2.5 billion, or $250 million, or perhaps even $25 million to speed up deployment of vaccines by a year is pretty plausible. I don’t know the details but apparently a vaccine was approved in 2021 that will only be rolled out widely in a few months, and another vaccine will be delayed until mid-2024: h... (read more)

richard_ngo

6mo

richard_ngo's Quick takes

richard_ngo6mo3

That's very useful info, ty. Though I don't think it substantively changes my conclusion because:

Government funding tends to go towards more legible projects (like R&D). I expect that there are a bunch of useful things in this space where there are more funding gaps (e.g. lobbying for rapid vaccine rollouts).
EA has sizeable funding, but an even greater advantage in directing talent, which I think would have been our main source of impact.
There were probably a bunch of other possible technological approaches to addressing malaria that were more speculat

... (read more)

Linch

6mo

There's a version of your agreement that I agree with, but I'm not sure you endorse, which is something like To be concrete, things I can imagine a more monomaniacal version of global health EA might emphasize (note that some of them are mutually exclusive, and others might be seen as bad, even under the monomanical lens, after more research): * Targeting a substantially faster EA growth rate than in our timeline * Potentially have a tiered system of outreach where the cultural onboarding in EA is in play for a more elite/more philosophically minded subset but the majority of people just hear the "end malaria by any means possible" message * Lobbying the US and other gov'ts to a) increase foreign aid and b) to increase aid effectiveness, particularly focused on antimalarial interventions. * (if politically feasible, which it probably isn't) potentially advocate that foreign aid must be tied with independently verified progress on malaria eradication). * Advocate more strongly, and more early on, for people to volunteer in antimalarial human challenge trials * Careful, concrete, and detailed CBEs (measuring the environmental and other costs to human life against malarial load) on when and where DDT usage is net positive * (if relevant) lobbying in developing countries with high malarial loads to use DDT for malaria control * Attempting to identify and fund DDT analogues that pass the CBE for countries with high malarial (or other insect-borne) disease load, even while the environmental consequences are pretty high (e.g. way too high to be worth the CBE for America). * (if relevant) lobbying countries to try gene drives at an earlier point than most conservative experts would recommend, maybe starting with island countries. * Write academic position papers on why the current vaccine approval system for malaria vaccines is too conservative * Be very willing to do side channel persuasion to emphasize that point * Write aggressive, detailed, and widely

richard_ngo's Quick takes

richard_ngo6mo57

It currently seems likely to me that we're going to look back on the EA promotion of bednets as a major distraction from focusing on scientific and technological work against malaria, such as malaria vaccines and gene drives.

I don't know very much about the details of either. But it seems important to highlight how even very thoughtful people trying very hard to address a serious problem still almost always dramatically underrate the scale of technological progress.

I feel somewhat mournful about our failure on this front; and concerned about whether the sa... (read more)

Hauke Hillebrandt

6mo

Global development EAs were very much looking into vaccines around 2015 and then and now it seemed that the malaria vaccine is just not crazy cost-effective, because you have to administer it more than once and it's not 100% effective - see Public health impact and cost-effectiveness of the RTS,S/AS01 malaria vaccine: a systematic comparison of predictions from four mathematical models Modelling the relative cost-effectiveness of the RTS,S/AS01 malaria vaccine compared to investment in vector control or chemoprophylaxis

richard_ngo

6mo

An article on why we didn't get a vaccine sooner: https://worksinprogress.co/issue/why-we-didnt-get-a-malaria-vaccine-sooner This seems like significant evidence for the tractability of speeding things up. E.g. a single (unjustified) decision by the WHO in 2015 delayed the vaccine by almost a decade, four years of which were spent in fundraising. It seems very plausible that even 2015 EA could have sped things up by multiple years in expectation either lobbying against the original decision, or funding the follow-up trial.

NickLaing6mo70

I understand the sentiment, but there's a lot here I disagree with. I'll discuss mainly one.

In the case of global health, I disagree that"thoughtful people trying very hard to address a serious problem still almost always dramatically underrate the scale of technological progress."

This doesn't fit with the history of malaria and other infectious diseases where the opposite has happened, optimism about technological progress has often exceed reality.

About 60 years ago humanity was positive about eradicating malaria with technological progress. W... (read more)

4[anonymous]6mo

I'm sympathetic to this. I also think it is interesting to look at how countries that eradicated malaria did so, and it wasn't with bednets, it was through draining swamps etc. (fwiw, I don't think that criticism applies to EA work on climate change. Johannes Ackva is focused on policy change to encourage neglected low carbon technologies.)

Linch6mo32

I think I'd be more convinced if you backed your claim up with some numbers, even loose ones. Maybe I'm missing something, but imo there just aren't enough zeros for this to be a massive fuckup.

Fairly simple BOTEC:

2 billion people at significant risk of malaria (WHO says 3 billion "at risk" but I assume the first 2 billion is at significantly higher risk than the last billion).
- note that Africa has ~95% of cases/deaths and a population of 1.2 billion; I assume you can get a large majority of the benefits if you ignore northern Africa too.
LLINs last 3

... (read more)

MichaelStJules

6mo

Retracted my last comment, since as joshcmorrison pointed out, the vaccines aren't mRNA-based. Still, "Total malaria R&D investment from 2007 to 2018 was over $7 billion, according to data from Policy Cures Research in the report. Of that total, about $1.8 billion went to vaccine R&D." https://www.devex.com/news/just-over-600m-a-year-goes-to-malaria-r-d-can-covid-19-change-that-98708/amp

joshcmorrison6mo35

Do you think that if GiveWell hadn't recommended bednets/effective altruists hadn't endorsed bednets it would have led to more investment in vaccine development/gene drives etc.? That doesn't seem intuitive to me.

To me GiveWell fit a particular demand, which was for charitable donations that would have reliably high marginal impact. Or maybe to be more precise, for charitable donations recommended by an entity that made a good faith effort without obvious mistakes to find the highest reliable marginal impact donation. Scientific research does not have that structure since the outcomes are unpredictable.

Buck6mo34

I don't think it makes sense to think of EA as a monolith which both promoted bednets and is enthusiastic about engaging with the kind of reasoning you're advocating here. My oversimplified model of the situation is more like:

Some EAs don't feel very persuaded by this kind of reasoning, and end up donating to global development stuff like bednets.
Some EAs are moved by this kind of reasoning, and decide not to engage with global development because this kind of reasoning suggests higher impact alternatives. They don't really spend much time thinking about h

... (read more)

MichaelStJules

6mo

The new malaria vaccines are mRNA vaccines, and mRNA vaccines were largely developed in response to COVID. I think billions were spent on mRNA R&D. That could have been too expensive for Open Phil, and they might not have been able to foresee the promise of mRNA in particular to invest so much specifically in it and not waste substantially on other vaccine R&D. Open Phil has been funding R&D on malaria for some time, including gene drives, but not much on vaccines until recently. https://www.openphilanthropy.org/grants/?q=malaria&focus-area[]=scientific-research EDIT: By the US government alone, $337 million was invested in mRNA R&D pre-pandemic over decades (and the authors found $5.9 billion in indirect grants), and after the pandemic started, "$2.2bn (7%) supported clinical trials, and $108m (<1%) supported manufacturing plus basic and translational science" https://www.bmj.com/content/380/bmj-2022-073747 Moderna also spent over a billion on R&D, and their focus is mRNA. (May be some or substantial overlap with US funding.) Pfizer and BioNTech also developed mRNA COVID vaccines together.

Transformative AGI by 2043 is <1% likely

richard_ngo7mo13

More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment

Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don't necessarily disagree with your 0.7 "regulation derailment", but I think that in most cases where I'm talking to people about AI risk, I'd want to factor this out (because I typically want to make claims like "here's what happens if we don't do something about it"). ... (read more)

Ted Sanders

7mo

Great comment. We didn't explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.

Transformative AGI by 2043 is <1% likely

richard_ngo7mo6

If events 1-5 constitute TAGI, and events 6-10 are conditional on AGI, and TAGI is very different from AGI, then you can't straightforwardly get an overall estimate by multiplying them together. E.g. as I discuss above, 0.3 seems like a reasonable estimate of P(derailment from wars) if the chip supply remains concentrated in Taiwan, but doesn't seem reasonable if the supply of chips is on track to be "massively scaled up".

Ted Sanders

7mo

I think that's a great criticism. Perhaps our conditional odds of Taiwan derailment are too high because we're too anchored to today's distribution of production. One clarification/correction to what I said above: I see the derailment events 6-10 as being conditional on us being on the path to TAGI had the derailments not occurred. So steps 1-5 might not have happened yet, but we are in a world where they will happen if the derailment does not occur. (So not really conditional on TAGI already occurring, and not necessarily conditional on AGI, but probably AGI is occurring in most of those on-the-path-to-TAGI scenarios.) Edit: More precisely, the cascade is: - Probability of us developing TAGI, assuming no derailments - Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment

richard_ngo's Quick takes

richard_ngo7mo18

One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there's:

No rigorous basis for that the use of mechanistic interpretability would "open up possibilities" to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint's hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.

I would like to point to this as a central example of the type of thing I'm worried about in scenario 2: the... (read more)

RyanCarey

7mo

I just disagreed with the OP because it's a false dichotomy; we could just agree with the true things that activists believe, and not the false ones, and not go based on vibes. We desire to believe that mech-interp is mere safety-washing iff it is, and so on.

Remmelt

7mo

The problem here is doing insufficient safety R&D at AI labs that enables the AI labs to market themselves as seriously caring about safety and thus that their ML products are good for release. You need to consider that, especially since you work at an AI lab.

quinn

7mo

Slightly conflicted agree vote: your model here offloads so much to judgment calls that fall on people who are vulnerable to perverse incentives (like, alignment/capabilities as a binary distinction is a bad frame, but it seems like anyone who'd be unusually well suited to thinking clearly about it's alternatives make more money and have less stressful lives if their beliefs fall some ways vs others). Other than that, I'm aware that no one's really happy about the way they tradeoff "you could copenhagen ethics your way out of literally any action in the limit" against "saying that the counterfactual a-hole would do it worse if I didn't is not a good argument". It seems like a law of opposite advice situation, maybe? As in some people in the blase / unilateral / powerhungry camp could stand to be nudged one way and some people in the scrupulous camp could stand to be nudged another. It also matters that the "oppose carbon capture or nuclear energy because it might make people feel better without solving the 'real problem'." environmentalists have very low standards even when you condition on them being environmentalists. That doesn't mean they can't be memetically adaptive and then influential, but it might be tactically important (i.e. you have a messaging problem instead of a more virtuous actually-trying-to-think-clearly problem)

richard_ngo's Quick takes

richard_ngo7mo2

Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.

I am not disputing this :) I am just disputing the factual claim that we know which is easier.

I'd say "alignment is harder than capabilities" seems almost certainly true

Are you making the claim that we're almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don't see how you could know that.

Lukas_Gloor

7mo

I'm not sure if I'm claiming quite that, but maybe I am. It depends on operationalizations. Most importantly, I want to flag that even the people who are optimistic about "alignment might turn out to be easy" probably lose their optimism if we assume that timelines are sufficiently short. Like, would you/they still be optimistic if we for sure had <2years? It seems to me that more people are confident that AI timelines are very short than people are confident that we'll solve alignment really soon. In fact, no one seems confident that we'll solve alignment really soon. So, the situation already feels asymmetric. On assessing alignment difficulty, I sympathize most with Eliezer's claims that it's important to get things right on the first try and that engineering progress among humans almost never happened to be smoother than initially expected (and so is a reason for pessimism in combination with the "we need to get it right on the first try" argument). I'm less sure how much I buy Eliezer's confidence that "niceness/helpfulness" isn't easy to train/isn't a basin of attraction. He has some story about how prosocial instincts evolved in humans for super-contingent reasons so that it's highly unlikely to re-play in ML training. And there I'm more like "Hm, hard to know." So, I'm not pessimistic for inherent technical reasons. It's more that I'm pessimistic because I think we'll fumble the ball even if we're in the lucky world where the technical stuff is surprisingly easy. That said, I still think "alignment difficulty?" isn't the sort of question where the ignorance prior is 50-50. It feels like there are more possibilities for it to be hard than easy.

richard_ngo's Quick takes

richard_ngo7mo4

Yepp, that seems right. I do think this is a risk, but I also think it's often overplayed in EA spaces. E.g. I've recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.

What's the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it's not actually clear that this by itself should make a difference,... (read more)

Transformative AGI by 2043 is <1% likely

richard_ngo7mo23

I put little weight on this analysis because it seems like a central example of the multiple stage fallacy. But it does seem worth trying to identify clear example of the authors not accounting properly for conditionals. So here are three concrete criticisms (though note that these are based on skimming rather than close-reading the PDF):

A lot of the authors' analysis about the probability of war derailment is focused on Taiwan, which is currently a crucial pivot point. But conditional on chip production scaling up massively, Taiwan would likely be far les

... (read more)

Ted Sanders

7mo

Great comment! Thanks especially for trying to point the actual stages going wrong, rather than hand-waving the multiple stage fallacy, which we all are of course well aware of. Replying to the points: From my POV, if events 1-5 have happened, then we have TAGI. It's already done. The derailments are not things that could happen after TAGI to return us to a pre-TAGI state. They are events that happen before TAGI and modify the estimates above. Yes, we think AGI will precede TAGI by quite some time, and therefore it's reasonable to talk about derailments of TAGI conditional on AGI.

Announcing the AI Fables Writing Contest!

richard_ngo7mo3

My entry: The King and the Golem.

richard_ngo's Quick takes

richard_ngo7mo8

Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn't my point.

I'd extend this not just to include AI researchers, but people who are involved in AI safety more generally. But on the question of the wider population, we agree.

The environmentalists I know who don't fly, don't use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifes

... (read more)

NickLaing7mo10

"show integrity with their lifestyles" is a nicer way of saying "virtue signalling",

I would describe it more as a spectrum. On the more pure "virtue signaling" end, you might choose one relatively unimportant thing like signing a petition, then blast it all over the internet while not doing other more important actions that's the cause.

Whereas on the other end of the spectrum, "showing integrity with lifestyle" to me means something like making a range of lifestyle choices which might make only s small difference to your cause, while making you feel like y... (read more)

richard_ngo's Quick takes

richard_ngo7mo15

Obviously if individual people want to use or not use a given product, that's their business. I'm calling it out not as a criticism of individuals, but in the context of setting the broader AI safety culture, for two broad reasons:

In a few years' time, the ability to use AIs will be one of the strongest drivers of productivity, and not using them will be... actually, less Luddite, and more Amish. It's fine for some people to be Amish, but for AI safety people (whose work particularly depends on understanding AI well) not using cutting-edge AI is like tryin

... (read more)

NickLaing

7mo

I'm a bit concerned that both of your arguments here are a bit strawmannish, but again I might be missing something 1. Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn't my point. I think it's borderline offensive to call people "amish" who boycott potentially dangerous tech which can increase productivity. First it could be offensive to the Amish, as you seem to be using it as a perogative, and second boycotting any 1 technology for harm minimisation reasons while using all other technology can't get compared to the Amish way of life. I'm not saying boycott all AI, that would be impossible anyway. Just perhaps not contributing financially to the company making the most cutting edge models. 1. This is a big discussion, but I think discarding not paying for chat GPT under the banner of poor scope sensitivity and virtue signaling is weak at best and straw Manning at worst. The environmentalists I know who don't fly, don't use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifestyles. This may or may not be helpful to their cause, but the little evidence we have also seems to show that more radical actions like this do not alienate regular people but instead pull people towards the argument your are trying to make, in this case that an AI frontier arms race might be harmful. I actually changed my mind on this on seeing the forum posts here a few months ago, I used to think that radical life decisions and activism was likely to be net harmful too. what research we have on the topic shows that more radical actions attract more people to mainstream climate/animal activist ideals, so I think your comment "has knock-on effects in who is attracted to the movement, etc." It's more likely to be wrong than right.

richard_ngo's Quick takes

richard_ngo7mo19

There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing?

Yepp, I disagree on a bunch of counts.

a) I dislike the phrase "we all die", nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there's a pretty wide range of options for the future of humans, including some really good ones (just like there's a pret... (read more)

rhollerith

7mo

Do you concede that frontier AI research is intrinsically dangerous? That it is among the handful of the most dangerous research programs ever pursued by our civilization? If not, I hope you can see why those who do consider it intrinsically dangerous are not particularly mollified or reassured by "well, who knows? maybe it will turn out OK in the end!" When I wrote "the alignment program" above, I meant something specific, which I believe you will agree is robust enough to organize society (if only we could get society to go along with it): namely, I meant thinking hard together about alignment without doing anything dangerous like training up models with billions of parameters till we have at least a rough design that most of the professional researchers agree is more likely to help us than to kill us even if it turns out to have super-human capabilities--even if our settling on that design takes us many decades. E.g., what MIRI has been doing the last 20 years. It makes me sad that you do not see that "we all die" is the default outcome that naturally happens unless a lot of correct optimization pressure is applied by the researchers to the design of the first sufficiently-capable AI before the AI is given computing resources. It would have been nice to have someone with your capacity for clear thinking working on the problem. Are you sure you're not overly attached (e.g., for intrapersonal motivational reasons) to an optimistic vision in which AI research "feels like the early days of hacker culture" and "there are hackathons where people build fun demos"?

Lukas_Gloor

7mo

Even if we should be undecided here, there's an asymmetry where, if you get alignment too early, that's okay, but getting capabilities before alignment is bad. Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky. On the object level, if we think the scaling hypothesis is roughly correct (or "close enough") or if we consider it telling that evolution probably didn't have the sophistication to install much specialized brain circuitry between humans and other great apes, then it seems like getting capabilities past some universality and self-improvement/self-rearrangement ("learning how to become better at learning/learning how to become better at thinking") threshold cannot be that difficult? Especially considering that we arguably already have "weak AGI." (But maybe you have an inside view that says we still have huge capability obstacles to overcome?) At the same time, alignment research seems to be in a fairly underdeveloped state (at least my impression as a curious outsider), so I'd say "alignment is harder than capabilities" seems almost certainly true. Factoring in lots of caveats about how they aren't always cleanly separable, and so on, doesn't seem to change that.

richard_ngo's Quick takes

richard_ngo7mo11

Yepp, I agree that I am doing an intuition pump to convey my point. I think this is a reasonable approach to take because I actually think there's much more disagreement on vibes and culture than there is on substance (I too would like AI development to go more slowly). E.g. AI safety researchers paying for ChatGPT obviously brings in a negligible amount of money for OpenAI, and so when people think about that stuff the actual cognitive process is more like "what will my purchase signal and how will it influence norms?" But that's precisely the sort of thi... (read more)

richard_ngo's Quick takes

richard_ngo7mo6

I don't think this is a coincidence—in general I think it's much easier for people to do great research and actually figure stuff out when they're viscerally interested in the problems they're tackling, and excited about the process of doing that work.

Like, all else equal, work being fun and invigorating is obviously a good thing? I'm open to people arguing that the benefits of creating a depressing environment are greater (even if just in the form of vignettes like I did above), e.g. because it spurs people to do better policy work. But falling into unsustainable depressing environments which cause harmful side effects seems like a common trap, so I'm pretty cautious about it.

Holly_Elmore

7mo

Totally. But OP kinda made it sound like the fact that you found 2 depressing was evidence it was the wrong direction. I think advocacy could be fun and full of its own fascinating logistical and intellectual questions as well as lots of satisfying hands-on work.

richard_ngo's Quick takes

richard_ngo7mo4

I'd like to constructively push back on this: The research and open-source communities outside AI Safety that I'm embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more ... unrestricted.

I think we agree: I'm describing a possible future for AI safety, not making the claim that it's anything like this now.

I was a climate activist organising FridaysForFuture (FFF) protests, and I don't recall this was ever the prevailing perception/attitude.

Not sure what you mean by this but in some AI safety spaces ML... (read more)

richard_ngo's Quick takes

richard_ngo7mo12

good call, will edit in

richard_ngo's Quick takes

richard_ngo7mo89

Existential risk

(COI note: I work at OpenAI. These are my personal views, though.)

My quick take on the "AI pause debate", framed in terms of two scenarios for how the AI safety community might evolve over the coming years:

AI safety becomes the single community that's the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that's the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There's a constant flow of ideas and b

... (read more)

richard_ngo7mo18

One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there's:

No rigorous basis for that the use of mechanistic interpretability would "open up possibilities" to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint's hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.

I would like to point to this as a central example of the type of thing I'm worried about in scenario 2: the... (read more)

Lukas_Gloor

7mo

Interesting and insightful framing! I think the main concern I have is that your scenario 1 doesn't engage much with the idea of capability info hazards and the point that some of the people who nerd out about technical research lack moral seriousness or big-picture awareness to not always push ahead.

rhollerith7mo18

history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems.

There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing? If not, then how does "research might proceed faster than we expect" give you hope rather than dread?

Also, I'm guessing you would oppose a worldwide ban starting today ... (read more)

Minh Nguyen

7mo

Re: Hacker culture I'd like to constructively push back on this: The research and open-source communities outside AI Safety that I'm embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more ... unrestricted. For context, I mess around with generative agents and learning agents. I broadly agree that the AI Safety community is very smart people working on very challenging and impactful problems. I'm just skeptical that what you've described is particularly unique to AI Safety, and think that descriptiom would apply to many ML-related spaces. Then again, I could be extremely inexperienced and unaware of the knowledge gap between top AI Safety researchers and everyone else. Re: Environmentalism I was a climate activist organising FridaysForFuture (FFF) protests, and I don't recall this was ever the prevailing perception/attitude. Mainstream activist movements and scientists put up a united front, and they still mutually support each other today. Even if it was superficial, FFF always emphasised "listen to the science". From a survey of FFF activists: I'm also fairly certain the environmentalist was a counterfactual net positive, with Will Macaskill himself commenting on the role of climate advocacy in funding solar energy research and accelerating climate commitments in What We Owe The Future. However, I will admit that the anti-nuclear stance was exactly as dumb as you've implied, and it embarrasses me how many activists expressed it. Re: Enemy of my Enemy Personally, I draw a meaningful distinction between being anti-AI capabilities and pro-AI Safety. Both are strongly and openly concerned about rapid AI progress, but the two groups have very different motivations, proposed solutions and degree of epistemic rigour. Being anti-AI does not mean pro AI Safety, the former is a much larger umbrella movement of people expressing strong opinions on a disruptive, often misunderstood field. 1. ^ Fronti

trevor1

7mo

I think that the 2-scenario model described here is very important, and should be a foundation for thinking about the future of AI safety. However, I think that both scenarios will also be compromised to hell. The attack surface for the AI safety community will be massive in both scenarios, ludicrously massive in scenario #2, but nonetheless still nightmarishly large in scenario #1. Assessment of both scenarios revolves around how inevitable you think slow takeoff is- I think that some aspects of slow takeoff, such as intelligence agencies, already started around 10 years ago and at this point just involve a lot of finger crossing and hoping for the best.

NickLaing7mo25

"hesitate to pay for ChatGPT because it feels like they're contributing to the problem"

Yep that's me right now and I would hardly call myself a Luddite (maybe I am tho?)

Can you explain why you frame this as an obviously bad thing to do? Refusing to help fund the most cutting edge AI company, which has been credited by multiple people with spurring on the AI race and attracting billions of dollars to AI capabilities seems not-unreasonable at the very least, even if that approach does happen to be wrong.

Sure there are decent arguments against not paying for ... (read more)

Holly_Elmore7mo31

This kind of reads as saying that 1 would be good because it's fun (it's also kind of your job, right?) and 2 would be bad because it's depressing.

Linch7mo87

I think it would be helpful for you to mention and highlight your conflict-of-interest here.

I remember becoming much more positive about ads after starting work at Google. After I left, I slowly became more cynical about them again, and now I'm back down to ~2018 levels.

EDIT: I don't think this comment should get more than say 10-20 karma. I think it was a quick suggestion/correction that Richard ended up following, not too insightful or useful.

Gerald Monroe

7mo

Something else you may note here. The reason environmentalists are wrong is they focus on the local issue and ignore the larger picture. Nuclear energy: they focus on the local risk of a meltdown or waste disposal, and ignore the carbon emitting power plants that must be there somewhere else for each nuclear plant they successfully block. Carbon emissions are global, even the worst nuclear disaster is local. Geoengineering: they simply won't engage on actually discussing the cost benefit ratios. Their reasoning shuts down or they argue "we can't know the consequences" as an argument to do nothing. This ignores the bigger picture that temperatures are rising and will continue to rise in all scenarios. Land use reform : they focus on the local habitat loss to convert a house to apartments, or an empty lot to apartments, and ignore that laws of conservation of number of humans. Each human who can't live in the apartment will live somewhere, and probably at lower density with more total environmental damage. Demanding AI Pauses: This locally stops model training, if approved, in the USA and EU. The places they can see if they bring out the signs in San Francisco. It means that top AI lab employees will be laid off, bringing any "secret sauce" with them to work for foreign labs who are not restricted. It also frees up wafer production for foreign labs to order compute on the same wafers. If Nvidia is blocked from manufacturing H100s, it frees up a share in the market for a foreign chip vendor. It has minimal, possibly zero effect on the development of AGI if you think wafer production is the rate limiting factor.

Tyler Johnston7mo61

I appreciate you drawing attention to the downside risks of public advocacy, and I broadly agree that they exist, but I also think the (admittedly) exaggerated framings here are doing a lot of work (basically just intuition pumping, for better or worse). The argument would be just as strong in the opposite direction if we swap the valence and optimism/pessimism of the passages: what if, in scenario one, the AI safety community continues making incremental progress on specific topics in interpretability and scalable oversight but achieves too little too slo... (read more)

Protest against Meta's irreversible proliferation (Sept 29, San Francisco)

richard_ngo7mo65

"They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do."

I think using "unsafe" in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists. I do not want to end up in a position where, in 5 or 10 years' time, policy proposals aimed at reducing existential risk come with 5 or 10 years worth of baggage in the form of previous claims about model harms that have turned out to be false. I... (read more)

simeon_c

7mo

I agree that when there's no memetic fitness/calibration trade-off, it's always better to be calibrated. But here there is a trade-off. How should we take it? 1. My sense is that there's never been any epistemically calibrated social movement and so that it would be playing against odds to impose that constraint. Even someone like Henry Spira who was very thoughtful personally used very unnuanced communication to achieve social change. 2. Richard, do you think that being miscalibrated has hurt or benefited the ability of past movements to cause social change? E.g. climate change and animal welfare. My impression is that probably not? They caused entire chunks of society to be miscalibrated on climate change (maybe less in the US but in Europe it's pretty big), and that's not good, but I would guess that the alarmism helped them succeed? As long as there also exists a moderate faction & and there still exists background debates on the object-level, I feel like having a standard social activism movement wd be overall very welcome. Curious if anyone here knows the relevant literature on the topic, e.g. details in the radical flank literature.

Ben_West7mo14

The policy you are suggesting is far further away from "open source" than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.

Suppose meta was claiming that their models were curing cancer. It probably is the case that their work is more likely to cure cancer than if they took Holly's preferred policy, but nonetheless it feels legitimate to object to them generating goodwill by claiming to cure cancer.

Holly_Elmore

7mo

How much do you anticipate protests characterizing the AI Safety community, and why is that important to you?

Protest against Meta's irreversible proliferation (Sept 29, San Francisco)

richard_ngo7mo125

Protests are by nature adversarial and high-variance actions prone to creating backlash, so I think that if you're going to be organizing them, you need to be careful to actually convey the right message (and in particular, way more careful than you need to be in non-adversarial environments—e.g. if news media pick up on this, they're likely going to twist your words). I don't think this post is very careful on that axis. In particular, two things I think are important to change:

"Meta’s frontier AI models are fundamentally unsafe."

I disagree; the current m... (read more)

Tyler Johnston7mo21

It's not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here's why:

My guess is that the distinction between "X company's frontier AI models are unsafe" vs. "X company's policy on frontier models is unsafe" isn't actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like "X company's AI work = bad" And that's really all the nuance that you need to create public press... (read more)

Holly_Elmore

7mo

They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do. > This basically just seems like a grab-bag accusation... you're accusing them of not being open-source enough? It's more like people think "open source" is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta's ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress.

Against much financial risk tolerance

richard_ngo9mo12

Great post! Not much to add, it all seems to make sense. I'd consider adding a more direct summary of the key takeaways at the top for easier consumption, though.

trammell

9mo

Thanks! I agree that would be helpful. My only excuse is that the points covered are scattered enough that writing a short, accessible summary at the top was a bit of a pain, and I ran out of time to write this before I could make it work…

The Happier Lives Institute is funding constrained and needs you!

richard_ngo10mo37

Impressed by the post; I'd like to donate! Is there a way to do so that avoids card fees? And if so, at what donation size do you prefer that people start using it?

BrownHairedEevee9mo11

If you donate through PayPal Giving Fund here 100% of your donation goes to HLI, as PayPal pays all the transaction fees. (Disclaimer: I work for PayPal, but this comment reflects my views alone, not those of the company.)

MichaelPlant

10mo

Hello Richard. Glad to hear this! I've just sent you HLI's bank details, which should allow you to pay without card fees (I was inclined to share them directly here, but was worried that would be unwise). I don't have an answer to your second question, I'm afraid.

AGI safety career advice

richard_ngo10mo3

One question: I am curious to hear anyone's perspective on the following "conflict":

The former is more important for influencing labs, the latter is more important for doing alignment research.

And yet, as I say, I believe both of these are necessary.

FWIW when I talk about the "specific skill", I'm not talking about having legible experience doing this, I'm talking about actually just being able to do it. In general I think it's less important to optimize for having credibility, and more important to optimize for the skills needed. Same for ML skill—l... (read more)

Denis

10mo

Thanks Richard, This is clear now. And thank you (and others) for sharing the resources link - this indeed looks like a fantastic resource. Denis

Why Altruists Can't Have Nice Things

richard_ngo10mo19

My guess is that this post is implicitly aimed at Bay Area EAs, and that roughly every perk at Trajan House/other Oxford locations is acceptable by these standards.

Perhaps worth clarifying this explicitly, if true—it would be unfortunate if the people who were already most scrupulous about perks were the ones who updated most from this post.

9[anonymous]10mo

I think it would be good if the OP would clarify which office perks he is criticising. Perks vary a lot across offices - probably more generous at Google and hedge funds, less at Amazon, less at a paper merchant in Slough, not great for an academia etc. The terms 'normal', 'usual', 'nice' are doing a lot of work in this post but are never defined and I don't know what they mean. Some things are normal in offices (dishwashers, standing desks) but are also nice.

Vaidehi Agarwalla

10mo

What's the difference between Bay Area spaces and Trajan ? They seemed roughly the same to me?

Critiques of prominent AI safety labs: Conjecture

richard_ngo10mo11

Good comment, consider cross-posting to LW?

Ingroup Deference

richard_ngo11mo4

I think there’s a sort of “LessWrong decision theory black hole” that makes people a bit crazy in ways that are obvious from the outside, and this comment thread isn’t the place to adjudicate all that.

From my perspective it's the opposite: epistemic modesty is an incredibly strong skeptical argument (a type of argument that often gets people very confused), extreme forms of which have been popular in EA despite leading to conclusions which conflict strongly with common sense (like "in most cases, one should pay scarcely any attention to what you find the m... (read more)

Ingroup Deference

richard_ngo1y2

I don't follow. I get that acting on low-probability scenarios can let you get in on neglected opportunities, but you don't want to actually get the probabilities wrong, right?

I reject the idea that all-things-considered probabilities are "right" and inside-view probabilities are "wrong", because you should very rarely be using all-things-considered probabilities when making decisions, for reasons of simple arithmetic (as per my example). Tell me what you want to use the probability for and I'll tell you what type of probability you should be using.

You mig... (read more)

trammell

11mo

That said, thanks for sharing the Anthropic Decision Theory paper! I’ll check it out.

pseudonym11mo11

On a separate note: I currently don't think that epistemic deference as a concept makes sense, because defying a consensus has two effects that are often roughly the same size: it means you're more likely to be wrong, and it means you're creating more value if right.

I don't fully follow this explanation, but if it's true that defying a consensus has two effects that are the same size, doesn't that suggest you can choose any consensus-defying action because the EV is the same regardless, since the likelihood of you being wrong is ~cancelled out by the expec... (read more)

trammell11mo11

The probability of success in some project may be correlated with value conditional on success in many domains, not just ones involving deference, and we typically don’t think that gets in the way of using probabilities in the usual way, no? If you’re wondering whether some corner of something sticking out of the ground is a box of treasure or a huge boulder, maybe you think that the probability you can excavate it is higher if it’s the box of treasure, and that there’s only any value to doing so if it is. The expected value of trying to excavate is P(trea... (read more)

Ingroup Deference

richard_ngo1y3

If it informs you that EA beliefs on some question have been unusual from the get-go, it makes sense to update the other way, toward the distribution of beliefs among people not involved in the EA community.

I'm a bit confused by this. Suppose that EA has a good track record on an issue where its beliefs have been unusual from the get-go. For example, I think that by temperament EAs tend to be more open to sci-fi possibilities than others, even before having thought much about them; and that over the last decade or so we've increasingly seen sci-fi possibil... (read more)

trammell

I'm defining a way of picking sides in disagreements that makes more sense than giving everyone equal weight, even from a maximally epistemically modest perspective. The way in which the policy "give EAs more weight all around, because they've got a good track record on things they've been outside the mainstream on" is criticizable on epistemic modesty grounds is that one could object, "Others can see the track record as well as you. Why do you think the right amount to update on it is more than they think the right amount is?" You can salvage a thought along these lines in a epistemic-modesty-criticism-proof way, but it would need some further story about how, say, you have some "inside information" about the fact of EAs' better track record. Does that help? Your quote is replying to my attempt at a "gist", in the introduction--I try to spell this out a bit more further in the middle of the last section, in the bit where I say "More broadly, groups may simply differ in their ability to acquire information, and it may be that a particular group’s ability on this front is difficult to determine without years of close contact." Let me know if that bit clarifies the point. Re I don't follow. I get that acting on low-probability scenarios can let you get in on neglected opportunities, but you don't want to actually get the probabilities wrong, right? In any event, maybe messing up the epistemics also makes it easier for you to spot neglected opportunities or something, and maybe this benefit sometimes kind of cancels out the cost, but this doesn't strike me as relevant to the question of whether epistemic deference as a concept makes sense. Startup founders may benefit from overconfidence, but overconfidence as a concept still makes sense.

AGI safety career advice

richard_ngo1y3

Thanks! I'll update it to include the link.

Coercion is an adaptation to scarcity; trust is an adaptation to abundance

richard_ngo1y2

Only when people face starvation, illness, disasters, or warfare can they learn who they can really trust.

Isn't this approximately equivalent to the claim that trust becomes much more risky/costly under conditions of scarcity?

only under conditions of local abundance do we see a lot of top-down hierarchical coercion

Yeah, this is an interesting point. I think my story here is that we need to talk about abundance at different levels. E.g. at the highest level (will my country/civilization survive?) you should often be in scarcity mindset, because losing one w... (read more)

AI strategy career pipeline

richard_ngo1y7

FYI I prefer "AI governance" over "AI strategy" because I think the latter pushes people towards trying to just sit down and think through arbitrarily abstract questions, which is very hard (especially for junior people). Better to zoom in more, as I discuss in this post.

A flaw in a simple version of worldview diversification

richard_ngo1y7

I can notice that Open Philanthropy's funding comes from one person

One person may well have multiple different parts, or subscribe to multiple different worldviews!

asking oneself how much one values outcomes in different cause areas relative to each other, and then pursuing a measure of aggregate value with more or less vigor

I think your alternative implicitly assumes that, as a single person, you can just "decide" how much you value different outcomes. Whereas in fact I think of worldview diversification as actually a pretty good approximation of the process I'd go through internally if I were asked this question.

NunoSempere

not "decide", but "introspect", or "reflect upon", or "estimate". This is in the same way that I can estimate probabilities.

A flaw in a simple version of worldview diversification

richard_ngo1y3

I agree that this, and your other comment below, both describe unappealing features of the current setup. I'm just pointing out that in fact there are unappealing outcomes all over the place, and that just because the equilibrium we've landed on has some unappealing properties doesn't mean that it's the wrong equilibrium. Specifically, the more you move towards pure maximization, the more you run into these problems; and as Holden points out, I don't think you can get out of them just by saying "let's maximize correctly".

(You might say: why not a middle gr... (read more)

NunoSempere

I think the post from Holden that you point to isn't really enough to go from "we think really hardcore estimation is perilous" to "we should do worldview diversification". Worldview diversification is fairly specific, and there are other ways that you could rein-in optimization even if you don't have worldviews, e.g., adhering to deontological constraints, reducing "intenseness", giving good salaries to employees, and so on.

A flaw in a simple version of worldview diversification

richard_ngo1y2

One reason to see "dangling" relative values as principled: utility functions are equivalent (i.e. produce the same preferences over actions) up to a positive affine transformation. Hence why we often use voting systems to make decisions in cases where people's preferences clash, rather than trying to extract a metric of utility which can be compared across people.

NunoSempere

I'm not sure whether the following example does anything for you; it could be that our intuitions about what is "elegant" are just very different: Imagine that I kill all animals except one Koala. Then if your worldview diversification which had some weight on animals, you would spend the remaining of that worldview on the Koala. But you could buy way more human QALYs or units of x-risk work per unit of animal welfare. More generally, setting up things such that sometimes you would end up valuing e.g., a salmon as 0.01% of a human and other times as 10% of a human just seems pretty inelegant.

A flaw in a simple version of worldview diversification

richard_ngo1y6

The Pareto improvements aren't about worldview diversification, though. You can see this because you have exactly the same problem under a single worldview, if you keep the amount of funding constant per year. You can solve this by letting each worldview donate to, or steal from, its own budget in other years.

I do think trade between worldviews is good in addition to that, to avoid the costs of lending/borrowing; the issue is that you need to be very careful when you're relying on the worldviews themselves to tell you how much weight to put on them. So for... (read more)

NunoSempere

Suppose that as the AI risk worldview becomes more alarmed, you are paying more and more units of x-risk prevention (according to the AI risk worldview) for every additional farmed animal QALY (as estimated by the farmed animal worldview). I find that very unappealing.

NunoSempere

Good point! I think my objection then is that I do want relative values to not be "dangling". Also, I do care about expected values, so there are some interesting trades where you bet all of your budget which you couldn't do just by borrowing from your future stream.

A flaw in a simple version of worldview diversification

richard_ngo1y14

I don't think this is actually a problem, for roughly the reasons described here. I.e. worldview diversification can be seen as a way of aggregating the preferences of multiple agents—but this shouldn't necessarily look like maximizing any given utility function.

I also talk about this here.

NunoSempere

I agree that things could be such that the reflective equilibrium was that, but it would really surprise me. In particular, Open Philanthropy has a sole funder. It's one guy. It would surprise me if his preference about things was such that he wanted to leave Pareto improvements on the table. Not impossible, but it feels like a stretch. Of course I'm missing a whole lot of context, so ¯\_(ツ)_/¯. Or, in other words, if things were such as in your post, I would expect to see trade (even though you could set up things such that it doesn't happen). I would not expect to see a U-turn on criminal justice. Instead, I think the worldview diversification thing as implemented by Open Phil is more like a historical accident.

AGI safety career advice

richard_ngo1y2

I expect a bunch of more rationalist-type people disagree with this claim, FWIW. But I also think that they heavily overestimate the value of the types of conceptual research I'm talking about here.

P(doom|AGI) is high: why the default outcome of AGI is doom

richard_ngo1y6

If you apply a security mindset (Murphy’s Law) to the problem of AI alignment, it should quickly become apparent that it is very difficult.

FYI I disagree with this. I think that the difficulty of alignment is a complicated and open question, not something that is quickly apparent. In particular, security mindset is about beating adversaries, and it's plausible that we train AIs in ways that mostly avoid them treating us as adversaries.

Greg_Colbourn

Interesting perspective, although I'm not sure how much we actually disagree. "Complicated and open", to me reads as "difficult" (i.e. the fact that it is still open means it has remained unsolved. For ~20 years now.). And re "adversaries", I feel like this is not really what I'm thinking of when I think about applying security mindset to transformative AI (for the most part - see next para.). "Adversary" seems to be putting too much (malicious) intent into the actions of the AI. Another way of thinking about misaligned transformative AI is as a super powered computer virus that is in someways an automatic process, and kills us (manslaughters us?) as collateral damage. It seeps through every hole that isn't patched. So eventually, in the limit of superintelligence, all the doom flows through the tiniest crack in otherwise perfect alignment (the tiniest crack in our "defences"). However, having said that, the term adversaries is totally appropriate when thinking of human actors who might maliciously use transformative AI to cause doom (Misuse risk, as referred to in OP). Any viable alignment solution needs to prevent this from happening too! (Because we now know there will be no shortage of such threats).

AGI safety career advice

richard_ngo1y15

I can see a worldview in which prioritizing raising awareness is more valuable, but I don't see the case for believing "that we have concrete proposals". Or at least, I haven't seen any; could you link them, or explain what you mean by a concrete proposal?

My guess is that you're underestimating how concrete a proposal needs to be before you can actually muster political will behind it. For example, you don't just need "let's force labs to pass evals", you actually need to have solid descriptions of the evals you want them to pass.

I also think that recent e... (read more)

Akash1y20

Clarification: I think we're bottlenecked by both, and I'd love to see the proposals become more concrete.

Nonetheless, I think proposals like "Get a federal agency to regulate frontier AI labs like the FDA/FAA" or even "push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy" are "concrete enough" to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke's list.

I don't think any of t... (read more)