All of rohinmshah's Comments + Replies

[Discussion] Best intuition pumps for AI safety

It seems like most of the work is being done here:

If you think that AI won’t be smarter than humans but agree that we cannot perfectly control AI in the same way that we cannot perfectly control humans

If I were adopting my skeptic-hat, I don't think I would buy that assumption. (Or like, sure, we can't perfectly  control AI, but your argument assumes that we are at least as unable to control AI as we are unable to control humans, which I wouldn't buy.) AI systems are programs; programs are (kind of) determined entirely by their source code, which we p... (read more)

7mariushobbhahn22dSo what would you pitch for skeptics look like? Just ask which assumptions they don't buy, rebut and iterate?
General vs specific arguments for the longtermist importance of shaping AI development

I agree with that, and that's what I meant by this statement above:

Note that general arguments can motivate you to learn more about the problem to develop more specific arguments, which you can then solve.

Join EA Global: London as a virtual attendee

Thanks, I'd be glad to see these fixed! I don't remember where exactly (10) happened unfortunately.

Join EA Global: London as a virtual attendee

(Not sure where to provide feedback on the conference, writing here because it mentions SwapCard, and the stuff I say here probably affects virtual attendees too, even though I'm an in-person attendee)

EDIT: Other people reporting many of these, including possibly a huge vulnerability where you can read anyone's "private" messages if they are part of an organization

I really dislike SwapCard. It has some ridiculously stupid and annoying things, some of which really should have been fixed after the literal first time anyone ever used it for a conference:

  • As fa
... (read more)
6GodoCdF1moHi Rohin, thanks for your feedback. Even if it mentions many bad points and personal opinion, it's highly valuable to keep improving our product :) Let me reply to each of them. 1. Inability to cancel a meeting is actually a critical bug, that will be fixed in our weekly release tomorrow. 2. Slot time and duration are defined by the event organiser, as most of the time they prefer to decide it themselves. But we plan to add an option that the organiser will be able to enable in order to "let participant define their own time slot" 3. You can have only one upcoming meeting with a same participant, once it's past, you can book a new one. 4. Actually only Event Organiser can edit a meeting location. But that's a good feedback that we have added to our backlog. 5. Having duplicate conversations is not ideal I agree, we are currently working on a better solution for this. 6. Good feedback, we will disable the ability to exit the modal by clicking outside if there is a message already written. Will be done this month. 7. You can manage per slot your availability, in order to not receive meeting request on some specific slots. We do not have yet something automatic linked to session attendance, it's in our backlog, along with the "Calendar view". 8. Events happening at the same time are grouped below a time sticky header. Calendar View can improve readability on some specific cases (wide screen only, not too many sessions that overlap) 9. Connection feature allow people to get in touch, have a private conversation, personal note and scoring on each contact, exchange their contact details, and export them. 10. This is weird, if you can provide a video it would be awesome, so we can reproduce and fix. 11. Related to number 7. Thanks all of you for your feedback and support, we work hard to make your event experience the best as possible, please continue sending us your thoug

FWIW, I found the Swapcard app to be a net improvement to my EAG experience.  I found it easier to schedule meetings than my default approach of Google Sheets + Calendly links + emails.  I wonder if part of it is that people seem more responsive on the app than via email?  

Not trying to detract from Rohin's experience.  Just pipping up in case it's helpful. I also ran into a number of the issues that Rohin had, but just sighed and worked around them.  

Disclaimer: I work for 80,000 Hours, which is fiscally sponsored by CEA, which runs EA Global.  

Aligning Recommender Systems as Cause Area

I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se.

Some illustrative hypotheticals of how these could go poorly:

  • To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially f
... (read more)
General vs specific arguments for the longtermist importance of shaping AI development

it seems worth separating motivation ("why should I care?") and action ("if I do care, what should I do?")

Imagine Alice, an existing AI safety researcher, having such a conversation with Bob, who doesn't currently care about AI safety:

Alice: AGI is decently likely to be built in the next century, and if it is it will have a huge impact on the world, so it's really important to deal with it now.

Bob: Huh, okay. It does seem like it's pretty important to make sure that AGI doesn't discriminate against people of color. And we better make sure that AGI isn't us... (read more)

3SamClarke24d(Apologies for my very slow reply.) I agree with this. If people become convinced to work on AI stuff by specific argument X, then they should definitely go and try to fix X, not something else (e.g. what other people tell them needs doing in AI safety/governance). I think when I said I wanted a more general argument to be the "default", I was meaning something very general, that doesn't clearly imply any particular intervention - like the one in the most important century series, or the "AI is a big deal" argument (I especially like Max Daniel's version [] of this). Then, it's very important to think clearly about what will actually go wrong, and how to actually fix that. But I think it's fine to do this once you're already convinced that you should work on AI, by some general argument. I'd be really curious if you still disagree with this?
3Samuel Shadrach1moThank you for this! Also found more resources on the LessWrong post. []
General vs specific arguments for the longtermist importance of shaping AI development

Finally, I personally think that the strongest case that we can currently make for the longtermist importance of shaping AI development is fairly general - something along the lines of the most important century series - and yet this doesn't seem to be the "default" argument (i.e. the one presented in key EA content/fellowships/etc. when discussing AI).

I agree that the general argument is the strongest one, in the sense that it is most likely to be correct / robust.

The problem with general arguments is that they tell you very little about how to solve the ... (read more)

8SamClarke1moAgreed! I think this is true for some kinds of content/fellowships/etc, but not all. For those targeted at people who aren't already convinced that AI safety/governance should be prioritised (which is probably the majority), it seems more important to present them with the strongest arguments for caring about AI safety/governance in the first place. This suggests presenting more general arguments. Then, I agree that you want to get people to help solve the problem, which requires talking about specific failure modes. But I think that doing this prematurely can lead people to dismiss the case for shaping AI development for bad reasons. Another way of saying this: for AI-related EA content/fellowships/etc, it seems worth separating motivation ("why should I care?") and action ("if I do care, what should I do?"). This would get you the best of both worlds: people are presented with the strongest arguments, allowing them to make an informed decision about how much AI stuff should be prioritised, and then also the chance to start to explore specific ways to solve the problem. I think this maybe applies to longtermism in general. We don't yet have that many great ideas of what to do if longtermism is true, and I think that people sometimes (incorrectly) dismiss longtermism for this reason.
Is it crunch time yet? If so, who can help?

I do not think it is crunch time. I think people in the reference class you're describing should go with some "normal" plan such as getting into the best AI PhD program you can get into, learning how to do AI research, and then working on AI safety.

(There are a number of reasons you might do something different. Maybe you think academia is terrible and PhDs don't teach you anything, and so instead you immediately start to work independently on AI safety. That all seems fine. I'm just saying that you shouldn't make a change like this because of a supposed "... (read more)

I do think it is crunch time probably, but I agree with what Rohin said here about what you should do for now (and about my minority status). Skilling up (not just in technical specialist stuff, also in your understanding of the problem we face, the literature, etc.) is what you should be doing. For what I think should be done by the community as a whole, see this comment.

Seeking social science students / collaborators interested in AI existential risks

Planned summary for the Alignment Newsletter:

This post presents a list of research questions around existential risk from AI that can be tackled by social scientists. The author is looking for collaborators to expand the list and tackle some of the questions on it, and is aiming to provide some mentorship for people getting involved.

The motivated reasoning critique of effective altruism

It’s so easy to collapse into the arms of “if there’s even a small chance X will make a very good future more likely …” As with consequentialism, I totally buy the logic of this! The issue is that it’s incredibly easy to hide motivated reasoning in this framework. Figuring out what’s best to do is really hard, and this line of thinking conveniently ends the inquiry (for people who want that).

I have seen something like this happen, so I'm not claiming it doesn't, but it feels pretty confusing to me. The logic pretty clearly doesn't hold up. Even if you acce... (read more)

Yeah I'm surprised by this as well. Both classical utilitarianism (in the extreme version, "everything that is not morally obligatory is forbidden") and longtermism just seem to have many lower degrees of freedom than other commonly espoused ethical systems, so it would naively be surprising if these worldviews can justify a broader range of actions than close alternatives. 

The motivated reasoning critique of effective altruism

Yeah, I agree that would also count (and as you might expect I also agree that it seems quite hard to do).

Basically with (b) I want to get at "the model does something above and beyond what we already had with verbal arguments"; if it substantially affects the beliefs of people most familiar with the field that seems like it meets that criterion.

The motivated reasoning critique of effective altruism

The obvious response here is that I don't think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I'm even more suspicious of other methodologies here.

Yeah, I'm just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.

I think we might just be arguing about different things here?

Makes sense, I'm happy to ignore those sorts of methods for the purposes of this discussion.

Medicine is less amenable to empirical testing than physics, but that doesn't mea

... (read more)
The motivated reasoning critique of effective altruism

Replied to Linch -- TL;DR: I agree this is true compared to global poverty or animal welfare, and I would defend this as simply the correct way to respond to actual differences in the questions asked in longtermism vs. those asked in global poverty or animal welfare.

You could move me by building an explicit quantitative model for a popular question of interest in longtermism that (a) didn't previously have models (so e.g. patient philanthropy or AI racing doesn't count), (b) has an upshot that we didn't previously know via verbal arguments, (c) doesn't involve subjective personal guesses or averages thereof for important parameters, and (d) I couldn't immediately tear a ton of holes in that would call the upshot into question.

5MichaelStJules2moI feel that (b) identifying a new upshot shouldn't be necessary; I think it should be enough to build a model with reasonably well-grounded parameters (or well-grounded ranges for them) in a way that substantially affects the beliefs of those most familiar with or working in the area (and maybe enough to change minds about what to work on, within AI or to AI or away from AI). E.g., more explicitly weighing risks of accelerating AI through (some forms of) technical research vs actually making it safer, better grounded weights of catastrophe from AI, a better-grounded model for the marginal impact of work. Maybe this isn't a realistic goal with currently available information.
The motivated reasoning critique of effective altruism

My guess is that longtermist EAs ( like almost all humans) have never been that close to purely quantitative models guiding decisions

I agree with the literal meaning of that, because it is generally a terrible idea to just do what a purely quantitative model tells you (and I'll note that even GiveWell isn't doing this). But imagining the spirit of what you meant, I suspect I disagree.

I don't think you should collapse it into the single dimension of "how much do you use quantitative models in your decisions". It also matters how amenable the decisions are t... (read more)

7Linch2moThanks so much for the response! Upvoted. (I'm exaggerating my views here to highlight the differences, I think my all-things-considered opinion on these positions are much closer to yours than the rest of the comment will make it sound) I think my strongest disagreement with your comment is the framing here: If we peel away the sarcasm, I think the implicit framing is that 1. If X is less amenable than Y to method A of obtaining truth, and X is equally or more amenable to methods B, C, and D relative to Y, we should do less method A to obtain truth in X (relative to Y), and more methods B, C, and D. 2. X is less amenable than Y to method A of obtaining truth. 3. Thus, we should use method A less in X than in Y. Unless I'm missing something, I think this is logically invalid. The obvious response here is that I don't think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I'm even more suspicious of other methodologies here. Medicine is less amenable to empirical testing than physics, but that doesn't mean [] that clinical intuition is a better source of truth for the outcomes of drugs than RCTs. (But medicine is relatively much less amenable to theorems than physics, so it's correct to use less proofs in medicine than physics.) More minor gripes: I think I'm willing to bite the bullet and say that GiveWell (or at least my impression of them from a few years back) should be more rigorous in their modeling. Eg, weird to use median staff member's views as a proxy for truth, weird to have so few well-specified forecasts, and so forth. I think we might just be arguing about different things here? Like to me, these seem more like verbal arguments of questionable [] ver
5MichaelStJules2moHmm, I guess I hadn't read that post in full detail (or I did and forgot about the details), even though I was aware of it. I think the argument there that mortality will roughly match some time after transition is pretty solid (based on two datasets and expert opinion). I think there was still a question of whether or not the "short-term" increase in mortality outweighs the reduction in behavioural deprivation, especially since it wasn't clear how long the transition period would be. This is a weaker claim than my original one, though, so I'll retract my original claim. FWIW, although this is completely different claim, bone fracture is only discussed in that post as a potential cause of increased mortality in cage-free systems, but not as a source of additional pain regardless of mortality that could mean cage-free is worse and would remain worse. The post was primarily focused on mortality and behavioural deprivation/opportunities. Fractures have since been weighted explicitly here [] (from []).
The motivated reasoning critique of effective altruism

Overall great post, and I broadly agree with the thesis. (I'm not sure the evidence you present is all that strong though, since it too is subject to a lot of selection bias.) One nitpick:

Most of the posts’ comments were critical, but they didn’t positively argue against EV calculations being bad for longtermism. Instead they completely disputed that EV calculations were used in longtermism at all!

I think you're (unintentionally) running a motte-and-bailey here.

Motte: Longtermists don't think you should build explicit quantitative models, take their best g... (read more)

4Linch2moOh I absolutely agree. I generally think the more theoretical sections of my post are stronger than the empirical sections. I think the correct update from my post is something like "there is strong evidence of nonzero motivated reasoning in effective altruism, and some probability that motivated reasoning + selection bias-mediated issues are common in our community" but not enough evidence to say more than that. I think a principled follow-up work (maybe by CEA's new epistemics project manager []?) would look like combing through all (or a statistically representative sample of) impact assessments and/or arguments made in EA, and try to catalogue them for motivated reasoning and other biases. I think this is complicated. It's certainly possible I'm fighting against strawmen! But I will just say what I think/believe right now, and others are free to correct me. I think among committed longtermists, there is a spectrum of trust in explicit modeling, going from my stereotype of weeatquince(2020)'s views to maybe 50% (30%?) of the converse of what you call the "motte."(Maybe Michael Dickens [] (2016) is closest?). My guess is that longtermist EAs ( like almost all humans) have never been that close to purely quantitative models guiding decisions, and we've moved closer in the last 5 years to reference classes of fields like the ones that weeatquince's post pulls from. I also think I agree with MichaelStJules' point about the amount of explicit modeling that actually happens relative to effort given to other considerations. "Real" values are determined not by what you talk about, butby what tradeoffs [] you actually make.
4MichaelStJules2moI'm not defending what you think is a bailey, but as a practical matter, I would say until recently (with Open Phil publishing a few models for AI), longtermists have not been using numbers or models much, or when they do, some of the most important parameters are extremely subjective personal guesses or averages of people's guesses, not based on reference classes, and risks of backfire were not included.
How to succeed as an early-stage researcher: the “lean startup” approach

In that example, Alice has ~5 min of time to give feedback to Bob; in Toby's case the senior researchers are (in aggregate) spending at least multiple hours providing feedback (where "Bob spent 15 min talking to Alice and seeing what she got excited about" counts as 15 min of feedback from Alice). That's the major difference.

I guess one way you could interpret Toby's advice is to simply get a project idea from a senior person, and then go work on it yourself without feedback from that senior person -- I would disagree with that particular advice. I think it's important to have iterative / continual feedback from senior people.

How to succeed as an early-stage researcher: the “lean startup” approach

I agree substituting the question would be bad, and sometimes there aren't any relevant experts in which case you shouldn't defer to people. (Though even then I'd consider doing research in an unrelated area for a couple of years, and then coming back to work on the question of interest.)

I admit I don't really understand how people manage to have a "driving question" overwritten -- I can't really imagine that happening to me and I am confused about how it happens to other people.

(I think sometimes it is justified, e.g. you realize that your question was co... (read more)

How to succeed as an early-stage researcher: the “lean startup” approach

so it's e.g. the mesa-optimizers paper or multiple LW posts by John Wentworth.  As far as I can tell, none of these seems to be following the proposed 'formula for successful early-career research'. 

I think the mesa optimizers paper fits the formula pretty well? My understanding is that the junior authors on that paper interacted a lot with researchers at MIRI (and elsewhere) while writing that paper.

I don't know John Wentworth's history. I think it's plausible that if I did, I wouldn't have thought of him as a junior researcher (even before seei... (read more)

How to succeed as an early-stage researcher: the “lean startup” approach

My impression from talking to friends working in ML is that usually faculty have ideas that they'd be excited to see their senior grad students to work on, senior grad students have research ideas that they'd love for junior grad students to implement, and so forth. 

I think this is true if the senior person can supervise the junior person doing the implementation (which is time-expensive). I have lots of project ideas that I expect I could supervise. I have ~no project ideas where I expect I could spend an hour talking to someone, have them go off for... (read more)

How to succeed as an early-stage researcher: the “lean startup” approach

I'm considering three types of advice:

  1. "Always defer to experts"
  2. "Defer to experts for ~3 years, then trust your intuitions"
  3. "Always trust your intuitions"

When you said

But to steelman(steel-alien?) his view a little, I worry that EA is overinvested in outside-view/forecasting types (like myself?), rather than people with strong and true convictions/extremely high-quality initial research taste, which (quality-weighted) may be making up  the majority of revolutionary progress. 

And if we tell the future Geoffrey Hintons (and Eliezer Yudkowskys) of the

... (read more)
5Jan_Kulveit3moI would guess the 'typical young researcher fallacy' also applies to Hinton - my impression is he is basically advising his past self, similarly to Toby. As a consequence, the advice is likely sensible for people-much-like-past-Hinton, but not a good general advice for everyone. In ~3 years most people are able to re-train their intuitions a lot (which is part of the point!). This seems particularly dangerous in cases where expertise in the thing you are actually interested in does not exist, but expertise in something somewhat close does - instead of following your curiosity, you 'substitute the question' with a different question, for which a PhD program exists, or senior researchers exist, or established directions exist. If your initial taste/questions was better than the expert's, you run a risk of overwriting your taste with something less interesting/impactful. Anecdotal illustrative story: Arguably, large part of what are now the foundations of quantum information theory / quantum computing could have been discovered much sooner, together with taking more sensible interpretations of quantum mechanics than Copenhagen interpretation seriously. My guess what was happening during multiple decades (!) was many early career researchers were curious what's going on, dissatisfied with the answers, interested in thinking about the topic more... but they were given the advice along the lines 'this is not a good topic for PhDs or even undergrads; don't trust your intuition; problems here are distasteful mix of physics and philosophy; shut up and calculate, that's how a real progress happens' ... and they followed it; acquired a taste telling them that solving difficult scattering amplitudes integrals using advanced calculus techniques is tasty, and thinking about deep things formulated using tools of high-school algebra is for fools. (Also if you did run a survey in year 4 of their PhDs, large fraction of quantum physicists would probably endorse the learned update
How to succeed as an early-stage researcher: the “lean startup” approach

What % do you think this is true for, quality-weighted? 

Weighted by quality after graduating? Still > 50%, probably > 80%, but it's really just a lot harder to tell (I don't have enough data). I'd guess that the best people still had "bad ideas" when they were starting out.

(I think a lot of what makes an junior researcher's idea "bad" is that the researcher doesn't know about existing work, or has misinterpreted the goal of the field, or lacks intuitions gained from hands-on experience, etc. It is really hard to compensate for a lack of knowledg... (read more)

3Lukas_Finnveden3moI'm confused about your FAQ's advice here. Some quotes from the longer example: In the example, Bob "wants to get into the field", so this seems like an example of how junior people shouldn't defer to experts when picking research projects. (Specualative differences: Maybe you think there's a huge difference between Alice giving a recommendation about an area vs a specific research project? Or maybe you think that working on impact regularization is the best Bob can do if he can't find a senior researcher to supervise him, but if Alice could supervise his work on robustness he should go with robustness? If so, maybe it's worth clarifying that in the FAQ.) Edit: TBC, I interpret Toby Shevlane as saying ~you should probably work on whatever senior people find interesting; while Jan Kulveit says that "some young researchers actually have great ideas, should work on them, and avoid generally updating on research taste of most of the 'senior researchers'". The quoted FAQ example is consistent with going against Jan's strong claim, but I'm not sure it's consistent with agreeing with Toby's initial advice, and I interpret you as agreeing with that advice when writing e.g. "Defer to experts for ~3 years, then trust your intuitions".
4Linch3mo(As an aside, I read your FAQ and enjoyed it, so thanks for the share!)

Thanks for the link to your FAQ, I'm excited to read it further now!

Re: the rest of your comment, I think you're reading more into my comment than I said or meant. I do not think researchers should generally be deferential; I think they should have strong beliefs, that may in fact go against expert consensus. I just don't think this is the right attitude while you are junior

To be clear, I think Geoffrey Hinton's advice was targeted at very junior people. In context, the interview was conducted for Andrew Ng's online deep learning course, which for many peo... (read more)

How to succeed as an early-stage researcher: the “lean startup” approach

I'm not going to go into much detail here, but I disagree with all of these caveats. I think this would be a worse post if it included the first and third caveats (less sure about the second).

First caveat: I think > 95% of incoming PhD students in AI at Berkeley have bad ideas (in the way this post uses the phrase). I predict that if you did a survey of people who have finished their PhD in AI at Berkeley, over 80% of them would think their initial ideas were significantly worse than their later ideas. (Note also that AI @ Berkeley is a very selective p... (read more)

Let's start with the third caveat: maybe the real crux is what we think are the best outputs;  what I consider some of the best outputs by young researchers of AI alignment is easier to point at via examples - so it's e.g. the mesa-optimizers paper or multiple LW posts by John Wentworth.  As far as I can tell, none of these seems to be following the proposed 'formula for successful early-career research'. 

My impression is PhD students in AI in Berkeley need to optimise, and actually optimise a lot for success in an established field (ML/AI),... (read more)

I think > 95% of incoming PhD students in AI at Berkeley have bad ideas (in the way this post uses the phrase).[...](Note also that AI @ Berkeley is a very selective program.)

What % do you think this is true for, quality-weighted? 

I remember an interview with Geoffrey Hinton where (paraphrased) Hinton was basically like "just trust your intuitions man. Either your intuitions are good or they're bad. If they are good you should mostly trust your intuitions regardless of what other people say, and if they're bad, well, you aren't going to be a good r... (read more)

rohinmshah's Shortform

I'm not objecting to providing the information (I think that is good), I'm objecting to calling it a "conflict of interest".

I'd be much more keen on something like this (source):

For transparency, note that the reports for the latter three rows are all Open Philanthropy analyses, and I am co-CEO of Open Philanthropy.

rohinmshah's Shortform

I sometimes see people arguing for people to work in area A, and declaring a conflict of interest that they are personally working on area A.

If they already were working in area A for unrelated reasons, and then they produced these arguments, it seems reasonable to be worried about motivated reasoning.

On the other hand, if because of these arguments they switched to working in area A, this is in some sense a signal of sincerity ("I'm putting my career where my mouth is").

I don't like the norm of declaring your career as a "conflict of interest", because it... (read more)

8Ramiro3moI share your feeling towards it... but I also often say that one's "skin in the game" (your latter example) is someone else's "conflict of interest." I don't think that the listener / reader is usually in a good position to distinguish between your first and your second example; that's enough to justify the practice of disclosing this as a potential "conflict of interest." In addition, by knowing you already work for cause X, I might consider if your case is affected by some kind of cognitive bias.
Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'"

He asserts that "numerous people have come forward, both publicly and privately, over the past few years with stories of being intimidated, silenced, or 'canceled.'"  This doesn't match my experience.

I also have not had this experience, though that doesn't mean it didn't happen, and I'd want to take this seriously if it did happen.

However, Phil Torres has demonstrated that he isn't above bending the truth in service of his goals, so I'm inclined not to believe him. See previous discussion here. Example from the new article:

It’s not difficult to see ho

... (read more)

Many thanks for this, Rohin. Indeed, your understanding is correct. Here is my own screenshot of my private announcement on this matter.

This is far from the first time that Phil Torres references my work in a way that is set up to give the misleading impression that I share his anti-longtermism view. He and I had extensive communication about this in 2020, but he showed no sympathy for my complaints. 

What is the role of public discussion for hits-based Open Philanthropy causes?

This is my best attempt at summarizing a reasonable outsider's view of the current state of affairs. Before publication, I had this sanity checked (though not necessarily endorsed) by an EA researcher with more context. Apologies in advance if it misrepresents the actual state of affairs, but that's precisely the thing I'm trying to clarify for myself and others.

I just want to note that I think this question is great and does not misrepresent the actual state of affairs.

I do think there's hope for some quantitative estimates even in the speculative cases; ... (read more)

4Denkenberger4moAjeya explains it in her 80k interview [] and the result is: "this estimate is roughly $200 trillion per world saved, in expectation. So, it’s actually like billions of dollars for some small fraction of the world saved, and dividing that out gets you to $200 trillion per world saved. This is quite good in the scheme of things, because it’s like less than two years’ worth of gross world product. It’s like everyone in the world working together on this one problem for like 18 months, to save the world."
Aligning Recommender Systems as Cause Area

Unfortunately I don't really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.

The next Alignment Newsletter will include two articles on recommender systems that mostly disagree with the "recommender systems are driving polarization" position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn't neglecting arguments pointing in the opposite direction.)

EDIT: To be clear, I'd be excited for someone else to develop this... (read more)

Aligning Recommender Systems as Cause Area

The result is software that is extremely addictive, with a host of hard-to-measure side effects on users and society including harm to relationships, reduced cognitive capacity, and political radicalization.

As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I've seen specific arguments arguing against the claim.) One seems to be a paper studying what users bel... (read more)

3IvanVendrov1moThanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I've come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it's not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind. This is closer to a crux. I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes. From your Alignment Forum comment, This seems like the real crux. I'm not sure how exactly you define "deliberately and intentionally" but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).
6Pablo4moHave you considered developing these comments into a proper EA Forum post?
Case studies of self-governance to reduce technology risk

Planned summary for the Alignment Newsletter:

Should we expect AI companies to reduce risk through self-governance? This post investigates six historical cases, of which the two most successful were the Asilomar conference on recombinant DNA, and the actions of Leo Szilard and other physicists in 1939 (around the development of the atomic bomb). It is hard to make any confident conclusions, but the author identifies the following five factors that make self-governance more likely:

1. The risks are salient.
2. If self-governance doesn’t happen, then the govern

... (read more)
Can money buy happiness? A review of new data

Nice find, thanks!

(For others: note that the linked blog post also considers things like "maybe they just uploaded the wrong data" to be a plausible explanation.)

Getting started independently in AI Safety

 you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you'd better go through Spinning Up in Deep RL before you can continue. 

Tbc, I do generally like the idea of just in time learning. But:

  • You may not realize when you are hopelessly out of your depth ("doesn't everyone say that ML is an art where you just turn knobs until things work?" or "how was I supposed to know that the algorithm was going to silently clip my rewards, making all of my reward shaping useless?")
  • You may not know what you don't know. In
... (read more)
3NicholasKross4moI feel like I'm on both sides of this, so I'll take the course and then immediately jump into whatever seems interesting in PyTorch
Getting started independently in AI Safety

I think too many people feel held back from doing a project like thing on their own.

Absolutely. Also, too many people don't feel held back enough (e.g. maybe it really would have been beneficial to, say, go through Spinning Up in Deep RL before attempting a deep RL project). How do you tell which group you're in? 

(This comment inspired by Reversing Advice)

7rory_greig5moThis is a good point, although I suppose you could still think of this in the framing of "just in time learning", i.e. you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you'd better go through Spinning Up in Deep RL before you can continue. Although the risk is that it may be demoralising to start something which is too far outside of your comfort zone.
4JJ Hepburn5moYep, always tricky here. I was actually just reading Reversing Advice [] just before posting this but wasn't sure how I should manage this. Advice is like medication. It should come with similar rules, regulations, restrictions and warnings. Some advice is over the counter and can be used by almost everyone. Advice should be used in moderation, do not take more than the recommended dose. Prescription medicine is illegal to advertise for (in Australia) because it is not useful for everyone and should only be recommended by a health care professional. Some advice does not mix well with other advice and care should be taken when mixing advice. Do not take advice that has been recommended to someone else as it may not apply to you. A particular problem may have several different advice that is helpful for it but each does not work for everyone, so you may need to try a few before you find the one that works for you. Having said that I think I would default to aiming for the higher thing when you are not sure. If you aim high you may fall short and if you aim low you can still only fall short. So if you're on the margin, start with a deep RL project. You might quickly find that its hard to do and fall back to doing Spinning Up. If symptoms persist, please consult your health care professional.
Can money buy happiness? A review of new data

If we change the y-axis to display a linear relationship, this tells a different story. In fact, we see a plateauing of the relationship between income and experience wellbeing, just as found in Kahneman and Deaton (2010), but just at a later point — about $200,000 per year.

Uhh... that shouldn't happen from just re-plotting the same data. In fact, how is it that in the original graph, there is an increase from $400,000 to $620,000, but in the new linear axis graph, there is a decrease?

A doubling of income is associated with about a 1-point increase on a 0–

... (read more)
7AppliedDivinityStudies5moRohin, I thought this was super weird too. Did a bit more digging and found this blog post: [] The author (who is an academic) agrees this is a bit weird, and notes "small-n noisiness at high incomes". So overall, I see the result as plausible but not super robust. Though note that in alignment with Kahneman/Deaton, Life Satisfaction does continue to increase even as Experienced Wellbeing dips.
2julianhazell5moOne thing I would like to add is that I think it is plausible that the results might not be even close to the same if Killingsworth's study contained responses from folks living in low-income countries. For example, I wouldn't be surprised if money actually has a much stronger effect on happiness for people earning $500 ~ per year, as things like medicine, food, shelter, sanitation, etc probably bring significantly more happiness than the kinds of things bought by people that earn $400,000+ per year. Also, even if it does make a small difference (which I find hard to believe at such a low income), you can double, triple, quadruple, etc the income of 100 people earning $500 per year for a lower cost than doubling the income of one person earning $200,000. Since we can't directly prove this from Killingsworth's study — which the blogpost was primarily about — the assumption was that the results would be the same for low-income earners.
7MichaelPlant5moSo, there was a discrepancy between the data provided for the paper and the graph in the paper itself. The graph plotted above used the data provided. I'm not sure what else to say without contacting the journal itself. I don't follow this. The claim is that money makes less of a difference what one might expect, not that it makes no difference. Obviously, there are reasons for and against working at, say, Goldman Sachs besides the salary. It does follow that, if you receiving money makes less of a difference than you would expect, then you giving it to other people, and them receiving it, will also make a smaller-than-anticipated difference. But, of course, you could do something else with your money that could be more effective than giving it away as cash - bednets, deworming, therapy, etc.
Ben Garfinkel's Shortform

I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

2Ben Garfinkel5moMostly the former! I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was. For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that humanity has "control" of its future, then absorbing this point (if it's correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.
High Impact Careers in Formal Verification: Artificial Intelligence

Planned summary for the Alignment Newsletter:

This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:

1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation

2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we w

... (read more)
Progress studies vs. longtermist EA: some differences

I just think there's a much greater chance that we look back on it and realize, too late, that we were focused on entirely the wrong things.

If you mean like 10x greater chance, I think that's plausible (though larger than I would say). If you mean 1000x greater chance, that doesn't seem defensible.

In both fields you basically ~can't experiment with the actual thing you care about (you can't just build a superintelligent AI and check whether it is aligned; you mostly can't run an intervention on the entire world  and check whether world GDP went up). Y... (read more)

Help me find the crux between EA/XR and Progress Studies

I've been perceiving a lot of EA/XR folks to be in (3) but maybe you're saying they're more in (2)?


Maybe it turns out that most folks in each community are between (1) and (2) toward the other. That is, we're just disagreeing on relative priority and neglectedness.

That's what I would say.

I can't see it as literally the only thing worth spending any marginal resources on (which is where some XR folks have landed).

If you have opportunity A where you get a benefit of 200 per $ invested, and opportunity B where you get a benefit of 50 per $ invested, you w... (read more)

Help me find the crux between EA/XR and Progress Studies

I kinda sorta answered Q2 above (I don't really have anything to add to it).

Q3: I'm not too clear on this myself. I'm just an object-level AI alignment researcher :P

Q4: I broadly agree this is a problem, though I think this:

Before PS and EA/XR even resolve our debate, the car might be run off the road—either as an accident caused by fighting groups, or on purpose.

seems pretty unlikely to me, where I'm interpreting it as "civilization stops making any progress and regresses to the lower quality of life from the past, and this is a permanent effect". 

I ... (read more)

Help me find the crux between EA/XR and Progress Studies

If XR weighs so strongly (1e15 future lives!) that you are, in practice, willing to accept any cost (no matter how large) in order to reduce it by any expected amount (no matter how small), then you are at risk of a Pascal's Mugging.

Sure. I think most longtermists wouldn't endorse this (though a small minority probably would).

But when the proposal becomes: “we should not actually study progress or try to accelerate it”, I get lost.

I don't think this is negative, I think there are better opportunities to affect the future (along the lines of Ben's comment).... (read more)

OK, so maybe there are a few potential attitudes towards progress studies:

  1. It's definitely good and we should put resources to it
  2. Eh, it's fine but not really important and I'm not interested in it
  3. It is actively harming the world by increasing x-risk, and we should stop it

I've been perceiving a lot of EA/XR folks to be in (3) but maybe you're saying they're more in (2)?

Flipping it around, PS folks could have a similar (1) positive / (2) neutral / (3) negative attitude towards XR efforts. My view is not settled, but right now I'm somewhere between (1) and (2)... (read more)

Progress studies vs. longtermist EA: some differences

But EA/XR folks don't seem to be primarily advocating for specific safety measures. Instead, what I hear (or think I'm hearing) is a kind of generalized fear of progress. Again, that's where I get lost. I think that (1) progress is too obviously valuable and (2) our ability to actually predict and control future risks is too low.

I think there's a fear of progress in specific areas (e.g. AGI and certain kinds of bio) but not a general one? At least I'm in favor of progress generally and against progress in some specific areas where we have good object-level... (read more)

4jasoncrawford6moThat's interesting, because I think it's much more obvious that we could successfully, say, accelerate GDP growth by 1-2 points per year, than it is that we could successfully, say, stop an AI catastrophe. The former is something we have tons of experience with: there's history, data, economic theory… and we can experiment and iterate. The latter is something almost completely in the future, where we don't get any chances to get it wrong and course-correct. (Again, this is not to say that I'm opposed to AI safety work: I basically think it's a good thing, or at least it can be if pursued intelligently. I just think there's a much greater chance that we look back on it and realize, too late, that we were focused on entirely the wrong things.)
Help me find the crux between EA/XR and Progress Studies

If you're willing to accept GCR in order to slightly reduce XR, then OK—but it feels to me that you've fallen for a Pascal's Mugging.

Eliezer has specifically said that he doesn't accept Pascal's Mugging arguments in the x-risk context

I wouldn't agree that this is a Pascal's Mugging. In fact, in a comment on the post you quote, Eliezer says:

If an asteroid were genuinely en route, large enough to wipe out humanity, possibly stoppable, and nobody was doing anything about this 10% probability, I would still be working on FAI but I would be screaming pretty lou

... (read more)
4jasoncrawford6moAs to whether my four questions are cruxy or not, that's not the point! I wasn't claiming they are all cruxes. I just meant that I'm trying to understand the crux, and these are questions I have. So, I would appreciate answers to any/all of them, in order to help my understanding. Thanks!
3jasoncrawford6moI'm not making a claim about how effective our efforts can be. I'm asking a more abstract, methodological question about how we weigh costs and benefits. If XR weighs so strongly (1e15 future lives!) that you are, in practice, willing to accept any cost (no matter how large) in order to reduce it by any expected amount (no matter how small), then you are at risk of a Pascal's Mugging. If not, then great—we agree that we can and should weigh costs and benefits. Then it just comes down to our estimates of those things. And so then I just want to know, OK, what's the plan? Maybe the best way to find the crux here is to dive into the specifics of what PS and EA/XR each propose to do going forward. E.g.: * We should invest resources in AI safety? OK, I'm good with that. (I'm a little unclear on what we can actually do there that will help at this early stage, but that's because I haven't studied it in depth, and at this point I'm at least willing to believe that there are valuable programs there. So, thumbs up.) * We should raise our level of biosafety at labs around the world? Yes, absolutely. I'm in. Let's do it. * We should accelerate moral/social progress? Sure, we absolutely need that—how would we actually do it? See question 3 above. But when the proposal becomes: “we should not actually study progress or try to accelerate it”, I get lost. Failing to maintain and accelerate progress, in my mind, is a global catastrophic risk, if not an existential one. And it's unclear to me whether this would even increase or decrease XR, let alone the amount—in any case I think there are very wide error bars on that estimate. But maybe that's not actually the proposal from any serious EA/XR folks? I am still unclear on this.
Final Report of the National Security Commission on Artificial Intelligence (NSCAI, 2021)

A lot of longtermists do pay attention to this sort of stuff, they just tend not to post on the EA Forum / LessWrong. I personally heard about the report from many different people after it was published, and also from a couple of people even before it was published (when there was a chance to provide input on it).

In general I expect that for any sufficiently large object-level thing, the discourse on the EA Forum will lag pretty far behind the discourse of people actively working on that thing (whether that discourse is public or not).  I read the EA... (read more)

Yeah, that makes sense. 

It still seems to me like this is a sufficiently important and interesting report that it'd be better if there was a little more mention of it on the Forum, for the sake of "the general longtermist public", since (a) the Forum seems arguably the main, central hub for EA discourse in general, and (b) there is a bunch of other AI governance type stuff here, so having that without things like this report could give a distorted picture. 

But it also doesn't seem like a horrible or shocking error has been committed. And it does make sense that these things would be first, and mostly, discussed in more specialised sub-communities and venues.

Draft report on existential risk from power-seeking AI

If AGI doom were likely, what additional evidence would we expect to see?

  1. Humans are pursuing convergent instrumental subgoals much more. (Related question: will AGIs want to take over the world?)
    1. A lot more anti-aging research is going on.
    2. Children's inheritances are ~always conditional on the child following some sort of rule imposed by the parent, intended to further the parent's goals after their death.
    3. Holidays and vacations are rare; when they are taken it is explicitly a form of rejuvenation before getting back to earning tons of money.
    4. Humans look like
... (read more)
Draft report on existential risk from power-seeking AI

I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: "The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future."

If you're still making this claim now, want to bet on it? (We'd first have to operationalize who counts as an "AI safety researcher".)

I also think it wasn't true in Sep 2017, but I'm less confident about that, and it's not as easy to bet on.

(Am e-mailing with Rohin, will report back e.g. if we check this with a survey.)

Load More