It seems like most of the work is being done here:
If you think that AI won’t be smarter than humans but agree that we cannot perfectly control AI in the same way that we cannot perfectly control humans
If I were adopting my skeptic-hat, I don't think I would buy that assumption. (Or like, sure, we can't perfectly control AI, but your argument assumes that we are at least as unable to control AI as we are unable to control humans, which I wouldn't buy.) AI systems are programs; programs are (kind of) determined entirely by their source code, which we p... (read more)
I agree with that, and that's what I meant by this statement above:
Note that general arguments can motivate you to learn more about the problem to develop more specific arguments, which you can then solve.
Thanks, I'd be glad to see these fixed! I don't remember where exactly (10) happened unfortunately.
(Not sure where to provide feedback on the conference, writing here because it mentions SwapCard, and the stuff I say here probably affects virtual attendees too, even though I'm an in-person attendee)
EDIT: Other people reporting many of these, including possibly a huge vulnerability where you can read anyone's "private" messages if they are part of an organization
I really dislike SwapCard. It has some ridiculously stupid and annoying things, some of which really should have been fixed after the literal first time anyone ever used it for a conference:
FWIW, I found the Swapcard app to be a net improvement to my EAG experience. I found it easier to schedule meetings than my default approach of Google Sheets + Calendly links + emails. I wonder if part of it is that people seem more responsive on the app than via email? Not trying to detract from Rohin's experience. Just pipping up in case it's helpful. I also ran into a number of the issues that Rohin had, but just sighed and worked around them. Disclaimer: I work for 80,000 Hours, which is fiscally sponsored by CEA, which runs EA Global.
I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se.
Some illustrative hypotheticals of how these could go poorly:
it seems worth separating motivation ("why should I care?") and action ("if I do care, what should I do?")
Imagine Alice, an existing AI safety researcher, having such a conversation with Bob, who doesn't currently care about AI safety:
Alice: AGI is decently likely to be built in the next century, and if it is it will have a huge impact on the world, so it's really important to deal with it now.
Bob: Huh, okay. It does seem like it's pretty important to make sure that AGI doesn't discriminate against people of color. And we better make sure that AGI isn't us... (read more)
Related: Regulatory Markets for AI Safety
Another resource: https://ai-alignment.com/sympathizing-with-ai-e11a4bf5ef6e
Finally, I personally think that the strongest case that we can currently make for the longtermist importance of shaping AI development is fairly general - something along the lines of the most important century series - and yet this doesn't seem to be the "default" argument (i.e. the one presented in key EA content/fellowships/etc. when discussing AI).
I agree that the general argument is the strongest one, in the sense that it is most likely to be correct / robust.
The problem with general arguments is that they tell you very little about how to solve the ... (read more)
I do not think it is crunch time. I think people in the reference class you're describing should go with some "normal" plan such as getting into the best AI PhD program you can get into, learning how to do AI research, and then working on AI safety.
(There are a number of reasons you might do something different. Maybe you think academia is terrible and PhDs don't teach you anything, and so instead you immediately start to work independently on AI safety. That all seems fine. I'm just saying that you shouldn't make a change like this because of a supposed "... (read more)
I do think it is crunch time probably, but I agree with what Rohin said here about what you should do for now (and about my minority status). Skilling up (not just in technical specialist stuff, also in your understanding of the problem we face, the literature, etc.) is what you should be doing. For what I think should be done by the community as a whole, see this comment.
Planned summary for the Alignment Newsletter:
This post presents a list of research questions around existential risk from AI that can be tackled by social scientists. The author is looking for collaborators to expand the list and tackle some of the questions on it, and is aiming to provide some mentorship for people getting involved.
It’s so easy to collapse into the arms of “if there’s even a small chance X will make a very good future more likely …” As with consequentialism, I totally buy the logic of this! The issue is that it’s incredibly easy to hide motivated reasoning in this framework. Figuring out what’s best to do is really hard, and this line of thinking conveniently ends the inquiry (for people who want that).
I have seen something like this happen, so I'm not claiming it doesn't, but it feels pretty confusing to me. The logic pretty clearly doesn't hold up. Even if you acce... (read more)
Yeah I'm surprised by this as well. Both classical utilitarianism (in the extreme version, "everything that is not morally obligatory is forbidden") and longtermism just seem to have many lower degrees of freedom than other commonly espoused ethical systems, so it would naively be surprising if these worldviews can justify a broader range of actions than close alternatives.
Yeah, I agree that would also count (and as you might expect I also agree that it seems quite hard to do).
Basically with (b) I want to get at "the model does something above and beyond what we already had with verbal arguments"; if it substantially affects the beliefs of people most familiar with the field that seems like it meets that criterion.
The obvious response here is that I don't think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I'm even more suspicious of other methodologies here.
Yeah, I'm just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.
I think we might just be arguing about different things here?
Makes sense, I'm happy to ignore those sorts of methods for the purposes of this discussion.
Medicine is less amenable to empirical testing than physics, but that doesn't mea
Replied to Linch -- TL;DR: I agree this is true compared to global poverty or animal welfare, and I would defend this as simply the correct way to respond to actual differences in the questions asked in longtermism vs. those asked in global poverty or animal welfare.
You could move me by building an explicit quantitative model for a popular question of interest in longtermism that (a) didn't previously have models (so e.g. patient philanthropy or AI racing doesn't count), (b) has an upshot that we didn't previously know via verbal arguments, (c) doesn't involve subjective personal guesses or averages thereof for important parameters, and (d) I couldn't immediately tear a ton of holes in that would call the upshot into question.
My guess is that longtermist EAs ( like almost all humans) have never been that close to purely quantitative models guiding decisions
I agree with the literal meaning of that, because it is generally a terrible idea to just do what a purely quantitative model tells you (and I'll note that even GiveWell isn't doing this). But imagining the spirit of what you meant, I suspect I disagree.
I don't think you should collapse it into the single dimension of "how much do you use quantitative models in your decisions". It also matters how amenable the decisions are t... (read more)
Overall great post, and I broadly agree with the thesis. (I'm not sure the evidence you present is all that strong though, since it too is subject to a lot of selection bias.) One nitpick:
Most of the posts’ comments were critical, but they didn’t positively argue against EV calculations being bad for longtermism. Instead they completely disputed that EV calculations were used in longtermism at all!
I think you're (unintentionally) running a motte-and-bailey here.
Motte: Longtermists don't think you should build explicit quantitative models, take their best g... (read more)
In that example, Alice has ~5 min of time to give feedback to Bob; in Toby's case the senior researchers are (in aggregate) spending at least multiple hours providing feedback (where "Bob spent 15 min talking to Alice and seeing what she got excited about" counts as 15 min of feedback from Alice). That's the major difference.
I guess one way you could interpret Toby's advice is to simply get a project idea from a senior person, and then go work on it yourself without feedback from that senior person -- I would disagree with that particular advice. I think it's important to have iterative / continual feedback from senior people.
I agree substituting the question would be bad, and sometimes there aren't any relevant experts in which case you shouldn't defer to people. (Though even then I'd consider doing research in an unrelated area for a couple of years, and then coming back to work on the question of interest.)
I admit I don't really understand how people manage to have a "driving question" overwritten -- I can't really imagine that happening to me and I am confused about how it happens to other people.
(I think sometimes it is justified, e.g. you realize that your question was co... (read more)
so it's e.g. the mesa-optimizers paper or multiple LW posts by John Wentworth. As far as I can tell, none of these seems to be following the proposed 'formula for successful early-career research'.
I think the mesa optimizers paper fits the formula pretty well? My understanding is that the junior authors on that paper interacted a lot with researchers at MIRI (and elsewhere) while writing that paper.
I don't know John Wentworth's history. I think it's plausible that if I did, I wouldn't have thought of him as a junior researcher (even before seei... (read more)
My impression from talking to friends working in ML is that usually faculty have ideas that they'd be excited to see their senior grad students to work on, senior grad students have research ideas that they'd love for junior grad students to implement, and so forth.
I think this is true if the senior person can supervise the junior person doing the implementation (which is time-expensive). I have lots of project ideas that I expect I could supervise. I have ~no project ideas where I expect I could spend an hour talking to someone, have them go off for... (read more)
I'm considering three types of advice:
When you said
But to steelman(steel-alien?) his view a little, I worry that EA is overinvested in outside-view/forecasting types (like myself?), rather than people with strong and true convictions/extremely high-quality initial research taste, which (quality-weighted) may be making up the majority of revolutionary progress. And if we tell the future Geoffrey Hintons (and Eliezer Yudkowskys) of the
What % do you think this is true for, quality-weighted?
Weighted by quality after graduating? Still > 50%, probably > 80%, but it's really just a lot harder to tell (I don't have enough data). I'd guess that the best people still had "bad ideas" when they were starting out.
(I think a lot of what makes an junior researcher's idea "bad" is that the researcher doesn't know about existing work, or has misinterpreted the goal of the field, or lacks intuitions gained from hands-on experience, etc. It is really hard to compensate for a lack of knowledg... (read more)
Thanks for the link to your FAQ, I'm excited to read it further now!
Re: the rest of your comment, I think you're reading more into my comment than I said or meant. I do not think researchers should generally be deferential; I think they should have strong beliefs, that may in fact go against expert consensus. I just don't think this is the right attitude while you are junior
To be clear, I think Geoffrey Hinton's advice was targeted at very junior people. In context, the interview was conducted for Andrew Ng's online deep learning course, which for many peo... (read more)
I'm not going to go into much detail here, but I disagree with all of these caveats. I think this would be a worse post if it included the first and third caveats (less sure about the second).
First caveat: I think > 95% of incoming PhD students in AI at Berkeley have bad ideas (in the way this post uses the phrase). I predict that if you did a survey of people who have finished their PhD in AI at Berkeley, over 80% of them would think their initial ideas were significantly worse than their later ideas. (Note also that AI @ Berkeley is a very selective p... (read more)
Let's start with the third caveat: maybe the real crux is what we think are the best outputs; what I consider some of the best outputs by young researchers of AI alignment is easier to point at via examples - so it's e.g. the mesa-optimizers paper or multiple LW posts by John Wentworth. As far as I can tell, none of these seems to be following the proposed 'formula for successful early-career research'. My impression is PhD students in AI in Berkeley need to optimise, and actually optimise a lot for success in an established field (ML/AI),... (read more)
I think > 95% of incoming PhD students in AI at Berkeley have bad ideas (in the way this post uses the phrase).[...](Note also that AI @ Berkeley is a very selective program.)
What % do you think this is true for, quality-weighted? I remember an interview with Geoffrey Hinton where (paraphrased) Hinton was basically like "just trust your intuitions man. Either your intuitions are good or they're bad. If they are good you should mostly trust your intuitions regardless of what other people say, and if they're bad, well, you aren't going to be a good r... (read more)
I'm not objecting to providing the information (I think that is good), I'm objecting to calling it a "conflict of interest".
I'd be much more keen on something like this (source):
For transparency, note that the reports for the latter three rows are all Open Philanthropy analyses, and I am co-CEO of Open Philanthropy.
I sometimes see people arguing for people to work in area A, and declaring a conflict of interest that they are personally working on area A.
If they already were working in area A for unrelated reasons, and then they produced these arguments, it seems reasonable to be worried about motivated reasoning.
On the other hand, if because of these arguments they switched to working in area A, this is in some sense a signal of sincerity ("I'm putting my career where my mouth is").
I don't like the norm of declaring your career as a "conflict of interest", because it... (read more)
He asserts that "numerous people have come forward, both publicly and privately, over the past few years with stories of being intimidated, silenced, or 'canceled.'" This doesn't match my experience.
I also have not had this experience, though that doesn't mean it didn't happen, and I'd want to take this seriously if it did happen.
However, Phil Torres has demonstrated that he isn't above bending the truth in service of his goals, so I'm inclined not to believe him. See previous discussion here. Example from the new article:
It’s not difficult to see ho
Many thanks for this, Rohin. Indeed, your understanding is correct. Here is my own screenshot of my private announcement on this matter.
This is far from the first time that Phil Torres references my work in a way that is set up to give the misleading impression that I share his anti-longtermism view. He and I had extensive communication about this in 2020, but he showed no sympathy for my complaints.
This is my best attempt at summarizing a reasonable outsider's view of the current state of affairs. Before publication, I had this sanity checked (though not necessarily endorsed) by an EA researcher with more context. Apologies in advance if it misrepresents the actual state of affairs, but that's precisely the thing I'm trying to clarify for myself and others.
I just want to note that I think this question is great and does not misrepresent the actual state of affairs.
I do think there's hope for some quantitative estimates even in the speculative cases; ... (read more)
Unfortunately I don't really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.
The next Alignment Newsletter will include two articles on recommender systems that mostly disagree with the "recommender systems are driving polarization" position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn't neglecting arguments pointing in the opposite direction.)
EDIT: To be clear, I'd be excited for someone else to develop this... (read more)
The result is software that is extremely addictive, with a host of hard-to-measure side effects on users and society including harm to relationships, reduced cognitive capacity, and political radicalization.
As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I've seen specific arguments arguing against the claim.) One seems to be a paper studying what users bel... (read more)
Should we expect AI companies to reduce risk through self-governance? This post investigates six historical cases, of which the two most successful were the Asilomar conference on recombinant DNA, and the actions of Leo Szilard and other physicists in 1939 (around the development of the atomic bomb). It is hard to make any confident conclusions, but the author identifies the following five factors that make self-governance more likely:1. The risks are salient.2. If self-governance doesn’t happen, then the govern
Should we expect AI companies to reduce risk through self-governance? This post investigates six historical cases, of which the two most successful were the Asilomar conference on recombinant DNA, and the actions of Leo Szilard and other physicists in 1939 (around the development of the atomic bomb). It is hard to make any confident conclusions, but the author identifies the following five factors that make self-governance more likely:
1. The risks are salient.2. If self-governance doesn’t happen, then the govern
Nice find, thanks!
(For others: note that the linked blog post also considers things like "maybe they just uploaded the wrong data" to be a plausible explanation.)
(See response to rory_greig above)
you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you'd better go through Spinning Up in Deep RL before you can continue.
Tbc, I do generally like the idea of just in time learning. But:
I think too many people feel held back from doing a project like thing on their own.
Absolutely. Also, too many people don't feel held back enough (e.g. maybe it really would have been beneficial to, say, go through Spinning Up in Deep RL before attempting a deep RL project). How do you tell which group you're in?
(This comment inspired by Reversing Advice)
If we change the y-axis to display a linear relationship, this tells a different story. In fact, we see a plateauing of the relationship between income and experience wellbeing, just as found in Kahneman and Deaton (2010), but just at a later point — about $200,000 per year.
Uhh... that shouldn't happen from just re-plotting the same data. In fact, how is it that in the original graph, there is an increase from $400,000 to $620,000, but in the new linear axis graph, there is a decrease?
A doubling of income is associated with about a 1-point increase on a 0–
I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.
This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we w
This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:
1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation
2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we w
I just think there's a much greater chance that we look back on it and realize, too late, that we were focused on entirely the wrong things.
If you mean like 10x greater chance, I think that's plausible (though larger than I would say). If you mean 1000x greater chance, that doesn't seem defensible.
In both fields you basically ~can't experiment with the actual thing you care about (you can't just build a superintelligent AI and check whether it is aligned; you mostly can't run an intervention on the entire world and check whether world GDP went up). Y... (read more)
I've been perceiving a lot of EA/XR folks to be in (3) but maybe you're saying they're more in (2)?
Maybe it turns out that most folks in each community are between (1) and (2) toward the other. That is, we're just disagreeing on relative priority and neglectedness.
That's what I would say.
I can't see it as literally the only thing worth spending any marginal resources on (which is where some XR folks have landed).
If you have opportunity A where you get a benefit of 200 per $ invested, and opportunity B where you get a benefit of 50 per $ invested, you w... (read more)
I kinda sorta answered Q2 above (I don't really have anything to add to it).
Q3: I'm not too clear on this myself. I'm just an object-level AI alignment researcher :P
Q4: I broadly agree this is a problem, though I think this:
Before PS and EA/XR even resolve our debate, the car might be run off the road—either as an accident caused by fighting groups, or on purpose.
seems pretty unlikely to me, where I'm interpreting it as "civilization stops making any progress and regresses to the lower quality of life from the past, and this is a permanent effect".
I ... (read more)
If XR weighs so strongly (1e15 future lives!) that you are, in practice, willing to accept any cost (no matter how large) in order to reduce it by any expected amount (no matter how small), then you are at risk of a Pascal's Mugging.
Sure. I think most longtermists wouldn't endorse this (though a small minority probably would).
But when the proposal becomes: “we should not actually study progress or try to accelerate it”, I get lost.
I don't think this is negative, I think there are better opportunities to affect the future (along the lines of Ben's comment).... (read more)
OK, so maybe there are a few potential attitudes towards progress studies:
Flipping it around, PS folks could have a similar (1) positive / (2) neutral / (3) negative attitude towards XR efforts. My view is not settled, but right now I'm somewhere between (1) and (2)... (read more)
But EA/XR folks don't seem to be primarily advocating for specific safety measures. Instead, what I hear (or think I'm hearing) is a kind of generalized fear of progress. Again, that's where I get lost. I think that (1) progress is too obviously valuable and (2) our ability to actually predict and control future risks is too low.
I think there's a fear of progress in specific areas (e.g. AGI and certain kinds of bio) but not a general one? At least I'm in favor of progress generally and against progress in some specific areas where we have good object-level... (read more)
If you're willing to accept GCR in order to slightly reduce XR, then OK—but it feels to me that you've fallen for a Pascal's Mugging.Eliezer has specifically said that he doesn't accept Pascal's Mugging arguments in the x-risk context
If you're willing to accept GCR in order to slightly reduce XR, then OK—but it feels to me that you've fallen for a Pascal's Mugging.
Eliezer has specifically said that he doesn't accept Pascal's Mugging arguments in the x-risk context
I wouldn't agree that this is a Pascal's Mugging. In fact, in a comment on the post you quote, Eliezer says:
If an asteroid were genuinely en route, large enough to wipe out humanity, possibly stoppable, and nobody was doing anything about this 10% probability, I would still be working on FAI but I would be screaming pretty lou
Results are in this post.
A lot of longtermists do pay attention to this sort of stuff, they just tend not to post on the EA Forum / LessWrong. I personally heard about the report from many different people after it was published, and also from a couple of people even before it was published (when there was a chance to provide input on it).
In general I expect that for any sufficiently large object-level thing, the discourse on the EA Forum will lag pretty far behind the discourse of people actively working on that thing (whether that discourse is public or not). I read the EA... (read more)
Yeah, that makes sense.
It still seems to me like this is a sufficiently important and interesting report that it'd be better if there was a little more mention of it on the Forum, for the sake of "the general longtermist public", since (a) the Forum seems arguably the main, central hub for EA discourse in general, and (b) there is a bunch of other AI governance type stuff here, so having that without things like this report could give a distorted picture.
But it also doesn't seem like a horrible or shocking error has been committed. And it does make sense that these things would be first, and mostly, discussed in more specialised sub-communities and venues.
If AGI doom were likely, what additional evidence would we expect to see?
I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: "The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future."
If you're still making this claim now, want to bet on it? (We'd first have to operationalize who counts as an "AI safety researcher".)
I also think it wasn't true in Sep 2017, but I'm less confident about that, and it's not as easy to bet on.
(Am e-mailing with Rohin, will report back e.g. if we check this with a survey.)