I appreciate you writing this, it seems like a good and important post. I'm not sure how compelling I find it, however. Some scattered thoughts:
Due to current outsourcing being of data labeling, I think one of the issues you express in the post is very unlikely:
My general worry is that in future, the global south shall become the training ground for more harmful AI projects that would be prohibited within the Global North. Is this something that I and other people should be concerned about?
Maybe there's an argument about how:
This line of argument suggests that slow takeoff is inherently harder to steer. Because pretty much any version of slow takeoff means that the world will change a ton before we get strongly superhuman AI.
I'm not sure I agree that the argument suggests that. I'm also not sure slow takeoff is harder to steer than other forms of takeoff — they all seem hard to steer. I think I messed up the phrasing because I wasn't thinking about it the right way. Here's another shot:
Widespread AI deployment is pretty wild. If timelines are short, we might get attempts at AI...
I think these don’t bite nearly as hard for conditional pauses, since they occur in the future when progress will be slower
Your footnote is about compute scaling, so presumably you think that's a major factor for AI progress, and why future progress will be slower. The main consideration pointing the other direction (imo) is automated researchers speeding things up a lot. I guess you think we don't get huge speedups here until after the conditional pause triggers are hit (in terms of when various capabilities emerge)? If we do have the capabilities for automated researchers, and a pause locks these up, that's still pretty massive (capability) overhang territory.
While I’m very uncertain, on balance I think it provides more serial time to do alignment research. As model capabilities improve and we get more legible evidence of AI risk, the will to pause should increase, and so the expected length of a pause should also increase [footnote explaining that the mechanism here is that the dangers of GPT-5 galvanize more support than GPT-4]
I appreciate flagging the uncertainty; this argument doesn't seem right to me.
One factor affecting the length of a pause would be the (opportunity cost from pause) / (risk of cata...
Sorry, I agree my previous comment was a bit intense. I think I wouldn't get triggered if you instead asked "I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it's very likely (>50%), what do you think?"
P(doom) is not why I disagree with you. It feels a little like if I'm arguing with an environmentalist about recycling and they go "wow do you even care about the environment?" Sure, that could be a crux, but in this case it isn't and the question is asked in a way that is trying to force me to ag...
I don't think you read my comment:
I don't think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
I also think it's bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn't feel like good faith so much as bullying people into agreeing with you. But that's just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my "all things considered" view is more like 40% doom.
Yep, seems reasonable, I don't really have any clue here. One consideration is that this AI is probably way better than all the human scientists and can design particularly high-value experiments, also biological simulations will likely be much better in the future. Maybe the bio-security community gets a bunch of useful stuff done by then which makes the AI's job even harder.
there will be governance mechanisms put in place after a failure
Yep, seems reasonably likely, and we sure don't know how to do this now.
I'm not sure where I'm assuming we can't pause dangerous AI "development long enough to build aligned AI that would be more capable of ensuring safety"? This is a large part of what I mean with the underlying end-game plan in this post (which I didn't state super explicitly, sorry), e.g. the centralization point
centralization is good because it gives this project more time for safety work and securing the world
I'm curious why you don't include intellectually aggressive culture in the summary? It seems like this was a notable part of a few of the case studies. Did the others just not mention this, or is there information indicating they didn't have this culture? I'm curious how widespread this feature is. e.g.,
The intellectual atmosphere seems to have been fairly aggressive. For instance, it was common (and accepted) that some researchers would shout “bullshit” and lecture the speaker on why they were wrong.
we need capabilities to increase so that we can stay up to date with alignment research
I think one of the better write-ups about this perspective is Anthropic's Core Views on AI Safety.
From its main text, under the heading The Role of Frontier Models in Empirical Safety, a couple relevant arguments are:
Thanks Aaron that's a good article appreciate it. It still wasn't clear to me they were making an argument that increasing capabilities could be net positive, more that safety people should be working with whatever is the current most powerful model
"But we also cannot let excessive caution make it so that the most safety-conscious research efforts only ever engage with systems that are far behind the frontier."
This makes sense to me, the best safety researchers should have full access to the current most advanced models, preferably in my eyes before ...
Not responding to your main question:
Second in a theoretical situation where capabilities research globally stopped overnight, isn't this just free-extra-time for the human race where we aren't moving towards doom? That feels pretty valuable and high EV in and of itself.
I'm interpreting this as saying that buying humanity more time, in and of itself, is good.
I don't think extra time pre-transformative-AI is particularly valuable except its impact on existential risk. Two reasons for why I think this:
I'm glad you wrote this post. Mostly before reading this post, I wrote a draft for what I want my personal conflict of interest policy to be, especially with regard to personal and professional relationships. Changing community norms can be hard, but changing my norms might be as easy as leaving a persuasive comment! I'm open to feedback and suggestions here for anybody interested.
I think Ryan is probably overall right that it would be better to fund people for longer at a time. One counter-consideration that hasn't been mentioned yet: longer contracts implicitly and explicitly push people to keep doing something — that may be sub-optimal — because they drive up switching costs.
If you have to apply for funding once a year no matter what you're working on, the "switching costs" of doing the same thing you've been doing are similar to the cost of switching (of course they aren't in general, but with regard to funding they might ...
How is the super-alignment team going to interface with the rest of the AI alignment community, and specifically what kind of work from others would be helpful to them (e.g., evaluations they would want to exist in 2 years, specific problems in interpretability that seem important to solve early, curricula for AIs to learn about the alignment problem while avoiding content we may not want them reading)?
To provide more context on my thinking that leads to this question: I'm pretty worried that OpenAI is making themselves a single point of failure in e...
I am not aware of modeling here, but I have thought about this a bit. Besides what you mention, some other ways I think this story may not pan out (very speculative):
A few weeks ago I did a quick calculation for the amount of digital suffering I expect in the short term, which probably gets at your question about these sizes, for the short term. tldr of my thinking on the topic:
Thanks for your response. I'll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
...Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors
The article doesn't seem to have a comment section so I'm putting some thoughts here.
Hey Aaron, thanks for your thorough comment. While we still disagree (explained a bit below), I'm also quite glad to read your comment :)
Re scaling current methods: The hundreds of billions figure we quoted does require more context not in our piece; SemiAnalysis explains in a bit more detail how they get to that number (eg assuming training in 3mo instead of 2 years). We don't want to haggle over the exact scale before it becomes infeasible, though---even if we get another 2 OOM in, we wanted to emphasize with our argument that 'the current method route' ...
I'm not Buck, but I can venture some thoughts as somebody who thinks it's reasonably likely we don't have much time.
Given that "I'm skeptical that humans will go extinct in the near future" and that you prioritize preventing suffering over creating happiness, it seems reasonable for you to condition your plan on humanity surviving the creation of AGI. You might then back-chain from possible futures you want to steer toward or away from. For instance, if AGI enables space colonization, it sure would be terrible if we just had planets covered in factor...
I agree that persuasion frames are often a bad way to think about community building.
I also agree that community members should feel valuable, much in the way that I want everybody in the world to feel valued/loved.
I probably disagree about the implications, as they are affected by some other factors. One intuition that helps me is to think about the donors who donate toward community building efforts. I expect that these donors are mostly people who care about preventing kids from dying of malaria, and many donors also donate lots of money towards chariti...
Sorry about the name mistake. Thanks for the reply. I'm somewhat pessimistic about us two making progress on our disagreements here because it seems to me like we're very confused about basic concepts related to what we're talking about. But I will think about this and maybe give a more thorough answer later.
Edit: corrected name, some typos and word clarity fixed
Overall I found this post hard to read and I spent far too long trying to understand it. I suspect the author is about as confused about key concepts as I am. David, thanks for writing this, I am glad to see writing on this topic and I think some of your points are gesturing in a useful and important direction. Below are some tentative thoughts about the arguments. For each core argument I first try to summarize your claim and then respond, hopefully this makes it clearer where we actually disagree vs....
FWIW I often vote on posts at the top without scrolling because I listened to the post via the Nonlinear podcast library or read it on a platform that wasn't logged in. Not all that important of a consideration, but worth being aware of.
Here are my notes which might not be easier to understand, but they are shorter and capture the key ideas:
This evidence doesn't update me very much.
I would prefer an EA Forum without your critical writing on it, because I think your critical writing has similar problems to this post...
I interpret this quote to be saying, "this style of criticism — which seems to lack a ToC and especially fails to engage with the cruxes its critics have, which feels much closer to shouting into the void than making progress on existing disagreements — is bad for the forum discourse by my lights. And it's fine for me to dissuade people from writing content which hurts disc...
I expect a project like this is not worth the cost. I imagine doing this well would require dozens of hours of interviews with people who are more senior in the EA movement, and I think many of those people’s time is often quite valuable.
Regarding the pros you mention:
I’m not convinced that building more EA ethos/identity based around shared history is a good thing. I expect this would make it even harder to pivot to new things or treat EA as a question, it also wouldn’t be unifying for many folks (e.g. who having been thinking about AI safety for a dec
The short answer to your question is "yes, if major changes happen in the world fairly quickly, then career advice which does not take such changes into account will be flawed"
I would also point to the example of the advice "most of the impact in your career is likely to come later on in your life, like age 36+" (paraphrase from here and other places). I happen to believe there's a decent chance we have TAI/AGI by the time I'm 36 (maybe I'm >50% on this), which would make the advice less likely to be true.
Other things to consider: if timelines are...
I like this comment and think it answers the question at the right level of analysis.
To try and summarize it back: EA’s big assumption is that you should purchase utilons, rather than fuzzies, with charity. This is very different from how many people think about the world and their relationship to charity. To claim that somebody’s way of “doing good” is not as good as they think is often interpreted by them as an attack on their character and identity, thus met with emotional defensiveness and counterattack.
EA ideas aim to change how people act and think (and for some core parts of their identity); such pressure is by default met with resistance.
There is some non-prose discussion of arguments around AI safety. Might be worth checking out: https://www.lesswrong.com/posts/brFGvPqo8sKpb9mZf/the-basics-of-agi-policy-flowchart Some of the stuff linked here: https://www.lesswrong.com/posts/4az2cFrJp3ya4y6Wx/resources-for-ai-alignment-cartography Including: https://www.lesswrong.com/posts/mJ5oNYnkYrd4sD5uE/clarifying-some-key-hypotheses-in-ai-alignment
I have only skimmed this, but it seems quite good and I want more things like it on the forum. Positive feedback!
My phrasing below is more blunt and rude than I endorse, sorry. I’m writing quickly on my phone. I strong downvoted this post after reading the first 25% of it. Here are some reasons:
“Bayesianism purports that if we find enough confirming evidence we can at some point believe to have found “truth”.” Seems like a mischaracterization, given that sufficient new evidence should be able to change a Bayesian’s mind (tho I don’t know much about the topic).
“We cannot guess what knowledge people will create into the future” This is literally false, we can guess at ...
I liked this post and would like to see more of people thinking for themselves about cause prioritization and doing BOTECs.
Some scattered thoughts below, also in the spirit of draft amnesty.
I had a little trouble understanding your calculations/logic, so I'm going to write them out in sentence form: GiveWell's current giving recommendations correspond to spending about $0.50 to save an additional person an additional year of life. A 10% chance of extinction from misaligned AI means that postponing misaligned AI by a year gets us 10%*current populatio...
I am a bit confused by the key question / claim. It seems to be some variant of "Powerful AI may allow the development of technology which could be used to destroy the world. While the AI Alignment problem is about getting advanced AIs to do what their human operator wants, this could still lead to an existential catastrophe if we live in such a vulnerable world where unilateral actors can deploy destructive technology. Thus actual safety looks like not just having Aligned AGI, but also ensuring that the world doesn't get destroyed by bad or careless or un...
Personally I didn't put much weight on this sentence because the more-important-to-me evidence is many EAs being on the political left (which feels sufficient for the claim that EA is not a generally conservative set of ideas, as is sometimes claimed). See the 2019 EA Survey in which 72% of respondents identified as Left or Center Left.
“There are also strong selection effects on retreat attendees vs. intro fellows”
I wonder what these selection effects are. I imagine you get a higher proportion of people who think they are very excited about EA. But also, many of the wicked smart, high achieving people I know are quite busy and don’t think they have time for a retreat like this, so I wonder if you’re somewhat selecting against these people?
Similarly, people who are very thoughtful about opportunity costs and how they spend their time might feel like a commitment like this is too big given that they don’t know much about EA yet and don’t know how much they agree/want to be involved.
Thanks for making this. I expect that after you make edits based on comments and such this will be the most up to date and accurate public look at this question (the current size of the field). I look forward to linking people to it!
I disagree with a couple specific points as well as the overall thrust of this post. Thank you for writing it!
A maximizing viewpoint can say that we need to be cautious lest we do something wonderful but not maximally so. But in practice, embracing a pragmatic viewpoint, saving money while searching for the maximum seems bad.
I think I strongly disagree with this because opportunities for impact appear heavy tailed. Funding 2 interventions that are in the 90th percentile is likely less good than funding 1 intervention in the 99th percentile. Given this stat...
You write:
Another possible reason to argue for a zero-discount rate is that the intrinsic value of humanity increases at a rate greater than the long-run catastrophe rate[19]. This is wrong for (at least) 2 reasons.
Your footnote is to The Precipice: To quote from The Precipice Appendix E:
...by many measures the value of humanity has increased substantially over the centuries. This progress has been very uneven over short periods, but remarkably robust over the long run. We live long lives filled with cultural and material riches that would have seemed l
Welcome to the forum! I am glad that you posted this! And also I disagree with much of it. Carl Shulman already responded explaining why he things the extinction rate approaches zero fairly soon, reasoning I agree with.
...Under a stable future population, where people produce (on average) only enough offspring to replace themselves, a person’s expected number of descendants is equal to the expected length of human existence, divided by the average lifespan (). I estimate this figure is 93[22].
To be consistent, when comparing lives saved in pr
I read this post around the beginning of March this year (~6 months ago). I think reading this post was probably net-negative for my life plans. Here are some thoughts about why I think reading this post was bad for me, or at least not very good. I have not re-read the post since then, so maybe some of my ideas are dumb for obvious reasons.
I think the broad emphasis on general skill and capacity building often comes at the expense of directly pursuing your goals. In many ways, the post is like “Skill up in an aptitude because in the future this might...
This is great and I’m glad you wrote it. For what it’s worth, the evidence from global health does not appear to me strong enough to justify high credence (>90%) in the claim “some ways of doing good are much better than others” (maybe operationalized as "the top 1% of charities are >50x more cost-effective than the median", but I made up these numbers).
The DCP2 (2006) data (cited by Ord, 2013) gives the distribution of the cost-effectiveness of global health interventions. This is not the distribution of the cost-effectiveness of possible dona...
The edit is key here. I would consider running an AI-safety arguments competition in order to do better outreach to graduate-and-above level researchers to be a form of movement building and one for which crunch time could be in the last 5 years before AGI (although probably earlier is better for norm changes).
One value add from compiling good arguments is that if there is a period of panic following advanced capabilities (some form of firealarm), then it will be really helpful to have existing and high quality arguments and resources on hand to help...
I’m a bit confused by this post. I’m going to summarize the main idea back, and I would appreciate it if you could correct me where I’m misinterpreting.
Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default. In reality, the probability of existential risk from particular causes is generally less than 10% [this feels like an implicit assumption], so finding more information about the risks causes us to decrease our worry about those risks. We can get more information a...
A solution that doesn’t actually work but might be slightly useful: Slow the lemons by making EA-related Funding things less appealing than the alternative.
One specific way to do this is to pay less than industry pays for similar positions: altruistic pay cut. Lightcone, the org Habryka runs, does this: “Our current salary policy is to pay rates competitive with industry salary minus 30%.” At a full-time employment level, this seems like one way to dissuade people who are interested in money, at least assuming they are qualified and hard working enough to ...
Good question. Short answer: despite being an April Fools post, that post seems to encapsulate much of what Yudkowski actually believes – so the social context is that the post is joking in its tone and content but not so much the attitude of the author; sorry I can't link to anything to further substantiate this. I believe Yudkowski's general policy is to not put numbers on his estimates.
Better answer: Here is a somewhat up-to-date database about predictions about existential risk chances from some folks in the community. You'll notice these are far below...
#17 in the spreadsheet is "How much do charities differ in impact?"
I would love to see an actual distribution of charity cost-effectiveness. As far as I know, that doesn't exist. Most folks rely on Ord (2013) which is the distribution of health interventions, but it says nothing about where charities actually do work.
Elaborating on point 1 and the "misinformation is only a small part of why the system is broken" idea:
The current system could be broken in many ways but at some equilibrium of sorts. Upsetting this equilibrium could have substantial effects because, for instance, people's built immune response to current misinformation is not as well trained as their built immune response to traditionally biased media.
Additionally, intervening on misinformation could be far more tractable than other methods of improving things. I don't have a solid grasp of wh... (read more)