Strong upvoted but I figured I should comment as well. I agree with Ryan that the effect on chip supply and AI timelines is one of the most important dynamics, perhaps the most important. It's a bit unclear which direction it points, but I think it probably swamps everything else in its magnitude, and I was sad to see that this post doesn't discuss it.
I don't have the time right now to find exactly which comparison I am thinking of, but I believe my thought process was basically "the rate of new people getting AI PhDs is relatively slow"; this is of course only one measure for the number of researchers. Maybe I used data similar to that here: https://www.lesswrong.com/s/FaEBwhhe3otzYKGQt/p/AtfQFj8umeyBBkkxa
Alternatively, AI academics might be becoming more sociable – i.e. citing their friends' papers more, and collaborating more on papers. I don’t find either of the explanations particularly convincing.
FWIW, I find this somewhat convincing. I think the collaborating on papers part seems like it could be downstream of the expectations of # of paper produced being higher. My sense is that grad students are expected to write more papers now than they used to. One way to accomplish this is to collaborate more.
I expect if you compared data on the tota...
Language models have been growing more capable even faster. But with them there is something very special about the human range of abilities, because that is the level of all the text they are trained on.
This sounds like a hypothesis that makes predictions we can go check. Did you have any particular evidence in mind? This and this come to mind, but there is plenty of other relevant stuff, and many experiments that could be quickly done for specific domains/settings.
Note that you say "something very special" whereas my comment is actually about ...
My understanding of your main claim: If AGI is not a magic problem-solving oracle and is instead limited by needing to be unhobbled and integrated with complex infrastructure, it will be relatively safe for model weights to be available to foreign adversaries. Or at least key national security decision makers will believe that's the case.
Please correct me if I'm wrong. My thoughts on the above:
Where is this relative safety coming from? Is it from expecting that adversaries aren't going to be able to figure out how to do unhobbling or steal the necess...
Are you all interested in making content or doing altruism-focused work about AI or AI Safety?
I'll toss out that a lot of folks in the Effective Altruism-adjacent sphere are involved in efforts to make future AI systems safe and beneficial for humanity. If you all are interested in producing content or making a difference around artificial intelligence or AI Safety, there are plenty of people who would be happy to help you e.g., better understand the key ideas, how to convey them, understand funding gaps in the ecosystem, etc. I, for one, would be ha...
Poor people are generally bad at managing their own affairs and need external guidance
That seems like a particularly cynical way of describing this argument. Another description might be: Individuals are on average fine at identifying ways to improve their lives, and if you think life improvements are heavy tailed, this implies that individual will perform much less well than experts who aim to find the positive tail interventions.
Here's a similar situation: A high school student is given 2 hours with no distractions and told they should study for a ...
Thanks for writing this. I agree that this makes me nervous. Various thoughts:
I think I’ve slowly come to believe something like, ‘sufficiently smart people can convince themselves that arbitrary morally bad things are actually good’. See e.g., as the gymnastic meme, but also there’s something deeper of like ‘many of the evil people throughout history have believed that what they’re doing is good actually’. I think the response to this should be deep humility and moral risk aversion. Having a big brain argument that sounds good to you about why what you’re...
The paper that introduces the test is probably what you're looking for. Based on a skim, it seems to me that it spends a lot of words laying out the conceptual background that would make this test valuable. Obviously it's heavily selected for making the overall argument that the test is good.
Elaborating on point 1 and the "misinformation is only a small part of why the system is broken" idea:
The current system could be broken in many ways but at some equilibrium of sorts. Upsetting this equilibrium could have substantial effects because, for instance, people's built immune response to current misinformation is not as well trained as their built immune response to traditionally biased media.
Additionally, intervening on misinformation could be far more tractable than other methods of improving things. I don't have a solid grasp of wh...
I appreciate you writing this, it seems like a good and important post. I'm not sure how compelling I find it, however. Some scattered thoughts:
Thanks for this. A few points:
Due to current outsourcing being of data labeling, I think one of the issues you express in the post is very unlikely:
My general worry is that in future, the global south shall become the training ground for more harmful AI projects that would be prohibited within the Global North. Is this something that I and other people should be concerned about?
Maybe there's an argument about how:
This line of argument suggests that slow takeoff is inherently harder to steer. Because pretty much any version of slow takeoff means that the world will change a ton before we get strongly superhuman AI.
I'm not sure I agree that the argument suggests that. I'm also not sure slow takeoff is harder to steer than other forms of takeoff — they all seem hard to steer. I think I messed up the phrasing because I wasn't thinking about it the right way. Here's another shot:
Widespread AI deployment is pretty wild. If timelines are short, we might get attempts at AI...
I think these don’t bite nearly as hard for conditional pauses, since they occur in the future when progress will be slower
Your footnote is about compute scaling, so presumably you think that's a major factor for AI progress, and why future progress will be slower. The main consideration pointing the other direction (imo) is automated researchers speeding things up a lot. I guess you think we don't get huge speedups here until after the conditional pause triggers are hit (in terms of when various capabilities emerge)? If we do have the capabilities for automated researchers, and a pause locks these up, that's still pretty massive (capability) overhang territory.
While I’m very uncertain, on balance I think it provides more serial time to do alignment research. As model capabilities improve and we get more legible evidence of AI risk, the will to pause should increase, and so the expected length of a pause should also increase [footnote explaining that the mechanism here is that the dangers of GPT-5 galvanize more support than GPT-4]
I appreciate flagging the uncertainty; this argument doesn't seem right to me.
One factor affecting the length of a pause would be the (opportunity cost from pause) / (risk of cata...
Sorry, I agree my previous comment was a bit intense. I think I wouldn't get triggered if you instead asked "I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it's very likely (>50%), what do you think?"
P(doom) is not why I disagree with you. It feels a little like if I'm arguing with an environmentalist about recycling and they go "wow do you even care about the environment?" Sure, that could be a crux, but in this case it isn't and the question is asked in a way that is trying to force me to ag...
I don't think you read my comment:
I don't think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
I also think it's bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn't feel like good faith so much as bullying people into agreeing with you. But that's just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my "all things considered" view is more like 40% doom.
Yep, seems reasonable, I don't really have any clue here. One consideration is that this AI is probably way better than all the human scientists and can design particularly high-value experiments, also biological simulations will likely be much better in the future. Maybe the bio-security community gets a bunch of useful stuff done by then which makes the AI's job even harder.
there will be governance mechanisms put in place after a failure
Yep, seems reasonably likely, and we sure don't know how to do this now.
I'm not sure where I'm assuming we can't pause dangerous AI "development long enough to build aligned AI that would be more capable of ensuring safety"? This is a large part of what I mean with the underlying end-game plan in this post (which I didn't state super explicitly, sorry), e.g. the centralization point
centralization is good because it gives this project more time for safety work and securing the world
I'm curious why you don't include intellectually aggressive culture in the summary? It seems like this was a notable part of a few of the case studies. Did the others just not mention this, or is there information indicating they didn't have this culture? I'm curious how widespread this feature is. e.g.,
The intellectual atmosphere seems to have been fairly aggressive. For instance, it was common (and accepted) that some researchers would shout “bullshit” and lecture the speaker on why they were wrong.
we need capabilities to increase so that we can stay up to date with alignment research
I think one of the better write-ups about this perspective is Anthropic's Core Views on AI Safety.
From its main text, under the heading The Role of Frontier Models in Empirical Safety, a couple relevant arguments are:
Thanks Aaron that's a good article appreciate it. It still wasn't clear to me they were making an argument that increasing capabilities could be net positive, more that safety people should be working with whatever is the current most powerful model
"But we also cannot let excessive caution make it so that the most safety-conscious research efforts only ever engage with systems that are far behind the frontier."
This makes sense to me, the best safety researchers should have full access to the current most advanced models, preferably in my eyes before ...
Not responding to your main question:
Second in a theoretical situation where capabilities research globally stopped overnight, isn't this just free-extra-time for the human race where we aren't moving towards doom? That feels pretty valuable and high EV in and of itself.
I'm interpreting this as saying that buying humanity more time, in and of itself, is good.
I don't think extra time pre-transformative-AI is particularly valuable except its impact on existential risk. Two reasons for why I think this:
I'm glad you wrote this post. Mostly before reading this post, I wrote a draft for what I want my personal conflict of interest policy to be, especially with regard to personal and professional relationships. Changing community norms can be hard, but changing my norms might be as easy as leaving a persuasive comment! I'm open to feedback and suggestions here for anybody interested.
I think Ryan is probably overall right that it would be better to fund people for longer at a time. One counter-consideration that hasn't been mentioned yet: longer contracts implicitly and explicitly push people to keep doing something — that may be sub-optimal — because they drive up switching costs.
If you have to apply for funding once a year no matter what you're working on, the "switching costs" of doing the same thing you've been doing are similar to the cost of switching (of course they aren't in general, but with regard to funding they might ...
How is the super-alignment team going to interface with the rest of the AI alignment community, and specifically what kind of work from others would be helpful to them (e.g., evaluations they would want to exist in 2 years, specific problems in interpretability that seem important to solve early, curricula for AIs to learn about the alignment problem while avoiding content we may not want them reading)?
To provide more context on my thinking that leads to this question: I'm pretty worried that OpenAI is making themselves a single point of failure in e...
I am not aware of modeling here, but I have thought about this a bit. Besides what you mention, some other ways I think this story may not pan out (very speculative):
A few weeks ago I did a quick calculation for the amount of digital suffering I expect in the short term, which probably gets at your question about these sizes, for the short term. tldr of my thinking on the topic:
Thanks for your response. I'll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
...Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors
The article doesn't seem to have a comment section so I'm putting some thoughts here.
Hey Aaron, thanks for your thorough comment. While we still disagree (explained a bit below), I'm also quite glad to read your comment :)
Re scaling current methods: The hundreds of billions figure we quoted does require more context not in our piece; SemiAnalysis explains in a bit more detail how they get to that number (eg assuming training in 3mo instead of 2 years). We don't want to haggle over the exact scale before it becomes infeasible, though---even if we get another 2 OOM in, we wanted to emphasize with our argument that 'the current method route' ...
I'm not Buck, but I can venture some thoughts as somebody who thinks it's reasonably likely we don't have much time.
Given that "I'm skeptical that humans will go extinct in the near future" and that you prioritize preventing suffering over creating happiness, it seems reasonable for you to condition your plan on humanity surviving the creation of AGI. You might then back-chain from possible futures you want to steer toward or away from. For instance, if AGI enables space colonization, it sure would be terrible if we just had planets covered in factor...
I agree that persuasion frames are often a bad way to think about community building.
I also agree that community members should feel valuable, much in the way that I want everybody in the world to feel valued/loved.
I probably disagree about the implications, as they are affected by some other factors. One intuition that helps me is to think about the donors who donate toward community building efforts. I expect that these donors are mostly people who care about preventing kids from dying of malaria, and many donors also donate lots of money towards chariti...
Sorry about the name mistake. Thanks for the reply. I'm somewhat pessimistic about us two making progress on our disagreements here because it seems to me like we're very confused about basic concepts related to what we're talking about. But I will think about this and maybe give a more thorough answer later.
Edit: corrected name, some typos and word clarity fixed
Overall I found this post hard to read and I spent far too long trying to understand it. I suspect the author is about as confused about key concepts as I am. David, thanks for writing this, I am glad to see writing on this topic and I think some of your points are gesturing in a useful and important direction. Below are some tentative thoughts about the arguments. For each core argument I first try to summarize your claim and then respond, hopefully this makes it clearer where we actually disagree vs....
FWIW I often vote on posts at the top without scrolling because I listened to the post via the Nonlinear podcast library or read it on a platform that wasn't logged in. Not all that important of a consideration, but worth being aware of.
Here are my notes which might not be easier to understand, but they are shorter and capture the key ideas:
This evidence doesn't update me very much.
I would prefer an EA Forum without your critical writing on it, because I think your critical writing has similar problems to this post...
I interpret this quote to be saying, "this style of criticism — which seems to lack a ToC and especially fails to engage with the cruxes its critics have, which feels much closer to shouting into the void than making progress on existing disagreements — is bad for the forum discourse by my lights. And it's fine for me to dissuade people from writing content which hurts disc...
I expect a project like this is not worth the cost. I imagine doing this well would require dozens of hours of interviews with people who are more senior in the EA movement, and I think many of those people’s time is often quite valuable.
Regarding the pros you mention:
I’m not convinced that building more EA ethos/identity based around shared history is a good thing. I expect this would make it even harder to pivot to new things or treat EA as a question, it also wouldn’t be unifying for many folks (e.g. who having been thinking about AI safety for a dec
The short answer to your question is "yes, if major changes happen in the world fairly quickly, then career advice which does not take such changes into account will be flawed"
I would also point to the example of the advice "most of the impact in your career is likely to come later on in your life, like age 36+" (paraphrase from here and other places). I happen to believe there's a decent chance we have TAI/AGI by the time I'm 36 (maybe I'm >50% on this), which would make the advice less likely to be true.
Other things to consider: if timelines are...
I like this comment and think it answers the question at the right level of analysis.
To try and summarize it back: EA’s big assumption is that you should purchase utilons, rather than fuzzies, with charity. This is very different from how many people think about the world and their relationship to charity. To claim that somebody’s way of “doing good” is not as good as they think is often interpreted by them as an attack on their character and identity, thus met with emotional defensiveness and counterattack.
EA ideas aim to change how people act and think (and for some core parts of their identity); such pressure is by default met with resistance.
There is some non-prose discussion of arguments around AI safety. Might be worth checking out: https://www.lesswrong.com/posts/brFGvPqo8sKpb9mZf/the-basics-of-agi-policy-flowchart Some of the stuff linked here: https://www.lesswrong.com/posts/4az2cFrJp3ya4y6Wx/resources-for-ai-alignment-cartography Including: https://www.lesswrong.com/posts/mJ5oNYnkYrd4sD5uE/clarifying-some-key-hypotheses-in-ai-alignment
My phrasing below is more blunt and rude than I endorse, sorry. I’m writing quickly on my phone. I strong downvoted this post after reading the first 25% of it. Here are some reasons:
“Bayesianism purports that if we find enough confirming evidence we can at some point believe to have found “truth”.” Seems like a mischaracterization, given that sufficient new evidence should be able to change a Bayesian’s mind (tho I don’t know much about the topic).
“We cannot guess what knowledge people will create into the future” This is literally false, we can guess at ...
I liked this post and would like to see more of people thinking for themselves about cause prioritization and doing BOTECs.
Some scattered thoughts below, also in the spirit of draft amnesty.
I had a little trouble understanding your calculations/logic, so I'm going to write them out in sentence form: GiveWell's current giving recommendations correspond to spending about $0.50 to save an additional person an additional year of life. A 10% chance of extinction from misaligned AI means that postponing misaligned AI by a year gets us 10%*current populatio...
I am a bit confused by the key question / claim. It seems to be some variant of "Powerful AI may allow the development of technology which could be used to destroy the world. While the AI Alignment problem is about getting advanced AIs to do what their human operator wants, this could still lead to an existential catastrophe if we live in such a vulnerable world where unilateral actors can deploy destructive technology. Thus actual safety looks like not just having Aligned AGI, but also ensuring that the world doesn't get destroyed by bad or careless or un...
Personally I didn't put much weight on this sentence because the more-important-to-me evidence is many EAs being on the political left (which feels sufficient for the claim that EA is not a generally conservative set of ideas, as is sometimes claimed). See the 2019 EA Survey in which 72% of respondents identified as Left or Center Left.
“There are also strong selection effects on retreat attendees vs. intro fellows”
I wonder what these selection effects are. I imagine you get a higher proportion of people who think they are very excited about EA. But also, many of the wicked smart, high achieving people I know are quite busy and don’t think they have time for a retreat like this, so I wonder if you’re somewhat selecting against these people?
Similarly, people who are very thoughtful about opportunity costs and how they spend their time might feel like a commitment like this is too big given that they don’t know much about EA yet and don’t know how much they agree/want to be involved.
Seems plausibly fine to me. If you think about a fellowship as a form of "career transition funding + mentorship", it makes sense that this will take ~3 months (one fellowship) for some people, ~6 months (two fellowships) for others, and some either won't transition at all or will transition later.