CS student at the University of Southern California. Previously worked for three years as a data scientist at a fintech startup. Before that, four months on a work trial at AI Impacts. Currently working with Professor Lionel Levine and Cornell's LAISR on language model safety research.
This advice totally applies here: https://forum.effectivealtruism.org/posts/KFMMRyk6sTFReaWjs/you-don-t-have-to-respond-to-every-comment
Good luck with your projects, hope you’re feeling better soon.
This model performance is really impressive, and I'm glad you're interested in large language models. But I share some of Gavin's concerns, and I think it would be a great use of your time to write up a full theory of impact for this project. You could share it, get some feedback, and think about how to make this the most impactful while reducing risks of harm.
One popular argument for short-term risks from advanced AI are the risks from AI persuasion. Beth Barnes has a great writeup, as does Daniel Kokotajlo. The most succinct case I can make is that the internet is already full of bots, they spread all kinds of harmful misinformation, they reduce trust and increase divisiveness, and we shouldn't be playing around with more advanced bots without seriously considering the possible consequences.
I don't think anybody would make the argument that this project is literally an existential threat to humanity, but that shouldn't be the bar. Just as much as you need the technical skills of LLM training and the creativity and drive to pursue your ideas, you need to be able to faithfully and diligently evaluate the impact of your projects. I haven't thought about it nearly enough to say the final word on the project's impact, but before you keep publishing results, I would suggest spending some time to think and write about your impact.
I understood it as the combination of the 100x Multiplier discussed by Will MacAskill in Doing Good Better (referring to the idea that cash is 100x more valuable for somebody in extreme poverty than for someone in the global top 1%), and GiveWell's current bar for funding set at 8x GiveDirectly. This would mean that Open Philanthropy targets donation opportunities that are at least 800x (or more like 1000x on average) more impactful than giving that money to a rich person.
Here's two quotes you might disagree with. If true, they seem like they would make us slightly more skeptical of x-risk from AI, though not countering the entire argument.
Richards argues that lack of generality will make recursive self-improvement more difficult:
I’m less concerned about the singularity because if I have an AI system that’s really good at coding, I’m not convinced that it’s going to be good at other things. And so it’s not the case that if it produces a new AI system, that’s even better at coding, that that new system is now going to be better at other things. And that you get this runaway train of the singularity.
Instead, what I can imagine is that you have an AI that’s really good at writing code, it generates other AI that might be good at other things. And if it generates another AI that’s really good at code, that new one is just going to be that: an AI that’s good at writing code. And maybe we can… So to some extent, we can keep getting better and better and better at producing AI systems with the help of AI systems. But a runaway train of a singularity is not something that concerns me...
The problem with that argument is that the claim is that the smarter version of itself is going to be just smarter across the board. Right? And so that’s where I get off the train. I’m like, “No, no, no, no. It’s going to be better at say programming or better at protein folding or better at causal reasoning. That doesn’t mean it’s going to be better at everything.”
He also argues that lack of generality will also make deception more difficult:
One of the other key things for the singularity argument that I don’t buy, is that you would have an AI that then also knows how to avoid people’s potential control over it. Right? Because again, I think you’d have to create an AI that specializes in that. Or alternatively, if you’ve got the master AI that programs other AIs, it would somehow also have to have some knowledge of how to manipulate people and avoid their powers over it. Again, if it’s really good at programming, I don’t think it’s going to be able to be particularly good at manipulating people.
These arguments at least indicate that generality is a risk factor for AI x-risk. Forecasting whether superintelligent systems will be general or narrow seems more difficult but not impossible. Language models have already shown strong potential for both writing code and persuasion, which is a strike in favor of generality. Ditto for Gato's success across multiple domains (EDIT: Or is it? See below). More outside view arguments about the benefits or costs of using the one model for many different tasks seem mixed and don't sway my opinion much. Curious to hear other considerations.
Very glad to see this interview and the broader series. Engaging with more ML researchers seems like a good way to popularize AI safety and learn something in the process.
Also, low quality research or poor discussion can make it less likely that important decision makers will take AI safety seriously.
Nice! I really like this analysis, particularly the opportunity to see how many present-day lives would be saved in expectation. I mostly agree with it, but two small disagreements:
First, I’d say that there are already more than 100 people working directly on AI safety, making that an unreasonable lower bound for the number of people working on it over the next 20 years. This would include most of the staff of Anthropic, Redwood, MIRI, Cohere, and CHAI; many people at OpenAI, Deepmind, CSET, and FHI; and various individuals at Berkeley, NYU, Cornell, Harvard, MIT, and elsewhere. There’s also tons of funding and field-building going on right now which should increase future contributions. This is a perennial question that deserves a more detailed analysis than this comment, but here’s some sources that might be useful:
Ben Todd would guess it’s about 100 people, so maybe my estimate was wrong: https://twitter.com/ben_j_todd/status/1489985966714544134?s=21&t=Swy2p2vMZmUSi3HaGDFFAQ
Second, I strongly believe that most of the impact in AI safety will come from a handful of the most impactful individuals. Moreover I think it’s reasonable to make guesses about where you’ll fall in that distribution. For example, somebody with a history of published research who can get into a top PhD program has a much higher expected impact than somebody who doesn’t have strong career capital to leverage for AI safety. The question of whether you could become one of the most successful people in your field might be the most important component of personal fit and could plausibly dominate considerations of scale and neglectedness in an impact analysis.
For more analysis of the heavy-tailed nature of academic success, see: https://forum.effectivealtruism.org/posts/PFxmd5bf7nqGNLYCg/a-bird-s-eye-view-of-the-ml-field-pragmatic-ai-safety-2
But great post, thanks for sharing!
This (pop science) article provides two interesting critiques of the analogy between the human brain and neural nets.
I'm not sure the direct implication for timelines here. You might be able to argue that these disanalogies mean that neural nets will require less compute than the brain. But an interesting point of disanalogy, to correct any misconceptions that neural networks are "just like the brain".
Nearly impossible to answer. This report by OpenPhil gives it a hell of an effort, but could still be wrong by orders of magnitude. Most fundamentally, the amount of compute necessary for AGI might not be related to the amount of compute used by the human brain, because we don’t know how similar our algorithmic efficiency is compared to the brain’s.
Yes, that's how I understood it as well. If you spend the same amount on inference as you did on training, then you get a hell of a lot of inference.
I would expect he'd also argue that, because companies are willing to spend tons of money on training, we should also expect them to be willing to spend lots on inference.
This is a great set of guidelines for integrity. Hopefully more grantmakers and other key individuals will take this point of view.
I’d still be interested in hearing how the existing level of COIs affects your judgement of EA epistemics. I think your motivated reasoning critique of EA is the strongest argument that current EA priorities do not accurately represent the most impactful causes available. I still think EA is the best bet available for maximizing my expected impact, but I have baseline uncertainty that many EA beliefs might be incorrect because they’re the result of imperfect processes with plenty of biases and failure modes. It’s a very hard topic to discuss, but I think it’s worth exploring (a) how to limit our epistemic risks and (b) how to discount our reasoning in light of those risks.