One potential idea I've had—which I'll admit isn't strictly related to AI or biosecurity but does seem like it could heavily scale with larger numbers of researchers—is to have interns flesh out a visualized reasoning model (perhaps similar to an "epistemic map") regarding existential risk/recovery scenarios, inspired by the work that Luisa Rodriguez did regarding this topic (see her post here on the forum and her appearance on the 80K podcast).
Such a project could basically lay out different scenarios with their input variables (e.g., "Only Q number of people survive and they are distributed in L locations", "ash/dust particles cover X% of the world for Y years and reduce light/photosynthesis by Z%"), diagram how different variables or world states may importantly interact with each other, conduct and input research on details like "what is the population size-viability likelihood curve (given concerns of genetic diversity in conditions which may feature higher rates of infant/maternal mortality)", theorize on various dynamics (e.g., how serious will security dilemmas be for groups with pre-apocalyptic firearms but without pre-apocalyptic law enforcement and military structures), etc. In addition to creating such maps (which ideally could be published and subsequently explored at a user's direction), the interns could produce reports with noteworthy "extinction vignettes" or "extinction variable-combinations" (e.g., what are some key conditions that seem likely to cause slow extinction/non-recovery in scenarios that do not feature near-immediate widespread extinction) as well as key uncertainties (e.g., climate models).
I'm curious whether other people find this idea potentially interesting or valuable?
(I briefly searched for more on this a few months ago but didn't see anything similar, and Luisa Rodriguez didn't mention any such projects when I emailed her, but if there already is something like this please let me know!)
I'm running Redwood Research's interpretability research.
I've considered running an "interpretability mine"--we get 50 interns, put them through a three week training course on transformers and our interpretability tools, and then put them to work on building mechanistic explanations of parts of some model like GPT-2 for the rest of their internship.
My usual joke is "GPT-2 has 12 attention heads per layer and 48 layers. If we had 50 interns and gave them each a different attention head every day, we'd have an intern-day of analysis of each attention head in 11 days."
This is bottlenecked on various things:
I think there's a 30% chance that in July, we'll wish that we had 50 interns to do something like this. Unfortunately this is too low a probability for it to make sense for us to organize the internship.
Now that it's after July, did you ever end up wishing you had 50 interns to do something like this?
I am glad we did not have 50 interns in July. But I’m 75% that we’ll run a giant event like this with at least 25 participants by the end of January. I’ll publish something about this in maybe a month.
Cool!