ojorgensen

Pursuing a graduate degree (e.g. Master's)
36Joined Jan 2021

Bio

Incoming AI MsC student at Imperial College London. I  helped run the EA student group at St Andrews during my undergrad.

Comments
5

I’d be interested to know if people thought this carried over to other EA forum adjacent spaces too, like EA twitter and lesswrong? My impression is that content might be slightly worse here too, but maybe not to the same extent as the forum. It seems like we might expect some of these mechanisms to translate to these spaces too, but not all of them, which would be useful for trying to determine which of these factors are actually important.

How to upskill in AI Safety after AGISF, and how to help others upskill

For people who want to start working on AI Safety, it feels like the AGISF program is widely accepted as the start of the pipeline. It also feels like there are some things we want to see work on within AI Safety: conceptual work, engineering work, empirical work, etc.

However, I think there’s an upskilling gap that isn’t currently clearly signposted by the community: how to go from AGISF to being able to contribute! The main programmes I’m aware of for taking someone who knows a bit about alignment to being able to contribute to AI Safety are SERI MATS (trying to produce conceptual researchers), MLAB (trying to produce engineers), and internships at orgs like Conjecture, CHAI etc.

Without having done these, my impression is that they seem like the best options for upskilling past AGISF, but these are bottlenecked pretty hard by mentor time. So, I think there are plenty of people who could be doing useful work with sufficient upskilling, but who will have to do upskilling more independently. For these people, useful resources like How to pursue a career in technical AI Alignment by Charlie Rogers-Smith exist, but I think some clear signposting for people post-AGISF, pre-job would be useful.

So, after AGISF, what should an excited safety person be doing to upskill? Here are a few ideas
 

Engineering: 

Conceptual work:

How to use these resources?

I think for almost all of these resources, the order of how good I would expect them to be is: 

In person with mentorship > Online with mentorship > In person with no mentorship > Online with no mentorship >> solo. A caveat is that I’m not super sure about how high to rate mentorship compared with simply working through these resources with other enthusiastic people. It seems less important for some areas of engineering work than it does for conceptual work (like getting research taste), but I’m not sure about this.

 

Implications for AI Safety field-builders.

I think the gap between someone doing AGISF and doing impactful work has two implications for field builders in AI safety: 

Firstly, I think there would be a lot of value in creating online versions of programmes which essentially go through some of these resources, similarly to how AGISF currently works. I think the biggest bottleneck for these would be mentor time, but I think lots of these could be used successfully without mentors, if there are others who are excited to work through the resources who also have some background in the area.

Secondly, I think that field builders working in local groups with enough members and resources (Oxford, Berkeley, London) could try to run versions of these programmes in person. If online versions of these programmes exist, this becomes even easier: essentially all it requires is scheduling when these groups need to meet!

 

Conclusion

If you have done AGISF and want to start upskilling, hopefully some of these resources will be useful!

If you want to help others upskill, I think running programmes centred around some of these resources would be a good idea!


 

Thanks for reading the post Catherine! I like this list a lot, and I agree that trying to answer ‘sub-AGI evidence of alignment doesn’t tell us about AGI alignment’ is the key here.

I think that trying to evaluate research agendas might still be important given this. We may struggle to verify the most general version of the claim above, but maybe we can make progress if we restrict ourselves to analysing the kinds of evidence that are generated by specific research agendas. Hence, if we try to answer the claim as in the context of specific research agendas (like "to what extent does interpretability give us evidence of alignment in AGI systems?"), the question might become more tractable, although this is offset by having to answer more questions!

Thanks for reading the post Oscar! Going to reply to both of your comments here! I haven't thought a lot about when one should start "steering" in their career, but I think starting with an approach focussed on rowing  makes a lot of sense.

Addressing the idea that steering is less important if we can just fund all possible research agendas, I don't think this necessarily holds. It seems that we are talent-constrained at least to an extent, and so every researcher focussed on a hopeless / implausible research agenda is one that isn't working on a plausible research agenda. Thus, even with lots of funding, steering is still important. 

Could someone explain the “e/acc” in some of these? I haven’t seen it before.