This is a special post for quick takes by ojorgensen. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since:

How to upskill in AI Safety after AGISF, and how to help others upskill

For people who want to start working on AI Safety, it feels like the AGISF program is widely accepted as the start of the pipeline. It also feels like there are some things we want to see work on within AI Safety: conceptual work, engineering work, empirical work, etc.

However, I think there’s an upskilling gap that isn’t currently clearly signposted by the community: how to go from AGISF to being able to contribute! The main programmes I’m aware of for taking someone who knows a bit about alignment to being able to contribute to AI Safety are SERI MATS (trying to produce conceptual researchers), MLAB (trying to produce engineers), and internships at orgs like Conjecture, CHAI etc.

Without having done these, my impression is that they seem like the best options for upskilling past AGISF, but these are bottlenecked pretty hard by mentor time. So, I think there are plenty of people who could be doing useful work with sufficient upskilling, but who will have to do upskilling more independently. For these people, useful resources like How to pursue a career in technical AI Alignment by Charlie Rogers-Smith exist, but I think some clear signposting for people post-AGISF, pre-job would be useful.

So, after AGISF, what should an excited safety person be doing to upskill? Here are a few ideas
 

Engineering: 

Conceptual work:

How to use these resources?

I think for almost all of these resources, the order of how good I would expect them to be is: 

In person with mentorship > Online with mentorship > In person with no mentorship > Online with no mentorship >> solo. A caveat is that I’m not super sure about how high to rate mentorship compared with simply working through these resources with other enthusiastic people. It seems less important for some areas of engineering work than it does for conceptual work (like getting research taste), but I’m not sure about this.

 

Implications for AI Safety field-builders.

I think the gap between someone doing AGISF and doing impactful work has two implications for field builders in AI safety: 

Firstly, I think there would be a lot of value in creating online versions of programmes which essentially go through some of these resources, similarly to how AGISF currently works. I think the biggest bottleneck for these would be mentor time, but I think lots of these could be used successfully without mentors, if there are others who are excited to work through the resources who also have some background in the area.

Secondly, I think that field builders working in local groups with enough members and resources (Oxford, Berkeley, London) could try to run versions of these programmes in person. If online versions of these programmes exist, this becomes even easier: essentially all it requires is scheduling when these groups need to meet!

 

Conclusion

If you have done AGISF and want to start upskilling, hopefully some of these resources will be useful!

If you want to help others upskill, I think running programmes centred around some of these resources would be a good idea!


 

Curated and popular this week
trammell
 ·  · 25m read
 · 
Introduction When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2] In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior. There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3] Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
LewisBollard
 ·  · 6m read
 · 
> Despite the setbacks, I'm hopeful about the technology's future ---------------------------------------- It wasn’t meant to go like this. Alternative protein startups that were once soaring are now struggling. Impact investors who were once everywhere are now absent. Banks that confidently predicted 31% annual growth (UBS) and a 2030 global market worth $88-263B (Credit Suisse) have quietly taken down their predictions. This sucks. For many founders and staff this wasn’t just a job, but a calling — an opportunity to work toward a world free of factory farming. For many investors, it wasn’t just an investment, but a bet on a better future. It’s easy to feel frustrated, disillusioned, and even hopeless. It’s also wrong. There’s still plenty of hope for alternative proteins — just on a longer timeline than the unrealistic ones that were once touted. Here are three trends I’m particularly excited about. Better products People are eating less plant-based meat for many reasons, but the simplest one may just be that they don’t like how they taste. “Taste/texture” was the top reason chosen by Brits for reducing their plant-based meat consumption in a recent survey by Bryant Research. US consumers most disliked the “consistency and texture” of plant-based foods in a survey of shoppers at retailer Kroger.  They’ve got a point. In 2018-21, every food giant, meat company, and two-person startup rushed new products to market with minimal product testing. Indeed, the meat companies’ plant-based offerings were bad enough to inspire conspiracy theories that this was a case of the car companies buying up the streetcars.  Consumers noticed. The Bryant Research survey found that two thirds of Brits agreed with the statement “some plant based meat products or brands taste much worse than others.” In a 2021 taste test, 100 consumers rated all five brands of plant-based nuggets as much worse than chicken-based nuggets on taste, texture, and “overall liking.” One silver lining
 ·  · 1m read
 · 
Relevant opportunities