AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

21
2d
Heads up! Next week Toby Ord will be answering questions about his scaling series (everything under "artificial intelligence" on this page: tobyord.com/writings) on the EA Forum.  I'm posting the discussion thread (along with all the posts in the series and a special page to navigate them) on Monday, but I figured I'd share this now, in case anyone wants to get ahead on the reading over the weekend.
34
16d
2
Dwarkesh (of the famed podcast) recently posted a call for new guest scouts. Given how influential his podcast is likely to be in shaping discourse around transformative AI (among other important things), this seems worth flagging and applying for (at least, for students or early career researchers in bio, AI, history, econ, math, physics, AI that have a few extra hours a week). The role is remote, pays ~$100/hour, and expects ~5–10 hours/week. He’s looking for people who are deeply plugged into a field (e.g. grad students, postdocs, or practitioners) with high taste. Beyond scouting guests, the role also involves helping assemble curricula so he can rapidly get up to speed before interviews. More details are in the blog post; link to apply (due Jan 23 at 11:59pm PST).
9
10d
4
Are there any signs of governments beginning to do serious planning for the need for Universal Basic Income (UBI) or negative income tax...it feels like there's a real lack of urgency/rigour in policy engagement within government circles. The concept has obviously had its high-level advocates a la Altman but it still feels incredibly distant as any form of reality.  Meanwhile the impact is being seen in job markets right now - in the UK graduate job opening have plummeted in the last 12 months. People I know are having a hard enough time finding jobs with elite academic backgrounds - let alone the vast majority of people who went to average universities. This is happening today - before there's any consensus of arrival of AGI and widely recognised mass displacement in mid-career job markets. Impact is happening now, but preparation for major policy intervention in current fiscal scenarios seems really far off. If governments do view the risk of major employment market disruption as a realistic possibility (which I believe in many cases they do) are they planning for interventions behind the scene? Or do they view the problem as too big to address until it arrives...viewing rapid response > careful planning in the way the COVID emergency fiscal interventions emerged. Would be really interested to hear of any good examples of serious thinking/preparation of how some form of UBI could be planned for (logistically and fiscally) in the near time 5 year horizon. 
1
21h
I'm researching how safety frameworks of frontier labs (Anthropic RSP, OpenAI Preparedness Framework, DeepMind FSF) have changed between versions. Before I finish the analysis, I'm collecting predictions to compare with actual findings later. 5 quick questions. Questions Disclaimer: please take it with a grain of salt, questions drafted quickly with AI help, treating this as a casual experiment, not rigorous research. Thanks if you have a moment
43
3mo
2
Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.   (A) Technical (i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these. (ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are. (iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks. (iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they
44
3mo
5
Not sure who needs to hear this, but Hank Green has published two very good videos about AI safety this week: an interview with Nate Soares and a SciShow explainer on AI safety and superintelligence. Incidentally, he appears to have also come up with the ITN framework from first principles (h/t @Mjreard). Hopefully this is auspicious for things to come?
3
6d
We would welcome guest posts for Windfall's Trust AI Economics Brief here! If you're interested in summarizing your latest research for a broader audience of decision-makers or want to share a thoughtful take, this is an opportunity to do so!
68
8mo
3
A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. (This was originally a tweet thread (https://x.com/RyanPGreenblatt/status/1925992236648464774) which I've converted into a quick take. I also posted it on LessWrong.) What is the change and how does it affect security? 9 days ago, Anthropic changed their RSP so that ASL-3 no longer requires being robust to employees trying to steal model weights if the employee has any access to "systems that process model weights". Anthropic claims this change is minor (and calls insiders with this access "sophisticated insiders"). But, I'm not so sure it's a small change: we don't know what fraction of employees could get this access and "systems that process model weights" isn't explained. Naively, I'd guess that access to "systems that process model weights" includes employees being able to operate on the model weights in any way other than through a trusted API (a restricted API that we're very confident is secure). If that's right, it could be a high fraction! So, this might be a large reduction in the required level of security. If this does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical! Also, one of the easiest ways for security-aware employees to evaluate security is to think about how easily they could steal the weights. So, if you don't aim to be robust to employees, it might be much harder for employees to evaluate the level of security and then complain about not meeting requirements[1]. Anthropic's justification and why I disagree Anthropic justified the change by
Load more (8/236)