AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

27
13d
7
Invitation for bets I’m willing to bet that Anthropic’s revenue growth over the next year will be slower than its revenue growth over the last 3 years. I proposed a specific bet here. Anyone who wants can offer to take the other side of that bet. Or you can make a counteroffer. I’m also willing to make a longer-term bet that the AI industry is in a bubble. I proposed a specific bet for that, too, here. Feel free to offer to take the other side of that bet or make a counteroffer. I’d also be open to other bets. It seems pointless to bet about whether AGI or transformative AI will be deployed within the next 5-10 years, yet, for the heck of it, I would agree to a bet against that, too. (I’ll make bets for small, nominal amounts of money to be donated to the winner’s charity of choice, since the practical and legal problems with betting are too large otherwise.) I’d also bet against the deployment of 100,000+ SAE Level 5 fully autonomous vehicles in North America within the next 3 years, if anyone has a strong opinion on that. I’d make a similar bet against the deployment of autonomous humanoid robots in North American households, although we’d have to come up with some specific resolution criteria. Similarly, I’d bet against any significant level of near-term labour automation by LLMs or generative AI. Or against LLMs becoming capable of performing all sorts of specific tasks well. On any of these topics, I’m also open to invitations for a public dialogue. (More on that topic here.)
50
1mo
2
More EA undergrads should do political volunteering. It's impactful AND fun. Choose an election that's impactful (e.g. AI safety candidate) and neglected (e.g. primaries in always-blue/red places), couch-crash the weekend there, and volunteer with the campaign. I say this after doing 15 hours of street canvassing myself. I was surprised by how anecdotally impactful and fun it was. If you like people-watching, talking to strangers, and/or joining passionate projects for a weekend, I think you'll also love this. I wish I thought of this earlier. Literature on the impact (Claude-generated): Kalla & Broockman's meta-analysis of 49 field experiments finds zero average persuasive effect in general elections, but effects do show up when voters lack a partisan cue (i.e. primaries and ballot measures). Mann & Haenschen (2024) find mobilization effects (e.g. canvassing) are 33-76% larger in low-attention races than in high-attention ones. Your marginal volunteer hour goes much further in a primary.
11
13d
SMBC by Zach Weinersmith is doing a great job of conveying AI Safety memes more widely. Relevant comics: https://www.smbc-comics.com/comic/speech https://www.smbc-comics.com/comic/safe https://www.smbc-comics.com/comic/ai-17 https://www.smbc-comics.com/comic/ai-15 I would love to see his take on an illustrated AI Safety book, like 'Open Borders' meets 'If anyone builds it, everyone dies'.
38
3mo
In two days (March 21st, 12-4pm), about 140 of us (event link) will be marching on Anthropic, OpenAI and xAI in SF asking the CEOs to make statements on whether they would stop developing new frontier models if every other major lab in the world credibly does the same. This comes after Anthropic removed its commitment to pause development from their RSP. We'll be starting at 500 Howard St, San Francisco (Anthropic's Office, full schedule and more info here). This is shaping to be the biggest US AI Safety protest to date, with a coalition including Nate Soares (MIRI), David Krueger (Evitable), Will Fithian (Berkeley Professor) and folks representing PauseAI, QuitGPT, Humans First.
13
1mo
I'd like to have conversations with people who work or are knowledgeable about energy and security. Whether that's with respect to energy grids, nuclear power plants, solar panels, etc. I'm exploring a startup idea to harden the world's critical infrastructure against powerful AI. (I am also building a system to make formal verification more deployable at scale so that it may reduce loss of control and misuse scenarios.) I've given workshops on using AIs for productivity/research to various research organizations like MATS. I'm happy to offer a bit of my time to share my expertise on that if that would make the meeting more interesting for you (or any other topics you'd like to hear my perspective on). Context about me: I'm Jacques. I started working on technical AI safety research in January 2022. Before that, I had been engaging with AI ethics in a more personal capacity, worked as a data scientist at the Canada Energy Regulator, and earned a BSc/master's in Physics. I'm currently based in Montreal. Please schedule a meeting if interested (or DM if you know someone I should talk to): https://calendly.com/jacquesthibodeau/45-minute-meeting
34
5mo
1
Dwarkesh (of the famed podcast) recently posted a call for new guest scouts. Given how influential his podcast is likely to be in shaping discourse around transformative AI (among other important things), this seems worth flagging and applying for (at least, for students or early career researchers in bio, AI, history, econ, math, physics, AI that have a few extra hours a week). The role is remote, pays ~$100/hour, and expects ~5–10 hours/week. He’s looking for people who are deeply plugged into a field (e.g. grad students, postdocs, or practitioners) with high taste. Beyond scouting guests, the role also involves helping assemble curricula so he can rapidly get up to speed before interviews. More details are in the blog post; link to apply (due Jan 23 at 11:59pm PST).
45
8mo
5
Not sure who needs to hear this, but Hank Green has published two very good videos about AI safety this week: an interview with Nate Soares and a SciShow explainer on AI safety and superintelligence. Incidentally, he appears to have also come up with the ITN framework from first principles (h/t @Mjreard). Hopefully this is auspicious for things to come?
68
1y
3
A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. (This was originally a tweet thread (https://x.com/RyanPGreenblatt/status/1925992236648464774) which I've converted into a quick take. I also posted it on LessWrong.) What is the change and how does it affect security? 9 days ago, Anthropic changed their RSP so that ASL-3 no longer requires being robust to employees trying to steal model weights if the employee has any access to "systems that process model weights". Anthropic claims this change is minor (and calls insiders with this access "sophisticated insiders"). But, I'm not so sure it's a small change: we don't know what fraction of employees could get this access and "systems that process model weights" isn't explained. Naively, I'd guess that access to "systems that process model weights" includes employees being able to operate on the model weights in any way other than through a trusted API (a restricted API that we're very confident is secure). If that's right, it could be a high fraction! So, this might be a large reduction in the required level of security. If this does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical! Also, one of the easiest ways for security-aware employees to evaluate security is to think about how easily they could steal the weights. So, if you don't aim to be robust to employees, it might be much harder for employees to evaluate the level of security and then complain about not meeting requirements[1]. Anthropic's justification and why I disagree Anthropic justified the change by
Load more (8/255)