AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

18
8h
Both Sam and Dario saying that they now believe they know how to build AGI seems like an underrated development to me. To my knowledge, they only started saying this recently. I suspect they are overconfident, but still seems like a more significant indicator than many people seem to be tracking.
6
1d
Are you or someone you know: 1) great at building (software) companies 2) care deeply about AI safety 3) open to talk about an opportunity to work together on something If so, please DM with your background. If someone comes to mind, also DM. I am looking thinking of a way to build companies in a way to fund AI safety work.
23
14d
One challenge in AI safety field-building is that otherwise-useful resources – like lists of courses or funders or local groups – generally become outdated over time. We’ve tried to solve this by collecting a bunch of resources together at AISafety.com and dedicating considerable bandwidth to keeping them updated. Until recently, this maintenance has been largely ad hoc, making additions and changes as we learned of them. To ensure nothing slips through the cracks, we’ve now added a schedule for doing thorough sweeps through the entire database for each resource. Below is our current plan: * Courses * Every 3 months: general sweep * Communities * Every 3 months: general sweep * (maybe) Every 6 months: request update from organisers * Projects * Every 3 months: general sweep * Every 6 months: request update from owners of active projects * Jobs * [This is a filtered subset of 80k’s database and updates automatically] * Events & training * Twice weekly: check for new events and programs * Every 2 weeks: add any dates previously unannounced and check for changes to application deadlines * Funders * Every 2 weeks: check for changes to “applications open/closed” status * Every 3 months: general sweep * Landscape map * Every 1 month: check no links are broken * Every 3 months: general sweep * Donation guide * Every 3 months: check no links are broken * Every 6 months: review entire guide * Speak to an Advisor * Every 3 months: general sweep We’re also continuing to make immediate updates whenever we become aware of them. In other words, this is just the minimum you can expect for regular maintenance. If you spot a correction or want to add something new, please get in touch via the form on the relevant resource page. Our goal is to keep AISafety.com’s resources as accurate and up to date as possible.
45
2mo
16
Some of my thoughts on funding. It's giving season and I want to finally get around to publishing some of my thoughts and experiences around funding. I haven't written anything yet because I feel like I am mostly just revisiting painful experiences and will end up writing some angry rant. I have ideas for how things could be better so hopefully this can lead to positive change not just more complaining. All my experiences are in AI Safety. On Timing: Certainty is more important than speed. The total decision time is less important than the overdue time. Expecting a decision in 30 days and getting it in 35 days is worse than if I expect the decision in 90 days and I get it in 85 days. Grantmakers providing statistics about timing expectations makes things worse. If the mean or median response time is N days it is now N+5 days is it appropriate for me to send a follow-up email to check on the status? Technically it's not late yet. It could come tomorrow or in N more days. Imagine if the Uber app showed you the global mean wait time for the last 12 months and there was no map to track your driver's arrival. "It doesn't have to reduce the waiting time it just has to reduce the uncertainty" - Rory Sutherland My conversations about people's expectations and experiences with people in Berkeley are at times very different to those outside of Berkeley. After I posted my announcement about shutting down AISS and my comment on the LTFF update several people reached out to me about their experiences. Some people I already knew well, some I had met and others I didn't know before. Some of them had received funding a couple of times but their negative experiences led them to not reapply and walk away from their work or the ecosystem entirely. At least one mentioned having a draft post about their experience that they did not feel comfortable publishing. There was definitely a point for me where I had already given up but just not realised it. I had already run out of fundi
52
2mo
2
I'd love to see an 'Animal Welfare vs. AI Safety/Governance Debate Week' happening on the Forum. The risks from AI cause has grown massively in importance in recent years, and has become a priority career choice for many in the community. At the same time, the Animal Welfare vs Global Health Debate Week demonstrated just how important and neglected the cause of animal welfare remains. I know several people (including myself) who are uncertain/torn about whether to pursue careers focused on reducing animal suffering or mitigating existential risks related to AI. It would help to have rich discussions comparing both causes's current priorities and bottlenecks, and a debate week would hopefully expose some useful crucial considerations.
3
2d
I'm interesting in chatting to any civil servants, ideally in the UK, who are keen on improving decision making in their teams/area - potentially through forecasting techniques and similar methods. If you'd be interested in chatting, please DM me!
17
1mo
1
Anthropic's Twitter account was hacked. It's "just" a social media account, but it raises some concerns. Update: the post has just been deleted. They keep the updates on their status page: https://status.anthropic.com/
12
1mo
Isn't mechinterp basically setting out to build tools for AI self-improvement? One of the things people are most worried about is AIs recursively improving themselves. (Whether all people who claim this kind of thing as a red line will actually treat this as a red line is a separate question for another post.) It seems to me like mechanistic interpretability is basically a really promising avenue for that. Trivial example: Claude decides that the most important thing is being the Golden Gate Bridge. Claude reads up on Anthropic's work, gets access to the relevant tools, and does brain surgery on itself to turn into Golden Gate Bridge Claude. More meaningfully, it seems like any ability to understand in a fine-grained way what's going on in a big model could be co-opted by an AI to "learn" in some way. In general, I think the case that seems most likely soonest is: * Learn in-context (e.g. results of experiments, feedback from users, things like we've recently observed in scheming papers...) * Translate this to appropriate adjustments to weights (identified using mechinterp research) * Execute those adjustments Maybe I'm late to this party and everyone was already conceptualising mechinterp as a very dual-use technology, but I'm here now. Honestly, maybe it leans more towards "offense" (i.e., catastrophic misalignment) than defense! It will almost inevitably require automation to be useful, so we're ceding it to machines out of the gate. I'd expect tomorrow's models to be better placed to make sense of and use of mechinterp techniques than humans are - partly just because of sheer compute, but also maybe (and now I'm into speculating on stuff I understand even less) the nature of their cognition is more suited to what's involved.
Load more (8/148)