AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

4
3d
alignment is a conversation between developers and the broader field. all domains are conversations between decision-makers and everyone else: “here are important considerations you might not have been taking into account. here is a normative prescription for you.” “thanks — i had been considering that to 𝜀 extent. i will {implement it because x / not implement it because y / implement z instead}." these are the two roles i perceive. how does one train oneself to be the best at either? sometimes, conversations at eag center around ‘how to get a job’, whereas i feel they ought to center around ‘how to make oneself significantly better than the second-best candidate’.
9
8d
6
Is the recent partial lifting of US chip export controls on China (see e.g. here: https://thezvi.substack.com/p/selling-h200s-to-china-is-unwise) good or bad for humanity? I’ve seen many takes from people whose judgment I respect arguing that it is very bad, but their arguments, imho, just don’t make sense. What am I missing? For transparency, I am neither Chinese nor American, nor am I a paid agent of them. I am not at all confident in this take, but imho someone should make it. I see two possible scenarios: A) you are not sure how close humanity is to developing superintelligence in the Yudkowskian sense. This is what I believe, and what many smart opponents of the Trump administration’s move to ease chip controls believe. Or B) you are pretty sure that humanity is not going to develop superintelligence any time soon, let’s say in the next century. I admit that the case against the lifting of chip controls is stronger under B), though I am ultimately inclined to reject it in both scenarios. Why is easing of chip controls, imho, a good idea if the timeline to superintelligence might be short? If superintelligence is around the corner, here is what should be done: an immediate international pause of AI development until we figure out how to proceed. Competitive pressures and resulting prisoner’s dilemmas have been identified as the factor that might push us toward NOT pausing even when it would be widely recognized that the likely outcome of continuing is dire. There are various relevant forms of competition, but plausibly the most important is that between the US and China. In order to reduce competitive dynamics and thus prepare the ground for a cooperative pause, it is important to build trust between the parties and beware of steps that are hostile, especially in domains touching AI. Controls make sense only if you are very confident that superintelligence developed in the US, or perhaps in liberal democracy more generally, is going to turn out well for h
27
12d
2
The AI Eval Singularity is Near * AI capabilities seem to be doubling every 4-7 months * Humanity's ability to measure capabilities is growing much more slowly * This implies an "eval singularity": a point at which capabilities grow faster than our ability to measure them * It seems like the singularity is ~here in cybersecurity, CBRN, and AI R&D (supporting quotes below) * It's possible that this is temporary, but the people involved seem pretty worried Appendix - quotes on eval saturation Opus 4.6 * "For AI R&D capabilities, we found that Claude Opus 4.6 has saturated most of our automated evaluations, meaning they no longer provide useful evidence for ruling out ASL-4 level autonomy. We report them for completeness, and we will likely discontinue them going forward. Our determination rests primarily on an internal survey of Anthropic staff, in which 0 of 16 participants believed the model could be made into a drop-in replacement for an entry-level researcher with scaffolding and tooling improvements within three months." * "For ASL-4 evaluations [of CBRN], our automated benchmarks are now largely saturated and no longer provide meaningful signal for rule-out (though as stated above, this is not indicative of harm; it simply means we can no longer rule out certain capabilities that may be pre-requisities to a model having ASL-4 capabilities)." * It also saturated ~100% of the cyber evaluations Codex-5.3 * "We are treating this model as High [for cybersecurity], even though we cannot be certain that it actually has these capabilities, because it meets the requirements of each of our canary thresholds and we therefore cannot rule out the possibility that it is in fact Cyber High."
4
19d
I'm researching how safety frameworks of frontier labs (Anthropic RSP, OpenAI Preparedness Framework, DeepMind FSF) have changed between versions. Before I finish the analysis, I'm collecting predictions to compare with actual findings later. 5 quick questions. Questions Disclaimer: please take it with a grain of salt, questions drafted quickly with AI help, treating this as a casual experiment, not rigorous research. Thanks if you have a moment
3
24d
We would welcome guest posts for Windfall's Trust AI Economics Brief here! If you're interested in summarizing your latest research for a broader audience of decision-makers or want to share a thoughtful take, this is an opportunity to do so!
-4
25d
[Adapted from this comment.] Two pieces of evidence commonly cited for near-term AGI are AI 2027 and the METR time horizons graph. AI 2027 is open to multiple independent criticisms, one of which is its use of the METR time horizons graph to forecast near-term AGI or AI capabilities more generally. Using the METR graph to forecast near-term AGI or AI capabilities more generally is not supported by the data and methodology used to make the graph. Two strong criticisms that apply specifically to the AI 2027 forecast are: * It depends crucially on the subjective intuitions or guesses of the authors. If you don't personally share the authors' intuitions, or don't personally trust that the authors' intuitions are likely correct, then there is no particular reason to take AI 2027's conclusions seriously. * Credible critics claim that the headline results of the AI 2027 timelines model are largely baked in by the authors' modelling decisions, irrespective of what data the model uses. That means, to a large extent, AI 2027's conclusions are not actually determined by the data they use. We already saw with the previous bullet point that the conclusions of AI 2027 are largely a restatement of the authors' personal and contestable beliefs. This is another way in which AI 2027's conclusions are, effectively, a restatement of the pre-existing beliefs or assumptions that the authors chose to embed in their timelines model. AI 2027 is largely based on extrapolating the METR time horizons graph. The following criticisms of the METR time horizons graph therefore extend to AI 2027: * Some of the serious problems and limitations of the METR time horizons graph are sometimes (but not always) clearly disclosed by METR employees. Note the wide difference between the caveated description of what the graph says and the interpretation of the graph as a strong indicator of rapid, exponential improvement in general AI capabilities. * Gary Marcus, a cognitive scientist and AI research
1
1mo
Technical Alignment Research Accelerator (TARA) applications close today! Last chance to apply to join the 14-week, remotely taught, in-person run program (based on the ARENA curriculum) designed to accelerate APAC talent towards meaningful technical AI safety research. TARA is built for you to learn around full-time work or study by attending meetings in your home city on Saturdays and doing independent study throughout the week. Finish the program with a project to add to your portfolio, key technical AI safety skills, and connections across APAC. See this post for more information and apply through our website here.
9
1mo
4
Are there any signs of governments beginning to do serious planning for the need for Universal Basic Income (UBI) or negative income tax...it feels like there's a real lack of urgency/rigour in policy engagement within government circles. The concept has obviously had its high-level advocates a la Altman but it still feels incredibly distant as any form of reality.  Meanwhile the impact is being seen in job markets right now - in the UK graduate job opening have plummeted in the last 12 months. People I know are having a hard enough time finding jobs with elite academic backgrounds - let alone the vast majority of people who went to average universities. This is happening today - before there's any consensus of arrival of AGI and widely recognised mass displacement in mid-career job markets. Impact is happening now, but preparation for major policy intervention in current fiscal scenarios seems really far off. If governments do view the risk of major employment market disruption as a realistic possibility (which I believe in many cases they do) are they planning for interventions behind the scene? Or do they view the problem as too big to address until it arrives...viewing rapid response > careful planning in the way the COVID emergency fiscal interventions emerged. Would be really interested to hear of any good examples of serious thinking/preparation of how some form of UBI could be planned for (logistically and fiscally) in the near time 5 year horizon. 
Load more (8/238)