AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

13
5d
Here's some quick takes on what you can do if you want to contribute to AI safety or governance (they may generalise, but no guarantees). Paraphrased from a longer talk I gave, transcript here.  * First, there’s still tons of alpha left in having good takes. * (Matt Reardon originally said this to me and I was like, “what, no way”, but now I think he was right and this is still true – thanks Matt!) * You might be surprised, because there’s many people doing AI safety and governance work, but I think there’s still plenty of demand for good takes, and you can distinguish yourself professionally by being a reliable source of them. * But how do you have good takes? * I think the thing you do to form good takes, oversimplifying only slightly, is you read Learning by Writing and you go “yes, that’s how I should orient to the reading and writing that I do,” and then you do that a bunch of times with your reading and writing on AI safety and governance work, and then you share your writing somewhere and have lots of conversations with people about it and change your mind and learn more, and that’s how you have good takes. * What to read? * Start with the basics (e.g. BlueDot’s courses, other reading lists) then work from there on what’s interesting x important * Write in public * Usually, if you haven’t got evidence of your takes being excellent, it’s not that useful to just generally voice your takes. I think having takes and backing them up with some evidence, or saying things like “I read this thing, here’s my summary, here’s what I think” is useful. But it’s kind of hard to get readers to care if you’re just like “I’m some guy, here are my takes.” * Some especially useful kinds of writing * In order to get people to care about your takes, you could do useful kinds of writing first, like: * Explaining important concepts * E.g., evals awareness, non-LLM architectures (should I care? why?) , AI control, best arguments for/against sho
9
7d
3
EA Connect 2025: Personal Takeaways Background I'm Ondřej Kubů, a postdoctoral researcher in mathematical physics at ICMAT Madrid, working on integrable Hamiltonian systems. I've engaged with EA ideas since around 2020—initially through reading and podcasts, then ACX meetups, and from 2023 more regularly with Prague EA (now EA Madrid after moving here). I took the GWWC 10% pledge during the event. My EA focus is longtermist, primarily AI risk. My mathematical background has led me to take seriously arguments that alignment of superintelligent AI may face fundamental verification problems, and that current trajectories pose serious catastrophic risk. This shapes my donations toward governance and advocacy rather than technical alignment. I'm not ready to pivot careers at this stage—I'm contributing through donations while continuing in mathematics. I attended EA Connect during job search, so sessions on career strategy and donation prioritization were particularly relevant. On donation strategy Joseph Savoie's talk Twice as Good introduced the POWERS framework for improving donation impact: Price Tag (know the cost per outcome), Options (compare alternatives), Who (choose the right evaluator), Evaluate (use concrete benchmarks), Reduce (minimize burden on NGOs), Substance (focus on how charities work, not presentation). The framework is useful but clearly aimed at large donors—"compare 10+ alternatives" and "hire someone to evaluate" aren't realistic for someone donating 10% of a postdoc salary. The "Price Tag" slide was striking: what $1 million buys across cause areas—200 lives saved via malaria nets, 3 million farmed animals helped through advocacy, 6.1 gigatons CO₂ mitigation potential through agrifood reform. But the X-Risk/AI line only specified inputs ("fund 3-4 research projects"), not outcomes. This reflects the illegibility problem I asked about in office hours: how do you evaluate AI governance donations? Savoie acknowledged he doesn't donate much
0
17d
8
Slight update to the odds I’ve been giving to the creation of artificial general intelligence (AGI) before the end of 2032. I’ve been anchoring the numerical odds of this to the odds of a third-party candidate like Jill Stein or Gary Johnson winning a U.S. presidential election. That’s something I think is significantly more probable than AGI by the end of 2032. Previously, I’d been using 0.1% or 1 in 1,000 as the odds for this, but I was aware that these odds were probably rounded. I took a bit of time to refine this. I found that in 2016, FiveThirtyEight put the odds on Evan McMullin — who was running as an independent, not for a third party, but close enough — becoming president at 1 in 5,000 or 0.02%. Even these odds are quasi-arbitrary, since McMullin only became president in simulations where neither of the two major party candidates won a majority of Electoral College votes. In such scenarios, Nate Silver arbitrarily put the odds at 10% that the House would vote to appoint McMullin as the president.  So, for now, it is more accurate for me to say: the probability of the creation of AGI before the end of 2032 is significantly less than 1 in 5,000 or 0.02%. I can also expand the window of time from the end of 2032 to the end of 2034. That’s a small enough expansion it doesn’t affect the probability much. Extending the window to the end of 2034 covers the latest dates that have appeared on Metaculus since the big dip in its timeline that happened in the month following the launch of GPT-4. By the end of 2034, I still put the odds of AGI significantly below 1 in 5,000 or 0.02%. My confidence interval is over 95%. [Edited Nov. 28, 2025 at 3:06pm Eastern. See comments below.] I will continue to try to find other events to anchor my probability to. It’s difficult to find good examples. An imperfect point of comparison is an individual’s annual risk of being struck by lightning, which is 1 in 1.22 million. Over 9 years, the risk is in 1 in 135,000. Since the cre
3
18d
Self-driving cars are not close to getting solved. Don’t take my word for it. Listen to Andrej Karpathy, the lead AI researcher responsible for the development of Tesla’s Full Self-Driving software from 2017 to 2022. (Karpathy also did two stints as a researcher at OpenAI, taught a deep learning course at Stanford, and coined the term "vibe coding".) From Karpathy’s October 17, 2025 interview with Dwarkesh Patel: Karpathy elaborated later in the interview: I hope the implication for discussions around AGI timelines is clear.
4
24d
2
Your help requested: I’m seeking second opinions on whether my contention in Edit #4 at the bottom of this post is correct or incorrect. See the edit at the bottom of the post for full details. Brief info: * My contention is about the Forecasting Research Institute’s recent LEAP survey. * One of the headline results from the survey is about the probabilities the respondents assign to each of three scenarios. * However, the question uses an indirect framing — an intersubjective resolution or metaprediction framing. * The specific phrasing of the question is quite important. * My contention is, if respondents took the question literally, as written, they did not actually report their probabilities for each scenario, and there is no way to derive their probabilities from what they did report. * Therefore, the headline result that states the respondents’ probabilities for the three scenarios is not actually true. If my contention is right, then it means the results of the report are being misreported in a quite significant way. If my contention is wrong, then I must make a mea culpa and apologize to the Forecasting Research Institute for my error. So, your help requested. Am I right or wrong? (Note: the post discusses multiple topics, but here I’m specifically asking for opinions on the intersubjective resolution/metaprediction concern raised in Edit #4.)
0
1mo
2
Yann LeCun (a Turing Award-winning pioneer of deep learning) leaving Meta AI — and probably, I would surmise, being nudged out by Mark Zuckerberg (or another senior Meta executive) — is a microcosm for everything wrong with AI research today.  LeCun is the rare researcher working on fundamental new ideas to push AI forward on a paradigm level. Zuckerberg et al. seem to be abandoning that kind of work to focus on a mad dash to AGI via LLMs, on the view that enough scaling and enough incremental engineering and R&D will push current LLMs all the way to AGI, or at least very powerful, very economically transformative AI.  I predict that in five years or so, this will be seen in retrospect (by many people, if not by everyone) as an incredibly wasteful mistake by Zuckerberg, and also by other executives at other companies (and the investors in them) making similar decisions. The amount of capital being spent on LLMs is eye-watering and could fund a lot of fundamental research, some of which could have turned up some ideas that would actually lead to useful, economically and socially beneficial technology.
3
1mo
3
I felt very discouraged when I heard that there were over 1300 applications for the Gov AI winter fellowship. But now I'm frankly appalled to hear that there were over 7500 applications for the 2025 FIG Winter Fellowship.  Should we officially declare that AI governance is oversaturated, and not recommend this career path except for the ultra-talented?
0
1mo
3
A universally provably safe artificial general intelligence is not possible, and the reasoning begins with the halting problem. In 1936, Alan Turing proved that no algorithm can determine, for every possible program and input, whether that program will eventually stop running or run forever. The importance of the halting problem is that it shows there are limits on what can be predicted about the future behavior of general computational systems. The next key result is Rice’s theorem, which states that any non trivial question about what a program will eventually do is undecidable if the program is powerful enough to represent arbitrary computation. This includes questions such as whether a program will ever produce a certain output, ever enter a certain state, or always avoid a specific class of behaviors. A highly capable artificial intelligence system, particularly if it's a system with general reasoning ability, falls into this category. Such a system is computationally expressive enough to learn new strategies, modify its own internal structure, and operate in environments that cannot be fully anticipated. Asking whether it will always behave safely is mathematically equivalent to asking whether a general program will always avoid certain behaviors. Rice’s theorem shows that there is no universal method to answer such questions correctly in all cases. Quantum computing does not change this conclusion. Although quantum computation can accelerate certain classes of algorithms, it does not convert undecidable problems into decidable ones. The halting problem and Rice’s theorem apply to quantum computers just as they apply to classical computers. Provable safety is possible only when artificial intelligence systems are restricted. If the system cannot self modify, if its environment is fully defined, or if its components are formally verified, then proofs can be constructed. These proofs apply only within the specific boundaries that can be modeled and checked.
Load more (8/225)