Hide table of contents

Summary: Patching all exploits in open-source software that forms the backbone of the internet would be hard on maintainers, less effective than thought, and expensive (Fermi estimate included, 5%/50%/95% cost ~$31 mio./~$1.9 bio./~$168 bio.). It's unclear who'll be willing to pay that.

Preventative measures discussed for averting an AI takeover attempt include hardenening the software infrastructure of the world against attacks. The plan is to use lab-internal (specialized?) software engineering AI systems to submit patches to fix all findable security vulnerabilities in open-source software (think a vastly expanded and automated version of Project Zero, and likely to partner with companies developing internet-critical software (in the likes of Cisco & Huawei).

I think that that plan is net-positive. I also think that it has some pretty glaring open problems (in ascending order of exigency): (1) Maintainer overload and response times, (2) hybrid hardware/software vulnerabilities, and (3) cost as a public good (also known as "who's gonna pay for it?").

Maintainer Overload

If transformative AI is developed soon, most open source projects (especially old ones relevant to internet infrastructure) are going to be maintained by humans with human response times. That will significantly increase the time for relevant security patches to be reviewed and merged into existing codebases, especially if at the time attackers will submit AI-generated or co-developed subtle exploits using AI systems six to nine months behind the leading capabilities, keeping maintainers especially vigilant.

Hybrid and Hardware Vulnerabilities

My impression is that vulnerabilities are moving from software-only vulnerabilities towards very low-level microcode or software/hardware hybrid vulnerabilities (e.g. Hertzbleed, Spectre, Meltdown, Rowhammer, Microarchitectural Data Sampling, …), for which software fixes, if they exist, have pretty bad performance penalties. GPU-level vulnerabilities get less attention, but they absolutely exist, e.g. LeftoverLocals and JellyFish. My best guess is that cutting-edge GPUs are much less secure than CPUs, since they've received less attention from researchers and their documentation is less easily accessible. (They probably have less cruft from bad design choices in early computer history.) Hence: Software-only vulnerabilities are easy to fix, software/hardware hybrid ones are more painful to fix, hardware vulnerabilities escape quick fixes (in the extreme demanding recall like the Pentium FDIV bug). And don't get me started on the vulnerabilities lurking in human psychology, which are basically impossible to fix on short time-scales…

Who Pays?

Finding vulnerabilities in all the relevant security infrastructure of the internet and fixing them might be expensive. 1 mio. input tokens for Gemini 2.0 Flash cost $0.15, and $0.60 for output tokens—but a model able to find & find fixes to security vulnerabilities is going to be more expensive. An AI-generated me-adjusted Squiggle model estimates that it'd cost (median estimate) ~$1.9 bio. to fix most vulnerabilities in open-source software (90% confidence-interval: ~$31 mio. to ~$168 bio., mean estimated cost is… gulp… ~$140 bio.).

(I think the analysis under-estimates the cost because it doesn't consider setting up the project, paying human supervisors and reviewers, costs for testing infrastructure & compute, finding complicated vulnerabilities that arise from the interaction of different programs…).

It was notable when Google paid $600k for open-source fuzzing, so >~$1.9 bio. is going to be… hefty. The discussion on this has been pretty far mode and "surely somebody is going to do that when it's “so easy”", but there have been fewer remarks about the expense and who'll carry the burden. For comparison, the 6-year budget for Horizon Europe (which funds, as a tiny part of its portfolio, open source projects like PeerTube and the Eclipse Foundation) is 93.5 bio. €, and the EU Next Generation Internet programme has spent 250 mio. € (2018-2020)+62 mio. € (2021-2022)+27 mio. € (2023-2025)=~337 mio. € on funding open-source software.

Another consideration is that this project would need to be finished quickly—potentially less than a year as open weights models catch up and frontier models become more dangerous. So humanity will not be able to wait until the frontier models become cheaper so that it'll be less expensive—as soon as automated vulnerability finding becomes, both attackers and defenders will be in a race to exploit them.

So, a proposal: Whenever someone claims that LLMs will d/acc us out of AI takeover by fixing our infrastructure, they will also have to specify who will pay the costs of setting up this project and running it.

Comments3


Sorted by Click to highlight new comments since:

Not draft amnesty but I'll take it. Yell at me below to get my justification for the variable-values in the Fermi estimate.

While this is a good argument against it indicating governance-by-default (if people are saying that), securing longtermist funding to work with the free software community over this (thus overcoming two of the three hurdles) still seems to be a potentially very cost-effective way to reduce AI risk to look into, particularly combined with differential technological development of AI defensive v. offensive capacities.

That's maybe a more productive way of looking at it! Makes me glad I estimated more than I claimed.

I think governments are probably the best candidate for funding this, or AI companies in cooperation with governments. And it's an intervention which has limited downside and is easy to scale up/down, with the most important software being evaluated first.

Curated and popular this week
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal
titotal
 ·  · 35m read
 · 
None of this article was written with AI assistance. Introduction There have been many, many, many attempts to lay out scenarios of AI taking over or destroying humanity. What they tend to have in common is an assumption that our doom will be sealed as a result of AI becoming significantly smarter and more powerful than the best humans, eclipsing us in skill and power and outplaying us effortlessly. In this article, I’m going to do a twist: I’m going to write a story (and detailed analysis) about a scenario where humanity is disempowered and destroyed by AI that is dumber than us, due to a combination of hype, overconfidence, greed and anti-intellectualism. This is a scenario where instead of AI bringing untold abundance or tiling the universe with paperclips, it brings mediocrity, stagnation, and inequality. This is not a forecast. This story probably won’t happen. But it’s a story that reflects why I am worried about AI, despite being generally dismissive of all those doom stories above. It is accompanied by an extensive, sourced list of present day issues and warning signs that are the source of my fears. This post is divided into 3 parts: Part 1 is my attempt at a plausible sounding science fiction story sketching out this scenario, starting with the decline of a small architecture firm and ending with nuclear Armageddon. In part 2 I will explain, with sources, the real world current day trends that were used as ingredients for the story. In part 3 I will analysise the likelihood of my scenario, why I think it’s very unlikely, but also why it has some clear likelihood advantages over the typical doomsday scenario. The story of Slopworld In the nearish future: When the architecture firm HBM was bought out by the Vastum corporation, they announced that they would fire 99% of their architects and replace them with AI chatbots. The architect-bot they unveiled was incredibly impressive. After uploading your site parameters, all you had to do was chat with