AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

37
6d
5
I'm not sure how to word this properly, and I'm uncertain about the best approach to this issue, but I feel it's important to get this take out there. Yesterday, Mechanize was announced, a startup focused on developing virtual work environments, benchmarks, and training data to fully automate the economy. The founders include Matthew Barnett, Tamay Besiroglu, and Ege Erdil, who are leaving (or have left) Epoch AI to start this company. I'm very concerned we might be witnessing another situation like Anthropic, where people with EA connections start a company that ultimately increases AI capabilities rather than safeguarding humanity's future. But this time, we have a real opportunity for impact before it's too late. I believe this project could potentially accelerate capabilities, increasing the odds of an existential catastrophe.  I've already reached out to the founders on X, but perhaps there are people more qualified than me who could speak with them about these concerns. In my tweets to them, I expressed worry about how this project could speed up AI development timelines, asked for a detailed write-up explaining why they believe this approach is net positive and low risk, and suggested an open debate on the EA Forum. While their vision of abundance sounds appealing, rushing toward it might increase the chance we never reach it due to misaligned systems. I personally don't have a lot of energy or capacity to work on this right now, nor do I think I have the required expertise, so I hope that others will pick up the slack. It's important we approach this constructively and avoid attacking the three founders personally. The goal should be productive dialogue, not confrontation. Does anyone have thoughts on how to productively engage with the Mechanize team? Or am I overreacting to what might actually be a beneficial project?
31
18d
16
The current US administration is attempting an authoritarian takeover. This takes years and might not be successful. My manifold question puts an attempt to seize power if they lose legitimate elections at 30% (n=37). I put it much higher.[1] Not only is this concerning by itself, this also incentivizes them to achieve a strategic decisive advantage via superintelligence over pro-democracy factions. As a consequence, they may be willing to rush and cut corners on safety.  Crucially, this relies on them believing superintelligence can be achieved before a transfer of power. I don't know how much the belief in superintelligence has spread into the administration. I don't think Trump is 'AGI-pilled' yet, but maybe JD Vance is? He made an accelerationist speech. Making them more AGI-pilled and advocating for nationalization (like Ashenbrenner did last year) could be very dangerous. 1. ^ So far, my pessimism about US Democracy has put me in #2 on the Manifold topic, with a big lead over other traders. I'm not a Superforecaster though.
80
3mo
1
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt. Write a letter to the CA and DE Attorneys General I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them. Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction. Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
64
3mo
5
Notes on some of my AI-related confusions[1] It’s hard for me to get a sense for stuff like “how quickly are we moving towards the kind of AI that I’m really worried about?” I think this stems partly from (1) a conflation of different types of “crazy powerful AI”, and (2) the way that benchmarks and other measures of “AI progress” de-couple from actual progress towards the relevant things. Trying to represent these things graphically helps me orient/think.  First, it seems useful to distinguish the breadth or generality of state-of-the-art AI models and how able they are on some relevant capabilities. Once I separate these out, I can plot roughly where some definitions of "crazy powerful AI" apparently lie on these axes:  (I think there are too many definitions of "AGI" at this point. Many people would make that area much narrower, but possibly in different ways.) Visualizing things this way also makes it easier for me[2] to ask: Where do various threat models kick in? Where do we get “transformative” effects? (Where does “TAI” lie?) Another question that I keep thinking about is something like: “what are key narrow (sets of) capabilities such that the risks from models grow ~linearly as they improve on those capabilities?” Or maybe “What is the narrowest set of capabilities for which we capture basically all the relevant info by turning the axes above into something like ‘average ability on that set’ and ‘coverage of those abilities’, and then plotting how risk changes as we move the frontier?” The most plausible sets of abilities like this might be something like:  * Everything necessary for AI R&D[3] * Long-horizon planning and technical skills? If I try the former, how does risk from different AI systems change?  And we could try drawing some curves that represent our  guesses about how the risk changes as we make progress on a narrow set of AI capabilities on the x-axis. This is very hard; I worry that companies focus on benchmarks in ways that
61
3mo
4
Holden Karnofsky has joined Anthropic (LinkedIn profile). I haven't been able to find more information.
10
13d
We put out a proposal for automating AI safety research on Manifund. We got our first $10k. I figured I'd share this here if you or someone you might know would like to fund our work! Thanks! Coordinal Research: Accelerating the research of safely deploying AI systems. Project summary What are this project's goals? How will you achieve them? Coordinal Research (formerly Lyra Research, merging with Vectis AI) wants to accelerate the research of safe and aligned AI systems. We're complementing existing research in these directions through two key approaches: 1. Developing tools that accelerate the rate at which human researchers can make progress on alignment. 2. Building automated research systems that can assist in alignment work today. Automation and agents are here, and are being used to accelerate AI capabilities. AI Safety research is lagging behind in adopting these technologies, and many technical safety agendas would benefit from having their research output accelerated. Models are capable enough today to replace software engineers, conduct literature reviews, and generate novel research ideas. Given that the fundamental nature of the workforce is likely to change drastically in the coming years, and that these technologies are being used to increase automation of capabilities research, Coordinal wants to close this gap between capabilities and alignment sooner rather than later, before it grows wider. With the right scaffolding, frontier models can be used to accelerate AI safety research agendas. There now exist myriad academic papers and projects, as well as for-profit, capabilities-focused or business-driven startups building out agentic and autonomous systems. We want to ensure adoption of these tools for capabilities research does not significantly outpace adoption for safety work. Support for this project will directly fund 1) building out and iterating on our existing functional MVPs that address these goals, and 2) bootstrapping our nonpr
31
2mo
If you can get a better score than our human subjects did on any of METR's RE-Bench evals, send it to me and we will fly you out for an onsite interview Caveats: 1. you're employable (we can sponsor visas from most but not all countries) 2. use same hardware 3. honor system that you didn't take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking (Crossposted from twitter.)
10
15d
Microsoft continue to pull back on their data centre plans, in a trend that’s been going on for the past few months, since before the tariff crash (Archive). Frankly, the economics of this seem complex (the article mentions it’s cheaper to build data centres slowly, if you can), so I’m not super sure how to interpret this, beyond that this probably rules out the most aggressive timelines. I’m thinking about it like this: * Sam Altman and other AI leaders are talking about AGI 2027, at which point every dollar spent on compute yields more than a dollar of revenue, with essentially no limits * Their models are requiring exponentially more compute for training (ex. Grok 3, GPT-5) and inference (ex. o3), but producing… idk, models that don’t seem to be exponentially better? * Regardless of the breakdown in relationship between Microsoft and OpenAI, OpenAI can’t lie about their short- and medium-term compute projections, because Microsoft have to fulfil that demand * Even in the long term, Microsoft are on Stargate, so still have to be privy to OpenAI’s projections even if they’re not exclusively fulfilling them * Until a few days ago, Microsoft’s investors were spectacularly rewarding them for going all in on AI, so there’s little investor pressure to be cautious So if Microsoft, who should know the trajectory of AI compute better than anyone, are ruling out the most aggressive scaling scenarios, what do/did they know that contradicts AGI by 2027?
Load more (8/176)