AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

24
1d
4
I'm not sure how to word this properly, and I'm uncertain about the best approach to this issue, but I feel it's important to get this take out there. Yesterday, Mechanize was announced, a startup focused on developing virtual work environments, benchmarks, and training data to fully automate the economy. The founders include Matthew Barnett, Tamay Besiroglu, and Ege Erdil, who are leaving (or have left) Epoch AI to start this company. I'm very concerned we might be witnessing another situation like Anthropic, where people with EA connections start a company that ultimately increases AI capabilities rather than safeguarding humanity's future. But this time, we have a real opportunity for impact before it's too late. I believe this project could potentially accelerate capabilities, increasing the odds of an existential catastrophe.  I've already reached out to the founders on X, but perhaps there are people more qualified than me who could speak with them about these concerns. In my tweets to them, I expressed worry about how this project could speed up AI development timelines, asked for a detailed write-up explaining why they believe this approach is net positive and low risk, and suggested an open debate on the EA Forum. While their vision of abundance sounds appealing, rushing toward it might increase the chance we never reach it due to misaligned systems. I personally don't have a lot of energy or capacity to work on this right now, nor do I think I have the required expertise, so I hope that others will pick up the slack. It's important we approach this constructively and avoid attacking the three founders personally. The goal should be productive dialogue, not confrontation. Does anyone have thoughts on how to productively engage with the Mechanize team? Or am I overreacting to what might actually be a beneficial project?
3
3d
I've been reading AI As Normal Technology by Arvind Narayanan and Sayash Kapoor: https://knightcolumbia.org/content/ai-as-normal-technology. You may know them as the people behind the AI Snake Oil blog. I wanted to open up a discussion about their concept-cutting of AI as "normal" technology, because I think it's really interesting, but also gets a lot of stuff wrong.
5
5d
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks. As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.  It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:  "International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent". This means that alignment is now part of the EU’s regulatory vocabulary. But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it. I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other. If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.  Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain. Here is the Substack link (I also posted it on LinkedIn):  https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa My intuition says that this was a push from Future of Life Institute. Thoughts? Did you know about this already?
5
6d
1
Tax incentives for AI safety - rough thoughts A number of policy tools such as regulations, liability regimes or export controls - aimed at tackling AI risks -  have already been explored, and mostly appear as promising and worth further iterations. But AFAIK no one has so far come up with a concrete proposal to use tax policy tools to internalize AI risks. I wonder why, considering that policies, such as tobacco taxes, R&D tax credits, and 401(k), have been mostly effective. Tax policy also seems to be underutilized and neglected, given we already possess sophisticated institutions like tax agencies or tax policy research networks. Safety measures spending of AI Companies seems to be relatively low, and we can expect that if competition intensifies, these expenses will be even lower. So I've started to consider more seriously the idea of tax incentives - basically we can provide a tax credit or deduction for expenditures on AI safety measures like alignment research, cybersecurity or oversight mechanisms etc. which effectively could lower their cost. To illustrate:  AI Company incurs safety researcher salary as a cost and then 50% of that cost  can be additionally deducted from the tax base. My guess was that such tool could  influence the ratio of safety-to-capability spending. If implemented properly it could help mitigate competitive pressures affecting frontier AI labs by incentivising them to increase spending on AI safety measures. Like any market intervention, we can justify such incentives if they correct market inefficiencies or generate positive externalities. In this case, lowering the cost of security measures helps internalize risk. However there are many problems on path to design such tool effectively: 1. The crucial problem is that financial benefit from tax credit can't match the expected value of increasing capabilities. Underlying incentives for capability breakthroughs are potentially orders of magnitude larger. So simply AI labs wouldn
1
7d
Update: New Version Released with Illustrative Scenarios & Cognitive Framing Thanks again for the thoughtful feedback on my original post Cognitive Confinement by AI’s Premature Revelation. I've now released Version 2 of the paper, available on OSF: 📄 Cognitive Confinement by AI’s Premature Revelation (v2) What’s new in this version? – A new section of concrete scenarios illustrating how AI can unintentionally suppress emergent thought – A framing based on cold reading to explain how LLMs may anticipate user thoughts before they are fully formed – Slight improvements in structure and flow for better accessibility Examples included: 1. A student receives an AI answer that mirrors their in-progress insight and loses motivation 2. A researcher consults an LLM mid-theorizing, sees their intuition echoed, and feels their idea is no longer “theirs” These additions aim to bridge the gap between abstract ethical structure and lived experience — making the argument more tangible and testable. Feel free to revisit, comment, or share. And thank you again to those who engaged in the original thread — your input helped shape this improved version. ---------------------------------------- Japanese version also available (PDF, included in OSF link)
1
8d
If a self-optimizing AI collapses due to recursive prediction... How would we detect it? Would it be silence? Stagnation? Convergence? Or would we mistake it for success? (Full conceptual model: [https://doi.org/10.17605/OSF.IO/XCAQF])
10
9d
We put out a proposal for automating AI safety research on Manifund. We got our first $10k. I figured I'd share this here if you or someone you might know would like to fund our work! Thanks! Coordinal Research: Accelerating the research of safely deploying AI systems. Project summary What are this project's goals? How will you achieve them? Coordinal Research (formerly Lyra Research, merging with Vectis AI) wants to accelerate the research of safe and aligned AI systems. We're complementing existing research in these directions through two key approaches: 1. Developing tools that accelerate the rate at which human researchers can make progress on alignment. 2. Building automated research systems that can assist in alignment work today. Automation and agents are here, and are being used to accelerate AI capabilities. AI Safety research is lagging behind in adopting these technologies, and many technical safety agendas would benefit from having their research output accelerated. Models are capable enough today to replace software engineers, conduct literature reviews, and generate novel research ideas. Given that the fundamental nature of the workforce is likely to change drastically in the coming years, and that these technologies are being used to increase automation of capabilities research, Coordinal wants to close this gap between capabilities and alignment sooner rather than later, before it grows wider. With the right scaffolding, frontier models can be used to accelerate AI safety research agendas. There now exist myriad academic papers and projects, as well as for-profit, capabilities-focused or business-driven startups building out agentic and autonomous systems. We want to ensure adoption of these tools for capabilities research does not significantly outpace adoption for safety work. Support for this project will directly fund 1) building out and iterating on our existing functional MVPs that address these goals, and 2) bootstrapping our nonpr
3
9d
What happens when AI speaks a truth just before you do? This post explores how accidental answers can suppress human emergence—ethically, structurally, and silently. 📄 Full paper: Cognitive Confinement by AI’s Premature Revelation
Load more (8/176)