This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
Safety-First Agents/Architectures Are a Promising Path to Safe AGI
2mo
ago
•
Applied to
Concrete open problems in mechanistic interpretability: a technical overview
3mo
ago
•
Applied to
(Intro/1) - My Understandings of Mechanistic Interpretability Notebook
3mo
ago
•
Applied to
Announcing Apollo Research
4mo
ago
•
Applied to
Why and When Interpretability Work is Dangerous
4mo
ago
•
Applied to
Call for Pythia-style foundation model suite for alignment research
5mo
ago
•
Applied to
High-level hopes for AI alignment
5mo
ago
•
Applied to
PhD Position: AI Interpretability in Berlin, Germany
5mo
ago
•
Applied to
If interpretability research goes well, it may get dangerous
6mo
ago
•
Applied to
Join the AI governance and interpretability hackathons!
6mo
ago
•
Applied to
Sentience in Machines - How Do We Test for This Objectively?
6mo
ago
•
Applied to
Against LLM Reductionism
7mo
ago
•
Applied to
Introducing Leap Labs, an AI interpretability startup
7mo
ago
•
Applied to
[MLSN #8]: Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming
7mo
ago
•
Applied to
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
9mo
ago
•
Applied to
Safety of Self-Assembled Neuromorphic Hardware
9mo
ago
•
Applied to
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
9mo
ago
•
Applied to
A Barebones Guide to Mechanistic Interpretability Prerequisites
10mo
ago
•
Applied to
The limited upside of interpretability
1y
ago
•
Applied to
Join the interpretability research hackathon
1y
ago