This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
A Selection of Randomly Selected SAE Features
25d
ago
•
Applied to
AI alignment as a translation problem
3mo
ago
•
Applied to
ML4Good UK - Applications Open
4mo
ago
•
Applied to
Assessment of AI safety agendas: think about the downside risk
4mo
ago
•
Applied to
Public Call for Interest in Mathematical Alignment
5mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): call for applicants
6mo
ago
•
Applied to
Announcing Timaeus
6mo
ago
•
Applied to
Don't Dismiss Simple Alignment Approaches
6mo
ago
•
Applied to
Safety-First Agents/Architectures Are a Promising Path to Safe AGI
9mo
ago
•
Applied to
Concrete open problems in mechanistic interpretability: a technical overview
10mo
ago
•
Applied to
(Intro/1) - My Understandings of Mechanistic Interpretability Notebook
10mo
ago
•
Applied to
Announcing Apollo Research
11mo
ago
•
Applied to
Why and When Interpretability Work is Dangerous
11mo
ago
•
Applied to
Call for Pythia-style foundation model suite for alignment research
1y
ago
•
Applied to
High-level hopes for AI alignment
1y
ago
•
Applied to
PhD Position: AI Interpretability in Berlin, Germany
1y
ago
•
Applied to
If interpretability research goes well, it may get dangerous
1y
ago
•
Applied to
Join the AI governance and interpretability hackathons!
1y
ago
•
Applied to
Sentience in Machines - How Do We Test for This Objectively?
1y
ago