This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
Solving adversarial attacks in computer vision as a baby version of general AI alignment
1mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
3mo
ago
•
Applied to
Rational Animations' intro to mechanistic interpretability
4mo
ago
•
Applied to
ML4Good Brasil - Applications Open
5mo
ago
•
Applied to
A Selection of Randomly Selected SAE Features
6mo
ago
•
Applied to
AI alignment as a translation problem
8mo
ago
•
Applied to
ML4Good UK - Applications Open
9mo
ago
•
Applied to
Assessment of AI safety agendas: think about the downside risk
10mo
ago
•
Applied to
Public Call for Interest in Mathematical Alignment
10mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): call for applicants
11mo
ago
•
Applied to
Announcing Timaeus
1y
ago
•
Applied to
Don't Dismiss Simple Alignment Approaches
1y
ago
•
Applied to
Safety-First Agents/Architectures Are a Promising Path to Safe AGI
1y
ago
•
Applied to
Concrete open problems in mechanistic interpretability: a technical overview
1y
ago
•
Applied to
Announcing Apollo Research
1y
ago
•
Applied to
Why and When Interpretability Work is Dangerous
1y
ago
•
Applied to
Call for Pythia-style foundation model suite for alignment research
1y
ago
•
Applied to
High-level hopes for AI alignment
1y
ago