This is a linkpost for https://www.lesswrong.com/posts/Q44QjdtKtSoqRKgRe/introducing-leap-labs-an-ai-interpretability-startup
We are thrilled to introduce Leap Labs, an AI startup. We’re building a universal interpretability engine.
We design robust interpretability methods with a model-agnostic mindset. These methods in concert form our end-to-end interpretability engine. This engine takes in a model, or ideally a model and its training dataset (or some representative portion thereof), and returns human-parseable explanations of what the model ‘knows’.
- Reproducible and generalisable approaches win. Interpretability algorithms should produce consistent outputs regardless of any random initialisation. Future-proof methods make minimal assumptions about model architectures and data types. We’re building interpretability for next year’s models.
- Relatedly, heuristics aren’t enough. Hyperparameters should always be theoretically motivated. It’s not enough that some method or configuration works well in practice. (Or, even worse, that it’s tweaked to get a result that looks sensible to humans.) We find out why.
- We must grow interpretability and AI safety in the real world. Leap is a for-profit company incorporated in the US, and the plan is to scale quickly, and to hire and upskill researchers and engineers – we need more meaningful jobs for AI alignment researchers to make progress, nearly as much as we need the researchers themselves.
- Slow potentially dangerous broad domain systems. Public red-teaming is a means of change. Robust interpretability methods make discovering failure modes easier. We demonstrate the fragility of powerful and opaque systems, and push for caution.
- Speed potentially transformative narrow domain systems. AI for scientific progress is an important side quest. Interpretability is the backbone of knowledge discovery with deep learning, and has huge potential to advance basic science by making legible the complex patterns that machine learning models identify in huge datasets.
- Regulation is coming – let’s use it. We predict that governments and companies will begin to regulate and audit powerful models more explicitly, at very least from a bias-prevention viewpoint. We want to make sure that these regulations actually make models safer, and that audits are grounded in (our) state-of-the-art interpretability work.
- Interpretability as standard. Robust interpretability, failure mode identification and knowledge discovery should be a default part of all AI development. Ultimately, we will put a safety-focussed interpretability system in the pipeline of every leading AI lab.
We are currently seeking funding/investment. Contact us here.