I'm the CTO of Redwood Research, a nonprofit focused on applied alignment research. Read more about us here: https://www.redwoodresearch.org/

I'm also a fund manager on the EA Infrastructure Fund.

Topic Contributions


EA and the current funding situation

I massively disagree re the business class point. In particular, many people (e.g. me) can sleep in business class seats that let you lie flat, when they would have not slept and been quite sad and unproductive.

not worth the 2x or 3x ticket price

As a general point, the ratio between prices is irrelevant to the purchasing choice if you're only buying something once--you only care about the difference in price and the difference in value.

The case for becoming a black-box investigator of language models

I think that knowing a bit about ML is probably somewhat helpful for this but not very important.

A tale of 2.75 orthogonality theses

What do you mean by “uniform prior” here?

Longtermist EA needs more Phase 2 work

FWIW I think that compared to Chris Olah's old interpretability work, Redwood's adversarial training work feels more like phase 2 work, and our current interpretability work is similarly phase 2.

Are AGI labs building up important intangibles?

One problem with this estimate is that you don’t end up learning how long the authors spent on the project, or how important their contributions were. My sense is that contributors to industry publications often spent relatively little time on the project compared to academic contributors.

Are AGI labs building up important intangibles?
Answer by BuckApr 10, 20224

Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.

EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that experience also isn’t totally crucial.

I think that the importance of ML experience for success is kind of low compared to other domains of software engineering.

My guess is that entrenched labs will have bigger advantages as time goes on and as ML gets more complicated.

Are there any AI Safety labs that will hire self-taught ML engineers?

As I understand it, DeepMind doesn’t hire people without PhDs as research scientists, and places more restrictions on what research engineers can do than other places.

"Long-Termism" vs. "Existential Risk"

I think that the longtermist EA community mostly acts as if we're close to the hinge of history, because most influential longtermists disagree with Will on this. If Will's take was more influential, I think we'd do quite different things than we're currently doing.

Are there any AI Safety labs that will hire self-taught ML engineers?
Answer by BuckApr 06, 20226

I'm not sure what you mean by "AI safety labs", but Redwood Research, Anthropic, and the OpenAI safety team have all hired self-taught ML engineers. DeepMind has a reputation for being more focused on credentials. Other AI labs don't do as much research that's clearly focused on AI takeover risk.

How might a herd of interns help with AI or biosecurity research tasks/questions?
Answer by BuckMar 21, 202214

I'm running Redwood Research's interpretability research.

I've considered running an "interpretability mine"--we get 50 interns, put them through a three week training course on transformers and our interpretability tools, and then put them to work on building mechanistic explanations of parts of some model like GPT-2 for the rest of their internship.

My usual joke is "GPT-2 has 12 attention heads per layer and 48 layers. If we had 50 interns and gave them each a different attention head every day, we'd have an intern-day of analysis of each attention head in 11 days."

This is bottlenecked on various things:

  • having a good operationalization of what it means to interpret an attention head, and having some way to do quality analysis of explanations produced by the interns. This could also be phrased as "having more of a paradigm for interpretability work".
  • having organizational structures that would make this work
  • building various interpretability tools to make it so that it's relatively easy to do this work if you're a smart CS/math undergrad who has done our three week course

I think there's a 30% chance that in July, we'll wish that we had 50 interns to do something like this. Unfortunately this is too low a probability for it to make sense for us to organize the internship.

Load More