This is an anonymous account.
Hi Larks, thanks for the pushback here. We agree that this is hard to judge. Unfortunately, some of what this was was about the general atmosphere of the place which is unfortunately a bit fuzzy.
People said they feel a pressure conform / defer to these people as well for example at lunchtime conversations. People have also said they can't act as free or as loose as they would like in Constellation. So it's maybe something like feeling like you have to behave in a certain way or in line with what you perceive the funders and senior leadership want in order to fit in.
Although this may be present in other offices, we think this pressure is more pronounce at Constellation than other coworking spaces like the Open Phil offices or Lightcone, where we think there is more of an ability to say and do what you want.
We know this probably isn't as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this.
Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.
We think it's plausible that Redwood's total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it's difficult for us to judge without further information whether these insights would be worth staff time to write up.
The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date -- causal scrubbing -- are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic's highest-profile results were never published in a peer-reviewed venue.
Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader's to judge, reproduce and build upon. If, for example, we compare Neel's original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization's flagship projects.
(written in first person because one post author wrote it)
As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I'm not claiming no such ways exist)
I think this is the area we disagree on the most. Examples of other ideas:
1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them -- Christopher Pott and Martin Watternberg -- get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.
2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood's lights) are less important problems, and redirect them to more important problems. Not everyone would want to be "redirected", but there's a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.
3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.
I wouldn't confidently claim that any of these approaches would necessarily best Redwood, but there's a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood's high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.
I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don't think most work is very relevant. I think it's a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it's probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I'm pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.
I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I'd argue there's considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I'd be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn't happened.
I agree that getting from the stage of "scrappy preprint / blog post that your close collaborators can understand" to "peer-reviewed publication" can be 10-20% of a project's time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work -- no more than 5% of the overall project time. I'm curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.
Meta note: We believe this response is the 80/20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he's either paired with a good empirical ML researcher or gains more experience there himself (he's already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.
Some quick thoughts from writing the critique post (from the perspective of the main contributor / writer w/o a TAIS background)