446 karmaJoined Feb 2023


This is an anonymous account.


Sorted by New
Omega's Shortform
· 2mo ago · 1m read


Critiques of Prominent AI Safety Labs



Some quick thoughts from writing the critique post (from the perspective of the main contributor / writer w/o a TAIS background)

  • If you're a non-subject matter expert (SME) who can write, but who knows that other SME's have good/thoughtful critiques, I think it's worth sitting down with them and helping them write it. Often SME's lack the time and energy to write a critique. I think not being a SME gave me a bit of an outsider's perspective and I think I pushed back more on pieces because they weren't obvious to non-technical people, which I think made some of the technical critiques more specific. 
  • Overall, we are all really happy with the response this post has gotten, the quality of critiques / comments, and the impact it seems to be making in relevant circles. I would be happy to give feedback on others' critiques, if they share similar goals (improving information asymmetry, genuinely truth seeking). 
  • Writing anonymously has made this post better quality because I feel less ego / attached to the critiques we made, and feel like i can be more in truth seeking mode rather than worrying about protecting my status / reputation. On the flipside, we put a lot of effort into this post and i feel sad that this won't be recognized, because i'm proud of this work.  
  • Things we will change in future posts (keen to get feedback on this!) 
    • We will have a section which states our bottom-line opinions very explicitly and clearly (e.g. org X should receive less funding, we don't recommend people work at org Y) and then cite which reasons we think support each critique. I think a handful of comments raised points that we had thought about, but weren't made clear on the page. I feel a little hesitatnt to not say the bottom-line view because I worry people will think we are being overly negative, but I think if we can communicate our uncertainties and cavesat them, it could be okay. 
    • There were several contributors to this post. I think (partly due to being busy, time constraints and not wanting to delay publishing or be bottlenecked on a contributor getting back to me) I didn't scrutinize some contributions as thoroughly as I should have prior to publishing. I will aim to reduce that in future posts. 
    • I will be sharing all future drafts with 5-10 other SME reviewers (both people we think would agree & disagree with us) prior to publication, because I think a the comments on this post improved it substantialy. 
    • (minor) I would add a little more context on the flavor of feedback we are aiming to get from the org we are critiquing

Hi Larks, thanks for the pushback here. We agree that this is hard to judge. Unfortunately, some of what this was was about the general atmosphere of the place which is unfortunately a bit fuzzy.

People said they feel a pressure conform / defer to these people as well for example at lunchtime conversations. People have also said they can't act as free or as loose as they would like in Constellation. So it's maybe something like feeling like you have to behave in a certain way or in line with what you perceive the funders and senior leadership want in order to fit in.

Although this may be present in other offices, we think this pressure is more pronounce at Constellation than other coworking spaces like the Open Phil offices or Lightcone, where we think there is more of an ability to say and do what you want.

We know this probably isn't as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this.


Yep that's right. This is probably an underestimate, but we would need to spend some time figuring it out. We've spent at least 10 hours replying to cc


Hi Joseph, that quote is meant to be facetious. The scientist who originally said the quote was trying to encourage the opposite to his students - that researching before experimenting can save them time.


Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.

We think it's plausible that Redwood's total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it's difficult for us to judge without further information whether these insights would be worth staff time to write up.

The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date -- causal scrubbing -- are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic's highest-profile results were never published in a peer-reviewed venue.

Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader's to judge, reproduce and build upon. If, for example, we compare Neel's original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization's flagship projects.


(written in first person because one post author wrote it) 

As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I'm not claiming no such ways exist)

I think this is the area we disagree on the most. Examples of other ideas:

  1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them -- Christopher Pott and Martin Watternberg -- get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.

  2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood's lights) are less important problems, and redirect them to more important problems. Not everyone would want to be "redirected", but there's a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.

  3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.

I wouldn't confidently claim that any of these approaches would necessarily best Redwood, but there's a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood's high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.

I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don't think most work is very relevant. I think it's a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it's probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I'm pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.

I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I'd argue there's considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I'd be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn't happened.

I agree that getting from the stage of "scrappy preprint / blog post that your close collaborators can understand" to "peer-reviewed publication" can be 10-20% of a project's time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work -- no more than 5% of the overall project time. I'm curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.


Meta note: We believe this response is the 80/20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later. 

We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly. 

Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)

The key reasons we think 1) are:

  • If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
  • We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a "funding overhang". Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part. 

Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities 

The key reasons we think 2) are: 

  • There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell's groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).

    A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.
  • We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.

My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he's either paired with a good empirical ML researcher or gains more experience there himself (he's already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.


Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result. 

Load more