All of Nate Thomas's Comments + Replies

Re your point about "building an institution" and step 3: We think the majority of our expected value comes from futures in which we produce more research value per dollar than in the past.

(Also, just wanted to note again that $20M isn't the right number to use here, since around 1/3rd of that funding is for running Constellation, as mentioned in the post.)

-3
NunoSempere
1y
Cheers
7
Omega
1y
Thanks for mentioning the $20M point Nate - I've edited the post to make this a little more clear and would suggest people use $14M as the number instead.

Thanks for the comment Dan. I agree that the adversarially mined examples literature is the right reference class, of which the two that you mention (Meta’s Dynabench and ANLI) were the main examples (maybe the only examples? I forget) while we were working on this project. 

I’ll note that Meta’s Dynabench sentiment model (the only model of theirs that I interacted with) seemed substantially less robust than Redwood’s classifier (e.g. I was able to defeat it manually in about 10 minutes of messing around, whereas I needed the tools we made to defeat the Redwood model).

8
Dan H
1y
I think the adversarial mining thing was hot in 2019. IIRC, Hellaswag and others did it; I'd venture maybe 100 papers did it before RR, but I still think it was underexplored at the time and I'm happy RR investigated it.

Thanks to the authors for taking the time to think about how to improve our organization and the field of AI takeover prevention as a whole. I share a lot of the concerns mentioned in this post, and I’ve been spending a lot of my attention trying to improve some of them (though I also have important disagreements with parts of the post).

Here’s some information that perhaps supports some of the points made in the post and adds texture, since it seems hard to properly critique a small organization without a lot of context and inside information. (This is ada... (read more)

I know nothing about this organisation, and very little about this field, but this is an impressively humble and open response from a leader of an org in the face of a very critical article. No comment on content, but I appreciate the approach @Nate Thomas 

Phib
1y46
18
1

Ditto pseudonym, I recognize from another comment that there is an upcoming Constellation post from the original poster and a more effortful response forthcoming there, but I still think that despite receiving this piece in advance I am kind of surprised the following were not responded to?

  • Lack of Senior ML Research Staff
  • Lack of Comm... w/ ML Community
  • Conflicts of interest with funders

I guess people are busy and this is not a priority - seems like people are mostly thinking about Underwhelming Research Output (and Nate himself seems to say as much here)

Hi Nate, can you comment a bit more about this section?

We’ve heard multiple cases of people being fired after something negative happens in their life (personal, conflict at work, etc) that causes them to be temporarily less productive at work. While Redwood management have made some efforts to offer support to staff (e.g. offering unpaid leave on some occasions), we believe it may not have been done consistently, and are aware of cases where termination happened with little warning.

I feel like this would be among the more negative updates I would make abo... (read more)

(I’ll use this comment to also discuss some aspects of some other questions that have been asked.)

I think there are currently something like three categories of bottlenecks on alignment research:

  1. Having many tractable projects to work on that we expect will help (this may be limited by theoretical understanding / lack of end-to-end alignment solution)
  2. Institutional structures that make it easy to coordinate to work on alignment
  3. People who will attack the problem if they’re given some good institutional framework 

Regarding 1 (“tractable projects / theoret... (read more)