311Joined Mar 2022



Tools for collaborative truth seeking


Topic Contributions

I would hope that good criticism of EA would "make the world better if taken seriously" by improving the EA ecosystem. That said, I do understand your concern-- I hope people will submit good criticism to the journal, and that it will be published!

This is a really great point! Thank you for raising it. I'll see about adding it to future posts.

Thank you for pointing that out! Worth noting  that's a limit on the videos you can have stored on their servers at once; if you want to download & delete them from the servers you can record as many as you like.

These look great, thanks for suggesting them! Would you be interested in writing tutorials for some/all of them that I could add to the sequence? If not, I think updating the topic page with links to tutorials you think are good would also be great!

The tool is here, there'll also be a post in a few hours but it's pretty self-explanatory

Any feedback you have as we go would be much appreciated! I've focussed on broadening use, so I'm hoping a good chunk of the value will be in new ways to use the tools as much as anything else-- if you have any ways you think are missing they would also be great!

Thanks for making this! I also feel like I get a lot of value out of quarterly/yearly reviews, and this looks like a nice prompting tool. If you haven't seen it already, you might like to look at Pete Slattery's year-review question list too!

I think this is one reasonable avenue to explore alignment, but I don't want everybody doing it. 

My impression is that AI researchers exist on a spectrum from only doing empirical work (of the kind you describe) to only doing theoretical work (like Agent Foundations), and most fall in the middle, doing some theory to figure out what kind of experiment to run, and using empirical data to improve their theories (a lot of science looks like this!).

I think all (or even a majority of) AI safety researchers moving to doing empirical work on current AI systems is unwise, for two reasons:

  1. Bigger models have bigger problems
    1. Lessons learned from current misalignment may be necessary for aligning future models, but will certainly not be sufficient. For instance, GPT-3 will (we assume) never demonstrate deceptive alignment, because its model of the world is not broad enough to do so, but more complex AIs may do. 
    2. This is particularly worrying because we may only get one shot at spotting deceptive alignment! Thinking about problems in this class before we have direct access to models that could, even in theory, exhibit these problems seems both mandatory and a key reason alignment seems hard to me.
  2. AI researchers are sub-specialised. 
    1. Many current researchers working in non-technical alignment, while they presumably have a decent technical background, are not cutting-edge ML engineers. There's not a 1:1 skill translation from 'current alignment researcher' to 'GPT-3 alignment researcher'
    2. There is maybe some claim here that you could save money on current alignment researchers and fund a whole bunch of GPT alignment researchers, but I expect the exchange rate is pretty poor, or it's just not possible in the medium term to find sufficient people with a deep understanding of both ML and alignment.

The first one is the biggy. I can imagine this approach working (perhaps inefficiently) in a world were (1) were false and (2) were true, but I can't imagine this approach working in any worlds where (1) holds.

I agree that publishing results of the form "it turns out that X can be done, though we won't say how we did it" is clearly better than publishing your full results, but I think it's much more harmful than publishing nothing in a world where other people are still doing capabilities research. 

This is because it seems to me that knowing something is possible is often a first step to understanding how. This is especially true if you have any understanding of where this researcher or organisation were looking before publishing this result. 


I also think there are worlds where it's importantly harmful to too openly critique capabilities research, but I lean towards not thinking we are in this world, and think the tone of this post is a pretty good model for how this should look going forwards. +1!

Load more