Visiting Researcher @ MIT Media Lab, Nucleic Acid Observatory
449 karmaJoined Mar 2020Pursuing a doctoral degree (e.g. PhD)Basel, Switzerland


Physician and visiting researcher at Kevin Esvelt's research group Sculpting Evolution. Thinking about ways to safeguard the world from bio. Apart from that, I have an assortment of more random EA interests, e.g., how to create better universities, how to make Germany a greater force for technological and social progress, or how to increase high-skilled immigration in the US.


What’s the lore behind that update? This was before I followed EA community stuff

Thanks for writing this up, I was skeptical about Scott‘s strong take but didn’t take the time to check the links he provided as proof.

That's a good pointer, thanks! I'll drop the reference to Diggans and Leproust for now.


Thanks for the write-up. Just adding a note on how this distinction has practical implications for how to design databases containing hazardous sequences that are required for gene synthesis screening systems.

With gene synthesis screening, companies want to stop bad actors from getting access to the physical DNA or RNA of potential pandemic pathogens. Now, let's say researchers find the sequence of a novel pathogen that would likely spark a pandemic if released. Most would want this sequence to be added to synthesis screening databases. But some also want this database to be public. The information hazards involved in making such information publicly available could be large, especially if there is attached discussion of how exactly these sequences are dangerous.

I skimmed it, and it looks good to me. Thanks for the work! A separate post on this would be cool.

I set a reminder! Also, let me know if you do end up updating it.

Is there an updated version of this? E.g., GDP numbers have changed.

Flagging that I approve this post; I do believe that the relevant biosecurity actors within EA are thinking about this (though I'd love a more public write-up of this topic). Get in touch if you are thinking about this!

I'm excited that more people are looking into this area!

Flagging that I only read the intro and the conclusion, which might mean I missed something. 

High-skilled immigration

From my current understanding, high-skilled immigration reform seems promising not so much because of the effects on the migrants (though they are positive) but mostly due to the effect on the destination country's GDP and technological progress. The latter has sizeable positive spillover effects (that also accrue to poorer countries).

Advocacy for high-skilled immigration is less controversial and thus easier, which could make interventions in this area more valuable when compared to general immigration reform.

Then again, for the reasons above, more individuals are likely already working on improved high-skilled immigration. 


Also, have you chatted with Johannes Haushofer? He knows EA and recently started Malengo, which wants to facilitate educational migration from low-income countries. I'd assume he has thought about these topics a bunch.

Comment by Paul Christiano on Lesswrong:


""RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.""

These three links are:

  • The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing. I asked on the post but did not get a response.
  • The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model.  I'm not exactly sure what claims you are citing, but I think you are making some really wild leaps.
  • The third is Compendium of problems with RLHF, which primarily links to the previous 2 failures and then discusses theoretical limitations.

I think these are bad citations for the claim that methods are "not working well" or that current evidence points towards trouble.

The current problems you list---"unhelpful, untruthful, and inconsistent"---don't seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it's misleading to say this is what was "theorized" in the past.

I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn't engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.

Load more