AI Safety Researcher @ Independent Researcher
Working (6-15 years of experience)
584Joined Aug 2017


I work primarily on AI Alignment. My main direction at the moment is to accelerate alignment work via language models and interpretability.


The pay difference between working in industry and doing a PhD was a big factor for me to avoid getting a PhD a few years ago.

These days it still plays a role, though as an independent researcher I’d like to connect with more academics so that I can get better at doing research with more rigour and publish more papers. Avoiding the PhD has made this hard and I kind of have to have a lot more initiative to develop these skills that PhD students typically develop. That said, being able to selectively learn the skills that are actually useful for solving alignment is worth the tradeoff.

EDIT: Oh, and the lower level of prestige/credibility I have (from not doing a PhD) may get in the way of some of my plans so I’m trying to be creative about how to gain that prestige without having to do a PhD.

When I say "true," I simply mean that it is inevitable that these things are possible by some future AI system, but people have so many different definitions of AGI they could be calling GPT-3 some form of weak AGI and, therefore incapable of doing the things I described. I don't particularly care about "true" or "fake" AGI definitions, but just want to point out that the things I described are inevitable, and we are really not so far away (already) from the scenario I described above, whether you call this future system AGI or pre-AGI.

Situational awareness is simply a useful thing for a model to learn, so it will learn it. It is much better at modelling the world and carrying out tasks if it knows it is an AI and what it is able to do as an AI.

Current models can already write basic programs on their own and can in fact write entire AI architecture with minimal human input.

A "true" AGI will have situational awareness and knows its weights were created with the help of code, eventually knows its training setup (and how to improve it), and also knows how to rewrite its code. These models can already write code quite well; it's only a matter of time before you can ask a language model to create a variety of architectures and training runs based on what it thinks will lead to a better model (all before "true AGI" IMO). It just may take it a bit longer to understand what each of its individual weights do and will have to rely on coming up with ideas by only having access to every paper/post in existence to improve itself as well as a bunch of GPUs to run experiments on itself. Oh, and it has the ability to do interpretability to inspect itself much more precisely than any human can.

Since I expect some people to be a bit confused as to what exactly was the bad thing that has happened after reading this post, I think it would be great if the community health team could write a post explaining and pointing out exactly what was bad here and in other similar instances.

I think there is value in being crystal clear about what were the bad things that happened because I expect people will takeaway different things from this post.

I honestly didn’t know how to talk about it either, but wanted to point at general vibes I was getting. While I’m still confused about what‘s the issue exactly, contrary to my initial comment, I don’t really think polyamory within the community is a problem anymore. Not because of Arepo’s comment specifically, but because there are healthy ways to do polyamory just like other forms of relationships. It’s something that I thought was true before writing the comment, but was a bit confused about the whole mixing of career and “free love” with everyone in the community.

Maybe only talking about “free love” mixed with power dynamics and whatever else would have been better. I don’t know really know. Maybe I shouldn’t have said anything as someone confused about all this, but still wanting to help. I felt it was the kind of thing that a lot of people were thinking, but not saying it out loud.

That said, I think Sonia’s video cleared up some things a bit for me. It points to the larger amounts of “hacker houses”, networking, sex, and money in the Bay Area. She also points to polyamory not being the problem. However, she says while those things shape the structure of the problem, it’s power dynamics that ends up being the main root issue. It sounds to me like she is pointing to people will sometimes try to become polyamorous with others by abusing power dynamics (even though this is not inherent to most polyamorous relationships at all). Are power dynamics the whole story? I don’t know.

Note that a lot of people seemed to agree with my initial comment. I’m not sure what to make of that.

People have some strong opinions about things like polyamory, but I figured I’d still voice my concern as someone who has been in EA since 2015, but has mostly only interacted with the community online (aside from 2 months in the Bay and 2 in London):

I have nothing against polyamory, but polyamory within the community gives me bad vibes. And the mixing of work and fun seems to go much further than I think it should. It feels like there’s an aspect of “free love” and I am a little concerned about doing cuddle puddles with career colleagues. I feel like all these dynamics lead to weird behaviour people do not want to acknowledge.

I repeat, I am not against polyamory, but I personally do not expect some of this bad behaviour would happen as much if in a monogamous setting since I expect there would be less sliding into sexual actions.

I’ve avoided saying this because I did not want to criticize people for being polyamorous and expected a lot would disagree with me and it not leading to anything. But I do think the “free love” nature of polyamory with career colleagues opens the door for things we might not want.

Whatever it is (poly within the community might not be part of the issue at all!), I feel like there needs to be a conversation about work and play (that people seem to be avoiding).

Consider using Conjecture’s new Verbalize (https://lemm.ai/home) STT tool for transcriptions! They’ll be adding some LLM features on top of it, and I expect it to have some cool features in coming out this year.

I’ve been also pushing (for a while) for more people within EA to start thinking of ways to apply LLMs to our work. After ChatGPT, some people started saying similar stuff so I’m glad people are starting to see the opportunity.

Are there any actual rigorous calculations on this? It's hard for me to believe someone making $2M/year and donating $1M/year (to AI Safety or top GW charities) would have less counterfactual impact than someone working at CEA.

Edit: Let's say you are donating $1M/year to AI Safety, that might be about enough to cover the salary for about 9 independent alignment researchers. Though, those 9 researchers might not be yet comparable to top-level researchers who would get funding regardless. So, it would probably end up as additional funding for getting more young people in the field (and give them at least a years worth of funding). And I guess there are some other potentially valuable things like becoming a public figure. In this case, you'd have to estimate that the value you bring to CEA is worth more than that.

Thanks for doing this!

In terms of feedback: the most annoying thing so far is that as soon as you click on any grant, going 'back' to the previous page puts you back at a completely fresh search. You can't click to open up new tabs either.

I want to say that I appreciate posts like this by parents in the community. I'm an alignment researcher and given how fast things are moving, I do worry that I'm under-weighting the amount of impact I could lose in the next 10 years if I have kids. I feel like 'short timelines' make my decision harder even though I'm convinced I want kids in 5 or so years from now.

Some considerations I've been having lately:

  • Should I move far away from my parents, which would make it harder to depend on someone for childcare on the weekends and evenings? Will we be close to my future wife's parents?
  • Should I be putting in some time to make additional income I can eventually use to make my life easier in 5 years? Maybe it's easier for me to do so now before AGI crunch time?
  • The all-encompassing nature of AGI makes things like the share of household work a potential issue for a couple of years. I feel bad for thinking that I may have to ask my future wife if I can reduce housework in those couple of years of crunch time (let's say 2 years max). It feels selfish... Ultimately, this will just be a decision my future wife and I will have to make. I do want to do at least 50% of the housework outside of the crunch time.
    • It particularly feels bizarre in the context of some wild AGI thing we aren't even confident about how it will go. But like, if someone is the CEO of a startup, it feels more reasonable for their partner to take up additional housework if things get intense for a while. Or maybe a better example is that a pandemic is starting and one of the parents is head of some bio-risk org, I would find it odd if they tried to keep the household dynamic the same throughout the crucial time to limit the impact of the pandemic?
    • Overall I'm trying to be a good future husband and stuff like this weighs on me and I don't want to make the decision in some terrible and naive way like "my career is more important than yours." :/
Load more