I recently spent some time reflecting on my career and my life, for a few reasons:
- It was my 29th birthday, an occasion which felt like a particularly natural time to think through what I wanted to accomplish over the course of the next year 🙂.
- It seems like AI progress is heating up.
- It felt like a good time to reflect on how Redwood has been going, because we’ve been having conversations with funders about getting more funding.
I wanted to have a better answer to these questions:
- What’s the default trajectory that I should plan for my career to follow? And what does this imply for what I should be doing right now?
- How much urgency should I feel in my life?
- How hard should I work?
- How much should I be trying to do the most valuable-seeming thing, vs engaging in more playful exploration and learning?
In summary:
- For the purposes of planning my life, I'm going to act as if there are four years before AGI development progresses enough that I should substantially change what I'm doing with my time, and then there are three years after that before AI has transformed the world unrecognizably.
- I'm going to treat this phase of my career with the urgency of a college freshman looking at their undergrad degree--every month is 2% of their degree, which is a nontrivial fraction, but they should also feel like they have a substantial amount of space to grow and explore.
The AI midgame
I want to split the AI timeline into the following categories.
- The early game, during which interest in AI is not mainstream. I think this ended within the last year 😢
- The midgame, during which interest in AI is mainstream but before AGI is imminent. During the midgame:
- The AI companies are building AIs that they don’t expect will be transformative.
- The alignment work we do is largely practice for alignment work later, rather than an attempt to build AIs that we can get useful cognitive labor from without them staging coups.
- For the purpose of planning my life, I’m going to imagine this as lasting four more years. This is shorter than my median estimate of how long this phase will actually last.
- The endgame, during which AI companies conceive of themselves as actively building models that will imminently be transformative, and that pose existential takeover risk.
- During the endgame, I think that we shouldn’t count on having time to develop fundamentally new alignment insights or techniques (except maybe if AIs do most of the work? Idt we should count on this); we should be planning to mostly just execute on alignment techniques that involve ingredients that seem immediately applicable.
- For the purpose of planning my life, I’m going to imagine this as lasting three years. This is about as long as I expect this phase to actually take.
I think this division matters because several aspects of my current work seem like they’re optimized for midgame, and I should plausibly do something very differently in the endgame. Features of my current life that should plausibly change in the endgame:
- I'm doing blue-sky alignment research into novel alignment techniques–during the endgame, it might be too late to do this.
- I'm working at an independent alignment org and not interacting with labs that much. During the endgame, I probably either want to be working at a lab or doing something else that involves interacting with labs a lot. (I feel pretty uncertain about whether Redwood should dissolve during the AI endgame.)
- I spend a lot of my time constructing alignment cases that I think analogous to difficulties that we expect to face later. During the endgame, you probably have access to the strategy “observe/construct alignment cases that are obviously scary in the models you have”, which seems like it partially obseletes this workflow.
- Doing research that is practice rather than an actual attempt at aligning models or safely extracting cognitive labor from them. Some of the work I expect to want takeover-concerned people do during the endgame is probably very practical/empirical. But I expect us to also want to do some difficult-to-empirically-ground work to answer questions like “How could this particular model be scarily misaligned? How might our alignment strategy have failed such that this particular model will try to kill us?”
One core question here is: How is my impact distributed between work I do in the midgame vs the endgame? (As in, how much of my career value do I expect to lose if I suddenly die at the end of the midgame?)
- Midgame impact:
- The main mechanism here is that I think I (as part of Redwood) have a shot at developing alignment techniques (or other knowledge) that make a serious difference to the alignment plan, that can then be adopted by AI labs with no further actions from me/Redwood.
- Secondarily, I think we have a shot at developing an alignment research methodology and/or organizational structure that allows Redwood and maybe the broader alignment community to do much more good work during the midgame.
- Other midgame impacts I can have:
- helping other people to gain skills such that they have a really big impact in the endgame.
- setting up an organization which lasts into the endgame and is really productive at doing something crucial.
- Endgame impact:
- I have been assuming that most of my impact was in the midgame, but after thinking about it more over the last few weeks I’m now less sold on that.
- Mechanisms by which Redwood-style backchained alignment researchers might have a lot of impact in the endgame:
- Being an expert on some particular thing which is extremely useful when trying to align AIs or safely extract cognitive labor from them. When I’m thinking about this, I’m mostly trying to imagine the head of the alignment department at an AI lab being ecstatically delighted to be able to hire me, and saying “I am so glad we are able to work with Buck; it’s so incredibly convenient that he has been obsessively focused on this topic for the last four years and he’s already understood a bunch of the ins and outs; if it wasn’t for him we’d be yoloing stuff way more.”
- This could be a class of techniques, e.g. interpretability or MAD.
- This could be a certain aspect of the alignment situation, from a more technique-agnostic perspective. For example, it seems pretty plausible to me that when we’re using AI to make lots of autonomous decisions, we’ll have some limited budget to oversee these decisions such that we can afford to oversee like 0.1% of them, and we’d really like to be smart about which 0.1% to review so that we maximize the probability of catching bad actions. There are lots of different approaches you might use here; e.g. you probably want to try some kind of anomaly detection, and you probably also want to concentrate oversight on examples where the AI action was higher stakes. If I/some Redwood people obsessively focused on the version of this problem that we expect to happen with TAI, I can imagine us being incredible hires due to having so much relevant experience and cached thinking.
- As another example, if I decided to just focus on anti-adversarial training, it’s easy to imagine being way more familiar with that problem than anyone at an AI lab.
- Having a bunch of practice at thinking about AI alignment in principle, which might be really useful for answering difficult-to-empirically-resolve questions about the AIs being trained.
- Being well-prepared to use AI cognitive labor to do something useful, by knowing a lot about some research topic that we end up wanting to put lots of AI labor into. Maybe you could call this “preparing to be a research lead for a research group made up of AIs”. Or “preparing to be good at consuming AI research labor”.
- Being an expert on some particular thing which is extremely useful when trying to align AIs or safely extract cognitive labor from them. When I’m thinking about this, I’m mostly trying to imagine the head of the alignment department at an AI lab being ecstatically delighted to be able to hire me, and saying “I am so glad we are able to work with Buck; it’s so incredibly convenient that he has been obsessively focused on this topic for the last four years and he’s already understood a bunch of the ins and outs; if it wasn’t for him we’d be yoloing stuff way more.”
Pacing: a freshman year
I think I want to treat my next year with the pacing of a freshman year in a US undergrad degree, for someone who wants to go into startups and thinks there’s some chance that they’ll want to graduate college early. I think that people going into their freshman year should be thinking a little bit about what they want to do after college. They should understand things that they need to do during college in order to be set up well for their post-college activities (e.g. they probably want to do some research as an undergrad, and they probably need to eventually learn various math). But meeting those requirements probably isn’t going to be where most of their attention goes.
Similarly, I think that I should be thinking a bit about my AI endgame plans, and make sure that I’m not failing to do fairly cheap things that will set me up for a much better position in the endgame. But I should mostly be focusing on succeeding during the midgame (at some combination of doing valuable research and at becoming an expert in topics that will be extremely valuable during the endgame).
When you’re a freshman, you probably shouldn’t feel like you’re sprinting all the time. You should probably believe that skilling up can pay off over the course of your degree. Every month is about 2% of your degree.
I think that this is how I want to feel. In a certain sense, four years is a really long time. I spent a reasonable amount of the last year feeling kind of exhausted and wrecked and rushed, and my guess is that this was net bad for my productivity. I think I should feel like there is real urgency, but also real amounts of space to learn and grow and play.
I went back and forth a lot on how I wanted to set up this metaphor; in particular, I was pretty tempted to suggest that I should think of this as a sophomore year rather than a freshman year. I think that freshmen should usually mostly ignore questions about career planning, whereas I think I should e.g. spend at least some time talking to labs about the possibility of them working with me/Redwood in various ways. I ended up choosing freshman rather than sophomore because I think that 3 years is less reasonable than 4.
And so, my plan is something like:
- Put a bit of work into setting up my AI endgame plans.
- E.g. talk to some people who are at labs and make sure they don’t think that my vague aspirations here are insane. I’m interested in more suggestions along these lines.
- I think that if I feel more like I’ve deliberated once about this, I’ll find it easier to pursue my short-term plans wholeheartedly.
- Mostly (like with 70% of my effort), push hard on succeeding at my midgame plans.
- Spend about 20% of my effort on learning things that don’t have immediate benefits.
- For example, I’ve spent some time over the last few weeks learning about generative modeling, and I plan to continue studying this. I have a few motivations here:
- Firstly, I think it’s pretty healthy for me to know more about how ML progress tends to happen, and I feel much more excited about this subfield of ML than most subfields of ML. I feel intuitively really impressed and admiring of the researchers in this field, and it seems healthy for me to have a research field with researchers who I look up to and who I wholeheartedly believe I can learn a lot from.
- Secondly, I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment.
- For example, I’ve spent some time over the last few weeks learning about generative modeling, and I plan to continue studying this. I have a few motivations here:
Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] "Perhaps there's no problem at all" - saying this really doesn't help! I want to know why might that be the case! "concerted effort by longtermists could reduce it" - seems less likely now given shorter timelines. "finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems" - this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, "coordinate to not build dangerous AI systems" is not part of p(non-doom|AGI) [I'm interested in why people think there won't be doom, given we get AGI]. So far, Paul's section does basically nothing to update me on p(doom|AGI).
[Rohin Shah] "A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn't scale." - yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn't hold much hope imo. "I'm also less worried about race dynamics increasing accident risk"; "the Nash equilibrium is for all agents to be cautious" - I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we're in a different world now. "If I condition on discontinuous takeoff... I... get a lot more worried about AI risk" - this also seems cruxy (and I guess we've discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pretty high (given scaling so far). Imagining a million of them then working for a million years subjective time, within say, the year 2025, and a fast take-off seems pretty likely. If 100x GPT-4 compute isn't enough, what about 1000x (affordable by a major state)? "most of my optimism comes from the more outside view type considerations: that we'll get warning signs that the ML community won't ignore" - well, I think we are getting warning signs now, and, whilst not ignoring them, the ML community is not taking them anywhere seriously enough! We need to Pause. Now. "and that the AI risk arguments are not watertight." - sure, but that doesn't mean we're fine by default! (Imo alignment needs to be watertight to say default we're fine.) At least in your (Rohin's) conversation from 2019, there are cruxes. I'm coming down on the side of doom on them though in our current world of 2023.
[Robin Hanson] "The current AI boom looks similar to previous AI booms, which didn't amount to much in the past." - GPT-4 is good evidence against this. "intelligence is actually a bunch of not-very-general tools that together let us do many things" - multimodals models are good evidence against this. Foundation transformer models seem to be highly general. "human uniqueness...it's our ability to process culture (communicating via language, learning from others, etc)." - again, GPT-4 can basically do this. "principal-agent problems tend to be bounded" - this seems a priori unlikely to apply with superhuman AI, and you (Rohin) yourself say you disagree with this (and people are complaining they can't find the literature Robin claims backs this up). "Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now" - what about now? Maybe after the release of Google DeepMind's next big multimodal model this will be clear. I don't find Robin's reasons for optimism convincing (and I'll also note that I find his vision of the future - Age of Em - horrifying, so his default "we'll be fine" is actually also a nightmare.) [Rohin's opinion] "once AI capabilities on these factors [ability to process culture] reach approximately human level, we will "suddenly" start to see AIs beating humans on many tasks, resulting in a "lumpy" increase on the metric of "number of tasks on which AI is superhuman"" - would you agree that this is happening with GPT-4?
[Adam Gleave] "as we get closer to AGI we'll have many more powerful AI techniques that we can leverage for safety" - again this seems to suffer from the problem of grounding them in having a reliable AI in the first place (as Eliezer says "getting the AI to do your Alignment" homework" isn't a good strategy). "expect that AI researchers will eventually solve safety problems; they don't right now because it seems premature to work on those problems" - certainly not premature now. But are we anywhere near on track to solving them in time? "would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers." - well, we've got both now. "10-20% likely that AGI comes only from small variations of current techniques" - seems much higher to me now with GPT-4 and multimodal models on the way. "would see this as more likely if we hit additional milestones by investing more compute and data" - well, we have. Overall Adam's 2019 conversation has done nothing to allay my 2023 doom concerns. I'm guessing that based on what is said, Adam himself has probably updated in the direction of doom.
Reading Paul's more detailed disagreements with Eliezer from last year doesn't really update me on doom either, given that he agrees with more than enough of Eliezer's lethalities (i.e. plenty enough to make the case for high p(doom|AGI)). The same applies to the Deepmind alignment team's response.
I think I can easily just reverse this (i.e. it does depend on whether you frame the question as "do we die?" or "do we live?", and you are doing the latter here). Although to be fair, I'd use "possible", rather than "plausible": all the "we'll be fine" arguments I know of seem to me like they establish possibility, not near-certainty.
Overall, none of this has helped in reducing my p(doom|AGI); it's not even really touching the sides, so to speak. Do you (or anyone else) have anything better? Note that I have also asked this question here.