Oliver Sourbut

Technical staff (Autonomous Systems) @ UK AI Safety Institute (AISI)
299 karmaJoined Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)London, UK



  • Autonomous Systems @ UK AI Safety Institute (AISI)
  • DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
  • Former senior data scientist and software engineer + SERI MATS

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.


A little followup:

I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.

I'm now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I'll finish my PhD in the end, as a result, but I don't regret it: I produced a little research, met some great collaborators, and had fun while learning as a consequence!

Between the original thread and my leaving for PhD, I'd say I grew my engineering, DS, and project management skills a little, though diminishing, while also doing a lot of AIS prep. My total income also went up while I remained FT employed. This was due for a slowdown as a consequence of stock movements and vesting, but regardless I definitely forwent a lot of money thanks to becoming a student again (and then a researcher rather than a high-paid engineer)! As far as I can tell this is the main price I paid, in terms of both personal situation and impact, and perhaps I should have made the move sooner (though having money in the bank is very freeing and enables indirect impact).

FWIW I work at the AI Safety Institute UK and we're considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I... don't fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!

Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).

How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?

I've been bought into that idea for a long time. As Shaq says, 'Excellence is not a singular act, but a habit. You are what you repeatedly do.'

We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).

Simple, clear, thought-provoking model. Thanks!

I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it's worthwhile bringing in volunteers and training them! (citation very much needed)

Nitpick: are these 'externalities'? I'd have said, 'side effects'. An externality is a third-party impact from some interaction between two parties. The effects you're describing don't seem to be distinguished by being third-party per se (I can imagine glossing them as such but it's not central or necessary to the model).

Yeah. I also sometimes use 'extinction-level' if I expect my interlocutor not to already have a clear notion of 'existential'.

Point of information: at least half the funding comes from Schmidt futures (not OpenAI), though OpenAI are publicising and administrating it.

Another high(er?) priority for governments:

  • start building multilateral consensus and preparations on what to do if/when
    • AI developers go rogue
    • AI leaked to/stolen by rogue operators
    • AI goes rogue

I think this is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. What follows is a mildly heated response I had a few days ago, whose heatedness I don't necessarily endorse but whose content seems important to me.

Sadly this letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. I recently wrote an admonishment to CAIS for something similar.

The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China) a chance to catch up.

There are literal misanthropic 'effective accelerationists' in San Francisco, some of whose stated purpose is to train/develop AI which can surpass and replace humanity. There's Facebook/Meta, whose leaders and executives have been publicly pooh-poohing discussion of AI-related risks as pseudoscience for years, and whose actual motto is 'move fast and break things'. There's OpenAI, which with great trumpeting announces its 'Superalignment' strategy without apparently pausing to think, 'But what if we can't align AGI in 5 years?'. We don't need to invoke bogeyman 'China' to make this sort of point. Note also that the CCP (along with EU and UK gov) has so far been more active in AI restraint and regulation than, say, the US government, or orgs like Facebook/Meta.

Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China.

Now, this was in the context of paraphrases of others' positions on a pause in AI development, so it's at least slightly mention-flavoured (as opposed to use). But as far as I can tell, the precise framing here has been introduced in Scott's retelling.

Whoever introduced this formulation, this is bonkers in at least two ways. First, who is 'the West' and who is 'China'? This hypothetical frames us as hivemind creatures in a two-player strategy game with a single lever. Reality is a lot more porous than that, in ways which matter (strategically and in terms of outcomes). I shouldn't have to point this out, so this is a little bewildering to read. Let me reiterate: governments are not currently pursuing advanced AI development, only companies. The companies are somewhat international, mainly headquartered in the US and UK but also to some extent China and EU, and the governments have thus far been unwitting passengers with respect to the outcomes. Of course, these things can change.

Second, actually think about the hypothetical where 'we'[1] are 'on the verge of creating dangerous AI'. For sufficient 'dangerous', the only winning option for humanity is to take the steps we can to prevent, or at least delay[2], that thing coming into being. This includes advocacy, diplomacy, 'aggressive diplomacy' and so on. I put forward that the right length of pause then is 'at least as long as it takes to make the thing not dangerous'. You don't win by capturing the dubious accolade of nominally belonging to the bloc which directly destroys everything! To be clear, I think Scott and I agree that 'dangerous AI' here is shorthand for, 'AI that could defeat/destroy/disempower all humans in something comparable to an extinction event'. We already have weak AI which is dangerous to lesser levels. Of course, if 'dangerous' is more qualified, then we can talk about the tradeoffs of risking destroying everything vs 'us' winning a supposed race with 'them'.

I'm increasingly running with the hypothesis that many anglophones are mind-killed on the inevitability of contemporary great power conflict in a way which I think wasn't the case even, say, 5 years ago. Maybe this is how thinking people felt in the run up to WWI, I don't know.

I wonder if a crux here is some kind of general factor of trustingness toward companies vs toward governments - I think extremising this factor would change the way I talk and think about such matters. I notice that a lot of American libertarians seem to have a warm glow around 'company/enterprise' that they don't have around 'government/regulation'.

[ In my post about this I outline some other possible cruxes and I'd love to hear takes on these ]

Separately, I've got increasingly close to the frontier of AI research and AI safety research, and the challenge of ensuring these systems are safe remains very daunting. I think some policy/people-minded discussions are missing this rather crucial observation. If you expect it to be easy (and expect others to expect that) to control AGI, I can see more why people would frame things around power struggles and racing. For this reason, I consider it worthwhile repeating: we don't know how to ensure these systems will be safe, and there are some good reasons to expect that they won't be by default.

I repeat that the post as a whole is doing a service and I'm excited to see more contributions to the conversation around pause and differential development and so on.

  1. Who, me? You? No! Some development team at DeepMind or OpenAI, presumably, or one of the current small gaggle of other contenders, or a yet-to-be-founded lab. ↩︎

  2. If it comes to it, extinction an hour later is better than an hour sooner. ↩︎

I think that the best work on AI alignment happens at the AGI labs

Based on your other discussion e.g. about public pressure on labs, it seems like this might be a (minor?) loadbearing belief?

I appreciate that you qualify this further in a footnote

This is a controversial view, but I’d guess it’s a majority opinion amongst AI alignment researchers.

I just wanted to call out that I weakly hold the opposite position, and also opposite best guess on majority opinion (based on safety researchers I know). Naturally there are sampling effects!

This is a marginal sentiment, and I certainly wouldn't trade all lab researchers for non-lab researchers or vice versa. Diversification of research settings seems quite precious, and the dialogue is important to preserve.

I also question

Reasons include: access to the best alignment talent,

because a lot of us are very reluctant to join AGI labs, for obvious reasons! I know folks inside and outside of AGI labs, and it seems to me that the most talented are among the outsiders (but this also definitely could be an artefact of sample sizes).

Load more