Oliver Sourbut

Technical staff (Autonomous Systems) @ UK AI Safety Institute (AISI)

312 karmaJoined Sep 2020Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)London, UK

www.oliversourbut.net

Message

Interests:

AI safetyAI alignmentAI forecastingBiosecurityAnimal welfareWild animal welfareArtificial sentience

Bio

Participation
4

Autonomous Systems @ UK AI Safety Institute (AISI)
DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
Former senior data scientist and software engineer + SERI MATS

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

Ord - The Precipice
Pearl - The Book of Why
Bostrom - Superintelligence
McCall Smith - The No. 1 Ladies' Detective Agency (and series)
Melville - Moby-Dick
Abelson & Sussman - Structure and Interpretation of Computer Programs
Stross - Accelerando
Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

Hanabi (can't recommend enough; try it out!)
Pandemic (ironic at time of writing...)
Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.

Posts
4

Sorted by New

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut

· 8mo ago

Careless talk on US-China AI competition? (and criticism of CAIS coverage)

Oliver Sourbut

· 2y ago

Un-unpluggability - can't we just unplug it?

Oliver Sourbut

· 2y ago

-3

Two tongue-in-cheek EA anthems

Oliver Sourbut

· 3y ago · 1m read

Comments
58

Decomposing Agency — capabilities without desires

Oliver Sourbut9mo5

I like this decomposition!

I think 'Situational Awareness' can quite sensibly be further divided up into 'Observation' and 'Understanding'.

The classic control loop of 'observe', 'understand', 'decide', 'act'^[1], is consistent with this discussion, where 'observe'+'understand' here are combined as 'situational awareness', and you're pulling out 'goals' and 'planning capacity' as separable aspects of 'decide'.

Are there some difficulties with factoring?

Certain kinds of situational awareness are more or less fit for certain goals. And further, the important 'really agenty' thing of making plans to improve situational awareness does mean that 'situational awareness' is quite coupled to 'goals' and to 'implementation capacity' for many advanced systems. Doesn't mean those parts need to reside in the same subsystem, but it does mean we should expect arbitrary mix and match to work less well than co-adapted components - hard to say how much less (I think this is borne out by observations of bureaucracies and some AI applications to date).

Terminology varies a lot; this is RL-ish terminology. Classic analogues might be 'feedback', 'process model'/'inference', 'control algorithm', 'actuate'/'affect'... ↩︎

Careers Questions Open Thread

Oliver Sourbut1y6

A little followup:

I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.

I'm now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I'll finish my PhD in the end, as a result, but I don't regret it: I produced a little research, met some great collaborators, and had fun while learning as a consequence!

Between the original thread and my leaving for PhD, I'd say I grew my engineering, DS, and project management skills a little, though diminishing, while also doing a lot of AIS prep. My total income also went up while I remained FT employed. This was due for a slowdown as a consequence of stock movements and vesting, but regardless I definitely forwent a lot of money thanks to becoming a student again (and then a researcher rather than a high-paid engineer)! As far as I can tell this is the main price I paid, in terms of both personal situation and impact, and perhaps I should have made the move sooner (though having money in the bank is very freeing and enables indirect impact).

AI Regulation is Unsafe

Oliver Sourbut1y8

FWIW I work at the AI Safety Institute UK and we're considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I... don't fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!

Public Fundraising has Positive Externalities

Oliver Sourbut1y1

Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).

Altruism sharpens altruism

Oliver Sourbut1y12

How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?

I've been bought into that idea for a long time. As Shaq says, 'Excellence is not a singular act, but a habit. You are what you repeatedly do.'

We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).

Public Fundraising has Positive Externalities

Oliver Sourbut1y5

Simple, clear, thought-provoking model. Thanks!

I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it's worthwhile bringing in volunteers and training them! (citation very much needed)

Nitpick: are these 'externalities'? I'd have said, 'side effects'. An externality is a third-party impact from some interaction between two parties. The effects you're describing don't seem to be distinguished by being third-party per se (I can imagine glossing them as such but it's not central or necessary to the model).

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI

Oliver Sourbut1y5

Yeah. I also sometimes use 'extinction-level' if I expect my interlocutor not to already have a clear notion of 'existential'.

OpenAI's Superalignment team has opened Fast Grants

Oliver Sourbut1y3

Point of information: at least half the funding comes from Schmidt futures (not OpenAI), though OpenAI are publicising and administrating it.

Thoughts on the AI Safety Summit company policy requests and responses

Oliver Sourbut1y1

Another high(er?) priority for governments:

start building multilateral consensus and preparations on what to do if/when
- AI developers go rogue
- AI leaked to/stolen by rogue operators
- AI goes rogue

Pause For Thought: The AI Pause Debate

Oliver Sourbut2y29

I think this is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. What follows is a mildly heated response I had a few days ago, whose heatedness I don't necessarily endorse but whose content seems important to me.

Sadly this letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. I recently wrote an admonishment to CAIS for something similar.

The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China) a chance to catch up.

There are literal misanthropic 'effective accelerationists' in San Francisco, some of whose stated purpose is to train/develop AI which can surpass and replace humanity. There's Facebook/Meta, whose leaders and executives have been publicly pooh-poohing discussion of AI-related risks as pseudoscience for years, and whose actual motto is 'move fast and break things'. There's OpenAI, which with great trumpeting announces its 'Superalignment' strategy without apparently pausing to think, 'But what if we can't align AGI in 5 years?'. We don't need to invoke bogeyman 'China' to make this sort of point. Note also that the CCP (along with EU and UK gov) has so far been more active in AI restraint and regulation than, say, the US government, or orgs like Facebook/Meta.

Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China.

Now, this was in the context of paraphrases of others' positions on a pause in AI development, so it's at least slightly mention-flavoured (as opposed to use). But as far as I can tell, the precise framing here has been introduced in Scott's retelling.

Whoever introduced this formulation, this is bonkers in at least two ways. First, who is 'the West' and who is 'China'? This hypothetical frames us as hivemind creatures in a two-player strategy game with a single lever. Reality is a lot more porous than that, in ways which matter (strategically and in terms of outcomes). I shouldn't have to point this out, so this is a little bewildering to read. Let me reiterate: governments are not currently pursuing advanced AI development, only companies. The companies are somewhat international, mainly headquartered in the US and UK but also to some extent China and EU, and the governments have thus far been unwitting passengers with respect to the outcomes. Of course, these things can change.

Second, actually think about the hypothetical where 'we'^[1] are 'on the verge of creating dangerous AI'. For sufficient 'dangerous', the only winning option for humanity is to take the steps we can to prevent, or at least delay^[2], that thing coming into being. This includes advocacy, diplomacy, 'aggressive diplomacy' and so on. I put forward that the right length of pause then is 'at least as long as it takes to make the thing not dangerous'. You don't win by capturing the dubious accolade of nominally belonging to the bloc which directly destroys everything! To be clear, I think Scott and I agree that 'dangerous AI' here is shorthand for, 'AI that could defeat/destroy/disempower all humans in something comparable to an extinction event'. We already have weak AI which is dangerous to lesser levels. Of course, if 'dangerous' is more qualified, then we can talk about the tradeoffs of risking destroying everything vs 'us' winning a supposed race with 'them'.

I'm increasingly running with the hypothesis that many anglophones are mind-killed on the inevitability of contemporary great power conflict in a way which I think wasn't the case even, say, 5 years ago. Maybe this is how thinking people felt in the run up to WWI, I don't know.

I wonder if a crux here is some kind of general factor of trustingness toward companies vs toward governments - I think extremising this factor would change the way I talk and think about such matters. I notice that a lot of American libertarians seem to have a warm glow around 'company/enterprise' that they don't have around 'government/regulation'.

[ In my post about this I outline some other possible cruxes and I'd love to hear takes on these ]

Separately, I've got increasingly close to the frontier of AI research and AI safety research, and the challenge of ensuring these systems are safe remains very daunting. I think some policy/people-minded discussions are missing this rather crucial observation. If you expect it to be easy (and expect others to expect that) to control AGI, I can see more why people would frame things around power struggles and racing. For this reason, I consider it worthwhile repeating: we don't know how to ensure these systems will be safe, and there are some good reasons to expect that they won't be by default.

I repeat that the post as a whole is doing a service and I'm excited to see more contributions to the conversation around pause and differential development and so on.

Who, me? You? No! Some development team at DeepMind or OpenAI, presumably, or one of the current small gaggle of other contenders, or a yet-to-be-founded lab. ↩︎
If it comes to it, extinction an hour later is better than an hour sooner. ↩︎

Oliver Sourbut

Bio

Participation4

Posts 4

Comments58

Participation
4

Posts
4

Comments
58