Oliver Sourbut

Technical staff (Autonomous Systems) @ UK AI Safety Institute (AISI)
318 karmaJoined Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)London, UK
www.oliversourbut.net

Bio

Participation
4

  • Autonomous Systems @ UK AI Safety Institute (AISI)
  • DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
  • Former senior data scientist and software engineer + SERI MATS

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.

Comments
59

(cross-posted on LW)

Love this!

As presaged in our verbal discussion my top conceptual complement would be to emphasise exploration/experimentation as central to the knowledge production loop - the cycle of 'developing good taste to plan better experiments to improve taste (and planning model)' is critical (indispensable?) to 'produce new knowledge which is very helpful by the standards of human civilization' (on any kind of meaningful timescale).

This because just flailing, or even just 'doing stuff', gets you some novelty of observations, but directedly seeking informative circumstances at the boundaries of the known (which includes making novel unpredictable events happen, as well as getting equipped with richer means to observe and record them, and perhaps preparing to deliberatively extract insight) turns out to be able to mine vastly more insight per resource (time, materials, etc.). Hence science, but also hence individual human and animal playfulness, curiosity, adversarial exercises and drills (self-play ish), and whatnot.

Said another way, maybe I'd characterise 'the way that fluid intelligence and crystallised intelligence synergise in the knowledge production loop' as 'directed exploration/experimentation'?

Having said that, I don't necessarily think these capacities need to reside 'in the same mind', just as contemporary human orgs get more of this done and more effectively than individuals. But the pieces do need to be fit to each other (like, a physicist with great physics taste can't usually very well complement a bio lab without first becoming a person with great bio taste).

I like this decomposition!

I think 'Situational Awareness' can quite sensibly be further divided up into 'Observation' and 'Understanding'.

The classic control loop of 'observe', 'understand', 'decide', 'act'[1], is consistent with this discussion, where 'observe'+'understand' here are combined as 'situational awareness', and you're pulling out 'goals' and 'planning capacity' as separable aspects of 'decide'.

Are there some difficulties with factoring?

Certain kinds of situational awareness are more or less fit for certain goals. And further, the important 'really agenty' thing of making plans to improve situational awareness does mean that 'situational awareness' is quite coupled to 'goals' and to 'implementation capacity' for many advanced systems. Doesn't mean those parts need to reside in the same subsystem, but it does mean we should expect arbitrary mix and match to work less well than co-adapted components - hard to say how much less (I think this is borne out by observations of bureaucracies and some AI applications to date).


    1. Terminology varies a lot; this is RL-ish terminology. Classic analogues might be 'feedback', 'process model'/'inference', 'control algorithm', 'actuate'/'affect'... ↩︎

A little followup:

I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.

I'm now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I'll finish my PhD in the end, as a result, but I don't regret it: I produced a little research, met some great collaborators, and had fun while learning as a consequence!

Between the original thread and my leaving for PhD, I'd say I grew my engineering, DS, and project management skills a little, though diminishing, while also doing a lot of AIS prep. My total income also went up while I remained FT employed. This was due for a slowdown as a consequence of stock movements and vesting, but regardless I definitely forwent a lot of money thanks to becoming a student again (and then a researcher rather than a high-paid engineer)! As far as I can tell this is the main price I paid, in terms of both personal situation and impact, and perhaps I should have made the move sooner (though having money in the bank is very freeing and enables indirect impact).

FWIW I work at the AI Safety Institute UK and we're considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I... don't fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!

Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).

How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?

I've been bought into that idea for a long time. As Shaq says, 'Excellence is not a singular act, but a habit. You are what you repeatedly do.'

We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).

Simple, clear, thought-provoking model. Thanks!

I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it's worthwhile bringing in volunteers and training them! (citation very much needed)

Nitpick: are these 'externalities'? I'd have said, 'side effects'. An externality is a third-party impact from some interaction between two parties. The effects you're describing don't seem to be distinguished by being third-party per se (I can imagine glossing them as such but it's not central or necessary to the model).

Yeah. I also sometimes use 'extinction-level' if I expect my interlocutor not to already have a clear notion of 'existential'.

Point of information: at least half the funding comes from Schmidt futures (not OpenAI), though OpenAI are publicising and administrating it.

Another high(er?) priority for governments:

  • start building multilateral consensus and preparations on what to do if/when
    • AI developers go rogue
    • AI leaked to/stolen by rogue operators
    • AI goes rogue
Load more