Oliver Sourbut

PhD student (AI) @ University of Oxford
Pursuing a doctoral degree (e.g. PhD)
Working (6-15 years of experience)



Call me Oliver or Oly - I don't mind which.

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I'm currently (2022) just embarking on a PhD in AI in Oxford, and also spend time in (or in easy reach of) London. Until recently I was working as a senior data scientist and software engineer, and I've been doing occasional AI alignment research with SERI.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.


Sorted by New


I've given a little thought to this hidden qualia hypothesis but it remains very confusing for me.

To what extent should we expect to be able to tractably and knowably affect such hidden qualia?

This is beautiful and important Tyler, thank you for sharing.

I've seen a few people burn out (and come close myself), and I have made a point of gently socially making and reinforcing this sort of point (far less eloquently) myself, in various contexts. 

I have a lot of thoughts about this subject.

One thing I embrace always is silliness and (often self-deprecating) humour, which are useful antidotes to stress for a lot of people. Incidentally, your tweet thread rendition of the Eqyptian spell includes

I am light heading for light. Even in the dark, a fire bums in the distance.

(emphasis mine) which I enjoyed. A case of bad keming reified?

A few friends and acquaintances have recently been working on something they're calling Shard Theory, which considers the various parts of a human's motivation system and their interactions. They're interested for other reasons, but I was reminded here. See also Kaj Sotala's Multiagent Models of Mind which is more explicitly about how to be a human.

As a firm descriptive (but undecidedly prescriptive) transhumanist, I think your piece here also touches on something we will likely one day (maybe soon?) have to grapple with, which is the fundamental relationship between (moral) agency and moral patienthood. As it happens, modern humans are quite conclusively both, by most lights, but it doesn't look like this is a law of nature. Indeed there are likely many deserving moral patients today who are not much by way of agents. And we may bring into being agents which are not especially moral-patienty. (Further, something sufficiently agenty might render humans themselves ourselves to the status of 'not much by way of agents'.)

Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.

I think it's a staircase? Maybe like climbing upwards to more good stuff. Plus some cool circles to make it logo ish.

I'm intrigued by this thread. I don't have an informed opinion on the particular aesthetic or choice of quiz questions, but I note some superficial similarities to Coursera, Khan Academy, and TED-Ed, which are aimed at mainly professional age adults, students of all ages, and youth/students (without excluding adults) respectively.

Fun/cute/cartoon aesthetics do seem to abound these days in all sorts of places, not just for kids.

My uninformed opinion is that I don't see why it should put off teenagers (talented or otherwise) in particular, but I weakly agree that if something is explicitly pitched at teenagers, that might be offputting!

It looks like I got at least one downvote on this comment. Should I be providing tips of this kind in a different way?

I've considered a possible pithy framing of the Life Despite Suffering question as a grim orthogonality thesis (though I'm not sure how useful it is):

We sometimes point to the substantial majority's revealed preference for staying alive as evidence of a 'life worth living'. But perhaps 'staying-aliveness' and 'moral patient value' can vary more independently than that claim assumes. This is the grim orthogonality thesis.

An existence proof for the 'high staying-aliveness x low moral patient value' quadrant is the complex of torturer+torturee, which quite clearly can reveal a preference for staying alive, while quite plausibly being net negative value.

Can we rescue the correlation of revealed 'staying-aliveness' preference with 'life worth livingness'?

We can maybe reason about value from the origin of moral patients we see, without having a physical theory of value. All the patients we see at present are presumably products of natural selection. Let's also assume for now that patienthood comes from consciousness.

Two obvious but countervailing observations

  • to the extent that conscious content is upstream of behaviour but downstream of genetic content, natural selection will operate on conscious content to produce behaviour which is fitness-correlated
    • if positive conscious content produces attractive behaviour (and vice versa), we might anticipate that an organism 'doing well' according to suitable fitness-correlates would be experiencing positive conscious content
    • this seems maybe true of humans?
  • to the extent that behaviour is downstream of non-conscious control processes, natural selection will operate on non-conscious control processes to produce behaviour which is fitness-correlated
    • we can not rule out experiences 'not worth living' which nevertheless produce net revealed staying-aliveness preference, if the behaviour is sufficiently under non-conscious control, or if the selection for behaviour downstream of negative conscious experience is weak
      • weak selection is especially likely in novel out-of-distribution situations
    • in general, organisms which reveal preferences for not staying alive will never be ceteris paribus fitter (though there are special cases of course)

For non-naturally-selected moral patients, I think even the above bets are basically off.

I'm shocked and somewhat concerned that your empirical finding is that so few people have encountered or thought about this crucial consideration.

My experience is different, with maybe 70% of AI x-risk researchers I've discussed with being somewhat au fait with the notion that we might not know the sign of future value conditional on survival. But I agree that it seems people (myself included) have a tendency to slide off this consideration or hope to defer its resolution to future generations, and my sample size is quite small (a dozen maybe) and quite correlated.

For what it's worth, I recall this question being explicitly posed in at least a few of the EA in-depth fellowship curricula I've consumed or commented on, though I don't recall specifics and when I checked EA Cambridge's most recent curriculum I couldn't find it.

Typo hint:

"10<sup>38</sup>" hasn't rendered how you hoped. You can use <dollar>10^{38}<dollar> which renders as

Got it, I think you're quite right on one reading. I should have been clearer about what I meant, which is something like

  • there is a defensible reading of that claim which maps to some negative utilitarian claim (without necessarily being a central example)
  • furthermore I expect many issuers of such sentiments are motivated by basically pretheoretic negative utilitarian insight

E.g. imagine a minor steelification (which loses the aesthetic and rhetorical strength) like "nobody's positive wellbeing (implicitly stemming from their freedom) can/should be celebrated until everyone has freedom (implicitly necessary to escape negative wellbeing)" which is consistent with some kind of lexical negative utilitarianism.

You're right that if we insist that 'freedom' be interpreted identically in both places (parsimonious, granted, though I think the symmetry is better explained by aesthetic/rhetorical concerns) another reading explicitly neglects the marginal benefit of lifting merely some people out of illiberty. Which is only consistent with utilitarianism if we use an unusual aggregation theory (i.e. minimising) - though I have also seen this discussed under negative utilitarianism.

Anecdata: as someone whose (past) political background and involvement (waning!) is definitely some kind of lefty, and who, if it weren't for various x- and s-risks, would plausibly consider some form (my form, naturally!) of lefty politics to be highly important (if not highly tractable), my reading of that claim at least goes something like the first one. I might not be representative in that respect.

I have no doubt that many people expressing that kind of sentiment would still celebrate marginal 'releases', while considering it wrong to celebrate further the fruits of such freedom, ignoring others' lack of freedom.

Load More