Hide table of contents

𝙰 π™Ώπ™΄πšπ™΅π™Ύπšπ™Όπ™°π™½π™²π™΄ π™°πšπšƒ/πšπ™΄πš‚π™΄π™°πšπ™²π™· π™Ώπšπ™Ύπ™Ήπ™΄π™²πšƒ π™±πšˆ π™²π™·πšπ™Έπš‚ 𝙻𝙴𝙾𝙽𝙢

π™½π™Ύπšƒ πšƒπ™Ύ 𝙱𝙴 πšƒπ™°π™Ίπ™΄π™½ πš‚π™΄πšπ™Έπ™Ύπš„πš‚π™»πšˆ. π™΅π™Ύπš πšπ™΄π™°π™»

❦

a story: raw human wisdom is vastly insufficient for the task before us...

of navigating an entire series of highly uncertain and deeply contested decisions

where a single mistake could prove ruinous

with a greatly compressed timeline

❦
skip to the "real content"

Literary Reflection - Vanity of vanities! All is vanity! - April 1st Bonus DLC

✧ so this is my final output or maybe not? it's pretty much impossible to know either way.

β–  what I'm hearing is that your ERA output is half-alive and half-dead. let me guessβ€”you missed draft amnesty week?

✧ ngl this is all just an absurdly high-effort advertisement for my reading list.

β–  it's fun to joke and all, but shouldn't you be taking this more seriously? the stakes are high; the challenge immense; the funding cha-ching; um, i mean, life-changing.

✧ (dramatic voice) is being "serious" the only way to be serious? is not the greatest 'seriousness' often the surest sign that authenticity is nowhere to be found? if i err, do i not err by taking this matter too seriously? do not our greatest battles demand that we bring everything, everything to bear? (pregnant pause)

β–  (eyes lighting up in recognition) my God, not...

✧ MY GOD NOTβ€”INDEED! is not the greatness of this deed TOO GREAT FOR US? must we ourselves not become Dogeβ€”that is to say meme lords (or Prince harry)β€”SIMPLY TO APPEAR WORTHY OF IT?

β–  (anguished) Jesus, chris! THIS?... AGAIN?...

✧ YES AND AGAIN AND AGAIN AND AGAIN AND AGAIN (and maybe some more) ... there has never been A GREATER DEED; and whoever is born after usβ€”for the sake of this deed he will belong to A HIGHER HISTORY THAN ALL HISTORY HITHERTO. here the madman fell silent.

β–  i suspect the madman will not remain silent.

✧ i suspect your suspicion is well-suspected. i tell you solemnly, until the day the madman is dead and buried, it shall be exactly as thou hast foretold.

β–  "and buried?"β€”Jesus! (shakes head) ought not death suffice? or shall your clearly tortured soul rise from the dead and haunt me as some kind of unholy ghost? the Blessed Prophet Eliezer hath said "shut up and do the impossible!", but i say unto you: "SHUT UP (it's possible!)". and I pray thee, if thou sparest not my ears, at least spare my SOL.

✧ YOUR SOUL! your SOUL! (still serious, but soft, dreamy almost ethereal voice) "what is love? what is creation? what is longing? what is Q*?", so asketh the last man and blinketh. HE BLINKETH!

β–  ENOUGH, thou hast worn down my patience! HOLD THY TONGUE, I beseech thee, LEST I BLINKETH THEE OFF THIS EARTH!!!

✧ THOU WOULDST BLINK ME FROM THE EARTH? I tell you solemnly: thou blinkest constantly, but of blinking thou knowest naught. (deep voice, to the point of absurdity) β€œthere they laugh: they understand me not; I AM NOT THE MOUTH FOR THESE EARS. must one first BATTER their ears, that they may learn TO HEAR WITH THEIR EYES? (his voice becomes dreamy again) their eyes, their eyes... and what blessed sights do these eyes see... "twinkle, twinkle, little star, (pausing for dramatic effect, but trying way too hard) how i wonder what you are?"β€”and verily, i say unto you: never hath man penned words more profound...

β–  (intially flabbergasted, then his expression shifts. SILENCE. COLD, STONY SILENCE. THE ABSOLUTE COLD, STONY SILENCE OF ONE WHO WOULD WILLINGLY BLINKETH SOMEONE INTO SPACE)

✧ (the rebuke gives the madman pause to reflect, but only for a moment) i understand perfectly. what thou art saying is... that thou desirest my talk to proceed forthwith! the sooner i commence, the sooner thou shalt be free of my proclamations and the sooner i shall be free of your ill humour. on my honour, i shall delay this no further...

❦

Prelude

"URGENT: GET COLLECTIVELY WISER"[1]

Yoshua Bengio, Turing Award Winner and AI "Godfather", On the Wisdom Race

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. β€” Nick Bostrom, Superintelligence
❦

Did you and the other scientists not stop to consider the implications of what you were creating? β€” Special Counsel

When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb[2]Β β€” Oppenheimer

❦
There are moments in the history of science, where you have a group of scientists look at their creation and just say, you know: β€˜What have we done?... Maybe it's great, maybe it's bad, but what have we done? β€” Sam AltmanΒ 
❦

We stand at a crucial moment in the history of our species. Fueled by technological progress, our power has grown so great that for the first time in humanity’s long history, we have the capacity to destroy ourselvesβ€”severing our entire future and everything we could become.

Yet humanity’s wisdom has grown only falteringly, if at all, and lags dangerously behind. Humanity lacks the maturity, coordination and foresight necessary to avoid making mistakes from which we could never recover. As the gap between our power and our wisdom grows, our future is subject to an ever-increasing level of risk. This situation is unsustainable. So over the next few centuries[20], humanity will be tested: it will either act decisively to protect itself and its long-term potential, or, in all likelihood, this will be lost forever β€” Toby Ord, The Precipice

Additional quotes

Before I begin

Some projects consist of crafting a plan, then implementing it; others are more emergent. This project is of the latter kind.

Throughout this project, I have struggled to articulate exactly what it is that I have been trying to achieveβ€”even to myself, at times. I must have cycled through a dozen different framings: as seeding a conversation, as a distillation of ideas already floating in the air, as an effort to construct new narratives for a new age, as deconstruction, as a manifesto, as an exploration, as an attempt to synthesise two traditions, as both artifact and record of a decade spent struggling with these problems...

tell me more about what exactly this is

First and foremost, I view this as an exploration. The other aspects should be understood in this light. It is less an attempt to conclusively and immediately pronounce the truthβ€”What is it? How do we find it? How would we know if we had succeeded?β€”than to sketch out one possible interpretation. Socrates said "I know that I know nothing" and I see it as one of the wisest statements ever made.

It is also an act of narrative construction. I've been strongly persuaded by Yuval Harari's perspective in Sapiens that narrativesβ€”rather than being mere foolishness or entertainmentβ€”are, first, often useful fictions and, second, a key enabler of large-scale coordination in modern civilization. Our high-level narratives almost never completely capture the truth: they leave some things out, smooth over others. However, as finite beings, we seem to have no choice but to make use of some kind of simplifying narrative.

At the opposite end, it serves as a deconstruction. I'll quote Wittgenstein here: "My propositions serve as elucidations in the following way: anyone who understands me eventually recognizes them as nonsensical, when he has used themβ€”as stepsβ€”climb up beyond them. (He must, so to speak, throw away the ladder after he has climbed up it.)". My preferred interpretation is that he views the Tractatus as a tool for transformation rather than something to be taken literally. I suspect that this is a healthy way to relate to these kinds of large-scale, macro-narratives. "The map is not the territory" and "The tao that can be named is not the tao" express a similar spirit.

how does this fit into the broader intellectual landscape?

This talk can also be seen as an attempt to unify[3]Β the intellectual threads of AI safety and post-rationalism[4][5]/integral altruism. I honestly went back and forth several times about whether or not unifying them constitutes a worthwhile project or, worse, whether it would be net-negative. After all, post-modernism[6]Β essentially blew up several academic fields which have produced very little work of value since then. And it feels a bit strange to give a talk where you're worried that the effects could be quite negative if by some unexpected turn of events the talk ended up having too much influence. However, after having spent a number of months thinking about this, I eventually came around to the position that a surviving world is almost certainly one where we're able to integrate these kinds of ideas without an excessive amount of negative side-effects[7]. Whilst there are no guarantees,Β 

Whilst the presentation is novel, I don't want to claim any great originality for most of the ideas here. As much as I admire original thought, sometimes the moment demands something else. I see today as being about taking the wisps of vague beliefs floating in the air, capturing them, and then reshaping them into a new, more potent form.

parts of this feel like a manifesto

There's some truth in that. This may make some readers nervousβ€”but the optimal number of manifestos is non-zero.Β 

That said, insofar as this is a manifesto, it really isn't a manifesto in a traditional sense. Instead, it's something more fluid, something more exploratory[8]. The question "What would I write if I believed in this with complete and utter conviction" is often a fruitful path for learning what it would be like to fully inhabit a perspective.

why are parts intentionally provocative?

In my experience, often the best way to deeply understand a perspective is to engage with someone who is "going hard on it". I think there's something to be said for letting people add their own 'salt and pepper', by which I mean skepticism, rather than always optimizing for the "lowest common denominator".

isn't narrative construction exceptionally dangerous?

If narratives will play a significant role in shaping societyβ€”whether we will or noβ€”then refusing to engage in constructing any narratives abandons the field to malicious or naive actors. Additionally, my intuition is that avoiding consciously adopting a narrative/frame simply means unconsciously adopting one instead[9].

In light of the above, I am cautiously in favour of engaging in this activity to at least some degree[10]. However, this is not an excuse for trying to manipulate people. It's important to be open about what you're doing. You should be transparent about how you're emphasising some things and intentionally leaving other things out. People will have varying opinions on the wisdom of your choices, but at least they'll have the information they need to critique your narrative, reshape it, or construct an alternative.

what do you mean by wisdom in this context?

For the purposes of this talk, I'm thinking about wisdom in the following way:
β€’ I'm primarily thinking about wisdom in terms of the ability to help us make decisions that assist us in steering the world in a positive direction.
β€’ I don't want to define wisdom in such a way that I assume wisdom involves particular moral commitments or ontological assumptions. Doing so would unnecessarily narrow the audience for this talk.
β€’ I'm not aiming to clearly distinguish between AI for forecasting, AI for epistemics and AI for wisdom. These different directions are close enough that it isn't really worth treating them separately during a broad introduction like this.

My deepest gratitude to my ERA research manager Peter Gebauer and mentor Professor David Manley. I was not the easiest person to mentor, nor is this necessarily the ideal kind of output they would most have liked me to produce, and so I especially appreciate the patience they demonstrated. Without a doubt, this talk would have ended up much poorer without their assistance and advice. A fuller acknowledgements appears at the end.
If you want to build a ship, don’t drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless ocean. - Unknown origins
The feature presentation will begin shortly...
5, 4, 3, 2, 1...

β–ŒI want to lay out a scene

Imagine this. You’re in a car hurtling down a twisting mountain road[11], shrouded in a thick, shifting fog. The fog hides a whole minefield of obstaclesβ€”boulders in the road, fallen trees, sudden drops, and other cars swerving into your lane. And if all that weren’t enough, your brakes are, shall we say, just a little bit β€œiffy”.

This metaphor represents what I call the Β πŸ…‚πŸ…„πŸ……β€ƒπš πš› πš’ 𝚊 𝚍 Β which encapsulates how I see the strategic situation when it comes to AI.

πŸ…‚ πš™ 𝚎 𝚎 𝚍
πŸ…„ πš— 𝚌 𝚎 πš› 𝚝 𝚊 πš’ πš— 𝚝 𝚒
πŸ…… 𝚞 πš• πš— 𝚎 πš› 𝚊 πš‹ πš’ πš• πš’ 𝚝 𝚒

Predicting the future is almost certainly a risky endeavour. Nonetheless, I Β would like to encourage you to try anywayβ€” I believe it would be a grave mistake to allow ourselves to become defeatist here. In particular, I want you to ask yourself what the above framing, if true, would imply for our future. If you're anything like me, then I expect you'll come away with the feeling that the odds of catastrophe are far too high. However, if we choose correctly, then maybe (just maybe!), we can pull through this.

❧

presenting the Β πŸ…‚πŸ…„πŸ……  𝚝 πš› πš’ 𝚊 𝚍

☞ provocation: catastrophe is the default, not than the exception

πŸ…‚ πš™ 𝚎 𝚎 𝚍  β€” 😱⏳: Β AI is developing at an astounding rate[12], saturating benchmarks faster than we can construct new ones. As it begins to feed back into its own development[13], the possibility arises that in the near futureβ€”perhaps the very near futureβ€”we may look back retrospectively on the last year and view the rate of progress as slow and quaint.

The term "insanely fast" is typically used figuratively, but here it is literal. Even if we knew with utter certainty that the alignment problemΒ were easy, I still don't see how we could race as fast as humanly possibleβ€”barely an exaggeration[14]β€”towards superintelligence[15]Β without disaster being almost certain[16]. I like to think, perhaps naively, that we’ll manage to coordinate on a pause or slow down at some pointβ€”but even if everyone wanted it individually, competitive dynamics could still torpedo it.

πŸ…„ πš— 𝚌 𝚎 πš› 𝚝 𝚊 πš’ πš— 𝚝 𝚒[17]Β β€” πŸŒ…πŸ’₯:Β  We don’t even agree on the basics. Is AGI two yearsΒ away or twenty? Do we accelerate to win the arms race, or pause to get governance right? Is the search for a principled solution to alignment naive, or is it necessary? With so much uncertainty, even well-intentioned moves risk steering us directly into the very dangers we’re trying to avoid[18]. We can't wait for conclusive evidenceβ€”it may simply come too late[19]. Worse: the future may very well turn out to be even murkier than even the present.

πŸ…… 𝚞 πš• πš— 𝚎 πš› 𝚊 πš‹ πš’ πš• πš’ 𝚝 𝚒[20]Β β€” 🌊🚣: Β AI is the ultimate general-purpose technology[21]β€”incredible upside matched by equally incredible downside. There’s so many ways this could go horribly wrong: catastrophic malfunctions, the proliferation of bioweapons, war machines devoid of any human mercy, or perhaps even something that hasn’t yet been dreamed. Here's an unsettling observation: every time it seems as though we might have found all the wayds things could go wrong, someone always, always comes up with a new one [A, B, C, D, E, F, G, H, I …]. And, of course, these threats aren’t just isolated; they can combine and amplify[22].

niche aside: this can be interpreted as a polycrisis framing of AI risk[23]
"But that's not a 'twisty mountain road' or a 'thick fog'"β€”Yeah, but it fits the mood perfectly. Sorry, not sorry.
❦
The way I imagine it is that there is an avalanche, like there is an avalanche of AGI development, imagine it, this huge unstoppable force β€” Ilya Sutskever
❦
Science bestowed immense new powers on man and at the same time created conditions which were largely beyond his comprehension and still more beyond his control. While he nursed the illusion of growing mastery and exulted in his new trappings, he became the sport and presently the victim of tides, and currents, of whirlpools and tornadoes amid which he was far more helpless than he had been for a long time. Β β€” Winston Churchill, March 31st, 1949
❦
The world will become ever more confusing... More hostile, more impenetrable, both intentionally and intentionally... We will see geopolitical events unfold that are so obscured by fake narrative that it becomes impossible to discern truth from fiction... Twisting the justice system to levels of perverse complexity utterly impenetrable to any human. New technologies being invented that people can barely understand their function or where they even came from... A whirlwind confusing world of such speed and dark complexity that the unaugmented human without AI assistance is a sitting duck to AI powered manipulation and exploitation... Β β€” Connor Leahy, This House Believes Artificial Intelligence Is An Existential Threat, Cambridge Union Debate
❦
β€œI say to you againe, doe not call up any that you can not put downe; by the Which I meane, Any that can in Turne call up somewhat against you, whereby your Powerfullest Devices may not be of use. Ask the Lesser, lest the Greater shall not wish to Answer, and shall commande more than you.” Β β€” Lovecraft
❧

the πŸ…‚πŸ…„πŸ……  𝚝 πš› πš’ 𝚊 𝚍 Β presents a formidable challenge

☞ blackpill: there’s no rule that says we'll make it[24]

Even two factors would be hard enough, but three? That is another matter entirely. Without speed, we could slowly chip away at uncertainty and tackle problems one by one. Without uncertainty, we’d know what resources would be required and perhaps we could muster the will, even if the cost initially sounded unthinkably high. Without vulnerability, many vulnerabilities, we’d at least have the advantage of focus. But having to navigate them all simultaneously…

That would seem to demand truly exceptional judgmentβ€”knowing where to direct our attention, what needs to be done, and where we can move fast without accidentally cutting a critical corner. It's important to be clear. The task that stands before us is not merely difficult. Instead, we need to understanding that we face a task of overwhelming complexity and uncertainty. Maybe we give it our all but our all is like one man trying to roll back the tide and our only reward is to completely and utterly and humilinatingly fail. They say "life isn't fair" and the challenges we face may not be either.

❦
β€œI wish it need not have happened in my time," said Frodo.
"So do I," said Gandalf, "and so do all who live to see such times. But that is not for them to decide. All we have to decide is what to do with the time that is given us.” - J.R.R Tolkein, The Lord of the Rings

silver bullets: an impossible dream?

☞ provocation: hoping we'll find a silver bullet in time is more cope than an actual plan.

We might hope that beneath this apparent complexity lies a hidden simplicityβ€”a silver bullet[25]Β (that is, a comprehensive solution) waiting to be found:Β 

  • Perhaps we will discover a mathematically provable alignment technique, persuade the leading lab to implement it, and then we can sit back and wait as the resulting AI fully emerges to begin the process of optimizing the world.
  • Perhaps we’ll devise a perfect privacy-preserving on-chip mechanism that makes dangerous training runs impossible, set a speed limit on capability increases, and convince all the major powers to place their faith in this plan.
  • Or perhaps we can convince governments mutually assured AI malfunction isn't insane, avoid defensive sabotage inadvertently escalating into a nuclear war and defend against threats from less capable AI by investing deeply in societal resilience.

We cannot entirely dismiss these kinds of possibilities. But to pin all our hope on this... feels far too reckless to me. Unfortunately, while many people believe they may have identified a comprehensive solution, the only consensus is that everyone else is wrong[26]. How could one possibly feel confident under such circumstances[27]?

❦
A striking theme from the history of such achievements is that there is rarely if ever a silver bullet for risk. β€” Jason Crawford, No Silver Bullet[28]
❧

can we just stumble through?

☞ provocation: the delusion was always this: that consequences would queue politely, ordered conveniently from minor disruptions to catastrophic threats, each awaiting its proper turn...

If we abandon the search for a silver bullet, then perhaps the most seductive alternative becomes this: stumbling through, the way humans always have. It's a process that lacks elegance but not precedent.

Build something, watch it break, make it better. Steam engines exploded before we invented pressure relief valves. Cities burned before fire codes. Workers died before safety regulations. Each failure taught a lesson; each disaster better prepared us for the future. This gradual accumulation of knowledge, hard-won and often purchased with blood, has carried civilization forward for millennia.

However, this familiar narrative rests on a crucial assumption: that we'll make our mistakes while AI is still relatively weak, learning and adapting as we go. And that by the time AI becomes truly powerfulβ€”powerful enough that missteps become catastrophicβ€”we'll have already gained the necessary experience and developed appropriate safeguards. Now, the confidence some people have in this plan doesn't spring from nowhere. After all, this is how we've always done it: our first buildings weren't skyscrapers, our first boats weren't aircraft carriers and our first airplanes weren't jumbo jets. We scaled up gradually, incorporating lessons at each stage.

Unfortunately, AI refuses to follow this script. Capabilities advance at a pace that leaves little time to absorb one breakthrough before the next arrives. Worse, this progress is dangerously jagged[29]. Β This is what scares me: we're on the verge of developing AI sufficiently capable to help malicious actors design bioweapons and fully automate large-scale cyberattacks, yet we're still stuck working through the consequences of previous advances: corporate governance, deepfake scammers, fundamental questions about copyright. When technology moves faster than we can adapt, "stumbling through" ceases being a strategy and simply becomes cope.

❦
In a hotel room in Santa Clara, Calif., five members of the AI company Anthropic huddled around a laptop, working urgently. It was February 2025, and they had been at a conference nearby when they received disturbing news: results of a controlled trial had indicated that a soon-to-be-released version of Claude, Anthropic’s AI system, could help terrorists make biological weapons... After hours of work, they still weren’t sure whether the new product was safe. Β β€” Time Magazine
❦
A normal person assisted by AI will soon be able to build bioweapons... Imagine if an average person in the street could make a nuclear bomb. β€” Geoffrey Hinton
❧

not a masterplan, but a tapestry of threads of partial progress

☞ the hope: individually, each thread may be limited and yetβ€”stitched togetherβ€”they may be enough.

So if silver bullets remain elusive and stumbling through appears unviable, what options remain?

Perhaps we should start with an observation. Stumbling through hit upon one truth: our most likely future is neither clean nor singular, especially given our accelerated timeline. Instead, we should expect to make incremental progress along multiple fronts simultaneously: better-but-not-foolproof safety techniques, helpful-but-not-perfect governance tools, and revealing-but-not-complete insights into model psychology. This is not an embrace of the chaos of uncoordinated efforts, but an endorsement of a large-scale effort to skillfully and cooperatively[30]Β weave a messy yet functional tapestryΒ from a multitude of threads of partial progress[31]. Rather than relying on any individual thread, we stake our future on the whole.

This act of combinationβ€”of deciding how to weigh trade-offs, allocate limited resources, and sequence interventionsβ€”is not at all simple. It is a complex, ill-defined problem of the highest order. How much risk do we accept from a partially interpretable model in exchange for its insights into the alignment problem? How can we ensure enough oversight to prevent catastrophic misuse but not so much that we fall prey to government abuse or even totalitarianism[32]? How do we evaluate the downstream effects of accelerating a particular line of research and deliberately slowing another?

Answering these questions requires balancing a host of considerations that transcend any fixed set of rules. In other words, it demands wisdom.

❦
The fundamental test is how wisely we will guide this transformation – how we minimize the risks and maximize the potential for good β€” AntΓ³nio Guterres, Secretary-General of the United Nations
❦
Never has humanity had such power over itself, yet nothing ensures that it will be used wisely, particularly when we consider how it is currently being used…There is a tendency to believe that every increase in power means β€œan increase of β€˜progress’ itself”, an advance in β€œsecurity, usefulness, welfare and vigour; …an assimilation of new values into the stream of culture”, as if reality, goodness and truth automatically flow from technological and economic power as such. β€” Pope Francis, Laudato si'
❦
Yeah, so um... I don't really know, but wisdom feels like it may possibly be kind of important. I guess β€” me

the limits of human wisdom

☞ provocation: here’s a deeply uncomfortable truth: human cognition simply isn't built for this

But where might we find such wisdom? Honestly, requires us to conside that evolution may have left us ill-equipped to handle this[33]. Our minds were shaped to handle local, immediate, tangible threatsβ€”the predator in the grass, the coming storm. We’re not wired[34]Β to intuitively grasp exponential curves, to predict cascading feedback loops, or to navigate global scale decisions under normal circumstances, let alone under radical uncertainty and intense time pressure.

Our cognitive toolkit is riddled with biases that were once adaptive but are now liabilities: normalcy, which makes us underestimate novel threats; tribalism[35], which cripples, absolutely cripples, our ability to coordinate on shared, long-term survival; and optimismβ€”pathological optimismβ€”which convinces us that we can race through a minefield and somehow miraculously emerge unscathed.

❦
For the wisdom of this world is foolishness with God. For it is written, He taketh the wise in their own craftiness. β€” King James Bible
❦

reflection: tribalism

Of the diverse and numerous cognitive distortions that humans possess, tribalism is by far the most threatening. Humans are generally committed to overcoming their cognitive biases, but tribalism is an exception through which any of the other biases can make their return. It cynically wields any cogntive flaw or weakness it can get its hands on to make you feel justified in believing what is socially convenient. If you can't win an argument, it'll cause you to muddy the waters, blow up the debate or turn it into a game of who can slander or defame who the hardest, often whilst being in denial about what your doing[36]. Intelligence provides extremely limited protection and can even make you more vulnerable. Unfortunately, it is often disturbingly easy for a determined sub-group to disrupt the formation of any social consensus against their interests. That is, far too often, the heckler's veto wins out.
❦
Ihor Kendiukhov's The Lethal Reality Hypothesis provides the more sophisticated version of the argument in this sectionβ€”that is, the version I wish I could write. He offers a wide and sweeping analysis that ranges over game theory, observer-effects, ecological niches, entropy, feedback loops and minimum viable intelligence.
❧

the  𝚠 πš’ 𝚜 𝚍 𝚘 πš– – 𝚌 𝚊 πš™ 𝚊 πš‹ πš’ πš• πš’ 𝚝 𝚒 Β  𝚐 𝚊 πš™

☞ provocation: it increasingly feel like that we’re merely dancing around the central problem: that human wisdom is deeply limited, especially within the timescales that matter.

The complete mismatch between the complexity of our new reality and the limits of our minds creates what’s been called the wisdom-capability gap: a growing chasm between the judgment we need to face what lies ahead and the judgment we collectively possess. We are demanding more and more from ourselves, even as the conditions for wise decision-making have deteriorated[37].

Yes, we can and must push human wisdom furtherβ€”and the supporting infrastructure as well: new institutions, better epistemic tools, improved decision frameworks. But such gains tend to be incremental and hard-won. Yet the very technologies demanding this wisdom are advancing exponentially, not incrementallyβ€”creating a pace gap we may be unable to close by developing human wisdom alone.Β 

This leads us to a somewhat paradoxical conclusion: if our own wisdom is the key bottleneck, then perhaps one of the most crucial tasks ahead of us will be learning to leverage the very technology creating this crisis in order to help us navigate it.Β 

That is, perhaps we ought to build wise AI advisors[38].

❦
aside: relevant extracts

By default, the direction the world goes in will be a result of the choices people make, and these choices will be informed by the best thinking available to them. People systematically make better, wiser choices when they understand more about issues, and when they are advised by deep and wise thinking.

Advanced AI will reshape the world, and create many new situations with potentially high-stakes decisions for people to make. To what degree people will understand these situations well enough to make wise choices remains to be seen.

To some extent this will depend on how much good human thinking is devoted to these questions; but at some point it will probably depend crucially on how advanced, reliable, and widespread the automation of high-quality thinking about novel situations is.

We believe that this area could be a crucial target for differential technological development, but is at present poorly understood and receives little attention.

β€” Owen Cotton-Barratt, 2024 Essay Competition on the Automation of Wisdom and Philosophy Announcement Post

❦

We could call the (following) set of such capabilities "artificial wisdom" rather than β€œartificial intelligence”:

  • AI for forecasting and strategic foresight, to help human decision-makers to know what’s coming, including what capabilities would soon come on-line from continued AI progress.
  • AI for policy analysis and advice, to provide a much better understanding of what policy responses are available, and what the effects of those policy options would be.
  • AI for ethical deliberation, to help reason through the ethical quandaries that such new developments might pose (for example, around digital sentience), and/or quickly aggregate the preferences of a wider swathe of the electorate or humanity as a whole than is normally possible.
  • AI that assists in making trades or agreements, to identify positive-sum trades or treaties that could be agreed-upon, whether that’s between labs, between labs and governments, or between governments.
  • AI for rapid education and tuition, to help decision-makers and society at large get up to speed on the latest technological developments and geopolitical changes in what would be an extremely fast-changing world.

    "Helpful” here refers in particular to helpfulness for governments, companies, and broader society to respond to risks posed by rapid AI tech progress.

β€” Will MacAskilll, on Encouraging the Most Helpful AI Capabilities[39]
❦

"If the alternative were halting all AI progress, building wise AI would introduce added risks. But compared to the status quoβ€”advancing capabilities at a breakneck pace without wise metacognitionβ€”the attempt to make machines intellectually humble, context-adaptable, and adept at balancing viewpoints seems clearly preferable...

(Further) wise metacognition can lead to a virtuous cycle in AI, just as it does in humans. We may not know precisely what form wise AI will takeβ€”but it must surely be preferable to folly."

β€” Samuel Johnson, Amir-Hossein Karimi, Yoshua Bengio, Igor Grossmann, et al., Imagining and building wise machines: The centrality of AI metacognition

❦

"The AI tools/epistemics space might provide a route to a sociotechnical victory... Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics.

... I think these points are widely appreciated, but most people don’t seem to have really grappled with the implications β€” most centrally, that we should plausibly be aiming for a massive increase in collective reasoning and coordination as a core x-risk reduction strategy, potentially as an even higher priority than technical alignment."

β€” Β Raymond Douglas, β€˜AI for societal uplift’ as a path to victory

❦

aside: is artificial wisdom even possible?

This is a key crux. No matter how useful this would be, attempting to create such advisors is only worthwhile if it is actually feasible. I believe that it is. Whilst this I'll leave most of my thoughts for a future post, I'll share a few words here.

On the meta-level, I suspect that because none of the major labs have made a major effort[40]Β to increase the wisdom of their models (as opposed to having vague aspirations), there is a bunch of low-hanging fruit lying around. In the hands of a skilled operator with a healthy dose of skepticism, AI can already dispense useful advice for all kinds of matters. There is no obvious reason why we should be anywhere near the cap. I'd suggest that we could quickly make significant progress with even modest interventions such as architecting multi-agent systems, tweaking the model spec and hiring HCI experts to reshape the user interface.

On a more concrete level, I believe that (amplified) imitation learning is a promising baseline technique[41]. Imitation learning is often dismissed as being weak, but base imitation learning agents can be significantly enhanced by various techniques: debate, trees of agents, RAG applied to all the information on the internet and more. I won't deny that the choice of who to imitate poses some troubling questions; nonetheless, I see making these kinds of choices as unavoidable given that objectivity can only take us so far. The downsides are soften by our ability to at least partially mitigate this by training models to imitate a diverse range of thinkers and allowing users to determine which perspectives they want represented.

If we choose thinkers who are alive and willing to collaborate with the project, then we can supplement their writing by collecting additional data to fill in any major gaps. We'd also want some kind of out-of-distribution detection to reduce the chance of these agents going off the rails.

Maybe I haven't convinced you that progress here is feasible, but even then, I claim it would be foolish for humanity to give up on training AI to be wise without at least making a concerted effort; without at least seeing what progress can be made.

in conclusion

☞ where we stand: we need to make a decision. one that will determine our future.

The Β πŸ…‚πŸ…„πŸ…… Triad Β leaves us speeding down a precarious, foggy highway littered with hazards. We didn’t evolve to handle these kinds of threats; stumbling through looks more like stumbling into disaster, and the dream of a silver bullet is slowly fading away. However, perhaps there is still hope, but only if we decide to get serious; only if we decide that we actually want to win.

Winning here means accelerating trustworthy AI advisors, not just raw capabilities. It means finding the rare person who is able to assist in this quest and giving them the resources they need to make a difference. It means staring directly at the future, being fully cognizant of the overwhelming magnitude of the challenge ahead, but deciding to fight for our future regardless.

Starring down from the bluff, we are here...Β 
Time to wake, it's not a moment left to lose. Night is falling, fate is tightening the noose...
So when our leader's failing, we decide to stop the trailing, we are making final turn to right!
We're choosing life! We're choosing life! We're choosing life! We're choosing life! We're choosing life! We're choosing life! We're choosing li-i-i-i-ife!
❦
The years in front of us will be impossibly hard, asking more of us than we think we can give. But in my time as a researcher, leader, and citizen, I have seen enough courage and nobility to believe that we can winβ€”that when put in the darkest circumstances, humanity has a way of gathering, seemingly at the last minute, the strength and wisdom needed to prevail Β β€” Dario Amodei, The Adolescence of Technology

Vanity of vanities! All is vanity! β€” Part 2 β€” tribute or things that totally happened

✧ (rapturous applause, reverent awe, a girl has tears in her eyes. afterwards a guy comes up to him and asks who he is to be able to speak with Suchet authority)

β–  "a girl with tears in her eyes?"β€”come on now, do you take me for a fool? you just made that up!

✧ did i? did i?

β–  YES! yes you did! just like the time you told me that your "elucidation" of the Heideggerian concept of angst triggered a kenshō awakening experience in your friend. i have it on solid authority that you haven't read even twenty pages of Heidegger, and i'd be surprised if you understood even two. (tone shifts) in fact, i heard that you botched your presentation so badly that you were shouted down and booed off the stage

✧ many fine presenters presenting many fine presentations have been booed off stage... i meanβ€”i speak in generalities. i, of course, am not one of them!

β–  If you say so. I suppose you're also going to deny that NICK BOSTROM was so horrified by how badly you butchered his work that he stormed out after five minutes!

✧ (shifting uneasily) yeah, so about that... if there's one thing i know with absolute certainty, it's that the particular unspeakable, God-forsaken presentation you're referring to... never happened πŸ˜‰. your friend is clearly deeply disturbed. may God have mercy on his soul πŸ™.

❦

I'd like to share a few more words

Let's bring this back down to earth[42]. I have only presented one possible interpretation of the future; there are others.

It hath been said that truth is stranger than fiction. I consider this profound[43]. Fiction has to feel plausible, while truth is not bound by any of our narratives or preconceptions.

Β So it is worth naming a few of the main ways the story I have told here could be wrong.

  • First, I said that there's unlikely to be a silver bullet, yet many people seem to believe in the existence of one (they just all appear to believe in different ones). While I'm familiar with a wide range of alignment proposals, it's simply impossible to read or address everything. Therefore, despite my skepticism, a truly compelling proposal may already existβ€”or, if not, could yet be developed.
  • Second, there's a possibility that timelines aren't just short but extremely short. There may simply be no time for humanity to develop its wisdom. Focusing on wise AI advisors likely allows us to move faster, but surely there's some limit there. Shorter timelines mean less time to leverage any such wisdom to steer civilization in a positive direction. This deeply concerns me, but we need plans that operate on a variety of timelines. Maybe it's cope, but I don't feel a strong urge to try shifting the needle on ultra-short-timeline scenarios, as the outcome may already be determined regardless of what we do.
  • Third, many AI capabilities are a dual use technology. Just because a capability sounds great at first doesn't mean that a deeper analysis wouldn't overturn this.

Alternate ending: stop, just stop

the  𝚠 πš’ 𝚜 𝚍 𝚘 πš– – 𝚌 𝚊 πš™ 𝚊 πš‹ πš’ πš• πš’ 𝚝 𝚒 Β  𝚐 𝚊 πš™

The complete mismatch between the complexity of our new reality and the limits of our minds creates what’s been called the wisdom-capability gap: a growing chasm between the judgment we need to face what lies ahead and the judgment we collectively possess. We are demanding more and more from ourselves, even as the conditions for wise decision-making have deteriorated.

Yes, we can and must push wisdom furtherβ€”and the supporting infrastructure as well: new institutions, better epistemic tools, improved decision frameworks. But gains in human wisdom tend to be incremental and hard-won; and our ability to train wise AI advisors is fundamentally limited by how wise we are ourselves[44]. It seems increasingly likely that we’re merely dancing around the central problem: that the wisdom we will have access toβ€”human or AIβ€”is far too limited, especially within the timescales that matter.

This leads us to a rather unfortunate conclusion: if we can't fix the wisdom bottleneck, THEN WE NEED TO JUST FUCKING[45]Β STOP. That is, perhaps we ought not kill ourselves[46].

❧

in conclusion

The Β πŸ…‚πŸ…„πŸ…… Triad Β leaves us speeding down a precarious, foggy highway littered with hazards. We didn’t evolve to handle these kinds of threats; stumbling through looks more like stumbling into disaster, and the dream of a silver bullet is slowly fading away. However, perhaps there is still hope, but only if we decide to get serious; only if we decide that we actually want to win.

Winning here means stopping the suicide race, not just slowing capabilities. It means finding the common person who is willing to aid in this quest and giving them the opportunities they need. It means staring directly at the future, being fully cognizant of the overwhelming magnitude of the challenge ahead, but deciding to fight for our future regardless.

❧

(Feel free to share your own ending in the comments if you feel so drawn.)

Β We have time for a few questions

There are many topics that I wasn't able to cover during the course of this talk. These include:

  • More details on how I conceive of wisdom
  • Why use of the term 'wisdom' isn't just a pointless relabelling
  • Possible negative externalities of wisdom
  • How I see an ITN[47]Β analysis playing out

Feel free to ask about these questions, or anything else that you want to know 😊.

You should also feel free to message me if this is an area that you might be interested in working on.

I'll leave you some additional resources

I think it'd be perfectly valid to consider my talk as an absurdly high-effort advertisement for my reading list:

motivation

β€˜AI for societal uplift’ as a path to victory by Raymond Douglas β€”Β  LW Post : Examines the conditions in which a "societal uplift" - epistemics + coordination + institutional steering - might or might not lead to positive outcomes

N Stories of Impact for Wise AI Advisors β€”Β  Draft πŸ—οΈ : Different stories about how wise AI advisors could be useful for having a positive impact on the world.

International AI projects should promote differential AI development by Will MacAskill β€”Β  Substack : Argues that these projects should differentially favour capabilities related to "artifical wisdom", such as forecasting, ethical delibration and negotiation.

artificial wisdom

Imagining and building wise machines: The centrality of AI metacognition by Johnson, Karimi, Bengio, et al. β€”Β  Β Paper ,Β  Summary : This paper argues that wisdom involves two kinds of strategies (task-level strategies & metacognitive strategies). Since current AI is pretty good at the former, they argue that we should pursue the latter as a path to increasing AI wisdom.

Finding the Wisdom to Build Safe AI by Gordon Seidoh Worley β€”Β  LW post : Seidoh talks about his own journey toward becoming wiser through Zen and outlines a plan for building wise AI. In particular, he argues that it will be hard to produce wise AI without having a wise person to evaluate it.

Design Sketches: Angels-on-the-Shoulder by Owen-Cotton Barratt et al. β€”Β  Article : Sketches some products that might help people make more decisions that they'd endorse.

Designing Artificial Wisdom: The Wise Workflow Research Organisation by Jordan Arel β€”Β  EA forum postΒ  (πŸ† won a runner up prize in the AI Impacts competition): Jordan proposes mapping the workflows within an organisation that is researching a topic like AI safety or existential risk. AI could be used to automate or augment parts of their work. This proportion would increase over. The hope is that this would eventually allow us to fully bootstrap an artificially wise system.Β 

Should we just be building more datasets? by Gabriel Recchia β€”Β  SubstackΒ  (πŸ† won 4th prize in the AI Impacts Competition): Argues that an underrated way of increasing the wisdom of AI systems would be building more datasets (whilst also acknowledging the risks).

Tentatively Against Making AIs 'wise' by Oscar Delany β€”Β  EA forum postΒ  (πŸ† won a runner-up prize in the AI impacts competition): This article argues that insofar as wisdom is conceived of as being more intuitive than carefully reasoned, pursuing AI wisdom would be a mistake as we need AI reasoning to be transparent. I've included this because it seems valuable to have at least one critical article.

Wisdom & AI Β  -Β  Community : "a network of Buddhist teachers, AI professionals and leadership experts"

neighbouring areas of research

What's Important In "AI for Epistemics"? by Lukas Finnveden β€”Β  Forethought : AI for Epistemics is a subtly different but overlapping area. It is close enough that this article is worth reading. It provides an overview of why you might want to work on this, heuristics for good interventions and concrete projects.

Using AI to enhance societal decision making β€”Β  80,000 Hours Career Profile : Discusses the reasons why someone might want to work on this, possible counter-arguments and ways of getting involved.

AI for AI Safety by Joe Carlsmith β€”Β  LW post : Provides a strategic analysis of why AI for AI safety is important whether it's for making direct safety progress, evaluating risks, restraining capabilities or improving "backdrop capacity". Great diagrams.

AI Tools for Existential Security by Lizka Vaintrob and Owen Cotton-Barratt β€”Β  Forethought : Discusses how applications of AI can be used to reduce existential risks and suggests strategic implications.

Not Superintelligence: Supercoordination β€”Β  Forum postΒ [48]: This article suggests that software-mediated supercoordination could be beneficial for steering the world in positive directions, but also identifies the possibility of this ending up as a "horrorshow".

human wisdom

Stanford Encyclopedia of Philosophy Article on Wisdom by Sharon Ryan -Β  SEP article : SEP articles tend to be excellent, but also long and complicated. In contrast, this article maintains the excellence while being short and accessible.

Thirty Years of Psychological Wisdom Research: What We Know About the Correlates of an Ancient Concept by Dong, Weststrate and Fournier -Β  Paper : Provides an excellent overview of how different groups within psychology view wisdom.

The Quest for Artificial Wisdom by Sevilla -Β  Paper : This article outlines how wisdom is viewed in the Contemplative Sciences discipline. It has some discussion of how to apply this to AI, but much of this discussion seems outdated in light of the deep learning paradigm.

applications to governance

Wise AI support for government decision-making by Ashwin -Β  SubstackΒ  (πŸ† Β Prize winning entry in theΒ  AI Impacts Automation of Wisdom and Philosophy Competition ): This article convinced me that it isn't too early to start trying to engage the government on wise AI. In particular, Ashwin considers the example of automating the Delphi process. He argues that even though you might begin by automating parts of the process, over time you could expand beyond this, for example, by helping the organisers figure out what questions they should be asking.

some of my own work

Potentially Useful Projects in Wise AI Β β€”Β  EA forum post : An attempt to list projects which I would expect to be positive EV.

πŸ† My third prize-winning entry in theΒ  AI Impacts Automation of Wisdom and Philosophy CompetitionΒ  (split into two parts):

β€’Β  Some Preliminary Notes on the Promise of a Wisdom Explosion : Defines a wisdom explosion as a recursive self-improvement feedback loop that enhances wisdom, unlike intelligence as per the more traditional intelligence explosion. Argues that wisdom tech is safer from a differential technology perspective.

β€’Β  An Overview of "Obvious" Approaches to Training Wise AI Advisors : Compares four different high-level approaches to training wise AI: direct training, imitation learning, attempting to understand what wisdom is at a deep principled level, the scattergun approach. One of the competition judges wrote: "I can imagine this being a handy resource to look at when thinking about how to train wisdom, both as a starting point, a refresher, and to double-check that one hasn’t forgotten anything important".

🎁 Bonus slides

πŸ—‘οΈ rejected titlesΒ  β€” worlds that could have been

𝚝 πš‘ 𝚎 Β πŸ…‚πŸ…„πŸ……  𝚍 πš’ 𝚊 πš› πš’ 𝚎 𝚜

Mental Organisms of Mesa-lignment

Can Chris Survive the ERA Fellowship?

Dr Love-StarVed: Or How I Longed to Worrying and Build Da Bomb

1) What

πŸ“Š additional graphics and tablesΒ  β€” a picture is worth a thousand words

Β View on Life ItselfΒ 
TheΒ πŸ…‚πŸ…„πŸ……Β Triad
πŸ…‚peed - in absolute terms and relative to the speed of governance
πŸ…„ncertainty -Β regarding the situation and strategyΒ 
πŸ……ulnerability -Β many catastrophic threats that are hard or costly to defend against
Β View on MetaculusΒ 
Β View Goodheart Lab's forecast aggregatorΒ 
Β A Definition of AGI based on Cattell-Horn-Carroll theoryΒ 
Β View on Dewi Erwan's GithubΒ 
Β View at International Scientific Report on the Safety of Advanced AIΒ 
Β View on OpenAI BlogΒ 
Β View on METRΒ 
Β Epoch Capabilities IndexΒ 

Please note that benchmarks suffer from a variety of limitations [arbitary y-axes, streetlight effect in task selection[49], Goodharting, ect.]

πŸ‘ Β full acknowledgementsΒ  β€” credit where credit is due

Getting to this point has been a long and sometimes rough journey. Thanks to all the people who assisted me with making it here, only a small fraction of whom are listed below.Β 

Most of this work was completed whilst I was an ERA Technical Governance fellow. I would like to thank the ERA Fellowship for its financial and intellectual support.

More specifically, thank you to my mentor Professor David Manley for always pushing me to think harder about things that I may have missed and my research manager Peter Gebauer for consistent support.

Also, thank you to Christopher Clay who I was collaborating with on another post for helping me work through some of the ideas that made it into this post and helping me coin the name "πŸ…‚πŸ…„πŸ…… Triad".

Thank you to Will MacAskill for suggesting that I simplify my original name for the πŸ…‚πŸ…„πŸ…… Triad.

This production of this article involved significant amounts of AI assistance, however, for the sections that made the heaviest use of AI, I've gone over them more times than you would believe to ensure their are no subtle distortions.

The seeds of some of these ideas were initially developed whilst I was leading an AI Safety Camp project on Wise AI Advisors. Thank you to AISC for your financial support and to Richard Kroon, Matt Hampton and Christopher Cooper for your insightful discussions.

Thanks to Justis who provided feedback via the EA Forum feedback mechanism.

Thanks to Rupert McCallum who let me stay at his place whilst I was preparing for a talk on this research.

❦

reflection: attractor states are truly a thing of beauty. but one must beware: for whilst we yearn for hymns to draw out the best in humanity; their true nature is often that of a siren call luring us to our doom. few can approach them. most are forced to flee or consumed.

One cannot be truly great without the ability to soar up in the clouds; but the longer and higher the flight, the greater the risk. One must also be able to plant their feet back on the ground, lest they perish. Perhaps most people should never attempt to take-off; perhaps most such attempts are vanity or hubris.

Truth is rarely what we'd wish it to be. It tends to be imperfect, messy and inconvenient. It is something that we must wrestle with. I don't think I've ever found a perfect truth, and likely I never will. β€” Chris

  1. ^

    "Collective and individual wisdom has increased... but not fast enough to catch up with the rise in power of the tools we are building"

  2. ^

    In 2026, the more likely answer is that they were too busy meming to introspect.

  3. ^

    This is not the first attempt at such a unification - see Connor Leahy's Nostros for one such example.

  4. ^

    Post-rationalism seems to defy definitionβ€”everyone seems to want to conceive of it slightly differentlyβ€”so I've included three separate links: 1, 2, 3 (I wrote the middle one with my alt-account).

  5. ^

    Even though I have described this talk as inspired by post-rationalism, I actually think that post-rationalism is more continuity with rationalism than it appears at first glance. After all, Eliezer didn't just deliver arguments, but he told persuasive stories (not to mention the "nameless virtue of rationality").Β 

    Provocation: The seeds of post-rationalism were always within rationalism. At its best, post-rationalism simply takes these seeds to their logical conclusions.

    ❦

    Claude (explaining how Hegel's Dialectic differs from its 'thesis-antithesis-synthesis" popularisation): "The movement is more organic: a concept, through its own internal logic, reveals its inadequacy and passes over into what initially appeared as its opposite, and both are then aufgehoben (sublated)β€”cancelled, preserved, and elevated in a richer concept."

  6. ^

    Well, post-structuralism if you're nitpicky.

  7. ^

    See Eliezer Yudkowsky's Every Cause Wants to Be A Cult.

  8. ^

    "If only we could write that manifestoβ€”the one that controls the fate of the world. We will write words that tell it as it is and create things as they should be. With our words, we'll nail the truth into other minds, both descriptively and normatively, forever. One truth, for all minds, for all time; the telos of the mind.

    ... No mind has accomplished this, yet people keep writing manifestos as if something substantial will change. It's time to try something else, and give impermanence a chance.

    .... We need not do away with manifestos either. We can simply shift our intention, seeing them as a process, not an outcome. A manifesto can be living" β€” Peter Limberg, Living Manifestos

  9. ^

    "Practical men, who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist" β€” John Maynard Keynes

    ❦

    β€œAs a human being, you have no choice about the fact that you need a philosophy. Your only choice is whether you define your philosophy by a conscious, rational, disciplined process of thought and scrupulously logical deliberation - or let your subconscious accumulate a junk heap of unwarranted conclusions, false generalizations, undefined contradictions, undigested slogans, unidentified wishes, doubts and fears, thrown together by chance, but integrated by your subconscious into a kind of mongrel philosophy and fused into a single, solid weight: self-doubt, like a ball and chain in the place where your mind's wings should have grown.” β€” Ayn Rand

  10. ^

    AI 2027 convincingly demonstrated the power of narratives as a co-ordination mechanismβ€”and also some of the pitfalls.

    ❦

    The push to make AI safety more scientific/academic has brought profound benefits, but I suspect we've also lost something.

    Eliezer's posts created the field of AI safety, but if you try to do similar work today, I expect you'll find it extremely hard to land funding, even if your work was high quality.

  11. ^

    β€œWe’re driving down a cliff road. A mistake will kill you. Now we’re driving at 75 instead of 25.” β€” Dave Orr, Anthropic’s Head of Safeguards to Time Magazine

    "Hey, I said imagine!"

  12. ^

    "But GPT5" β€” Was an update, but OpenAI is also being unfairly maligned here. GPT5 felt disappointing precisely because they had already shared so much progress in the meantime. Garrison Lovely explains what a proper comparison actually looks like:

    "On a composite of leading benchmarks, GPT-4 scored 25 out of 100. GPT-5 gets a 69. In fact, at least 136 newer models now outperform GPT-4 by this measure... GPT-4 Turbo (from late 2023) solved just 2.8 percent of problems. GPT-5 solved 65 percent. On a set of difficult math problems, GPT-4o (from May 2024) scored just 9.3 percent, while OpenAI reports GPT-5 Pro gets nearly 97 percent without tools, and 100 percent with them."

    Zvi writes: "Theory implied by many: 5.0 and 5.1 were really 4.2 and 4.3, and 5.2 is the real 5.0, and this failure to mark version numbers adequately convinced half of Washington that scaling is over and AGI is far and now we're selling H200s to China."

    Dean Ball takes a really strong stance here: "GPT-5 shows that AI is hitting a wall" is a case study in mass fantasy that deserves careful retrospective analysis by sociolinguists, semioticians, anthropologists, etc etc. These sorts of takes are attractor states for those still in the early stages of grieving/coping, about which I have tweeted before."

  13. ^

    "We have set internal goals of having an automated AI research intern by September of 2026 running on hundreds of thousands of GPUs, and a true automated AI researcher by March of 2028. We may totally fail at this goal, but given the extraordinary potential impacts we think it is in the public interest to be transparent about this." Β β€” Β Sam Altman

    ❦

    "There was broad consensus (at the conference The Curve) that the pace of progress in AI models will continue to accelerate, though lots of debate about how quickly. A senior lab official said that 90% of the code in one of the frontier labs is now written by AI models (!!), and that they’re already replacing some entry-level technical jobs with automated agents." Β β€” Β Eli Pariser

  14. ^

    "Researchers were asked to do more comprehensive safety testing than initially planned, but given only nine days to do it. Executives wanted to debut 4o ahead of Google’s annual developer conference and take attention from their bigger rival.

    The safety staffers worked 20 hour days, and didn’t have time to double check their work. The initial results, based on incomplete data, indicated GPT-4o was safe enough to deploy.Β 

    But after the model launched, people familiar with the project said a subsequent analysis found the model exceeded OpenAI’s internal standards for persuasionβ€”defined as the ability to create content that can convince people to change their beliefs and engage in potentially dangerous or illegal behavior.

    The team flagged the problem to senior executives and worked on a fix. But some employees were frustrated by the process, saying that if the company had taken more time for safety testing, they could have addressed the problem before it got to users." - Wall Street Journal

  15. ^

    Whilst Mark Zuckerberg has been widely mocked for redefining superintelligence to mean smart glasses, when they say superintelligence, most of the other lab leaders actually mean superintelligence:

    β€’ Sam Altman: β€œAlthough it will happen incrementally, astounding triumphs – fixing the climate, establishing a space colony, and the discovery of all of physics – will eventually become commonplace.”
    β€’ Elon Musk: β€œWith artificial intelligence, we are summoning the demon. You know all those stories where there’s the guy with the pentagram and the holy water and he’s like... yeah, he’s sure he can control the demon, [but] it doesn’t work out”
    β€’ Demis Hassabis: β€œIt should be an era of maximum human flourishing, where we travel to the stars and colonize the galaxy”
    β€’ Mustafa Suleyman: β€œIf AGI is often seen as the point at which an AI can match human performance at all tasks, then superintelligence is when it can go far beyond that performance.”
    β€’ Dario Amodei: β€œIn terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc… The resources used to train the model can be repurposed to run millions of instances of it"

  16. ^

    In The Case for Low-Competence ASI Failure Scenarios, Ihor Kendiukhov argues that the discussion of these kinds of scenarios is neglected because the kinds of scenarios that people construct are often designed to persuade people that things could go wrong even assuming a moderate or high degree of competence.

  17. ^

    Professor Manley notes that if I claim too much uncertainty then this would undermine the point about how fast it is movingβ€”it would seem as though I would be contradicting myself. However, even if we should have a healthy skepticism about our ability to predict AI timelines and if our probability estimate should have a long-tail, it still seems as though the chance of AI moving faster than our civilization can handle is too high.

  18. ^

    Eliezer Yudkowsky: β€œNothing else Elon Musk has done can possibly make up for how hard the β€œOpenAI” launch trashed humanity’s chances of survival; previously there was a nascent spirit of cooperation, which Elon completely blew up to try to make it all be about *who*, which monkey, got the poison banana, and by spreading and advocating the frame that everybody needed their own β€œdemon” (Musk’s old term) in their house, and anybody who talked about reducing proliferation of demons must be a bad anti-openness person who wanted to keep all the demons for themselves.

    Nobody involved with OpenAI’s launch can reasonably have been said to have done anything else of relative importance in their lives. The net impact of their lives is their contribution to the huge negative impact of OpenAI’s launch, plus a rounding error.”

    ❦

    Sam Altman: β€œeliezer has IMO done more to accelerate AGI than anyone else.

    certainly he got many of us interested in AGI, helped deepmind get funded at a time when AGI was extremely outside the overton window, was critical in the decision to start openai, etc.”

  19. ^

    The "Collingridge dilemma" asserts: "When change is easy, the need for it cannot be foreseen; when the need for change is apparent, change has become expensive, difficult, and time-consuming".

    Or perhaps even impossible...

    ❦

    See also: Pitfalls of Evidence-based AI Policy

  20. ^

    Professor Manley notes that even if there are many different threats, the total probability could still be low if the individual probabilities are all low. Unfortunately, I'm far more pessimistic. Many of these threats appear unsettlingly likely.

  21. ^

    I am using this term in the everyday sense, rather than adopting Allan Dafoe's definition. AI satisfies this definition, but this definition also undersells it. His Intelligence Technology frame is likely more relevant.

  22. ^

    β€œAn unrecoverable catastrophe would probably occur during some period of heightened vulnerability---a conflict between states, a natural disaster, a serious cyberattack, etc.---since that would be the first moment that recovery is impossible and would create local shocks that could precipitate catastrophe. The catastrophe might look like a rapidly cascading series of automation failures: A few automated systems go off the rails in response to some local shock. As those systems go off the rails, the local shock is compounded into a larger disturbance; more and more automated systems move further from their training distribution and start failing. Realistically this would probably be compounded by widespread human failures in response to fear and breakdown of existing incentive systems---many things start breaking as you move off distribution, not just ML.” β€” What failure looks like, Paul Christiano

  23. ^

    I see talk of "the polycrisis/metacrisis" (singular, universal) as mistaken, just as it doesn't make sense to talk about "the boss" outside of a particular context. Instead there are polycrises/metacrises and the various risks from advanced AI can be considered one such example.

  24. ^

    This is the title of a Rob Miles video.

    ❦

    "Make it" isn't just referring to avoiding extinction, but avoiding global catastrophic risks as well.

    "This is a related concept, there's no rule which says that the challenges we're faced with are challenges that we are capable of meeting. Think about something like an asteroid strike if a big enough asteroid hits earth, we're pretty much done for... If an asteroid were headed for earth a few hundred years ago, that would pretty much just be it. Just game over." β€” Rob Miles

  25. ^

    This is really two separate claims:

    • No silver bullet for achieving good outcomes for the world from advanced AI technologies
    • No silver bullet for achieving alignment or control
  26. ^

    One exception is that a few folks have recently converged on provably safe AI as a goal. I'm sure they'll manage to make this work in more limited contexts, but I'm very skeptical of this succeeding more broadly. Happy to discuss in the comments.

  27. ^

    My argument in this section is kind of lame.

    "Wait, you really just said that out loud?" β€” yep and maybe the world would be better if more people did the same.

  28. ^

    Jason Crawford seems to have independently converged on the same "no silver bullet" argument and wording. Jason concedes, "it’s easy to fall victim to hope and cope, and to lull ourselves into a false sense of security based on half-measures that were β€œthe best we could do”, but then says, "I find the all-or-nothing thinking about AI safety counterproductive".

    My approach, described later in this talk, draws from both strands of thoughtβ€”I see both the all-or-nothing approach and the plain accumulative approach as naive.

  29. ^

    Short intro: See Ethan Mollick's Substack
    Longer discussion: Helen Toner's talk for a longer discussion.

  30. ^

    The time-pressure limits the degree of coordination, but these efforts just have to be coordinated enough. It would have significant elements of decentralisation as well. Stronger: too much co-ordination could even be negative.

  31. ^

    "So defense-in-depth" β€” no πŸ₯Ί, or at least not the naive version which is more-often-than-not just cope. You can't just throw a line of ten kids with toy swords in front of a tank and declare that the tank is incredibly unlikely to get past all ten layers.

    Similarly, "moar layers" can be a way to avoid thinking about the relative importance of various layers or the role that they should be playing.

  32. ^

    See the discussion in Nick Bostrom’s The Vulnerable World Hypothesis.

  33. ^

    Professor Manley notes a) that humans can do many things we didn't evolve to do (like building AI!) and b) I don't need this assumption, as opposed to empirically observing that we seem to be bad at these kinds of decisions.

    A more nuanced description of my position is:
    a) Evolution provides a useful prior about what we will or won't be good at.
    b) This prior can be overridden in some circumstances. Primarily this is either because we have good feedback loops or because we engaged in a long and painful process of trial and error to make up for bad feedback loops. Unfortunately, we lack good feedback loops and the speed of progress renders the latter a much less viable plan.

    But honestly, if you want to go deeper, it's probably better to just read Ihor Kendiukhov's The Lethal Reality Hypothesis.

  34. ^

    The claim here is not that our hardware renders this impossible, just that this goes against our nature. And even if some individuals are capable of learning to handle these, this is much harder on a societal scale where we face severe averaging effects.

  35. ^

    "To an outsider hearing the terms β€œAI safety,” β€œAI ethics,” β€œAI alignment,” they all sound like kind of synonyms, right? It turns out, and this was one of the things I had to learn going into this, that AI ethics and AI alignment are two communities that despise each other. It’s like the People’s Front of Judea versus the Judean People’s Front from Monty Python." β€” My AI Safety Lecture for UT Effective Altruism, Scott Aaronson

    My intuition is that the AI ethics side feels more negatively about the AI safety side than vice versa.

  36. ^

    I believe that the rationalist and effective altruist communities should be paying more attention to these issues. My top recommendation for better understanding these dynamics is to read Suspended Reason's Discursive Games, Discursive Warfare.

  37. ^

    Polarisation, geopolitical tensions, loss of trust in the media and experts in general (deservedly or undeservedly).

  38. ^

    See the appendix for an argument for why this might be viable.

  39. ^

    The order of his words have been slightly shuffled.

  40. ^

    One data point: I don't believe that any of the labs currently has a wisdom team.

  41. ^

    I'm confident that researchers will put forward even better proposals. Honestly, the primary value of my imitation learning proposal may simply be how it challenges the learned helplessness that far too many people have about trying to train AI to be wise.

  42. ^

    Deleuze and Guattari was call this deterritorialisation. Deleuze and Guattari Β are Wrongβ„’.

  43. ^

    Then again, I also consider Twinkle, Twinkle, Little Star profound.

  44. ^

    Even if we could train AI sages, it's unclear that we would appreciate their advice compared to the more seductive option of more sychophantic AI.

  45. ^

    "Sir, this is an EA forum you can't swear here"

    β€œI say unto you: one must still have chaos in oneself to be able to give birth to a dancing star"

    "For mercy's sake!"

    "ONE MUST STILL HAVE CHAOS WITHIN ONESELF TO GIVE BIRTH TO A DANCING STAR"

  46. ^

    Being serious: given the difficultly of stopping an arms race, a pause could either be one of the wisest, or the most foolish, things that we could do.

  47. ^

    Importance, tractability, neglectedness.

  48. ^

    This does not constitute an endorsement of Sofiechan or any content on that website. Unfortunately, I don't know of an alternate resource that I could link to instead, but I intend to replace this resource as soon as one becomes available.

  49. ^

    "We evaluate AI on tasks that are easy to evaluate – passing the bar exam, rather than practicing law. A task is easy to evaluate if it can be neatly encapsulated (doesn’t require a lot of outside context) and has clear right and wrong answers. These also happen to be precisely the easiest tasks to train an AI on"

  50. ^

    I currently lean towards it being insane.

  51. ^

    "Deployment decisions increasingly rely on human judgment as benchmarks saturate" - Evaluations Are Struggling To Keep Pace

  52. ^

    In If Anyone Builds It Everyone Dies, Eliezer and Nate distinguish between easy calls and hard calls. Eliezer clarifies the distinction further on Twitter.

  53. ^

    "Centuries" feels incredibly optimistic.

  54. ^

    Perhaps they missed an obvious counter-argument or they adopted a bizarre framing and then countless philosophy students end up suffering as a result.

  55. Show all footnotes

6

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities