I'm a researcher at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]
Selected posts
Background
I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.
I've found the following abstract frame/set of heuristics useful for thinking about how we can try to affect (or predict) the long-term future:
“How do we want to spend our precision/reach points? And can we spend them more wisely?”
[Meta: This is a rough, abstract, and pretty rambly note with assorted links; I’m just trying to pull some stuff out and synthesize it in a way I can more easily reference later (hoping to train habits along these lines). I don't think the ideas here are novel, and honestly I'm not sure who'd find this useful/interesting. (I might also keep editing it as I go.]
An underlying POV here is that (a) scope and (b) precision are in tension. (Alts: (a) "ambition / breadth / reach / ...” — vs — (b) “predictability / fidelity / robustness / ...”). You can aim at something specific and nearby [high precision, limited reach] or at something larger and farther away, fuzzier [low precision, broad reach]. And if you care about the kind of effect you’re having (you want to make X happen, not just looking for influence ~for influence’s sake), this matters a bunch.
Importantly, I think there are “architectural” features of the world/reality[1] that can ease this tension somewhat if they're used properly; if you channel your effort through them, you can transmit an intervention without it dissipating (or getting warped) as much as it otherwise would. Any channels like this will still be leaky (and they’re limited), but this sort of “structure” seems like the main thing to look for if you’re hoping to think about or improve the long-term future.
(See a related sketch diagram here. I also often picture something like: “what levers could reach across a paradigm shift?” (or: what features are invariant in relevant ways?))
Some examples / thinking this through a bit:
And so the point is that some projects find significantly “smarter” paths through this <reach (things have a small effect) vs precision (things don't have the effect you want)> space, piggybacking on features of reality that are more stable and predictably causally linked. I.e. it helps to orient to ~casual chokepoints that are close enough to predict/act on (we know stuff about them, we can use them as operational targets -- they’re on the right horizon/within reach), but causally upstream of enough important stuff for improvements to propagate down and make a big (positive!) difference.
I wrote most of this when we were working on the “First type of transformative AI?” post (here's the Forum version); I’d found it very natural to translate some of that into the above frame.
Something like:
-> People often zero in on something like the "boss battle" AI challenge (e.g. ~ASI), but — even assuming that's the main threat — aiming directly for dealing with later-stage AI transformations seems like it's often an inefficient way of spending your precision/reach points (relative to channeling your effort through shaping earlier AI impacts).
I.e. (barring something like the "silent-IE" trajectory here) —
If AI will have transformed a bunch of other stuff by the time the issue shows up, the world you're preparing for will be radically different than the one you're used to (and harder to predict).
- If you try to do something specific/robust/..., there's a greater chance that your work will be irrelevant (or it'll only help with the small slice of things/scenarios you can predict more specifically); your scope is very narrow.
- If you do something much more ambitious, you really need to hope that the intervention is actually helpful instead of getting warped / distracted along the way (moves like trying to start a movement, which could end up harmful or pushing for things that aren't that useful, or e.g. trying to lock in a particular power structure or set of norms before knowing more about what's going on).
(This is basically just the usual pattern. You can maybe think of it as having "how much AI has changed the world" on the X-axis instead of "time into the future")
Meanwhile, earlier-in-the-queue AI impacts might provide us with really good “channels”/levers (for affecting the long-term):
- They're "within reach" — we know more about how AI is changing things now, have more access/ability to change things (plus this is getting less attention)
- In some cases I think shaping how this plays out could be a reasonably good way to faithfully “transmit” the effects of our effort
- E.g. because we have reason to believe that they’re causally connected to later shifts in particular ways (e.g. the good kind of positive feedback loop on epistemics/coordination; if this change goes better, we’ll be nontrivially better set up for dealing with AI takeover threats / power concentration issues, ...)
- (For epistemics/coordination effects I also think they'd help quite a bit with other potential challenges -- it's a bit like "general capacity for sense" in my mind. But I expect this "causal connection" step is really tricky more generally and maybe voids some of this frame? )
- I think we can point to some that matter a lot; orienting to them doesn't limit the scope too much (even if they’re not “the big one”);
- (Even if you only consider the effects on AI-disempowerment threats, I think shaping how e.g. AI gets deployed in information ecosystems could matter a lot.)
To be clear: “how earlier AI impacts go” definitely isn't the only kind of channel/ causal chokepoint we can use to try to improve “how AI goes” or to help with "boss battles" etc. E.g.
Still, I continue to think that people focus too much on something like the "silent-IE" path here, and too little on “in which ways could AI massively change things early — before disempowerment-threatening-ASI — and can we improve how that goes?”
(At the time I also wrote the following (still true): Having written that, I find myself wanting to look closer at the “architecture” of potential early transformations; what are important early transformations that give us some predictability channel? Etc.)
Anyway, I like this as a “find a route to impact/predictability that makes use of more robust/persistent features” prompt. In my mind it’s also quite related to how I would prefer people orient to speculative BOTECs; find “features” related to the question you care about that are more stable (including inputs that are more grounded and a way to put them together that you trust). (Related section of my “ITN 201” post. See also this note on using simpler models.)
I think similar dynamics apply in a bunch of other places, too.[5]
== Some only-very-marginally-relevant images ==
An old sketch illustrating a sort of related idea (trying to see further-out "beacons" is useful, but it's also useful to have a "nearer" target that you can actually visualize, even if it's flawed, which was a major motivation for the design sketches work):
Another rough diagram (from over a year ago, now, I think):
And just for fun I'll add a couple other sketches in this footnote (people told me around when I made them that they were largely incomprehensible, IIRC).[6]
People often have quite a bit of choice in how to navigate the ~precision-breadth tradeoff, though, I think. See e.g. The fidelity model of spreading ideas
It's probably more accurate to say "...sometimes unusually stable social structures arise", actually.
My model is that they're often not really designed by anyone.
Similarly I think “leaders” of broad social structures (including e.g. revolutions) are often "selected in" or "leading the parade" rather than doing something like directing. (A check with Claude suggests that original main leaders/planners of successful revolutions usually did not hold power for at least 5 years [5-7/29 cases, apparently], and ~rarely achieved their stated political goals.)
Semi-related: (social) system dynamics are quite counterintuitive, “places to intervene in a system.”
I also remember appreciating Acemoglu’s Institutions, Technology and Prosperity
Related shortform by my brother (bold mine):
[...] This doesn't mean we need to give up, or only work on unambitious, practical applications. But it does mean that we have to admit that things can be useful to work on in expectation before we have a "complete story for how they save the world".
Note that what is being advocated here is not an "anything goes" mentality. I certainly think that AI safety research can be too abstract, too removed from any realistic application in any world. But there is a large spectrum of possibilities between "fully plan how you will solve a complex logic game before trying anything" and "make random jerky moves because they 'feel right'".
I'm writing this in response to Adam Jones' article on AI safety content.. I like a lot of the suggestions. But I think the section on alignment plans suffers from the "axe" fallacy that I claim is somewhat endemic here. Here's the relevant quote:
> For the last few weeks, I’ve been working on trying to find plans for AI safety. They should cover the whole problem, including the major hurdles after intent alignment. Unfortunately, this has not gone well - my rough conclusion is that there aren’t any very clear and well publicised plans (or even very plausible stories) for making this go well. (More context on some of this work can be found in BlueDot Impact’s AI safety strategist job posting). (emphasis mine).
I strongly disagree with this being a good thing to do!
We're not going to have a good, end-to-end plan about how to save the world from AGI. Even now, with ever more impressive and scary AIs becoming a comonplace, we have very little idea about what AGI will look like, what kinds of misalignment it will have, where the hard bits of checking it for intent and value alignment will be. Trying to make extensive end-to-end plans can be useful, but can also lead to a strong streetlight effect: we'll be overcommitting to current understanding, current frames of thought (in an alignment community that is growing and integrating new ideas with an exponential rate that can be factored in months, not years).
Don't get me wrong. I think it's valuable to try to plan things where our current understanding is likely to at least partially persist: how AI will interface with government, general questions of scaling and rough models of future development. But we should also understand that our map has lots of blanks, especially when we get down to thinking about what we will understand in the future. [...]
A few other links I'd dumped in a doc with this note:
John Wentworth’s writing on gears-level models (e.g. “...are capital investments”).
Eliezer:
- The Outside View's Domain ("...does not inspire in me any confidence that the Outside View can be applied across processes with greatly different internal causal structures, like life-and-death versus sleeping-and-waking. ..." ... "when you deal with attempted analogies across structually different processes, perhaps unique or poorly understood, then things which are similar in some surface respects are often different in other respects. And the sign of this domain is that when people try to reason by similarity, it is not at all clear what is similar to what, or which surface resemblances they should focus upon as opposed to others.")
- Underconstrained Abstractions ("...The further away you get from highly regular things like atoms, and the closer you get to surface phenomena that are the final products of many moving parts, the more history underconstrains the abstractions that you use. This is part of what makes futurism difficult. ")
From "First type of TAI?" again, a whacky schematic:
Whacky illustration:
This is a really useful post, thanks for writing it! (This kind of thing is precisely why I'm so interested in AI tools / work on AI for epistemics[1])
...
That said, I think strong findings like this are often largely due to differences in how things are measured or other incidental/background features of the dataset considered / the methods used to analyze the data. I haven't personally checked anything, but to give you a sense of what I mean, here are a couple explanations that might be worth considering:
I'll say that I tend to default to mistake theory, not conflict theory, and describing the issue in words like "fake" seems to assume the latter. Under the former lens, you might want to consider hypotheses like (d): maybe the world is very strange/different around April 1, such that it's easier for people to be confused and accidentally say untrue things.
(Still, we should probably always consider whether (e) the Forum is being inundated with lies in a planned attack of some kind.)
I'm also worried about an "epistemics" transformation going poorly, and agree that how it goes isn't just a question of getting the right ~"application shape" — something like differential access/adoption[1] matters here, too.
@Owen Cotton-Barratt, @Oliver Sourbut, @rosehadshar and I have been thinking a bit about these kinds of questions, but not as much as I'd like (there's just not enough time). So I'd love to see more serious work on things like "what might it look for our society to end up with much better/worse epistemic infrastructure (and how might we get there)?" and "how can we make sure AI doesn't end up massively harming our collective ability to make sense of the world & coordinate (or empower bad actors in various ways, etc.).
This comment thread on an older post touched on some related topics, IIRC
I didn't end up writing a reflection in the comments as I'd meant to when I posted this, but I did end up making two small paintings inspired by Benjamin Lay & his work. I've now shared them here.
I think of today (February 8) as "Benjamin Lay Day", for what it's worth. (Funny timing :) .)
Another one I'd personally add might be November 4 for Joseph Rotblat. And just in case you haven't seen / just for reference, there are some related resources on the Forum, e.g. here https://forum.effectivealtruism.org/topics/events-on-the-ea-forum, and here https://forum.effectivealtruism.org/posts/QFfWmPPEKXrh6gZa3/the-ea-holiday-calendar .
In fact I think the Forum team may also still maintain a list/calendar of possible days to celebrate somewhere. ( @Dane Valerie might know?)
Benjamin Lay — "Quaker Comet", early (radical) abolitionist, general "moral weirdo" — died on this day 267 years ago.
I shared a post about him a little while back, and still think of February 8 as "Benjamin Lay Day".
...
Around the same time I also made two paintings inspired by his life/work, which I figured I'd share now. One is an icon-style-inspired image based on a portrait of him[1]:
The second is based on a print depicting the floor plan of an infamous slave ship (Brooks). The print was used by abolitionists (mainly(?) the Society for Effecting the Abolition of the Slave Trade) to help communicate the horror of the trade.
I found it useful to paint it (and appreciate having it around today). But I imagine that not everyone might want to see it, so I'll skip a few lines here in case you expanded this quick take and decide you want to scroll past/collapse it instead.
.
.
.
When thinking about the impacts of AI, I’ve found it useful to distinguish between different reasons for why automation in some area might be slow. In brief:
I’m posting this mainly because I’ve wanted to link to this a few times now when discussing questions like "how should we update on the shape of AI diffusion based on...?". Not sure how helpful it will be on its own!
In a bit more detail:
(1) Raw performance issues
There’s a task that I want an AI system to do. An AI system might be able to do it in the future, but the ones we have today just can’t do it.
For instance:
A subclass here might be performance issues that are downstream of “interface mismatch”.[1] Cases where AI might be good enough at some fundamental task that we’re thinking of (e.g. summarizing content, or similar), but where the systems that surround that task or the interface through which we’re running the thing — which are trivial for humans — is a very poor fit for existing AI, and AI systems struggle to get around that.[2] (E.g. if the core concept is presented via a diagram, or requires computer use stuff.) In some other cases, we might separately consider whether the AI systems has the right affordances at all.
This is what we often think about when we think about the AI tech tree / AI capabilities. Others are often important, though:
(2) Verification & trust bottlenecks
The AI system might be able to do the task, but I can't easily check its work and prefer to just do it myself or rely on someone I trust. Or I can't be confident the AI won't fail spectacularly in some rare but important edge cases (in ways a human almost certainly wouldn't).[3]
For instance, maybe I want someone to pull out the most surprising and important bits from some data, and I can’t trust the judgement of an AI system the way I can trust someone I’ve worked with. Or I don’t want to use a chatbot in customer-facing stuff in case someone finds a way to make it go haywire.
A subclass here is when one can’t trust AI providers (and open-source/on-device models aren’t good enough) for some use case. Accountability sinks also play a role on the “trust” front. Using AI for some task might complicate the question of who bears responsibility when things go wrong, which might matter if accountability is load-bearing for that system. In this case we might go for assigning a human overseer.[4]
(3) Intrinsic premiums for “the human factor”[5]
The task *requires* human involvement, or I intrinsically prefer it for some reason.
E.g. AI therapists are less effective because knowing that the person on the other end is a person is actually useful to get me to do my exercises. Or: I might pay more to see a performance by Yo Yo Ma than by a robot because that’s my true preference; I get less value from a robot performance.
A subclass here is cases where the value I’m getting is testimony — e.g. if I want to gather user data or understand someone’s internal experience — that only humans can provide (even if AI can superficially simulate it).
(4) Adoption lag & institutional inertia
AI systems can do this, people would generally prefer that, there aren’t legal or similar “hard barriers” to AI use, etc., but in practice this is being done by humans.
E.g. maybe AI-powered medical research isn’t happening because the people/institutions who could be setting this kind of automation up just haven’t gotten to it yet.
Adoption lags might be caused by stuff like: sheer laziness, coordination costs, lack of awareness/expertise, attention scarcity — the humans currently involved in this process don’t have enough slack — or similar, active (but potentially hidden) incumbent resistance, or maybe bureaucratic dysfunction (no one has the right affordances).
(My piece on the adoption gap between the US government and ~industry is largely about this.)
(5) Motivated/active protectionism towards humans
AI systems that can do this, there’s no intrinsic need for human involvement, nor real capabilities/trust bottlenecks getting in the way. But we’ll deliberately continue relying on human labor for ~political reasons — not just because we’re moving slowly.
E.g. maybe we’ve hard-coded a requirement that a human lawyer or teacher or driver (etc.) is involved. A particularly salient subclass/set of examples here is when a group of humans has successfully lobbied a government to require human labor (where it might be blatantly obvious that it’s not needed). In other cases, the law (or anatomy of an institution) might incidentally require humans in the loop via some other requirement.
This is a low-res breakdown that might be missing stuff. And the lines between these categories can be very fuzzy. For instance, verification difficulties (2) can provide justification for protectionism (5) or further slow down adoption (4).
But I still think it’s useful to pay attention to what’s really at play, and worry we too often think exclusively in terms of raw performance issues (1) with a bit of diffusion lag (4).
Note OTOH that sometimes the AI capabilities are already there, but bad UI or lack of some complementary tech is still making those capabilities unusable.
I’m listing this as a “raw performance issue” because the AI systems/tools will probably keep improving to better deal with such clashes. But I also expect the surrounding interfaces/systems to change as people try to get more value from available AI capabilities. (E.g. stuff like robots.txt.)
Sometimes the edge cases are the whole point. See also: Heuristics That Almost Always Work - by Scott Alexander
Although I guess then we should be careful about alert fatigue/false security issues. IIRC this episode discusses related stuff: Machine learning meets malware, with Caleb Fenton (search for “alert fatigue” / “hypnosis”)
(There's also the opposite; cases where we'd actively prefer humans not be involved. For instance, all else equal, I might want to keep some information private — not want to share it with another person — even if I'd happily share with an AI system.)
Yeah, I guess I don't want to say that it'd be better if the team had people who are (already) strongly attached to various specific perspectives (like the "AI as a normal technology" worldview --- maybe especially that one?[1]). And I agree that having shared foundations is useful / constantly relitigating foundational issues would be frustrating. I also really do think the points I listed under "who I think would be a good fit" — willingness to try on and ditch conceptual models, high openness without losing track of taste, & flexibility — matter, and probably clash somewhat with central examples of "person attached to a specific perspective."
= rambly comment, written quickly, sorry! =
But in my opinion we should not just all (always) be going off of some central AI-safety-style worldviews. And I think that some of the divergence I would like to see more of could go pretty deep - e.g. possibly somewhere in the grey area between what you listed as "basic prerequisites" and "particular topics like AI timelines...". (As one example, I think accepting terminology or the way people in this space normally talk about stuff like "alignment" or "an AI" might basically bake in a bunch of assumptions that I would like Forethought's work to not always rely on.)
One way to get closer to that might be to just defer less or more carefully, maybe. And another is to have a team that includes people who better understand rarer-in-this-space perspectives, which diverge earlier on (or people who are by default inclined to thinking about this stuff in ways that are different from others' defaults), as this could help us start noticing assumptions we didn't even realize we were making, translate between frames, etc.
So maybe my view is that (1) there were more ~independent worldview formation/ exploration going on, and that (2) the (soft) deferral that is happening (because some deferral feels basically inevitable) were less overlapping.
(I expect we don't really disagree, but still hope this helps to clarify things. And also, people at Forethought might still disagree with me.)
In particular:
If this perspective involves a strong belief that AI will not change the world much, then IMO that's just one of the (few?) things that are ~fully out of scope for Forethought. I.e. my guess is that projects with that as a foundational assumption wouldn't really make much sense to do here. (Although IMO even if, say, I believed that this conclusion was likely right, I might nevertheless be a good fit for Forethought if I were willing to view my work as a bet on the worlds in which AI is transformative.)
But I don't really remember what the "AI as normal.." position is, and could imagine that it's somewhat different — e.g. more in the direction of "automation is the wrong frame for understanding the most likely scenarios" / something like this. In that case my take would be that someone exploring this at Forethought could make sense (haven't thought about this one much), and generally being willing to consider this perspective at least seems good, but I'd still be less excited about people who'd come with the explicit goal of pursuing that worldview & no intention of updating or whatever.
--
(Obviously if the "AI will not be a big deal" view is correct, I'd want us to be able to come to that conclusion -- and change Forethught's mission or something. So I wouldn't e.g. avoid interacting with this view or its proponents, and agree that e.g. inviting people with this POV as visitors could be great.)
ore Quick sketch of what I mean (and again I think others at Forethought may disagree with me):
I also want to caveat that:
(And thanks for the nice meta note!)
I've been struggling to articulate this well, but I've recently been feeling like, for instance, proposals on making deals with "early [potential] schemers" implicitly(?) rely on a bunch of assumptions about the anatomy of AI entities we'd get at relevant stages.
More generally I've been feeling pretty iffy about using game-theoretic reasoning about "AIs" (as in "they'll be incentivized to..." or similar) because I sort of expect it to fail in ways that are somewhat similar to what one gets if one tries to do this with states or large bureaucracies or something -- iirc the fourth paper here discussed this kind of thing, although in general there's a lot of content on this. Similar stuff on e.g. reasoning about the "goals" etc. of AI entities at different points in time without clarifying a bunch of background assumptions (related, iirc).
Also relevant (sort of[1]) — this cool post from Chris Olah & Adam Jermyn (from a while back): Reflections on Qualitative Research
I’m not sure if it’d be remotely compelling to people who have a very different perspective overall (I'm already fairly sympathetic; see e.g. my "y-axis" post).
Still, pulling out a few parts I appreciated:
Related: What's So Bad About Ad-Hoc Mathematical Definitions? (discussion here IIRC)
Sort of related: “Research as a Stochastic Decision Process” — my mental motto version of the post is something like “try to be greedy about the rate at which you’re gaining/producing information/clarity”
Also from Chris Olah (and Shan Carter), and IMO great: Distillation and research debt
(TBH part of the reason I'm posting this here and right now is that I made a hacky commitment to post a few things tonight and this is the lowest-friction option; I'd already posted a version of this in Slack and this was vaguely relevant...)