I'm a researcher at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]
Selected posts
Background
I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.
Yeah, I guess I don't want to say that it'd be better if the team had people who are (already) strongly attached to various specific perspectives (like the "AI as a normal technology" worldview --- maybe especially that one?[1]). And I agree that having shared foundations is useful / constantly relitigating foundational issues would be frustrating. I also really do think the points I listed under "who I think would be a good fit" — willingness to try on and ditch conceptual models, high openness without losing track of taste, & flexibility — matter, and probably clash somewhat with central examples of "person attached to a specific perspective."
= rambly comment, written quickly, sorry! =
But in my opinion we should not just all (always) be going off of some central AI-safety-style worldviews. And I think that some of the divergence I would like to see more of could go pretty deep - e.g. possibly somewhere in the grey area between what you listed as "basic prerequisites" and "particular topics like AI timelines...". (As one example, I think accepting terminology or the way people in this space normally talk about stuff like "alignment" or "an AI" might basically bake in a bunch of assumptions that I would like Forethought's work to not always rely on.)
One way to get closer to that might be to just defer less or more carefully, maybe. And another is to have a team that includes people who better understand rarer-in-this-space perspectives, which diverge earlier on (or people who are by default inclined to thinking about this stuff in ways that are different from others' defaults), as this could help us start noticing assumptions we didn't even realize we were making, translate between frames, etc.
So maybe my view is that (1) there were more ~independent worldview formation/ exploration going on, and that (2) the (soft) deferral that is happening (because some deferral feels basically inevitable) were less overlapping.
(I expect we don't really disagree, but still hope this helps to clarify things. And also, people at Forethought might still disagree with me.)
In particular:
If this perspective involves a strong belief that AI will not change the world much, then IMO that's just one of the (few?) things that are ~fully out of scope for Forethought. I.e. my guess is that projects with that as a foundational assumption wouldn't really make much sense to do here. (Although IMO even if, say, I believed that this conclusion was likely right, I might nevertheless be a good fit for Forethought if I were willing to view my work as a bet on the worlds in which AI is transformative.)
But I don't really remember what the "AI as normal.." position is, and could imagine that it's somewhat different — e.g. more in the direction of "automation is the wrong frame for understanding the most likely scenarios" / something like this. In that case my take would be that someone exploring this at Forethought could make sense (haven't thought about this one much), and generally being willing to consider this perspective at least seems good, but I'd still be less excited about people who'd come with the explicit goal of pursuing that worldview & no intention of updating or whatever.
--
(Obviously if the "AI will not be a big deal" view is correct, I'd want us to be able to come to that conclusion -- and change Forethught's mission or something. So I wouldn't e.g. avoid interacting with this view or its proponents, and agree that e.g. inviting people with this POV as visitors could be great.)
ore Quick sketch of what I mean (and again I think others at Forethought may disagree with me):
I also want to caveat that:
(And thanks for the nice meta note!)
I've been struggling to articulate this well, but I've recently been feeling like, for instance, proposals on making deals with "early [potential] schemers" implicitly(?) rely on a bunch of assumptions about the anatomy of AI entities we'd get at relevant stages.
More generally I've been feeling pretty iffy about using game-theoretic reasoning about "AIs" (as in "they'll be incentivized to..." or similar) because I sort of expect it to fail in ways that are somewhat similar to what one gets if one tries to do this with states or large bureaucracies or something -- iirc the fourth paper here discussed this kind of thing, although in general there's a lot of content on this. Similar stuff on e.g. reasoning about the "goals" etc. of AI entities at different points in time without clarifying a bunch of background assumptions (related, iirc).
Ah, @Gregory Lewis🔸 says some of the above better. Quoting his comment:
- [...]
- So ~everything is ultimately an S-curve. Yet although 'this trend will start capping out somewhere' is a very safe bet, 'calling the inflection point' before you've passed it is known to be extremely hard. Sigmoid curves in their early days are essentially indistinguishable from exponential ones, and the extra parameter which ~guarantees they can better (over?)fit the points on the graph than a simple exponential give very unstable estimates of the putative ceiling the trend will 'cap out' at. (cf. 1, 2.)
- Many important things turn on (e.g.) 'scaling is hitting the wall ~now' vs. 'scaling will hit the wall roughly at the point of the first dyson sphere data center' As the universe is a small place on a log scale, this range is easily spanned by different analysis choices on how you project forward.
- Without strong priors on 'inflecting soon' vs. 'inflecting late', forecasts tend to be volatile: is this small blip above or below trend really a blip, or a sign we're entering a faster/slow regime?
- [...]
I tried to clarify things a bit in this reply to titotal: https://forum.effectivealtruism.org/posts/iJSYZJJrLMigJsBeK/lizka-s-shortform?commentId=uewYatQz4dxJPXPiv
In particular, I'm not trying to make a strong claim about exponentials specifically, or that things will line up perfectly, etc.
(Fwiw, though, it does seem possible that if we zoom out, recent/near-term population growth slow-downs might be functionally a ~blip if humanity or something like it leaves the Earth. Although at some point you'd still hit physical limits.)
Oh, apologies: I'm not actually trying to claim that things will be <<exactly.. exponential>>. We should expect some amount of ~variation in progress/growth (these are rough models, we shouldn't be too confident about how things will go, etc.), what's actually going on is (probably a lot) more complicated than a simple/neat progression of new s-curves, etc.
The thing I'm trying to say is more like:
(Apologies if what I'd written earlier was unclear about what I believe — I'm not sure if we still notably disagree given the clarification?)
A different way to think about this might be something like:
Something like this seems to help explain why views like "the curve we're observing will (basically) just continue" have seemed surprisingly successful, even when the people holding those "curve go up" views justified their conclusions via apparently incorrect reasoning about the specific drivers of progress. (And so IMO people should place non-trivial weight on stuff like "rough, somewhat naive-seeming extrapolation of the general trends we're observing[2]."[3])
[See also a classic post on the general topic, and some related discussion here, IIRC: https://www.alignmentforum.org/posts/aNAFrGbzXddQBMDqh/moore-s-law-ai-and-the-pace-of-progress ]
Caveat: I'd add "...on a big range/ the scale we care about"; at some point, ~any progress would start hitting ~physical limits. But if that point is after the curve reshapes ~everything we care about, then I'm basically ignoring that consideration for now.
Obviously there are caveats. E.g.:
- the metrics we use for such observations can lead us astray in some situations (in particular they might not ~linearly relate to "the true thing we care about")
- we often have limited data, we shouldn't be confident that we're predicting/measuring the right thing, things can in fact change over time and we should also not forget that, etc.
(I think there were nice notes on this here, although I've only skimmed and didn't re-read https://arxiv.org/pdf/2205.15011 )
Also, sometimes we do know what
Replying quickly, speaking only for myself[1]:
I.e. I'm not speaking for the Online/Mod teams here, and didn't run this comment by anyone.
(I vaguely remember making and linking a public version of this doc somewhere at some point, but couldn't quickly find that.)
In fact it looks like I can no longer add or remove the Community tag from posts. I'm still in the Slack; a few people sometimes flag questions about edge cases there.
I sometimes see people say stuff like:
Those forecasts were misguided. If they ended up with good answers, that's accidental; the trends they extrapolated from have hit limits... (Skeptics get Bayes points.)
But IMO it's not a fluke that the "that curve is going up, who knows why" POV has done well.
A sketch of what I think happens:
There’s a general dynamic here that goes something like:
And then in *some* sense the bottlenecks crowd turn out to be right (the specific drivers/paradigm peters out, there’s literally no more space for more transistors, companies run low on easily accessible/high quality training data, etc.)…
…but then a "surprise new thing" pops up and fills the gap, such that the “true” thing we cared about (whether or not it’s what we were originally measuring) *does* actually continue as people originally predicted, apparently naively
(and it turned out that the curve consists of a stack of s-curves..)
We can go too far with this kind of reasoning; some “true things we care about” (e.g. spread of a disease) * are* in fact s-curves, bounded, etc., and only locally look like ~exponentials. (So: no, we shouldn't expect the baby to weigh trillions of pounds by age 10...)
But I think the more granular, gears-oriented view — which considers how long specific drivers of the progress we're seeing could continue, etc. — often underrates the extent to which *other forces* can (and often do) jump in when earlier drivers lose momentum.
"The Bypass Principle: How AI flows around obstacles" from Eric Drexler is a very related (and IMO good) post. Quote (bold mine):
While shallow assessments focus on visible obstacles — the difficulties of matching human capabilities, of overcoming regulatory barriers, and of restructuring organizations — AI-enabled developments will often find paths that bypass rather than overcome apparent barriers. Existing obstacles are concrete and obvious in a way that alternatives are not. Skewed judgment follows.
(This stuff isn't new; many people have pointed out these kinds of dynamics. But I feel like I'm still seeing them a fair bit — and this came up recently — so I wanted to write this note.)
Yeah, this sort of thing is partly why I tend to feel better about BOTECs like (writing very quickly, tbc!):
What could we actually accomplish if we (e.g.) doubled (the total stock/ flow of) investment in ~technical AIS work (specifically the stuff focused on catastrophic risks, in this general worldview)? (you could broaden if you wanted to, obviously)
Well, let's see:
- That might look like:
- adding maybe ~400(??) FTEs similar (in ~aggregate) to the folks working here now, distributed roughly in proportion to current efforts / profiles — plus the funding/AIS-specific infrastructure (e.g. institutional homes) needed to accomodate them
- E.g. across intent alignment stuff, interpretability, evals, AI control, ~safeguarded AI, AI-for-AIS, etc., across non-profit/private/govt (but in fact aimed at loss of control stuff).
- How good would this be?
- Maybe (per year of doubling) we'd then get something like a similar-ish value from this as we don from a year of current space (or something like 2x less, if we want to eyeball diminishing returns)
- Then maybe we can look at what this space has accomplished in the past year and see how much we'd pay for that / how valuable that seems...
- (What other ~costs might we be missing here?)
You might also decide that you have much better intuitions for how much we'd accomplish (and how valuable that'd be) on a different scale (e.g. adding one project like Redwood/Goodfire/Safeguarded AI/..., i.e. more like 30 FTEs than 400 — although you'd probably want to account for considerations like "for each 'successful' project we'd likely need to invest in a bunch of attempts/ surrounding infrastructure..."), or intuitions about what amount of investment is required to get to some particular desired outcome...
Or if you took the more ITN-style approach, you could try to approach the BOTEC via something like (1) how much investment has there been so far in this broad ~POV / porftolio, (2 (option a)) how much value/progress has this portfolio made + something like "how much has been made in the second half?" (to get a sense of how much we're facing diminishing returns at the moment — fwiw without thinking too much about it I think "not super diminishing returns at the mo"), or (2 (option b)) what fraction of the overall "AI safety problem" is "this-sort-of-safety-work-affectable" (i.e. something like "if we scaled up this kind of work — and only this kind of work — to an insane degree, how much of the problem will be fixed?") + how big/important the problem is overall... Etc. (Again, for all of this my main question is often "what are the sources of signal or taste / heuristics / etc. that you're happier basing your estimates on?)
Thank you! I used Procreate for these (on an iPad).[1]
(I also love Excalidraw for quick diagrams, have used & liked Whimsical before, and now also semi-grudgingly appreciate Canva.)
Relatedly, I wrote a quick summary of the post in a Twitter thread a few days ago and added two extra sketches there. Posting here too in case anyone finds them useful:
(And a meme generator for the memes.)
When thinking about the impacts of AI, I’ve found it useful to distinguish between different reasons for why automation in some area might be slow. In brief:
I’m posting this mainly because I’ve wanted to link to this a few times now when discussing questions like "how should we update on the shape of AI diffusion based on...?". Not sure how helpful it will be on its own!
In a bit more detail:
(1) Raw performance issues
There’s a task that I want an AI system to do. An AI system might be able to do it in the future, but the ones we have today just can’t do it.
For instance:
A subclass here might be performance issues that are downstream of “interface mismatch”.[1] Cases where AI might be good enough at some fundamental task that we’re thinking of (e.g. summarizing content, or similar), but where the systems that surround that task or the interface through which we’re running the thing — which are trivial for humans — is a very poor fit for existing AI, and AI systems struggle to get around that.[2] (E.g. if the core concept is presented via a diagram, or requires computer use stuff.) In some other cases, we might separately consider whether the AI systems has the right affordances at all.
This is what we often think about when we think about the AI tech tree / AI capabilities. Others are often important, though:
(2) Verification & trust bottlenecks
The AI system might be able to do the task, but I can't easily check its work and prefer to just do it myself or rely on someone I trust. Or I can't be confident the AI won't fail spectacularly in some rare but important edge cases (in ways a human almost certainly wouldn't).[3]
For instance, maybe I want someone to pull out the most surprising and important bits from some data, and I can’t trust the judgement of an AI system the way I can trust someone I’ve worked with. Or I don’t want to use a chatbot in customer-facing stuff in case someone finds a way to make it go haywire.
A subclass here is when one can’t trust AI providers (and open-source/on-device models aren’t good enough) for some use case. Accountability sinks also play a role on the “trust” front. Using AI for some task might complicate the question of who bears responsibility when things go wrong, which might matter if accountability is load-bearing for that system. In this case we might go for assigning a human overseer.[4]
(3) Intrinsic premiums for “the human factor”[5]
The task *requires* human involvement, or I intrinsically prefer it for some reason.
E.g. AI therapists are less effective because knowing that the person on the other end is a person is actually useful to get me to do my exercises. Or: I might pay more to see a performance by Yo Yo Ma than by a robot because that’s my true preference; I get less value from a robot performance.
A subclass here is cases where the value I’m getting is testimony — e.g. if I want to gather user data or understand someone’s internal experience — that only humans can provide (even if AI can superficially simulate it).
(4) Adoption lag & institutional inertia
AI systems can do this, people would generally prefer that, there aren’t legal or similar “hard barriers” to AI use, etc., but in practice this is being done by humans.
E.g. maybe AI-powered medical research isn’t happening because the people/institutions who could be setting this kind of automation up just haven’t gotten to it yet.
Adoption lags might be caused by stuff like: sheer laziness, coordination costs, lack of awareness/expertise, attention scarcity — the humans currently involved in this process don’t have enough slack — or similar, active (but potentially hidden) incumbent resistance, or maybe bureaucratic dysfunction (no one has the right affordances).
(My piece on the adoption gap between the US government and ~industry is largely about this.)
(5) Motivated/active protectionism towards humans
AI systems that can do this, there’s no intrinsic need for human involvement, nor real capabilities/trust bottlenecks getting in the way. But we’ll deliberately continue relying on human labor for ~political reasons — not just because we’re moving slowly.
E.g. maybe we’ve hard-coded a requirement that a human lawyer or teacher or driver (etc.) is involved. A particularly salient subclass/set of examples here is when a group of humans has successfully lobbied a government to require human labor (where it might be blatantly obvious that it’s not needed). In other cases, the law (or anatomy of an institution) might incidentally require humans in the loop via some other requirement.
This is a low-res breakdown that might be missing stuff. And the lines between these categories can be very fuzzy. For instance, verification difficulties (2) can provide justification for protectionism (5) or further slow down adoption (4).
But I still think it’s useful to pay attention to what’s really at play, and worry we too often think exclusively in terms of raw performance issues (1) with a bit of diffusion lag (4).
Note OTOH that sometimes the AI capabilities are already there, but bad UI or lack of some complementary tech is still making those capabilities unusable.
I’m listing this as a “raw performance issue” because the AI systems/tools will probably keep improving to better deal with such clashes. But I also expect the surrounding interfaces/systems to change as people try to get more value from available AI capabilities. (E.g. stuff like robots.txt.)
Sometimes the edge cases are the whole point. See also: Heuristics That Almost Always Work - by Scott Alexander
Although I guess then we should be careful about alert fatigue/false security issues. IIRC this episode discusses related stuff: Machine learning meets malware, with Caleb Fenton (search for “alert fatigue” / “hypnosis”)
(There's also the opposite; cases where we'd actively prefer humans not be involved. For instance, all else equal, I might want to keep some information private — not want to share it with another person — even if I'd happily share with an AI system.)