L

Linch

@ -
27619 karmaJoined Working (6-15 years)openasteroidimpact.org

Comments
2892

I wrote a short intro to stealth (the radar evasion kind). I was irritated by how bad existing online introductions are, so I wrote my own!

I'm not going to pretend it has direct EA implications. But one thing that I've updated more towards in the last few years is how surprisingly limited and inefficient the information environment is. Like obvious concepts known to humanity for decades or centuries don't have clear explanations online, obvious and very important trends have very few people drawing attention to them, you can just write the best book review of a popular book that's been around for decades, etc.

I suppose one obvious explanation here is that most people who can write stuff like this have more important and/or interesting things to do with their time. Which is true, but also kind of sad.

presupposes that EAs are wrong, or at least, merely luckily right

Right, to be clear I'm far from certain that the stereotypical "EA view" is right here. 

I guess really I was saying that "conditional on a sociological explanation being appropriate, I don't think it's as LW-driven as Yarrow thinks", although LW is undoubtedly important.

Sure that makes a lot of sense! I was mostly just using your comment to riff on a related concept. 

I think reality is often complicated and confusing, and it's hard to separate out contingency vs inevitable stories for why people believe what they believe. But I think the correct view is that EAs' belief on AGI probability and risk (within an order of magnitude or so)  is mostly not contingent (as of the year 2025) even if it turns out to be ultimately wrong.

The Google ads example was the best example I could think of to illustrate this. I'm far from certain that Google's decision to use ads was actually the best source of long-term revenue (never mind being morally good lol). But it still seemed like the internet as we understand it meant it was implausible that Google ads was counterfactually due to their specific acquisitions.

Similarly, even if EAs ignored AI before for some reason, and never interacted with LW or Bostrom, it's implausible that, as of 2025, people who are concerned with ambitious, large-scale altruistic impact (and have other epistemic, cultural, and maybe demographic properties characteristic of the movement) would not think of AI as a big deal. AI is just a big thing in the world that's growing fast. Anybody capable of reading graphs can see that.

That said, specific micro-level beliefs (and maybe macro ones) within EA and AI risk might be different without influence from either LW or the Oxford crowd. For example there might be a stronger accelerationist arm. Alternatively, people might be more queasy with the closeness with the major AI companies, and there will be a stronger and more well-funded contingent of folks interested in public messaging on pausing or stopping AI. And in general if the movement didn't "wake up" to AI concerns at all pre-ChatGPT I think we'd be in a more confused spot.

eh, I think the main reason EAs believe AGI stuff is reasonably likely is because this opinion is correct, given the best available evidence[1]

Having a genealogical explanation here is sort of answering the question on the wrong meta-level, like giving a historical explanation for "why do evolutionists believe in genes" or telling a touching story about somebody's pet pig for "why do EAs care more about farmed animal welfare than tree welfare." 

Or upon hearing "why does Google use ads instead of subscriptions?" answering with the history of their DoubleClick acquisition. That history is real, but it's downstream of the actual explanation: the economics of internet search heavily favor ad-supported models regardless of the specific path any company took. The genealogy is epiphenomenal.

The historical explanations are thus mildly interesting but they conflate the level of why

EDIT: man I'm worried my comment will be read as a soldier-mindset thing that only makes sense if you presume the "AGI likely soon" is already correct. Which does not improve on the conversation. Please only upvote it iff a version of you that's neutral on the object-level question would also upvote this comment.

  1. ^

    Which is a different claim from whether it's ultimately correct. Reality is hard.

  • Near-term AGI is highly unlikely, much less than a 0.05% chance in the next decade.

Is this something you're willing to bet on? 

crossposted from https://inchpin.substack.com/p/legible-ai-safety-problems-that-dont

Epistemic status: Think there’s something real here but drafted quickly and imprecisely

I really appreciated reading Legible vs. Illegible AI Safety Problems by Wei Dai. I enjoyed it as an impressively sharp crystallization of an important idea:

  1. Some AI safety problems are “legible” (obvious/understandable to leaders/policymakers) and some are “illegible” (obscure/hard to understand)
  2. Legible problems are likely to block deployment because leaders won’t deploy until they’re solved
  3. Leaders WILL still deploy models with illegible AI safety problems, since they won’t understand the problems’ full import and deploy the models anyway.
  4. Therefore, working on legible problems have low or even negative value. If unsolved legible problems block deployment, solving them will just speed up deployment and thus AI timelines.
    1. Wei Dai didn’t give a direct example, but the iconic example that comes to mind for me is Reinforcement Learning from Human Feedback (RLHF): implementing RLHF for early ChatGPT, Claude, and GPT-4 likely was central to making chatbots viable and viral.
    2. The raw capabilities were interesting but the human attunement was necessary for practical and economic use cases.

I mostly agree with this take. I think it’s interesting and important. However (and I suspect Wei Dai will agree), it’s also somewhat incomplete. In particular, the article presumes that “legible problems” and “problems that gate deployment” are idempotent, or at least the correlation is positive enough that the differences are barely worth mentioning. I don’t think this is true.

 

 

For example, consider AI psychosis and AI suicides. Obviously this is a highly legible problem that is very easy to understand (though not necessarily to quantify or solve). Yet they keep happening, and AI companies (or at least the less responsible ones) seem happy to continue deploying models without solving AI psychosis.

Now of course AI psychosis is less important than extinction or takeover risk. But this does not necessarily mean that problems as legible as AI psychosis today (or as AI psychosis in Nov 2024) will necessarily gate deployment, with actors at similar levels of responsibility as the existing AI company leaders.

Instead, it might be better to modify the argument to say we should primarily focus on solving/making legible problems that are not likely to actually gate deployment by default, and leave the problems that are already gating deployment to others (Trust and Safety teams, government legislators, etc). This sounds basically right to me.

But this raises another question: Legible to whom? And gating deployment by whom?

Wei Dai’s argument implicitly adopts a Mistake Theory framing, where AI company leadership don’t understand the illegible (to them) issues that could lead to our doom. On the one hand, this is surely true: e/accs aside, AI company leaders presumably don’t want themselves and their children to die, so in some sense, if they truly understand certain illegible issues that could lead to AI takeover and/or human extinction, the issues would probably block deployment.

In another sense, I’m not so sure the framing is right. Consider the following syllogism:

  1. If I believe that the risk is real, my company may have to shut down, or incur other large costs and possibly lose the AI race to Anthropic/OpenAI/DeepMind/China.
  2. I do not wish to incur large costs.
  3. Therefore, by modus tollens, the risk is not real.

This is a silly syllogism at face value, yet I believe it’s a common pattern of (mostly unconscious) thought among many people at AI labs. Related idea by Upton Sinclair: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

This suggests at least two complications for the epistemics-only/work on making illegible problems more legible framing:

  1. Solving a problem can go a long way in making a problem more legible
    1. Many people have talked about how in the course of solving a problem, you may make it more legible. Alternatively, reframing a problem can make it more solvable (cf also Grothendieck’s rising sea)
    2. But if you take a incentives-first, motivated-cognition framing as I’ve implied, you may also believe that solving a problem, and thus reducing the alignment tax, may magically and mysteriously make AI company leaders suddenly understand the importance of your problems, now that they’re cheaper to solve.
  2. If motivations, and not technical difficulty, drive much of the illegibility, this suggests that sometimes we should focus our explainer efforts on people who are further away from the situation, and thus less biased
    1. Concretely, the current pipeline looks like “AI safety-standard” route of legibility increases the following way: first try to convince “very technical Constellation-cluster” people -> then convince AI company safety teams -> then convince AI company non-safety technical people -> then convince AI company leaders - > then maybe try to convince the policymakers to implement informal agreements and policies into law
    2. But if I’m right about motivations, we should instead aim to convince unbiased people (or people biased in our direction) first, like tech journalists, faith leaders, politicians, and members of the general public.
      1. This is riskier epistemically in some ways because you’re talking to less knowledgeable and in some ways less intelligent people, but also has significant benefits in maintaining independence, and having less funky incentives and biases.
    3. To repeat, under my model, illegibility is often driven by incentives, culture and motivated reasoning, not technical or conceptual skill.

Note that I’m assuming that sometimes what you want is for AI companies to “see the light” and manage themselves (which is most of the “inside game” path forwards of #1). However, most of the time the way we get actual progress on AI not killing us all (especially for #2) is via legal and other forms of state hard power. In a democracy this entails a combination of convincing policymakers, civil society, and the general public, including not just technically agreement, but also saliency increases.

Of course, there can be real issues with over-regulation or inaccurate misdiagnosis of “illegible issues.” As someone responded to me on Twitter, “If a problem has no problem statement then… there isn’t a problem.” While the strong version of that is clearly false, there’s a weaker version that’s probably correct: problems you or I view as “illegible problem” are more likely to in reality be “not a real problem” in objective terms. I don’t have a clear solution for this other than a) thinking harder, and b) hoping that trying to increase legibility will also reveal the holes in the reasoning of “fake problems.” Ultimately, reality is difficult and there aren’t cheap workarounds.

Conclusion

Concretely, compared to before reading and thinking about Wei Dai’s article, I tentatively update a bit towards wanting to

a) work on more illegible problems,

b) thinking that AI safety should prioritize more explanation-type work, or work that is closer to analytic philosophy’s “conceptual sharpening,” and

c) if ideas mysteriously seem to bounce off of AI company employees and AI lab leaders, this may not be due to true philosophical or technical confusions, but rather obvious bias.

In some cases, we should think about, and experiment with, framing problems that are illegible (to AI company leaders or other ML-heavy people) to other audiences, rather than assume people are too dumb to understand our extant arguments without dumbing down.

 

Let me know if you have other thoughts on it here!

Some of the negative comments here gesture at the problem you're referring to, but less precisely than you had.

I wrote a quick draft on reasons you might want to skip pre-deployment Phase 3 drug trials (and instead do an experimental rollout with post-deployment trials, with option of recall) for vaccines for high diseases with high mortality burden, or for novel pandemics. https://inchpin.substack.com/p/skip-phase-3

It's written in a pretty rushed way, but I know this idea has been bouncing around for a while and I haven't seen a clearer writeup elsewhere, so I hope it can start a conversation!

Are the abundance ideas actually new to EA folks? They feel like rehashes of arguments we've had ~ a decade ago, often presented in less technical language and ignoring the major cruxes.

Not saying they're bad ideas, just not new.

This post had 169 views on the EA forum, 3K on substack, 17K on reddit, 31K on twitter.

Link appears to be broken.

Load more