A lot of EAs are reporting that some things seem like early signs of character or judgment flaws in SBF — an argument that seems wrong, an action that seems unjustified, etc. — now that they can reexamine those data points with the benefit of hindsight.
But the mental motions involved in "revisit the past and do a mental search for warning signs confirming that a Bad Person is bad" are pretty different from the mental motions involved in noticing and responding to problems before the person seems Bad at all.
"Noticing red flags" often isn't what it feels like from the inside to properly notice, respond to, and propagate warning signs that someone you respect is fucking up in a surprising way.
Things usually feel like "red flags" after you're suspicious, rather than before.
You're hopefully learning some real-world patterns via this "reinterpret old data points in a new light" process. But you aren't necessarily training the relevant skills and habits by doing this.
From my perspective, the whole idea that the relevant skillset is specifically about spotting Bad Actors is itself sort of confused. Like, EAs might indeed have too low a prior on bad actors existing, but also, the idea that the world is sharply divided into Fully Good Actors and Fully Bad Actors is part of what protected SBF in the first place!
It kept us from doing mundane epistemic accounting before he seemed Bad. If you're discouraged from just raising a minor local Criticism or Objection for its own sake — if you need some larger thesis or agenda or axe to grind, before it's OK to say "hey wait, I don't get X" — then it will be a lot harder to update incrementally and spot problems early.
(And, incidentally, a lot harder to trust your information sources! EA will inevitably make slower intellectual progress insofar as we don't trust each other to just say what's on our mind like an ordinary group of acquaintances working on a project together, and instead have to try to correct for various agendas or strategies we think the other party might be implementing.)
(Even if nobody's lying, we have to worry about filtered evidence, where people are willing to say X if they believe X but unwilling to say not-X if they believe not-X.)
Suppose that I say "the mental motions needed to spot SBF's issues early are mostly the same as the mental motions needed to notice when Eliezer's saying something that doesn't seem to make sense, casually updating at least a little against Eliezer's judgment in this domain, and naively blurting out 'wait, that doesn't currently make sense to me, what about objection X?'"
(Or if you don't have much respect for Eliezer, pick someone you do have respect for — Holden Karnofsky, or Paul Graham, or Peter Singer, or whoever.)
I imagine some people's reaction to that being: "But wait! Are you saying that Eliezer/Holden/whoever is a bad actor?? That seems totally wrong, what about evidence A B C X Y Z..."
Which seems to me to be missing the point:
1. The processes required to catch bad actors reliably, are often (though not always) similar to the processes required to correct innocent errors by good actors.
You do need to also have "bad actor" in your hypothesis space, or you'll be fooled forever even as you keep noting weird data points. (More concretely, since "bad actor" is vague verbiage: you need to have probability mass on people being liars, promise-breakers, Machiavellian manipulators, etc.)
But in practice, I think most of the problem lies in people not noticing or sharing the data points in the first place. Certainly in SBF's case, I (and I think most EAs) had never even heard any of the red flags about SBF, as opposed to us hearing a ton of flags and trying to explain them away.
So something went wrong in the processes "notice when something is off", "blurt out when you notice something is off", and "propagate interesting blurtings so others can hear about them", more so than in the process "realize that someone might be a bad actor if a long list of publicly discussed things already seem off about them".
(Though I assume some EAs — ones with more insider knowledge about SBF than me — made the latter mistake too.)
2. If a community only activates its "blurt out objections when you think you see an issue" reflex when it thinks it might be in the presence of bad actors, then (a) it will be way harder for the community to notice when a bad actor is present, but also (b) a ton of other dysfunctions became way likelier in the community.
I think (b) is where most of the action is.
EA has a big problem, I claim — relative to its goals and relative to what's possible, not necessarily relative to the average intellectual community — with...
- excessive deference;
- passivity and "taking marching orders" (rather than taking initiative);
- not asking questions or raising objections;
- learned helplessness;
- lack of social incentive to blurt things out when you're worried you might be wrong;
- lack of social incentive to build up your own inside-view model (especially one that disagrees with all the popular views among elite EAs);
- general lack of error-correction and propagation-of-information-about-errors;
- excessive focus on helping EA's image ("protecting the brand"), over simple inquiry into obvious questions that interest or confuse you.
I think EA leadership is unusually correct, and I think it legit can be hard for new EAs to come up with arguments that haven't already been extensively considered at some point in the past, somewhere on the public Internet or in unpublished Google Docs or wherever. So I think it's easy to understand why a lot of EAs are wary of looking stupid by blurting out their naive first-pass objections to things.
But I think that not blurting those things out turns out to have really serious costs at the community level. (Even in cases where a myopic Causal Decision Theorist would say it's individually rational.)
First, because it means that the EA with a nagging objection never learns why their objection is right or wrong, and therefore permanently has a hole in their model of reality.
And second, because a lot of how EA ended up unusually correct in the first place, was people autistically blurting out objections to "obvious-seeming" claims.
If we keep the cached conclusions of that process but ditch the methods that got us here, we're likely to drift away from truth over time (or at least fail to advance the frontier of knowledge nearly as much as we could).
EA is not "finished". We have not solved the problem of "figure out a plan that saves the world", such that the main obstacle is Implementing Existing Ideas. The main obstacle continues to be Figuring Things Out.
EAs should note and propagate criticisms and objections to their Favorite Ideas and Favorite People just because they're curious about what the answer is.
(And aren't hindered by Modest Epistemology or Worry About Looking Dumb or Worry About Making EA Look Bad, so they're free to blurt without first doing a complicated calculus about whether it's Okay to say the first thought that popped into their head.)
They shouldn't need to suspect that their Favorite Idea is secretly false/bad, or that their Favorite Person is secretly evil/corrupt, in order to notice an anomaly and go "huh, what's that about?" and naively raise the issue (including raising it in public).
Most Bayesian updating is incremental; and when a single piece of evidence is obviously decisive, it's less likely that EAs will be the only ones who notice it, so it matters less whether we spot the thing first. The ambiguous, hard-to-resolve cases that require unusual heuristics, experience, or domain knowledge are most of where we can hope to improve the world.
If EAs want to outperform, they need to be good at the micro-level updates, and at building up good intuitions about areas via many, many repeated instances of poking at small things and seeing how reality shakes out.
I think we need to fix that process in EA — practice it more at the individual level, and find ways to better incentivize it at the group level.
Not just when there's a big Generate A Far-Mode Criticism Of EA contest, or a clear Bad Guy to criticize, but when you just see an Eliezer-comment or Rob-comment or Toby-comment that doesn't quite make sense to you and you blurt out that tiny note of dissonance, even if you fully expect that there's a perfectly good response you just aren't thinking of.
(Or no good response, but it's OK because Eliezer Yudkowsky and Rob Bensinger and Toby Ord are not perfect angelic beings and people make mistakes.)
I do think that EA leadership isn't dumb, and has thought a lot about the Big Questions, such that you'll often be able to beat the larger intellectual market and guess at important truths if you try exercises like "attempt to come up with a good reason why Carl Shulman / Holden Karnofsky / etc. might be doing X, even though X isn't what I'd do at a glance".
But I don't think this exercise should be required in order to blurt out a first-order objection. Noticing when something seems false is a lot easier than doing that and generating a plausible hypothesis about another human's brain. And if you do come up with a plausible-sounding hypothesis, well, blurting out your first-order objection is a great way to test whether your hypothesis is correct!
Context: This post popped into my head because I was having a conversation with Peter Hartree about whether a specific argument by Peter Thiel made sense. And I was claiming that at a glance, a specific Thiel-argument seemed locally invalid to me, in the sense of Local Validity as a Key to Sanity and Civilization.
And Peter's response was that Thiel has a good enough track record that we should be very reluctant to assume he's wrong about something like this, and should put in an effort to steel-man him and figure out what alternative, more-valid things he could have meant.
And I'm OK with trying that out as an intellectual exercise. (Though I've said before that steelmanning can encourage fuzzy thinking and misunderstandings, and we should usually prioritize fleshmanning / passing people's Ideological Turing Test, rather than just trying to make up arguments that seem more plausible ex nihilo.)
But I felt an urge to say "this EA thing where we steel-man and defer to impressive people we respect, rather than just blurting out when a thing doesn't make sense to us until we hear a specific counter-argument, is part of our Bigger Problem". I think this problem keeps cropping up in EA across a bunch of domains — not just "why didn't the EA Forum host a good early discussion of at least one SBF red or yellow flag?", but "why do EAs keep getting into deference cascades that lead them to double- and triple-count evidence for propositions?", and "why are EAs herding together and regurgitating others' claims on AI alignment topics rather than going off in dozens of strange directions to test various weird inside-view ideas and convictions and build their own understanding?".
(Do people not realize that humanity doesn't know shit about alignment yet? I feel like people keep going into alignment research and being surprised by this fact.)
It all feels like one thing to me — this idea that it's fine for me to blurt out an objection to a local thing Thiel said, even though I respect him as a thinker and am a big fan of some of his ideas. Because blurting out objections is just the standard way for EAs to respond to anything that seems off to them.
I then want to be open that I may be wrong about Thiel on this specific point, and I want to listen to counter-arguments. But I don't think it would be good (for my epistemics or for the group's epistemics) to go through any special mental gymnastics or justification-ritual in order to blurt out my first-order objection in the first place.
I wanted to say all that in the Peter Thiel conversation. But then I worried that I wouldn't be able to communicate my point because people would think that I'm darkly hinting at Peter Thiel being a bad actor. (Because they're conflating the problem "EAs aren't putting a high enough prior on people being bad actors" with this other problem, and not realizing that organizing their mental universe around "bad actors vs. good actors" can make it harder to spot early signs of bad actors, and also harder to do a lot of other things EA ought to try to do.)
So I wrote this post. :)