Hide table of contents

Summary

I have become more truthseeking and epistemically modest in recent months and feel I have to re-evaluate my 'EA-flavored' beliefs, including:

  1. My particular takes about particular cause areas (chiefly alignment). Often, these feel immodest and/or copied from specific high-status people.
  2. Trust in the “EA viewpoint” on empirical issues (e.g., on AI risk). People tend to believe in stories about things that are too big for them to understand and I don't know if EA is just one plausible story out of these.
  3. Are these large empirical questions too hard for us to make reasonable guesses? Are we deluding ourselves in thinking we are better than most other ideologies that have been mostly wrong throughout history?
  4. Can I assume 'EA-flavored' takes on moral philosophy, such as utilitarianism-flavored stuff, or should I be more 'morally centrist'?
  5. Can I, as a smart and truthseeking person, do better than just deferring on, say, “Might AI lead to extinction?” Even though there are smarter & more epistemically virtuous people who I could defer to?
  6. Should I hold very moderate views on everything?
  7. Can EA, as a “smart and truthseeking” movement, assume its opinions are more accurate than other expert groups’?

 

 

Note

I originally wrote this as a private doc, but I thought maybe it's valuable to publish. I've only minimally edited it.

Also, I now think the epistemological concerns listed below aren't super clearly carved and have a lot of overlap. The list was never meant to be a perfect carving, just to motion at the shape of my overall concerns, but even so, I'd write it differently if I was writing it today.

 

Motivation

For some time now, I’ve wanted nothing more than to finish university and just work on EA projects I love. I’m about to finish my third year of university and could do just that. A likely thing I would work on is alignment field-building, e.g., helping to run the SERI MATS program again. (In this doc, will use alignment field-building as the representative of all the community building/operations-y projects I’d like to work on, for simplicity.)

However, in recent months, I have become more careful about how I form opinions. I am more truthseeking and more epistemically modest (but also more hopeful that I can do more than blind deferral in complex domains). I now no longer endorse the epistemics (used here broadly as “ways of forming beliefs”) that led me to alignment field-building in the first place. For example, I think this in part looked like “chasing cool, weird ideas that feel right to me” and “believing whatever high-status EAs believe”.

I am now deeply unsure about many assumptions underpinning the plan to do alignment field-building. I think I need to take some months to re-evaluate these assumptions.

 

In particular, here are the questions I feel I need to re-evaluate:

 

1. What should my particular takes about particular cause areas (chiefly alignment) and about community building be?

My current takes often feel immodest and/or copied from specific high-status people. For example, my takes on which alignment agendas are good are entirely copied from a specific Berkeley bubble. My takes on the size of the “community building multiplier” are largely based on quite immodest personal calculations, disregarding that many “experts” think the multiplier is lower.

I don’t know what the right amount of immodesty and copying from high-status people is, but I’d like to at least try to get closer.

 

2. Is the “EA viewpoint” on empirical issues (e.g., on AI risk) correct (because we are so smart)?

Up until recently I just assumed (a part of) EA is right about large empirical questions like “How effectively-altruistic is ‘Systemic Change’?”, “How high are x-risks?” and “Is AI an x-risk?”. (“Empirical” as opposed to “moral”.) First, this was maybe a naïve kind of tribalistic support, later because of the “superior epistemics” of EAs. The poster version of this is “Just believe whatever Open Phil says”.

Here’s my concern: In general, people adopt stories they like on big questions, e.g., the capitalism-is-cancer-and-we-need-to-overhaul-the-system story or the AI-will-change-everything-tech-utopia story. People don’t seek out all the cruxy information and form credences to actually get closer to the truth. I used to be fine just to back “a plausible story of how things are”, as I suspect many EAs are. But now I want to back the correct story of how things are.

I’m wondering if the EA/Open Phil worldview is just plausible story. This story probably contains a lot of truthseeking and truth on lower-level questions, such as “How effective is deworming?”. But on high-level questions such as “How big a deal is AGI?”, maybe it is close to impossible not just to believe in a story and instead do the hard truthseeking thing. Maybe that would be holding EA/Open Phil to an impossible standard. I simply don’t know currently if EA/Open Phil epistemics are better than that and therefore I should not defer to them unreservedly.

I am even more worried about this in the context of bubbles of EAs in the bay area. I’ve perceived the desire of people there to buy into big/exciting stories to be quite strong.

Maybe the EA/Open Phil story is therefore roughly as likely to be true as the stories of other smart communities, e.g. other ML experts. This seems especially damning for parts of the story where (a part of) EA is taking a minority view. A modest person would assume that EA is wrong in such cases.

 

I haven’t really heard many convincing stories that run counter EA, but I also haven’t really tried. Sure, I have heard counter-arguments repeated by EAs, but I’ve never sought out disagreeing communities and heard them out on their own terms. An additional problem is that others probably don’t spend as much time on refuting EA as EAs spend on backing it with arguments.

For illustration, I recently read an article criticizing the way EA often deals with low-probability risks. The author claims their critical view is common in the relevant academia. I wouldn’t even be surprised if this was the case. This makes my blind assumption that x-risk is a core moral priority of our time seem unjustified. I haven’t even considered plausible alternative views! (I don’t remember the name of the article sadly.)


3. Are humans in general equipped to answer such huge empirical questions?

Maybe questions such as “How high are x-risks?” are just so hard that even our best guesses are maybe 5% likely to be right. We delude ourselves to think we understand things and build solid theories, but really we are just like all the other big ideologies that have come and gone in history. We’re like communists who believe they have found the ultimate form of society, or like hippies who believe they figured out the answer to everything is love. (Or like 30s eugenicists who believe they should “improve” humanity.)

Here are a bunch of related concerning considerations:

  • Maybe there is something very big and real about the term “groupthink” that we don’t understand.
  • Maybe our epistemics don’t weigh that heavily in all of this, maybe they increase our chances of being correct from 5% to 6%.
  • Maybe the main ingredient to finding answers to the big questions throughout history has just been guessing a lot. Not being smart and truthseeking.
  • Maybe there’s a huge illusion in EA of “someone else has probably worked out these big assumptions we are making”. This goes all the way up to the person at Open Phil thinking “Holden has probably worked these out” but actually no one has.
  • Maybe it’s really hard to notice for people when they are not smart enough to have accurate views on something.
  • Maybe the base rate of correct big ideas is just very low.

 

I don’t understand any of these dynamics well and I don’t know if EA could be falling prey to them currently. Since these seem plausibly big, they seem worth investigating.

Again, I am even more worried about this in the context of EA bubbles in the Bay Area.
 

4. “EA-flavored” moral philosophy

I’ve been assuming a lot of “EA-flavored” takes on moral philosophy. This includes utilitarianism-flavored stuff, de-emphasizing rules/duties/justice/other moral goods, and totalist population ethics. Some of them are minority views, including among very smart subject experts. I am considering whether I should be more “morally centrist”. Depending on my answer to this question, this might imply anything from spending a bit more time with my family to changing my work focus to something “robustly good” like clean tech.

 

Interlude - information value

Now at the latest, I’m expecting a reaction like “But how on earth are you going to make progress on questions like fundamental moral philosophy?”.

First, note that I do not need to make progress on the fundamental philosophical/scientific issues as long as I can make progress on my epistemic strategies about them. E.g., I don’t need to decisively prove utilitarianism is correct if I can just decide my epistemic strategy should be a bit more immodest and therefore I believe in utilitarianism somewhat immodestly.

Practically, looking at my list of assumptions to re-evaluate, I feel like I could easily change my mind on each of them in a matter of weeks or months. The main problem is that I haven’t had the time to even look at them superficially and read 1 article or talk to 1 person about each of them. I think the information value in some deliberation is quite high and justifies investing some time. And better to do so sooner than later.

An objection to this may be: “Sure, your beliefs might change. But how can you expect your beliefs post-deliberation to be more accurate than pre-deliberation? Philosophers have been split over them for millennia and that doesn’t change, no matter what you deliberate.”

I would respond that this is precisely the claim of the concept called “epistemics”: that some ways of forming beliefs produce more accuracy than others. E.g., modest epistemics might produce more accurate beliefs than immodest epistemics. So if I have some credence that epistemics are powerful enough to do this in a domain like moral philosophy, then I’m justified to think my accuracy might increase post-deliberation. (And I do have some credence in that.)

Also, I’m expecting a gut reaction like “I’m skeptical that someone who just wants to have an impact in alignment field-building should end up having to do philosophy instead/first.” I don’t know much to say in response except that my reasoning seems to straightforwardly imply this. I would still be interested in whether many people have this gut reaction, please feel free to leave a comment if you do!

 

Back to questions I need to re-evaluate

From the four assumptions listed above, it’s probably evident that this is going in a very “meta” epistemic direction. How much can I trust EA in forming my beliefs? How much can I trust myself, and in which situations?

Here are the more “meta” epistemics questions to re-evaluate:


5. Can I, as a smart and truthseeking person, do better than just deferring on complex empirical/moral questions?

For example, can I do better than just deferring to the “largest and smartest” expert group on “Might AI lead to extinction?” (which seems to be EA). Can I instead look at the arguments and epistemics of EAs versus, say, opposing academics and reach a better conclusion? (Better in the sense of “more likely to be correct”.) If so, how much and how should I do that in the details?

Just in case you are thinking “clearly you can do better”: Consider the case of a smarter, more knowledgeable person with better epistemics than me. I know such a person, and they’ve even spent a lot more time thinking about “Might AI lead to extinction?” than me. They are probably also better than me at doing the whole weighing up different people’s views thing. From this angle, it seems unlikely that I can do better than just deferring to them. (To their view ‘all things considered’, not their view ‘by their own lights’.)

Just in case you are thinking “clearly you can’t do better”: This seems to contradict the way essentially everyone behaves in practice. I know no one who only ever defers to the “largest and smartest” expert group on everything, and doesn’t presume to look at arguments or at least the epistemics of different expert groups.

 

6. Should I be more normal?

If I tend to say I can’t do better than just deferring on complex empirical/moral questions, should I hold very moderate views on everything? Should I be a third deontologist, third virtue ethicist, third consequentialist, so to speak? Should I believe climate change is the biggest issue of our time? Should I stop drinking meal shakes? (I’m being mostly serious.)

(This is similar to point 4.)

 

7. Can EA, as a “smart and truthseeking” movement, assume its opinions are more accurate than other expert groups’?

We seem to often hope this is the case. E.g., we hope we are right about AI being an existential risk based on how smart and truthseeking we are. (In another sense, of course, we hope we are wrong.)

 

More on information value

I want to reiterate here that, even though these questions seem daunting, I think I could learn something that changes my mind in a lasting way within weeks or months. For example, I could imagine finding out that almost no one supports epistemic modesty in its strongest form and becoming more immodest as a result. Or I could imagine finding out that influential EAs haven’t thought about modesty much and becoming more cautious about “EA beliefs” as a result. I think it therefore makes sense to think about this stuff, and do so now rather than later.

 

I am grateful to Isaac Dunn and Pedro Oliboni for helpful feedback on earlier versions of this post.


 

Comments16
Sorted by Click to highlight new comments since: Today at 6:04 PM

I definitely don't have the answers, but want to acknowledge that a significant degree of deference to someone or something is simply unavoidable in an increasingly complex world. I feel there's sometimes this undercurrent (not in your post specifically, just in general) that if you're a really smart person, you should be able to have well-thought-out, not-too-deferential views on everything of importance. It's just not possible; no matter how smart someone is, they lack the ability to warp the fabric of time to slow it down enough to reach that end. That's not meant to discourage anyone from seeking to improve their thinking about the world -- it's to reassure anyone who feels bad about the fact that they have to defer so much.

Each of us have to decide what questions are most important for us to dig deeply into, and which we should rely mainly on deference to answer. This is often affected by the importance of the question, but there are other variables -- like how much time we'd have to invest to get a more reliable answer, and the extent to which a change from the answer we are assuming would change our actions.

I know that doesn't help decide who or what to defer to . . . .

Yeah I don't have non-deference based arguments of really basic and important things like:

  • whether stars exist
  • how the money system works
  • gravity

And it was only in the last few years that I considered inside view arguments for why the Earth isn't flat. 

I think you absolutely should take these questions to heart and not feel compelled to follow the EA consensus on any of them. Especially with alignment, it’s hard to do independent thinking without feeling like a fool, but I say we should all be braver and volunteer to be the fool sometimes to make sure we aren’t in the Emperor’s New Clothes.

I like this post.

"4. Can I assume 'EA-flavored' takes on moral philosophy, such as utilitarianism-flavored stuff, or should I be more 'morally centrist'?"

I think being more "morally centrist" should mean caring about what others care about in proportion to how much they care about it. It seems self-centered to be partial to the human view on this. The notion of arriving at your moral view by averaging over other people's moral views strikes me as relying on the wrong reference class.

Secondly, what do you think moral views have been optimised for in the first place? Do you doubt the social signalling paradigm? You might reasonably realise that your sensors are very noisy, but this seems like a bad reason to throw them out and replace them with something you know wasn't optimised for what you care about. If you wish to a priori judge the plausibility that some moral view is truly altruistic, you could reason about what it likely evolved for.

"I now no longer endorse the epistemics ... that led me to alignment field-building in the first place."

I get this feeling. But I think the reasons for believing that EA is a fruitfwl library of tools, and for believing that "AI alignment" (broadly speaking) is one of the most important topics, are obvious enough that even relatively weak epistemologies can detect the signal. My epistemology has grown a lot since I learned that 1+1=2, yet I don't feel an urgent need to revisit the question. And if I did feel that need, I'd be suspicious it came from a social desire or a private need to either look or be more modest, rather than from impartially reflecting on my options.

"3. Are we deluding ourselves in thinking we are better than most other ideologies that have been mostly wrong throughout history?"

I feel like this is the wrong question. I could think my worldview was the best in the world, or the worst in the world, and it wouldn't necessarily change my overarching policy. The policy in either case is just to improve my worldview, no matter what it is. I could be crazy or insane, but I'll try my best either way.

Really great post, and I'm waiting to see what others would think of it. My personal answer to most questions is that EA isn't as smart as we want to think it is, and we should indeed "be more normal".

One note is that I'd like to challenge the assumption that EA is the “largest and smartest” expert group on “Might AI lead to extinction?”. I don't think this is true? This question involves a ton of different disciplines and many big guesses, and people in EA and Rationality who work on it aren't relatively better at them (and certainly not at all of them at once) than others. EAs might have deliberated more on this question, but the motives for that make it a biased sample.

My impression is that others have thought so much less about AI x-risk than EAs and rationalists, and for generally bad reasons, that EAs/rats are the "largest and smartest" expert group basically 'by default'. Unfortunately with all the biases that come with that. I could be misunderstanding the situation tho.

Yeah, there's almost certainly some self-selection bias there. If someone thinks that talk of AI x-risk is merely bad science fiction, they will either choose not to become an EA or one chooses to go into a different cause area (and are unlikely to spend significant time thinking any more about AI x-risk or discussing their heterodox view).

For example, people in crypto have thought so much more about crypto than people like me . . . but I would not defer to the viewpoints of people in crypto about crypto. I would want to defer to a group of smart, ethical people who I had bribed so heavily that they were all willing to think deeply about crypto whether they thought it was snake oil or more powerful than AGI. People who chose to go into crypto without my massive bribery are much more likely to be pro-crypto than an unbiased sample of people would be.

I think this is true, and I only discovered in the last two months how attached a lot of EA/rat AI Safety people are to going ahead with creating superintelligence— even though they think the chances of extinction are high— because they want to reach the Singularity (ever or in their lifetime). I’m not particularly transhumanist and this shocked me, since averting extinction and s-risk is obviously the overwhelming goal in my mind (not to mention the main thing these Singularitarians would talk about to others). It made me wonder of we could have sought regulatory solutions earlier and we didn’t because everyone was so focused on alignment or bust…

We've thought about it a lot, but that doesn't mean we got anything worthwhile? It's like saying that literal doom prophets are the best group to defer to about when the world would end, because they've spent the most time thinking about it.

I think maybe about 1% of publicly available EA thought about AI isn't just science fiction. Maybe less. I'm much more worried about catastrophic AI risk than 'normal people' are, but I don't think we've made convincing arguments about how those will happen, why, and how to tackle them.

I'd like to challenge the assumption that EA is the “largest and smartest” expert group on “Might AI lead to extinction?”. I don't think this is true?

You seem to imply that there is another expert group which discusses the question of extinction from AI deeply (and you consider the possibility that the other group is in some sense "better" at answering the question)

Who are these people?

I'm not necessarily implying that. EA is not an expert group on AI. There are some experts among us (many of which work at big AI labs, doing valuable research), but most people here discussing it aren't experts. Furthermore, discussing a question 'deeply' does not guarantee that your answer is more accurate (especially if there's more than one 'deep' way to discuss it).

I would defer to AI experts, or to the world at large, more than I would to just EA alone. But either of those groups carries uncertainty and internal disagreement - and indeed, the best conclusion might just be that the answer is currently uncertain. And that we therefore need (as many experts outside EA have now come to support) to engage many more people and institutions in a collaborative effort to mitigate the possible danger.

For example, can I do better than just deferring to the “largest and smartest” expert group on “Might AI lead to extinction?” (which seems to be EA). Can I instead look at the arguments and epistemics of EAs versus, say, opposing academics and reach a better conclusion? (Better in the sense of “more likely to be correct”.) If so, how much and how should I do that in the details?

 

Deference is a major topic in EA. I am currently working on a research project simulating various models of deference.

So far, my findings indicate that deference is a double-edged sword:

  • You will tend to have more accurate beliefs if you defer to the wisdom of the crowd (or perhaps to a subject-matter expert - I haven't specifically modeled this yet).
  • However, remember that others are also likely to defer to you. If they fail to track the difference between your all-things-considered, deferent best guess and the independent observations and evidence you bring to the table, this can inhibit the community's ability to converge on the truth.
  • If the community is extremely deferent and if there is about as much uncertainty about what the community's collective judgment actually is as there is about the object-level question at hand, then it tentatively appears that it's better even for individual accuracy to be non-deferent. It may be that there are even greater gains to be made just by being less deferent than the group.
  • Many of these problems can be resolved if the community has a way of aggregating people's independent (non-deferent) judgments, and only then deferring to that aggregate judgment when making decisions. It seems to me progress can be made in this direction, though I'm skeptical we can come very close to this ideal.

So if your goal is to improve the community's collective accuracy, it tentatively seems best to focus on articulating your own independent perspective. It is also good to seek this out from others, asking them to not defer and to give their own personal, private perspective.

But when it comes time to make your own decision, then you will want to defer to a large, even extreme extent to the community's aggregate judgments.

Again, I haven't included experts (or non-truth-oriented activists) into my model. I am also basing my model on specific assumptions about uncertainty, so there is plenty of generalization from a relatively narrow result going on here.

The idea of deferring to common wisdom while continuing to formulate your own model reminds me EY's post on Lawful Uncertainty. The focus was an experiment from the 60s where subjects guessed card colors from a deck of 70% blue cards. People keep on trying to guess red based on their own predictions even though the optimal strategy was to always pick blue. EY's insight which this reminded me of was:

Even if subjects think they’ve come up with a hypothesis, they don’t have to actually bet on that prediction in order to test their hypothesis. They can say, “Now if this hypothesis is correct, the next card will be red”—and then just bet on blue. They can pick blue each time, accumulating as many nickels as they can, while mentally noting their private guesses for any patterns they thought they spotted. If their predictions come out right, then they can switch to the newly discovered sequence.

Your link didn't get pasted properly. Here it is: Lawful Uncertainty.

Maybe there’s a huge illusion in EA of “someone else has probably worked out these big assumptions we are making”. This goes all the way up to the person at Open Phil thinking “Holden has probably worked these out” but actually no one has.

I just wanted to highlight this in particular; I have heard people at Open Phil say things along the lines of "... but we could be completely wrong about this!" about large strategic questions. A few examples related to my work:

  • Is it net positive to have a dedicated community of EAs working on reducing GCBRs, or would it be better for people to be more fully integrated into the broader biosecurity field?
  • If we want to have this community, should we try to increase its size? How quickly?
  • Is it good to emphasize concerns about dual-use and information hazards when people are getting started in biosecurity, or does that end up either stymieing them (or worse, inspiring them to produce more harmful ideas)?

These are big questions, and I have spent dozens (though not hundreds) of hours thinking about them... which has led to me feeling like I have "working hypotheses" in response to each. A working hypothesis is not a robust, confident answer based on well-worked-out assumptions. I could be wrong, but I suspect this is also true in many other areas of community building and cause prioritisation, even "all the way up".

I think a common pitfall from being part of groups which appear to have better epistemics than a lot of others (i.e. EA, LW) is that being part of these groups implicitly gives a feeling of being able to let your [epistemic] guard down (e.g. to defer). 

I've noticed this in myself recently; identifying (whether consciously or not) as being more intelligent/rational than average Joe is actually is a surefire way for me to end up not thinking as clearly as I would have otherwise. (this is obvious in retrospect, but I think it's pretty important to keep in mind)

I agree with a lot of what you said, and have had similar concerns.  I appreciate you writing this and making it public!