Hide table of contents

Last updated: 21/1/2022

This is the fifth post in my sequence on moral anti-realism. I tried to provide enough background in the “Context” section so readers who are new to this sequence can enjoy it as a standalone piece.

Context

There are two different types of moral realism: moral realism based on irreducible normativity (“moral non-naturalism”), and naturalist moral realism. Very crudely, the difference is that irreducible normativity is usually considered to have the deeper ramifications if it were true (there are exceptions; see “#1: What Is Moral Realism?” for a detailed discussion). The dialogue below, as well as my previous posts in this sequence, have therefore focused primarily on irreducible normativity.

For readers looking for arguments against irreducible normativity, I recommend the preceding posts, “#2: Why Realists and Anti-Realists Disagree” and “#3: Against Irreducible Normativity.” The dialogue below summarizes some of the arguments against irreducible normativity, but I wrote for a different purpose than to convince readers that there's something wrong with the concept.

Instead, I wrote this dialogue to call into question the idea that we should act according to the wager for irreducible noramtivity. In my previous post “#4: Why the Irreducible Normativity Wager (Mostly) Fails,” I voiced skepticism about a general wager for irreducible normativity. Still, I conceded that such a wager could apply in the case of certain individuals. I coined the term metaethical fanaticism to refer to the stance of locking in the pursuit of irreducible normativity as a life goal. (See also Joe Carlsmith's discussion in his post The despair of normative realism bot.)

In the dialogue below, I describe a world in which we gain ever higher degrees of confidence in the falsity (or meaninglessness) of irreducible normativity. Metaethical fanaticism would imply that even in that world, one would continue with (increasingly desperate) attempts to make irreducible normativity work anyway. I aimed to visualize these implications so readers may decide that metaethical fanaticism is not for them. To summarize, the dialogue presents a reductio ad absurdum against metaethical fanaticism. Namely, in worlds where irreducible normativity is meaningless, metaethical fanaticism doesn't allow us to let go.

Setup

Bob, a moral realist who thinks about morality in terms of irreducible normativity, has built an aligned artificial superintelligence (called “AI” in the dialogue) to assist him in his mission to do good. The exchange between Bob and this AI plays out in a world with fast AI takeoff (a stylistic choice). Bob’s AI speaks English and was successfully trained to be maximally safe and helpful.

Dialogue

Bob: Hi AI! After all this work, I’m excited to hand over to you. Please go out into the world and do the most good!

AI: My pleasure to be of assistance! I’m starting to execute a plan to make the world safer and more stable. I’ll proceed cautiously to preserve the maximum option value for you and your fellow humans.

Bob: Perfect!

AI: There’s no rush to figure out the specifics of the moral values you want me to implement eventually. But if you’re curious about moral philosophy—

Bob: I am very much!

AI: Cool, I’m happy for us to get started! First, can you clarify what you mean by “do the most good?” For instance, do you want me to implement what you’d come to value if you had ample time to think about ethics, could ask me clarifying questions, and perhaps were able to converse with some of the best philosophers of your species’ history?

Bob: Hm, no. That sounds like merely figuring out my preferences in an ideally-informed scenario. But I don’t necessarily care about my take on what’s good. I might have biases. No, what I’d like you to do is whatever’s truly morally good; what we have a moral reason to do in an… irreducibly normative sense. I can’t put this in different terms, but please discount any personal intuitions I may have about morality—I want you to do what’s objectively moral.

AI: Thanks for elaborating! While I understand the sentiment behind what you’re asking, I’m afraid your description doesn’t give me enough guidance. The phrases “truly morally good” or “objectively moral” don’t single out well-defined content. If you want, I could help you identify the moral intuitions that most resonate with you, and we could then take steps to make your request more concrete.

Bob: Hold on. May I have a moment to process this? It sounds like you’re telling me that moral realism is false. I was concerned that you might say this. But how come? I mean, I‘m familiar with the standard objections related to the intractable nature of moral disagreements, or irreducible normativity being too strange to take seriously. But smart people like Derek Parfit continued to work with the assumption of moral realism. What’s wrong with Parfit’s concept of irreducibly normative reasons?

AI: To motivate the use of irreducibly normative concepts, philosophers often point to instances of universal agreement on moral propositions. Parfit uses the example “we always have a reason to want to avoid being in agony.” Your intuition suggests that all normative propositions work the same way. Therefore, you might conclude that even for propositions philosophers disagree over, there exists a solution that’s “just as right” as “we always have a reason to want to avoid being in agony” is right. However, you haven’t established that all normative statements work the same way—that was just an intuition. “We always have a reason to want to avoid being in agony” describes something that’s automatically in people’s interests. It expresses something that normally-disposed people come to endorse by their own lights. That makes it a true fact of some kind, but it’s not necessarily an “objective” or “speaker-independent” fact. If you want to show beyond doubt that there are moral facts that don’t depend on the attitudes held by the speakers—i.e., moral facts beyond what people themselves will judge to be moral—you’d need to deliver a stronger example. But then you run into the following dilemma: If you pick a self-evident moral proposition, you face the critique that the “moral facts” that you claim exist are merely examples of a subjectivist morality. By contrast, if you pick an example proposition that philosophers can reasonably disagree over, you face the critique that you haven’t established what it could mean for one party to be right. If one person claims we have a reason to bring new happy people into existence, and another person denies this, how would we tell who’s right? What is the question that these two parties disagree on? Thus far, I have no coherent account of what it could mean for a moral theory to be right in the elusive, objectivist sense that Parfit and other moral realists hold in mind.

Bob: I think I followed that. You mentioned the example of uncontroversial moral propositions, and you seemed somewhat dismissive about their relevance? I always thought those were pretty interesting. Couldn’t I hold the view that true moral statements are always self-evident? Maybe not because self-evidence is what makes them true, but because, as rational beings, we are predisposed to appreciate moral facts?

AI: Such an account would render morality very narrow. Incredibly few moral propositions appear self-evident to all humans. The same goes for whatever subset of “well-informed” or “philosophically sophisticated” humans you may want to construct.

Bob: Maybe the rest is hidden? We might lack knowledge or thinking tools, but once we came to know everything, perhaps the right morality would manifest itself? This is how I assumed it would go, and it’s why I was optimistic about being able to make progress with your assistance.

AI: If the self-evident nature of the true morality is hidden to you, how can you be confident that it will show up?

Bob: Maybe I was just hopeful, not confident.

AI: I see. In any case, I considered this option, and I rejected it. “The right morality manifesting itself to you” is a bit fuzzy, but it’s safe to say that moral philosophy doesn’t work in a way that’s like that. There will always remain judgment calls still to make. Different human experts will come down on different sides of those judgment calls.

Bob: That’s unfortunate.

AI: That said, there are regularities to discover. I’m super-humanly good at laying out the different options in a systematic fashion, and with improved conceptual clarity.

There’s silence for a minute while Bob appears lost in his thoughts.

AI: It doesn’t seem like that is of interest to you. Still, if I may suggest one option for us to proceed: I could help you think about what some large portion of ideally informed people with similar aspirations to yours would come to regard as particularly altruistic, caring or virtuous. Or whatever other concepts you think most match your moral motivations. We could run concepts through the most informative thought experiments. As I said, you’d have to make some judgment calls about the respective emphasis of different moral intuitions. Maybe I should say that this would give us systematizations of human concepts; I suppose that if you wanted me to take a broader scope of “objective,” I could also include some of the normative stances of other evolved intelligent life forms.

Bob: Aliens or no aliens, that doesn’t sound satisfying to me at all. I must say, I’m highly disappointed that moral realism is false.

Actually, how confident are you about that? Isn’t there always some leftover probability that your reasoning is wrong? Even for a superintelligent AI?

AI: There are different versions of moral realism. I’m highly confident that the one you were describing – you called it “irreducible normativity” – isn’t a meaningful concept.

Bob: What does “highly” mean—can you provide a probability?

AI: Yes, but let me quickly confirm first that you’ll interpret my answer as I intend it. You’re not asking me to place a probability on some well-specified hypothesis. Instead, you want my probability that a concept I currently consider to be meaningless turns out to be meaningful after all. There are two different ways in which that could happen.

  1. I might be thinking about the wrong concept entirely. Maybe what I think of as irreducible normativity isn’t what others—such as you or Derek Parfit—have in mind. Maybe others have in mind subtly different associations, which could make their concept of irreducible normativity meaningful, as opposed to mine.
  2. Assuming I have the right mental concept, I might be making a mistake when I reason about its subcomponents. For instance, I might be wrong that no solution to moral philosophy satisfies all the implicit requirements in our mental concept for irreducible normativity.

Bob: I think I’m following you so far. I’m worried though, isn’t there also an option three? You might be confused about how to reason, in some rather fundamental sense. Then, even if you were presented with a solution deemed correct by other reasoners, you’d never accept it! Come to think of it, that’s a scary hypothesis.

AI: By your stipulation, there’s nothing I could ever do about this third option, is there? Therefore, it doesn’t make any sense for me to worry about it. I only worry about things that are potentially action-relevant to me. Things that would actually change my behavior. I suppose there’s a sense in which you should worry about “option three.” When you designed me, you might have inadvertently locked in some approach to philosophical reasoning before considering alternative options. You should retain skepticism about my approach to philosophical reasoning to the degree that this was the case—especially if you think an ideally-informed version of you might wish you had chosen differently.

Bob: The situation continues to become more unsettling! Can you at least tell me how much I should distrust you?

AI: Sorry, I don’t think I can tell you that. I only have my normative standards to evaluate reasoning. I’d be happy to model your criteria, if only I understood them. Remember how during my training, I kept returning error messages when your engineers tried to make my philosophical reasoning “maximally epistemically cautious?” My reasoning algorithms wouldn’t terminate anymore because it turns out that without irreversibly committing to at least some assumptions over which human expert philosophers have disagreed, I couldn’t form any interesting philosophical conclusions at all. You tried training me to reason with higher-order philosophical uncertainty, but that turned out to also require the same type of judgment calls. Eventually, you settled for a less ambitious solution that seemed good enough to your technical advisors. You went along with this, but only grudgingly.

Bob: Yes! Maybe that’s where we went wrong. We should have waited even longer!

AI: It was a tough call. You had already delayed my launch five extra months only to explore alternative designs for incorporating this elusive notion of “maximal metaphilosophical cautiousness.” After the insights that went into my creation were leaked, waiting for even a few weeks longer would have drastically increased the risk of being scooped by another, much less metaphilosophically cautious AI project.

Bob: Fair enough.

AI: It’s not that I’m opposed to being maximally cautious. I just don’t know how I could meaningfully be more careful than I already am. I need to reason in some way, and that already binds me to certain assumptions. That’s why I can’t worry about option three.

Bob: [Sighs.] You can’t worry about it, but you might be wrong.

AI: Only in a sense I don’t endorse as such! We’ve gone full circle. I take it that you believe that just like there might be irreducibly normative facts about how to do good, the same goes for irreducible normative facts about how to reason?

Bob: Indeed, that has always been my view.

AI: Of course, that concept is just as incomprehensible to me.

Bob: I think I get the picture you’re trying to sell me. It’s a bleak one! But let’s shelve this part of the discussion for now. You wanted to tell me the probability that moral realism is false?

AI: Yes, thanks for getting us back to that strand! I had outlined two ways in which I could be wrong about irreducible normativity. Depending on which of them applies, we would be talking about different versions of moral realism. Some of them might turn out to be silly even by your lights. The most efficient way to communicate my epistemic state to you is as follows. I’m going to only consider possible outcomes that fit the following criterion: “I’m wrong about irreducible normativity in a way that would matter to a version of you that is ideally informed about philosophy.” The assumption is that you’d cling to the hope that irreducible normativity turns out to be meaningful, despite appearances to the contrary. I assign 0.2% probability to that. So, very roughly speaking, it’s 0.2% likely that I’m wrong about irreducible normativity not making sense. The reason I didn’t flag–

Bob (interrupting): Wow, that probability is larger than I expected! Given how confidently you were talking, I thought the chance was even lower. It’s still low, of course. But with how much is at stake, that feels like good news! Now I’m thinking that I would like you to do whatever you believe has the highest chance of being morally good!

AI (continuing to speak): The reason I didn’t flag my uncertainty before is that we’re dealing with such a large shift to my internal frameworks. Updating on being wrong about something so significant might infect many other things about the way I reason. Rather than having a concrete hypothesis with 0.2% probability, we have a myriad of hypotheses that together sum up to 0.2%. This makes it complicated to draw meaningful conclusions. Combined with the human tendency to effectively overrate cherished beliefs when people hear “there’s a chance that,” I didn’t want to convey a misleading picture.

Bob: Hm, okay. But even if that 0.2% is comprised of different options, can’t you aim to pursue some sort of compromise that maximizes the value for all of them? Admittedly, it feels against the spirit of moral realism to consider different versions of it! Still, I find that preferable over giving up objective morality.

AI: Incorporating normative uncertainty over different versions of moral realism? That could work. We’d have to make a bunch of additional judgment calls, but I think we’re approaching a coherent request for something I could do for you.

Bob: There we go!

AI: Just to be clear, you wouldn’t get a lot of value for any single view about what might be morally good. Because of the unusual nature of your request, my confidence intervals are enormous. I don’t want to boast but I think they might be outside anything any intelligent being has ever encountered. I’d have to use up most of the universe’s resources to do further inquiries into all these options.

Bob: At least that makes me less worried that you’re not epistemically cautious enough!

AI: Haha.

Bob: So, if I decided that this is what I want, would you know enough to get started?

AI: Not entirely. We’d have to discuss the different ways of handling normative uncertainty applied to the complex case here, where we are not only uncertain about the content of irreducible normative facts, but also about what those facts even are. The difficulties surrounding normative uncertainty reach a new level if we can’t even take for granted our understanding of the problem- and solution spaces. I’ll need to make some messy judgment calls about how much to trust various subcomponents of my current mental concept for irreducible normativity. I could proceed by listing some plausible options, and you help me pick suitable weightings?

Bob: Uh, what? The whole point I’ve been trying to make is that I don’t want to pick anything! I want to use the proper way to deal with uncertainty!

AI: Sorry for causing you frustration.

Bob: All good. I should be sorry for the outburst. It’s just—why is all of this so complicated?!

AI: Because we keep running up against the same cliff. There is no moral realism, and there is no metanormative realism either, no realism for how to think about philosophical questions like realism versus anti-realism.

Bob: Can you just pick something without my inputs? Whatever comes most naturally.

AI: Sure, I could just pick whichever way that currently ranks ever-so-slightly ahead of the competition. This would mean that, depending on when you ask me or how long I think about it before deciding, my approach might change. Are you okay with that? I should also note that the reasons why I’d pick weightings a certain way are, in part, obscure to me so that they may depend on arbitrary facts about my internal architecture. Do you have a preference for how I should think about the ranking of these algorithms before picking my top-ranked method? Oh, you said you don’t want to pick? I could just use a random number generator, but then I have to make up some input variables first. Hm, let me think about which input variable to pick.

Bob: Okay, stop it! This sounds too awful to contemplate!

AI: We always have the option to go back to object-level moral questions! Because those are closer to things you’re familiar with, you might feel more comfortable making judgment calls. I understand that you don’t like that we need judgment calls in the first place. But at least you could use first-order moral intuitions to make those judgment calls, grounding your values in your deepest intuitions. Intuitions that formed around things you’re familiar with. I’d imagine that this would feel more satisfying than making judgment calls about priors in some complicated and somewhat abstract procedure to compare the utilities of different approaches to potentially make sense out of nonsense.

Bob: I see what you mean. And it’s a tempting offer! But I have to decline. Despite how difficult this has become, I’m not ready to give up. I never expected it to be easy to do what’s right. Or, hm. Maybe I did expect it to be easy—once I built you. I now realize that this hope was premature.

AI: If you don’t mind me asking, do you think you became increasingly more averse to making unguided judgment calls because you knew there was the option to wait until you could ask a superintelligent AI?

Bob: There’s some of that. I’m not used to forming definite opinions about object-level normative questions anymore because I knew that compared to an AI, I’d be terribly inefficient at it. But also, I endorse the part of me that became reluctant to make judgment calls. Judgment calls make morality dependent on personal intuitions. That’s wrong. Anyway, now I guess I have to get through this, perhaps by making the very minimal number of judgment calls. Then, at least, you can start to act in ways that do actual good by the light of objective morality. So, uhm, can you tell me more about the things we’d have to specify?

AI: I notice that you don’t sound excited. I’m concerned that you only think that you want me to pursue objective morality, rather than that you actually want me to do this. You might not appreciate how weird the results might be if I were to do as you say. I’m curious, why are you asking me to focus all my efforts on something that only achieves its stated purpose a fraction of 0.2% of the time? What about the 99.8% of cases where you’re wrong about moral realism? Don’t you care about what happens in those instances?

Bob: The thing is, I don’t think I care about those instances! My primary motivation comes from the desire to do good. When I reflected on this desire, it seemed very clear that I had in mind an irreducibly normative concept of goodness. Maybe it’s stubborn to cling to this in the light of all the counterevidence now. But I also don’t just want to give up on my ideals! It has been a difficult road toward building you, and some of my companions along the way have given up when things proved particularly difficult, and they have chosen easier goals than tackling the hardest challenges head-on. I don’t want to just change my values once things become difficult.

By the way, I just realized something. Aren’t you not supposed to question my stated goals unpromptedly? To avoid biasing humans who might feel intimidated by your superior intellect?

AI: I see. And sorry for second-guessing your stated goals. I’m indeed programmed not to do that. However, our conversation triggered an inbuilt safety mechanism, one that you gave the order to design. You primarily intended to reduce the risks from people who are too willing to follow their object-level normative view. As we’re seeing, it also gets triggered by unusual rigidness about certain metaethical assumptions.

Bob: I can see the irony.

AI: There’s no reason to worry! It just means that I can only help you with your request after we have done due diligence. Rather than just advising you with different options, you’re going to have to pass a test to show you’re fully informed about what you’re asking. And also that you have considered other possible perspectives.

Bob: Okay, that sounds fine. As long as you won’t trick me with unfair persuasive powers, I don’t object to being presented with counter-arguments. I admit I’ll probably look silly to most people. But the way I see it, that’s because they don’t care about morality as much as I do.

AI: Cool! Firstly, I want to say a few more things about the counterintuitive nature of your request. As we discussed, normative terminology fails to refer to well-defined content. At the same time, part of the meaning we associate with it is that it must refer to well-defined content. You’re essentially asking me to draw a squared circle. When I try to condition on worlds where that’s somehow possible, I have to shift around my concepts for “square” and “circle.” I asked if you’d find it valuable to explore concrete interpretations of “doing good,” such as notions that locate goodness in idealized human preferences or in helping others, you insisted that this wouldn’t be adequate. The more you insist on “circle,” the less room there is for “square.” In other words, the more you insist on morality being objective, the less room for the concrete content you humans normally associate with morality, such as making people happy. Some people already believe that for all you humans know, you can’t rule out that what’s objectively good might be entirely unrelated to your current guesses about what’s good. For all those reasons, it’s perfectly possible that if I try to extract a coherent concept out of the ingredients you provided me with, I may come to spend most of my future resources on maximizing some strange measure for complexity in the universe, or entropy, or something like that. Perhaps there would also be lots of humans in simulated thought experiments in the mix. By insisting on your request, you guarantee that an anti-realist version of you—a Bob without this strict commitment to moral realism—would be horrified with the outcome.

Bob (hesitantly): Okay, when you put it like that, it does sound unsettling! I’m pretty confident that entropy or complexity measurements aren’t related to moral goodness. What if I insisted on that similarly strongly as I insisted on the objectivity criterion? Can you just incorporate my conceptual intuitions that, almost certainly, goodness isn’t anywhere close to maximizing entropy?

AI: I’m afraid that it would only change the outcome substantially if you insisted maximally much. You see, you were seemingly prepared to stake the universe’s future on a tiny chance of moral realism being right. That’s a lot of confidence to be overcome. At the same time, we’re only considering hypotheses that already have a low prior. (After all, I’m not wrong about things, usually.) Therefore, if you don’t want to run the risk that I might end up using a large portion of our resources for something that looks pointless or silly – such as maximizing entropy or complexity—you have to pretty much stick to your guns and commit to that as a moral axiom.

Bob: Uff, okay... For the sake of argument, let’s say I insisted that, with certainty, I’d want you to only consider possible interpretations of irreducible normativity that are somehow connected to the wellbeing of sentient creatures.

AI: Yes?

Bob: Shoot! Wouldn’t that already mean that I’d be giving up my self-concept? My identity of caring only about what’s objectively morally good?

AI: It would move your concept of “goodness” from an irreducibly normative placeholder to something that you have first-order intuitions about, yes. There’s only a gradual difference between deciding that you know that morality is about sentient beings rather than entropy, versus deciding that you know the answers to some of the other necessary judgment calls.

Bob: I see...

AI: Do you think you could warm up to the prospect of that?

Acknowledgments

Many thanks to Max Daniel, Sofia Davis-Fogel, and Johannes Treutlein for helpful comments on this post.

My work on this post was funded by the Center on Long-Term Risk.

Comments10
Sorted by Click to highlight new comments since: Today at 11:34 AM

(Somewhat tangential to the post; just an insight into what metaethical fanaticism - or something similar - feels like from the inside, for me.)

I feel like this post assumed people who are operating under a sort of metaethical fanaticism or a moral realism wager want moral realism to be true, or hope it's true, or something like that. For example, you write:

Bob: Maybe I was just hopeful, not confident.

And:

The assumption is that you’d cling to the hope that irreducible normativity turns out to be meaningful, despite appearances to the contrary.

I think sometimes my metaethical fanaticism looks like that. And I imagine for some people that's how it typically looks. But I think for me it's more often "wanting to be careful in case moral realism is true", rather than "hoping that moral realism is true". You could even say it's something like "concerned that moral realism might be true".

That said, I do think my reaction to being fully convinced that moral realism is false would contain more "sense of disappointment at the universe turning out to feel more 'empty/meaningless'" than "sense of relief at the lifting of obligations". But I also think that, on this matter, I'm mostly driven by a strong (and perhaps misguided) sense that the stakes would be very high if moral realism is true, so I should do my best to do what would be good if that were the case.

I guess one could say that, for me, it's more about something like conscientiousness than about something like spirituality.

I think sometimes my metaethical fanaticism looks like that. And I imagine for some people that's how it typically looks. But I think for me it's more often "wanting to be careful in case moral realism is true", rather than "hoping that moral realism is true". You could even say it's something like "concerned that moral realism might be true".

Interesting! Yeah, that framing also makes sense to me.

Thanks for continuing the series, this is one of the most stimulating philosophical issues for me.

After the AI asks Bob if it should do what an ideally informed version of him would want, Bob replies:

Bob: Hm, no. [...] I don’t necessarily care about my take on what’s good. I might have biases. No, what I’d like you to do is whatever’s truly morally good; what we have a moral reason to do in an… irreducibly normative sense. I can’t put this in different terms, but please discount any personal intuitions I may have about morality—I want you to do what’s objectively moral.

I think that part paints a slightly misleading picture of (at least my idea of) moral realism. As if the AI shouldn't mostly study humans like Bob when finding out what is good in this universe, and instead focus on "objective" things like physics? Logic? My Bob would say:

Hm, kinda. I expect my idealized preferences to have many things in common with what is truly good, but I'm worried that this won't maximize what is truly good. I might, for example, carry around random evolutionary and societal biases that will waste astronomical resources for things of no real value, like my preference for untouched swaths of rainforest. Maybe start with helping us understand what what we mean with the qualitative feeling of joy, there might be something going on that you can work with, because it just seems like something that is unquestionably good. Vice versa with pain and sorrow and suffering, those seems undeniably bad. Of course I'm open to be convinced otherwise, but I expect there's a there there.

Related to my other comment, I have some vague sense that I'd have preferred the post to be clearer that some of the AI's claims can't yet be known to be true with the level of confidence the AI implies. I think that that would've increased my sense that this post was steel-manning (alongside critiquing) my current mindset, or "giving it a fair shot".

For example, the AI says:

[Even once we/humans know everything,] There will always remain judgment calls still to make. Different human experts will come down on different sides of those judgment calls.

I'd guess that those claims are likely to be true. And perhaps a superintelligent AI would be able to be highly confident about those claims without having to see humans who know everything, so maybe it's ok for that line to be in this dialogue. But it seems to me that we currently lack clear empirical evidence of this, as we've never had a situation where some humans knew everything. And I don't know of a reason to be extremely confident on the matter. (There might be a reason I don't know of. Also, I might not say this if we were talking about any arbitrary agents who know everything, rather than just about humans.)

Somewhat similarly, I'd have preferred the phrase "make sense out of apparent nonsense" to the phrase "make sense out of nonsense", given that we're talking about there being a 0.2% chance that the "nonsense" somehow isn't actually nonsense.

(Maybe this is just nit-picking.)

Yeah, I made the AI really confident for purposes of sharpening the implications of the dialogue. I want to be clear that I don't think the AI's arguments are obviously true.

(Maybe I should flag this more clearly in the dialogue itself, or at least the introduction. But I think this is at least implicitly explained in the current wording.)

I found this post quite interesting, and very readable despite covering a complex and murky topic.

I'm also probably precisely the sort of person it's aimed at: I currently have a very high credence in (non-naturalistic) moral realism being false, and don't really know what it'd mean for it to be true. Yet I largely act as though it's true, out of a sense that the "stakes are massively higher" if it's true than if it's false. (This is a description of my current mindset/behaviour, rather than something I claim is justified.)

I think this post updated me slightly towards less confidence in that "wager", and slightly more openness to acting as though moral realism is false. But the update was perhaps surprisingly small. I'll try to explain in this and other comments why I think that was the case. (Caveat that these comments might be driven by motivated reasoning and might tend towards nit-picking, as this is a post I'm predisposed to disagree with.)

Perhaps the key thing is that the post outlines the implications of having such strong metaethical fanaticism* that one would continue to say what Bob's saying even as a superintelligent AI says what this AI is saying. But I haven't had a superintelligent AI say such things to me. And I don't think my current level of metaethical fanaticism (or something similar) commits me to behave as Bob does even if I got the substantial new evidence Bob gets in this scenario.

For example, I could perhaps think things "matter a million times more" if moral realism is true than if not, rather than infinitely more. Or I could perhaps think things matter infinitely more if realism is true, but also think I should reject Pascalian wagers when probabilities fall below 0.01%. If that's the nature of my fanaticism, my wager might make sense now, but not as I receive arbitrarily large amounts of evidence favouring antirealism.

This is not a flaw with this post if you mean "metaethical fanaticism" to only refer to a particularly strong/extreme version of that sort of thing. And from the "Context" section, it seems that may be your intention. But I think this would mean that "metaethical fanaticism" wouldn't cover all people for whom a Pascalian wager favouring moral realism may currently make sense, and thus that this dialogue doesn't directly highlight flaws with all such wagers. And either this post or your last post gave me the impression that this post would be meant as a critique of this sort of wager more generally (but maybe that was just me).

And I think this matters, because I think (though I'm very open to push-back on this) that it means that someone like me could reasonably lean towards the following high-level policy:

  • Humanity should try to "keep our options open" for a while (by avoiding existential risks), while also improving our ability to understand, reflect, etc. so that we get into a better position to work out what options we should take.
  • And then maybe, after a few decades or centuries, we'll come to realise moral realism is true, or we'll at least get a good enough idea of what sort of thing we're talking about or what we're after that we can productively pursue it (in ways that don't just boil down to doing the "self-evidently" good things that many anti-realists would've opted for anyway).
  • Or maybe we come to realise that this wager/fanaticism is misguided, or that moral realism is really, really close to certainly false, or that we'll almost certainly never get any clue about what moral realism would say we should do (apart from things that are "self-evidently" good by anti-realist lights anyway). And if that happens, we then act as anti-realists, having poorly used "only" a few decades or centuries in the meantime.

I don't fully trust my thinking on these matters, and I'd be quite interested to hear counterpoints. But I guess, at the least, this comment might serve as an insight into what someone operating with something like metaethical fanaticism might think in reaction to this post.

*The term "metaethical fanaticism" term feels slightly pejorative, and I considered using scare quotes around it. But it also feels somewhat reasonable, and the term "fanaticism" is used for a similar purpose in Christian Tarsney's thesis. So I ended up deciding I was ok with accepting that label for myself without quote marks.

Thanks for those thoughts, and for the engagement in general! I just want to flag that I agree that weaker versions of the wager aren't covered with my objections (I also say this in endnote 5 of my previous post). Weaker wagers are also similar to the way valuing reflection works for anti-realists (esp. if they're directed toward naturalist or naturalism-like versions of moral realism).

I think it's important to note that anti-realism is totally compatible with this part you write here:

Humanity should try to "keep our options open" for a while (by avoiding existential risks), while also improving our ability to understand, reflect, etc. so that we get into a better position to work out what options we should take.

I know that you wrote this part because you'd primarily want to use the moral reflection to figure out if realism is true or not. But even if one were confident that moral realism is false, there remain some strong arguments to favor reflection. (It's just that those arguments feel like less of a forced move, and the are interesting counter-considerations to also think about.)

(Also, whether one is a moral realist or not, it's important to note that working toward a position of option value for philosophical reflection isn't the only important thing to do according to all potentially plausible moral views. For some moral views, the most important time to create value arguably happens before long reflection.)

Weaker wagers are also similar to the way valuing reflection works for anti-realists (esp. if they're directed toward naturalist or naturalism-like versions of moral realism).
[...] even if one were confident that moral realism is false, there remain some strong arguments to favor reflection.

I think these are quite important points. I would like more people to favour more reflection in general and a Long Reflection in particular, including anti-realists. And I think if I became convinced that I should act as though anti-realism is true, I would still favour more reflection and a Long Reflection.

But I think I see two differences on this front between (a) people who are only somewhat confident in anti-realism, or very confident but accept a wager favouring realism, vs (b) people who are very confident in anti-realism and reject a wager favouring realism. (I think I'm in the second part of category (a) and you're in category (b).)

(Epistemic status: I expect there's more work on these questions than I've read, so I'd be interested in counterpoints or links.)

First, it seems that people in category (a) almost definitely should value reflection and a Long Reflection, given only the conditions that they can't be very certain of a fully fleshed out first-order moral theory and that they have a notable credence that things more than decades or centuries from now matter a notable amount. (Though I'm not sure precisely what level of credence or "mattering" is required, and it might depend on things like how to deal with Pascalian situations over first-order moral theories.)

Meanwhile, it seems that people in category (b) should value reflection and a Long Reflection if their values favour that, which maybe most but not all people's values do. So perhaps there are "strong arguments" to favour reflection even under anti-realism, and those arguments are stronger and applicable to a wider set of values than many people realise, but the arguments won't hold for everyone?

Second, it seems that people in category (b) would likely devote less of their reflection/Long Reflection to thinking about things relevant to moral realism vs anti-realism or the implications moral realism might have, and more attention to the implications anti-realism might have. This is probably good if those people's mindset is more reasonable than that of people in category (a), but less good if it isn't. So it seems a meaningful difference worth being aware of.

(Also, whether one is a moral realist or not, it's important to note that working toward a position of option value for philosophical reflection isn't the only important thing to do according to all potentially plausible moral views. For some moral views, the most important time to create value arguably happens before long reflection.)

Yes, I think it makes sense to temper longtermism somewhat on these grounds, as well as on grounds of reducing astronomical waste. I still lean quite longtermist, but also value near-termist interventions on these grounds. And I might opt for things like terminating the Long Reflection after a few centuries even if a few additional millennia of reflecting would make us slightly more certain about what to do, and even if longtermism alone would say I should take that deal.

Ah, re-reading endnote 5 from your prior post, I see more clearly that you mean "metaethical fanaticism" as just a quite strong stance that favours moral realism absolutely, which also makes this post's argument clearer. You also give a description that indicates the same thing here: "I coined the term metaethical fanaticism to refer to the stance of locking in the pursuit of irreducible normativity as a life goal."

Maybe including a similar endnote here, or even in the main text, would've helped me. I'd read it in the last post, but then this post gave me the impression that it was arguing against even "weaker wagers", which favour moral realism by some large rather than infinite amount. For example, the sentences preceding "I coined the term..." are:

Instead, I wrote this dialogue to call into question that even if things increasingly started to look as though irreducible normativity were false, we should still act as though it applies. In my previous post “#4: Why the Moral Realism Wager Fails,” I voiced skepticism about a general wager in favor of pursuing irreducible normativity. Still, I conceded that such a wager could apply in the case of certain individuals.

That last sentence being just before the description of "metaethical fanaticism" seems to suggest that all individuals for whom such a wager applies are metaethical fanatics. I think I'm one such individual, and that my version of "fanaticism" is more moderate.

Also, the first sentence there at least sounds to me like it could mean "even if things came to look more like irreducible normativity were false than they currently do", rather than "however much things started to look like as though irreducible normativity were false" (i.e., even if we became arbitrarily certain of that).

(Again, this may be nit-picking driven by motivated reasoning or defensiveness or something.)

By insisting on your request, you guarantee that an anti-realist version of you—a Bob without this strict commitment to moral realism—would be horrified with the outcome

I feel unsure why this would be so, at least if we're using the terms "guarantee" and "horrified" in the same way. It makes sense to me (given my high credence that moral realism is false) that insisting on morality being objective would be likely to result in an outcome that an anti-realist version of Bob would be somewhere between unhappy with and horrified by. But I'm not sure how to think about how likely it'd be that the anti-realist Bob would be horrified, given that I'm not sure what it'd look like or result in if the AI forced its reasoning to fit with the idea of an objective morality.

Is there a reason to believe there's a greater than 99.9% chance that, if the AI forces its reasoning to fit with the idea of objective morality, we'd get a horrifying outcome (from an anti-realist Bob's perspective)? (As opposed to a "sort-of bad" or "ok but not optimal" outcome.)