Integrity for consequentialists

Paul_Christiano

Integrity for consequentialists

Paul_Christiano

9 min readNov 14, 2016

186

Comments 18

Sorted by

New & upvoted

Paul_Christiano

I still broadly endorse this post. Here are some ways my views have changed over the last 6 years:

At the time I wrote the OP I considered consequentialist evaluation the only rubric for judging principles like this, and the only reason we needed anything else was because of the intractability of consequentialist reasoning or moral uncertainty. I’m now more sympathetic to other moral intuitions and norms, and think my previous attempts to shoehorn them into a consequentialist justification involve some motivated cognition and philosophical error.
That said I’m now more sympathetic to evidential cooperation in large worlds and a bit less confused about decision theory. So overall I’m more convinced by a range of consequentialist arguments for common-sense moral judgments, including the principle expressed in this post. I don't think this is the most important form of justification, but it does slightly strengthen those intuitions and plays a role when trying to clarify them in weird cases (e.g. when considering our obligations towards AI systems rather than humans).
I’m more hesitant about retaliation than when I wrote the OP, and am mostly unwilling to “do malicious things that have no direct good consequences for me” except in cases where people have opted in to retaliation for bad behavior (e.g. by agreeing to a contract or putting down a deposit).
Although I still endorse this post and think that some relevant arguments have become stronger, I’m more sensitive to a bunch of ways it’s complicated and incomplete. Overall I have less conviction about everything in this space. I do still try to just behave with integrity in a straightforward way, and do think that this is an unusually robust ethical conclusion despite acknowledging more uncertainty about it.

interstice

What arguments/evidence caused you to be more hesitant about retaliation?

Vasco Grilo🔸

Nice post!

In case someone is wondering:

Updateless Decision Theory (UDT) is a decision theory meant to deal with a fundamental problem in the existing decision theories: the need to treat the agent as a part of the world in which it makes its decisions.

purplepeople

This is an excellent post. I've been struggling myself to understand to what extend deontological values and the inherent irrationality of humans need to be factored into consequentialist decision making. I've become more and more convinced that values and social norms matter much more than I had previously thought.

n99

If you ask me a question that I don't want to answer, and me saying "I don't think I should answer that" would itself reveal information that I don't want to reveal, then I will probably lie.

We could decline to answer some questions that aren't too revealing so that people won't know which is which. The cost of hiding some innocuous things seems much lower than the benefit of being trusted.

TruePath

I think this post is confused on a number of levels.

First, as far as ideal behavior is concerned integrity isn't a relevant concept. The ideal utilitarian agent will simply always behave in the manner that optimizes expected future utility factoring in the effect that breaking one's word or other actions will have on the perceptions (and thus future actions) of other people.

Now the post rightly notes that as a limited human agent we aren't truly able to engage in this kind of analysis. Both because of our computational limitations and our inability to perfectly deceive it is beneficial to adopt heuristics about not lying, stabbing people in the back etc.. (which we may judge to be worth abandoning in exceptional situations).

However, the post gives us no reason to believe it's particular interpretation of integrity "being straightforward" is the best such heuristic. It merely asserts the author's belief that this somehow works out to be the best.

This brings us to the second major point, even though the post acknowledges the very reason for considering integrity is that, "I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on." the post proceeds to act as if it was considering what kind of integrity like notion would be appropriate to design into (or socially construct) in some alternative society of purely rational agents.

Obviously, the way we should act depends hugely on the way in which others will interpret our actions and respond to them. In the actual world WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST. It doesn't matter if some other alternate notion of integrity might have been better to have if we don't show integrity in the traditional manner we will be punished.

In particular, "being straightforward" will often needlessly imperil people's estimation of our integrity. For example, consider the usual kinds of assurances we give to friends and family that we "will be there for them no matter what" and that "we wouldn't ever abandon them." In truth pretty much everyone, if presented with sufficient data showing their friend or family member to be a horrific serial killer with every intention of continuing to torture and kill people, would turn them in even in the face of protestations of innocence. Does that mean that instead of saying "I'll be there for you whatever happens" we should say "I'll be there for you as long as the balance of probability doesn't suggest that supporting you will cost more than 5 QALYs" (quality adjusted life years)?

No, because being straightforward in that sense causes most people to judge us as weird and abnormal and thereby trust us less. Even though everyone understands at some level that these kind of assurances are only true ceterus parabus actually being straightforward about that fact is unusual enough that it causes other people to suspect that they don't understand our emotions/motivations and thus give us less trust.

In short: yes, the obvious point that we should adopt some kind of heuristic of keeping our word and otherwise modeling integrity is true. However, the suggestion that this nice simple heuristic is somehow the best one is completely unjustified.

Paul_Christiano

I apologize in advance if I'm a bit snarky.

The ideal utilitarian agent will simply always behave in the manner that optimizes expected future utility factoring in the effect that breaking one's word or other actions will have on the perceptions (and thus future actions) of other people

This view is not broadly accepted amongst the EA community. At the very least, this view is self-defeating in the following sense: such an "ideal utilitarian" should not try to convince other people to be an ideal utilitarian, and should attempt to become a non-ideal utilitarian ASAP (see e.g. Parfit's hitchhiker for the standard counterexample, though obviously there are more realistic cases).

However, the post gives us no reason to believe it's particular interpretation of integrity "being straightforward" is the best such heuristic. It merely asserts the author's belief that this somehow works out to be the best.

I argued for my conclusion. You may not buy the arguments, and indeed they aren't totally tight, but calling it "mere assertion" seems silly.

the very reason for considering integrity is that, "I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on."

This is neither true, nor what I said.

WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST

This is what it looks like when something is asserted without argument.

I do agree roughly with this sentiment, but only if it is interpreted sufficiently broadly that it is consistent with my post.

Does that mean that instead of saying "I'll be there for you whatever happens" we should say "I'll be there for you as long as the balance of probability doesn't suggest that supporting you will cost more than 5 QALYs" (quality adjusted life years)?

I tried to spell out pretty explicitly what I recommend in the post, right at the beginning ("when I imagine picking an action, I pretend that picking it causes everyone to know that I am the kind of person who picks that option"), and it clearly doesn't recommend anything like this.

You seem to use "being straightforward" in a different way than I do. Saying "I'll be there for you whatever happens" is straightforward if you actually mean the thing that people will understand you as meaning.

TruePath

-1

Re your first point yup they won't try to recruit others to that belief but so what? That's already a bullet any utilitarian has to bite thanks to examples like the aliens who will torture the world if anyone believes utilitarianism is true or ties to act as of it is. There is absolutely nothing self defeating here.

Indeed if we define utilitarianism as simply the belief that ones preference relation on possible worlds is dictated by the total utility in then it follows by definition that the best act an agent can take are just the ones which maximize utility. So maybe the better way to phrase this is as: why care what the agent who pledges to utilitarianism in some way and wants to recruit others might need to do or act that's a distraction from the simple question of what in fact maximizes utility. If that means convincing everyone not to be utilitarians then so be it.

And yes re the rest of your points I guess I just don't see why it matters what would be good to do if other agents respond in some way you argue would be reasonable. Indeed, what makes consequentialism consequentialism is that you aren't acting based on what would happen if you imagine interacting with idealized agents like a Kantianesque theory might consider but what actually happens when you actually act.

I agree the caps were aggressive and I apologize for that and I agree I'm not trying to produce evidence which says that in fact how people respond to supposed signals of integrity tends to match what they see as evidence you follow the standard norms. That's just something people need to consult their own experience and ask themselves if, in their experience, thay tends to be true. Ultimately I think that it's just not true that a priori analysis of what should make people see you as trustworthy or have any other social reaction is a good guide to what they will do?

But I guess that is just going to return to point 1 and our different conceptions of what is utilitarianism requires.

Robert_Wiblin

"WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST"

I think there is a lot to this, but I feel it can be subsumed into Paul's rule of thumb:

You should follow a standard societal notion of what is decent behaviour (unless you say ahead of time that you won't in this case) if you want people to have always thought that you are the kind of person who does that.

Because following standard social rules that everyone assumes to exist is an important part of being able to coordinate with others without very high communication and agreement overheads, you want to at least meet that standard (including following some norms you might have reservations about). Of course this doesn't preclude you meeting a higher standard if having a reputation for going above and beyond would be useful to you (as Paul argues it often is for most of us).

[anonymous]

Can you show an example of how this set of rules helps you to "rexover the ability to engage in normal levels of being a jerk when it's actually a good idea"?

Paul_Christiano

Suppose I am considering saying something mean about someone in a context where they won't hear me, and I would be unwilling to say the same thing to their face. I have a hard time with this in general. But there are cases where it is OK according to this heuristic (when they'd be fine knowing that I would say that kind of thing about them under those conditions), and I think those are the cases that I endorse-on-reflection.

GlobalGuru

What an interesting area of debate. I want to add my views, with a language and experience from beyond EA, and welcome any kind of response.

As a leadership coach, I hold that my principal aim in coaching is to deepen my clients' ability to act from the greatest personal integrity, on the basis that the greater their integrity, then the better the consequences of their actions are likely to be for themselves and others, and the whole world - along the EA principles of Do Good Better and Be Kind. People can sense integrity and it is closely bound up with trustworthiness. Yes, it can also be mis-judged by others.

As I read these posts I found myself looking at a headline in our local paper which said "My mum says to do everything with love". Simple. In my experience, greater integrity leads to greater understanding of love. As the Dalai Lama puts it: loving-kindness. My coaching method of increasing integrity is to challenge (to resolve) internal conflicts of belief, as this develops greater integration of the personality in the whole body-mind and spirit, rather than just continuing a rationalising argument in the head alone. Solve the war inside yourself, don't create it outside yourself.

I don't see the rationalisation of actions such as lying, deceit, revenge, shaming or punishment, even in part, as having any place whatsover in a person who wants to demonstrate integrity, love, Doing Good Better or Being Kind. These are unintegrated actions, coming from limiting personal beliefs, which just produce an escalation in lowering standards of behaviour, increasing the harms, and making the world a worse place for everyone. The problem is that the world seems to be adopting such principles, including non-compliance with the law.

The first post seems to be written on the premise that the purpose of integrity is to enhance one's own reputation, rather than the better consequentialist purpose of Doing Good Better. But reputation is entirely subjective within the eye of the other beholder, whereas integrity is entirely subjective within the control of the subject, even though the ultimate consequences might be beyond control.

The problem with concepts like acting out of revenge, or indeed offence, is that they are usually based upon subjective and irrational emotions, devoid of care about the important details, such as ascertaining the true facts, compliance with the rule of law, and the fact that perceptions differ. In any action, whether intended to be good or bad, we cannot predict how others will perceive and respond to our actions. "Others" may just project their own lack of integrity onto us as the "subject", to make us into "perpetrator" to fit their "victim" belief in themselves. The smart response to perceived harm is to see the crisis as an opportunity for active resolution for greater good, not as a self-appointed victim who responds with greater self-justified harm.

Unfortunately, this is not how our current major national leaders act, because they appear not to have done the necessary personal introspection to see the value of acting out of integrity and loving-kindness. Their knee-jerk reaction is to use greater military power to try to destroy their unintegrated sense of "evil" that they project onto others.

Our system for selecting our leaders, at every level, in every institution, is outdated and entirely corrupt (for evidence, read The Dictator's Handbook). Given that it is the most powerful ones who create (and continue) the greatest man-made global existential risks (whilst supposedly being responsible for "Peace & Security" on our planet), personally, I would love to see EA rationalise about how best to change this horrible system which produces the most dangerously corrupt leaders desperately lacking with integrity. And then ACT to create anew. Develop the DAOs?

kbog

The first half of your essay (your method of only deceiving when it would still make sense to deceive if people knew you were such a deceiver) looks entirely disjoint from second half. In what way do the graphs, the reasons for being honest, etc., support this particular mindset that you have chosen? They just give complicated consequentialist reasons for being honest, which seems to be what you were trying to avoid in the first place.

I don't think the graph makes anything clearer. Are we assuming that you're holding the benefits of deceit fixed? Because that changes a lot of things. We can't decide whether or not deceit is a good idea without having the expected value of deceit.

Why are you marking the typical thought experiment as having a very low cost of discovery? I would think that many typical thought experiments could have a very high cost of discovery - they could reference serious transgressions where large amounts of money, national secrets, lives, etc are at stake and where you might be seen as very immoral for not being honest despite the greater good of your actions. So the cost of discovery would be high yet the probability of discovery would be zero in such a thought experiment. On the other hand, there could be plenty of instances in our lives where we are likely to be discovered yet the cost of discovery is low. For instance, Wikipedia canvassing, or something along those lines.

So I don't see what this line is doing in the two-dimensional space of possibilities. Why do you assume that all instances of deceit take place along this line?

Maybe you're saying that if you hold almost everything constant, then people's reaction to somebody else's deceit depends on how likely they were to be discovered? But it's not clear that it's a large factor. For one thing, people's emotional attitudes to something like this are complex dispositions, not clear functions, and we're contradictory and flawed reasoners. For another, I can't even tell if we do care about someone's expectation of being discovered in our judgements upon those who have committed deceit. Yes, technically it makes sense to deter people more from concealable behavior, but only on a utilitarian principle of punishment does that make sense - which is far from a close approximation of people's emotional response to deceit. It's not a factor in retributive accounts of punishment nor does it play into accounts of moral blameworthiness as far as I know.

I don't see how you even arrived at the shape of the line. You draw it as upward-sloping, but in your bullet-points you give reasons to believe that it would be downward-sloping. You seem to think that these bullets make it more hyperbolic than linear but I don't see how you arrived at that conclusion from the bullet points, which quite clearly imply that the line would just slope downward rather than upward. You assume that the bullet points modulate the interior of the line but not the end points, which is just weird to me.

Also, let me clarify how a thought experiment works. It's not supposed to provide a guide to effective behavior in iterated games or anything like that. A thought experiment works as a philosophical investigation of an underlying principle. The philosophical investigation will leave us with a general principle about ethical value. Then we'll look at empirical information in order to pursue the goal. Usually, however, people don't use thought experiments to argue that consequentialists should lie. The argument for being deceitful would just be that it's what consequentialism demands, so if consequentialism is true, then we ought to lie (sometimes). It doesn't take a special argument from thought experiments to establish that. So let's say we agree that we should do whatever maximizes the best consequences. We'll conduct an empirical investigation of when and how lying maximizes consequences. To a large extent, it will depend on the expected benefit of lying. And it seems unlikely to me that you will be able to find a universal rule for summarizing the right way to behave.

Paul_Christiano

You draw it as upward-sloping, but in your bullet-points you give reasons to believe that it would be downward-sloping.

The y axis is the cost of being a jerk, which is (presumably) higher if people are more likely to notice. In particular, it's not the cost of being perceived as a jerk, which (I argue) should be downward sloping.

(It seems like your other confusions about the graphs come from the same miscommunication, sorry about that.)

Also, let me clarify how a thought experiment works.

This is a post about how I think people ought to act in plausible situations. Thought experiments can cast light on that question to the extent they bear relevant similarities to plausible situations. The relationship between thought experiments and plausible situations becomes relevant if we are trying to make inferences about what we should do in plausible situations.

I agree that there are other philosophical questions that this post does not speak to.

And it seems unlikely to me that you will be able to find a universal rule for summarizing the right way to behave.

I agree that we won't be able to find universal rules. I tried to give a few arguments for why the correct behavior is less sensitive to context than you might expect, such that a simple approximation can be more robust than you would think. (I don't seem to have successfully communicated to you, which is OK. If these aspects of the post are also confusing to others then I may revise them in an attempt to clarify.)

kbog

The y axis is the net cost of being a jerk, which is (presumably) higher if people are more likely to notice.

Okay, well the problem here is that it assumes that people have transparent knowledge about what the probability of being discovered is. In reality we can't infer well at all how likely someone thought it was for them to get caught. I think we often see rule breakers as irrational people who just assume that they won't get caught. So I take issue with the approach of taking the amount of disapproval you will get from being a jerk and whittling it down to such a narrow function based on a few ad hoc principles.

I'd suggest a more basic view of psychology and sociology. Trust is hard to build and once someone violates trust then the shadow of doing so stays with them for a long time. If you do one shady thing once and then apologize and make amends for it then you can be forgiven (e.g. Givewell) but if you do shady things repeatedly while also apologizing repeatedly then you're hosed (e.g. Glebgate). So you get two strikes, essentially. Therefore, definitely don't break your trust, but then again if you have the reputation for it anyway then it's not as big a deal to keep it up.

But whichever way you explain it, you're still just doing the consequentialist calculus. And you still have to think about things in individual situations which are unusual. Moreover, you've still done nothing to actually support the proposed rule in the first half of the post.

This is a post about how I think people ought to act in plausible situations. Thought experiments can cast light on that question to the extent they bear relevant similarities to plausible situations. The relationship between thought experiments and plausible situations becomes relevant if we are trying to make inferences about what we should do in plausible situations.

Ok, but you're not actually answering the philosophical issue, and people don't seem to think by way of thought experiment in their applied ethical reasoning so it's a bit of an odd way of discussing it. You could just as easily ignore the idea of the thought experiment and simply say "here's what the consequences of honesty and deceit are."

MaxDalton

-3

This seems to be an interesting approach to this question. However, for a top level post in this forum, I would like to see more of an attempt to link this directly to effective altruism, which, as many have noted, is not simply consequentialism. There is no mention of 'effective altruism', 'charity', 'career', 'poverty', 'animal' or 'existential risk' (of course effective altruism is broader than these things, but I think this is indicative).

(Writing in a personal capacity)

Chris Leong

Effective altruism is strongly linked with consequentialism, so much so, that I don't think a more explicit link is required.

Benjamin_Todd

I found Paul's post useful, but I think it would have been good to point out that EA is not a type of consequentialism, since that's a misconception I think we should try to stamp out.

Comments

Integrity for consequentialists

Integrity for consequentialists

I.

II.

III.

IV.

V.

VI.