Hide table of contents

This was originally a post I wrote for LessWrong, but it felt more relevant than usual to EA interests, so I figured I would crosspost it here. Note that this post tries to mostly talk about how to have sane thoughts yourself, and not about how to get the people around you to have sane thoughts. See the discussion on LessWrong for some clarifications.

Epistemic Status: Pointing at early stage concepts, but with high confidence that something real is here. Hopefully not the final version of this post.

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.

That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:

  • I have come to believe that people's ability to come to correct opinions about important questions is in large part a result of whether their social and monetary incentives reward them when they have accurate models in a specific domain. This means a person can have extremely good opinions in one domain of reality, because they are subject to good incentives, while having highly inaccurate models in a large variety of other domains in which their incentives are not well optimized.
  • People's rationality is much more defined by their ability to maneuver themselves into environments in which their external incentives align with their goals, than by their ability to have correct opinions while being subject to incentives they don't endorse. This is a tractable intervention and so the best people will be able to have vastly more accurate beliefs than the average person, but it means that "having accurate beliefs in one domain" doesn't straightforwardly generalize to "will have accurate beliefs in other domains".

    One is strongly predictive of the other, and that’s in part due to general thinking skills and broad cognitive ability. But another major piece of the puzzle is the person's ability to build and seek out environments with good incentive structures.
  • Everyone is highly irrational in their beliefs about at least some aspects of reality, and positions of power in particular tend to encourage strong incentives that don't tend to be optimally aligned with the truth. This means that highly competent people in positions of power often have less accurate beliefs than competent people who are not in positions of power.
  • The design of systems that hold people who have power and influence accountable in a way that aligns their interests with both forming accurate beliefs and the interests of humanity at large is a really important problem, and is a major determinant of the overall quality of the decision-making ability of a community. General rationality training helps, but for collective decision making the creation of accountability systems, the tracking of outcome metrics and the design of incentives is at least as big of a factor as the degree to which the individual members of the community are able to come to accurate beliefs on their own.

A lot of these updates have also shaped my thinking while working at CEA, LessWrong and the LTF-Fund over the past 4 years. I've been in various positions of power, and have interacted with many people who had lots of power over the EA and Rationality communities, and I've become a lot more convinced that there is a lot of low-hanging fruit and important experimentation to be done to ensure better levels of accountability and incentive-design for the institutions that guide our community.

I also generally have broadly libertarian intuitions, and a lot of my ideas about how to build functional organizations are based on a more start-up like approach that is favored here in Silicon Valley. Initially these intuitions seemed at conflict with the intuitions for more emphasis on accountability structures, with broken legal systems, ad-hoc legislation, dysfunctional boards and dysfunctional institutions all coming to mind immediately as accountability-systems run wild. I've since then reconciled my thoughts on these topics a good bit.


Somewhat surprisingly, "integrity" has not been much discussed as a concept handle on LessWrong. But I've found it to be a pretty valuable virtue to meditate and reflect on.

I think of integrity as a more advanced form of honesty – when I say “integrity” I mean something like “acting in accordance with your stated beliefs.” Where honesty is the commitment to not speak direct falsehoods, integrity is the commitment to speak truths that actually ring true to yourself, not ones that are just abstractly defensible to other people. It is also a commitment to act on the truths that you do believe, and to communicate to others what your true beliefs are.

Integrity can be a double-edged sword. While it is good to judge people by the standards they expressed, it is also a surefire way to make people overly hesitant to update. If you get punished every time you change your mind because your new actions are now incongruent with the principles you explained to others before you changed your mind, then you are likely to stick with your principles for far longer than you would otherwise, even when evidence against your position is mounting.

The great benefit that I experienced from thinking of integrity as a virtue, is that it encourages me to build accurate models of my own mind and motivations. I can only act in line with ethical principles that are actually related to the real motivators of my actions. If I pretend to hold ethical principles that do not correspond to my motivators, then sooner or later my actions will diverge from my principles. I've come to think of a key part of integrity being the art of making accurate predictions about my own actions and communicating those as clearly as possible.

There are two natural ways to ensure that your stated principles are in line with your actions. You either adjust your stated principles until they match up with your actions, or you adjust your behavior to be in line with your stated principles. Both of those can backfire, and both of those can have significant positive effects.

Who Should You Be Accountable To?

In the context of incentive design, I find thinking about integrity valuable because it feels to me like the natural complement to accountability. The purpose of accountability is to ensure that you do what you say you are going to do, and integrity is the corresponding virtue of holding up well under high levels of accountability.

Highlighting accountability as a variable also highlights one of the biggest error modes of accountability and integrity – choosing too broad of an audience to hold yourself accountable to.

There is tradeoff between the size of the group that you are being held accountable by, and the complexity of the ethical principles you can act under. Too large of an audience, and you will be held accountable by the lowest common denominator of your values, which will rarely align well with what you actually think is moral (if you've done any kind of real reflection on moral principles).

Too small or too memetically close of an audience, and you risk not enough people paying attention to what you do, to actually help you notice inconsistencies in your stated beliefs and actions. And, the smaller the group that is holding you accountable is, the smaller your inner circle of trust, which reduces the amount of total resources that can be coordinated under your shared principles.

I think a major mistake that even many well-intentioned organizations make is to try to be held accountable by some vague conception of "the public". As they make public statements, someone in the public will misunderstand them, causing a spiral of less communication, resulting in more misunderstandings, resulting in even less communication, culminating into an organization that is completely opaque about any of its actions and intentions, with the only communication being filtered by a PR department that has little interest in the observers acquiring any beliefs that resemble reality.

I think a generally better setup is to choose a much smaller group of people that you trust to evaluate your actions very closely, and ideally do so in a way that is itself transparent to a broader audience. Common versions of this are auditors, as well as nonprofit boards that try to ensure the integrity of an organization.

This is all part of a broader reflection on trying to create good incentives for myself and the LessWrong team. I will try to follow this up with a post that more concretely summarizes my thoughts on how all of this applies to LessWrong concretely.

In summary:

  • One lens to view integrity through is as an advanced form of honesty – “acting in accordance with your stated beliefs.”
    • To improve integrity, you can either try to bring your actions in line with your stated beliefs, or your stated beliefs in line with your actions, or reworking both at the same time. These options all have failure modes, but potential benefits.
  • People with power sometimes have incentives that systematically warp their ability to form accurate beliefs, and (correspondingly) to act with integrity.
  • An important tool for maintaining integrity (in general, and in particular as you gain power) is to carefully think about what social environment and incentive structures you want for yourself.
  • Choose carefully who, and how many people, you are accountable to:
    • Too many people, and you are limited in the complexity of the beliefs and actions that you can justify.
    • Too few people, too similar to you, and you won’t have enough opportunities for people to notice and point out what you’re doing wrong. You may also not end up with a strong enough coalition aligned with your principles to accomplish your goals.

[This post was originally posted on my shortform feed]





More posts like this

Sorted by Click to highlight new comments since:
One lens to view integrity through is as an advanced form of honesty – “acting in accordance with your stated beliefs.”

How do you think this definition of integrity interacts with the appeal-to-consequences concern that's being discussed on LW these days? (1, 2)

I haven't thought about this rigorously, but it seems like this definition could be entirely compatible with doing a lot of appeal-to-consequences reasoning (which seems to miss some important part of what we're gesturing at when we talk about integrity).

Hmm... now I'm worried that I'm not parsing you correctly.

Are you intending something closer to (1) or (2)?

(1) "stated beliefs" just means beliefs about what is true about physical reality. You're saying that acting with integrity means doing what accords with what one thinks is true, regardless of the consequences. (Sorta like fiat justitia ruat caelum, incompatible with appeal to consequences.)

(2) "stated beliefs" means beliefs about what is true about physical reality and social reality. You're saying that acting with integrity means doing what seems best / will result in the best outcomes given one's current understanding of social reality. (Compatible with appeal to consequences.)

Curated and popular this week
Relevant opportunities