Rationality Policies

Elliot Temple

This is a linkpost for https://criticalfallibilism.com/rationality-policies/

The rule of law is one of our most important political inventions. Written rules help address problems with biased, corrupt or otherwise untrustworthy people in power. Who shouldn’t be trusted with arbitrary power? Everyone. We’re all fallible. We all have biases. We all make mistakes. If you’re in a position of authority over other people, you should not trust yourself. You should want written rules to constrain and guide your actions. And you should transparently share information about what you’re doing, so others can check for mistakes and hold you accountable.

The basic concept of the rule of law should be used for personal rationality too. Don’t trust yourself to be unbiased or rational. Expect to fail sometimes and plan for that; design policies that will reduce the harm of your failures. Write down rules for yourself, follow them, and provide transparency.

Even that won’t always work. This stuff is hard. Government officials abuse power often. The rule of law doesn’t solve all the problems. It just makes things better. Rationality policies combined with trying your best to be rational will give you a better chance of success than trying without policies. Policies will make it harder to fail in some ways, and push you in some of the right directions.

The perspective that you know you’re trustworthy is a mistake. The rule of law isn’t just for leaders who think “I know I’m trustworthy, but other people don’t know it, so I have to go out of my way for their peace of mind.” It’s not primarily about asymmetric information. Instead, as counter-intuitive as you may find it, your perspective should be: “I’m fallible. I’m making many mistakes that I’m blind to. I have biases that I’m not aware of. This applies to ideas I’m highly confident about.” You should do many things to counter that problem. You should expend major effort, in multiple ways, trying to deal with that difficult situation – that you may be wrong about issues where you feel strongly confident that you’re right. Rationality policies are one tool that can help, if used well. You have to know what you’re doing though; not all rationality policies are beneficial; bad policies can use effort but make things worse.

Examples of Policies

These examples are intended to give people some ideas of some things that rationality policies can be like.

Whenever you spend more than 15 minutes reading something (that’s accessible to the general public), you must either upvote it or reply pointing out a mistake (if those features are available). Every time you do this must be documented on a public list you maintain. (Even if an upvote or reply feature isn’t available, you can still put it on your own list, where you specify an upvote or include the text of your reply.)
For every 5 things you upvote, you must write a reply to at least one of them.
When you spend more than 8 minutes reading something, you have to write down one reason it’s worth your time. For every 25 things, you must choose 5 other things you didn’t read and write a reason why not. For every 100 things, you must write an analysis of how you’re prioritizing your time and whether you should make any changes for the next 100 things.
Debate anyone who asks.
Debate anyone who asks who meets some conditions (see the “Example Debate Conditions” section below).
Visit one new online forum per year (and look around, read some stuff, and write at least 10 posts). Optionally require it that it disagree with you in some way, or be from a subculture you’re unfamiliar with, or some other condition to help make sure you’re actually expanding your horizons.
Before making an important decision, spend at least 5 minutes (use a timer) doing written brainstorming of ways it could go wrong. Then review your list and try to think of solutions that will prevent those bad outcomes from happening. If you can’t think of at least 3 dangers plus solutions, abort the decision (unless it’s extremely urgent or you’ve been stuck for a long time, in which case a risky decision may be better than nothing).
When you find a sentence confusing, make a grammar tree diagram to figure it out. Limit: one per day. More is optional.
Make at least one paragraph tree per month.
Freewrite at least twice a week.
Freewrite at least 10,000 words per month.
Meditate at least four times per week.
Read at least two books every month.
Have at least two serious, long debates per year (minimum 5,000 words written by you and also by someone else).
Keep a list of ideas you encounter that dislike. Every month, pick one at random (using a computer or dice, not an arbitrary choice) and write an essay (at least 800 words) criticizing it, and put the essay on your blog. Remove ideas from the list after writing about them or when they’ve been on the list for a year without being chosen. Keep the list in public with dates for when ideas are added.
Allow the public to share criticism with you and vote on it. Every two weeks, you respond to the highest voted criticism and reset the voting. The same criticism may be voted to the top again later, in which case you must give a different response to it.
Let people submit questions. Every week, answer one random question, one question of your choice, and one most-upvoted question. Remove questions from the queue when answered. Reset upvotes every month.
When you feel confused, write down a note. Spend at least five minutes trying to understand it better, either now or later. At least once every two months, purposefully find something which is confusing and engage with it in order to have a better sense of what confusion feels like. (Danger: You might learn to recognize big confusion from the confusing stuff you seek out, which could actually bias you against noticing subtle confusion because it’s different than your explicit model of confusion.)
Every time you and your friend John both read the same book, compete to point out the least confusing part that is confusing. You go back and forth pointing stuff out until no one can think of something that is less confusing but still confusing. The goal is to get better at noticing subtle confusions, ambiguities, etc.
Every day, write down one prediction involving at least one number. You should be able to check whether you were right about half of them the same day you write them (and actually do that). Predictions can involve anything you don’t know (“If I look up X, the answer will be…”), not just future events.
Before judging something is bad, write out an error in words (it disagreeing with something you think you know is not an error).
Before reaching a conclusion about a field, review all well known, popular schools of thought.
Give direct answers to questions. E.g. say “yes” or “no”. Elaborating after the direct answer is fine.
When confronted with complex ideas on an internet forum, ask if they’re original. If they’re not, ask for a link or cite to the best written version that the person knows of. Read it at least enough to write down one error. (This is often socially problematic and frequently gets negative responses and no citation or link, so I do this a lot less than I’d like to. I don’t know how to formulate good conditions for when to do it. I just try it sporadically.)
When someone makes a factual claim that ought to be backed up by published research, and which you disagree with, ask for a link or cite to one research article (not a meta study or field review article) that they had already read before the conversation started. State that you will find an error in it or concede. If they provide exactly one appropriate article in response (which is uncommon in my experience), then find an error or concede. (I like doing this, and if you haven’t done it a dozen times I highly recommend it. But you need a lot of skill. Maybe trying to do it and failing could inspire you to work on those skills. Also I don’t know a great way to formulate specific trigger conditions. Also I don’t always want to do it with people who don’t actually care if they’re wrong and who will just stop talking without learning anything. However, it can be a good way to end conversations while having the optionality that, if I’ve misjudged them, a good outcome can happen. Bad people tend to opt out of discussions fast when I do things like expect them to have a citation they’ve already read which I want to read, analyze and discuss. Good people would either want to do that or acknowledge that their claim isn’t based on research and then talk about how they reached it.)
Don’t read or reply to forum posts, social media posts or news articles until after doing other more important reading and writing that day, unless you make a conscious, intentional decision and write down a reason in a personal freewrite. (If you’re too busy to freewrite anything that day, you don’t need to go on internet forums, social media or news sites either.) (This is a policy that’s harder to provide public transparency about but still fairly easy to avoid lying to yourself about as long as your rule is strictly no exceptions. Basically any policy is easy to lie to yourself about if you allow exceptions instead of following it exactly and literally.)
If it seems like someone might want to debate you, but they don’t invoke your debate policy (and you have one), link it to them.
Track daily metrics (e.g. words written, words edited, hours slept, book reading). Review metrics monthly and look for problems (and make plans for improvement if you identify problems). Some people will improve tracked metrics even without any further policies or incentives.
After writing a reply to someone in a discussion, before sending it, stop and consider what your goal is in the discussion, and your goal with this specific message, and whether you think this message will actually work well to accomplish your goals.
Before replying to someone, consider what achievable goal you may accomplish with the reply, then keep that goal in mind while writing the reply.
Look for opportunities to use debate trees. (This is vague and doesn’t work well for accountability. But it can have value as a reminder to yourself or to empower others. Someone who saw this policy might want you to use a debate tree with them and request it.)
When someone baits you, wait at least an hour before replying. Also consider if it’s maybe ignorable.
If you get triggered or stressed by a conversation, take a break until tomorrow.
If you’re frustrated, tilted or upset by a conversation, wait until you’re calm before responding.
Meditate if you’re frustrated, tilted or upset due to a conversation.
If you’re frustrated, tilted or upset by a conversation, do a post mortem analysis of how and why you got frustrated, tilted or upset, what mistakes you made to let that happen, how it could have been avoided, whether there were any early warning signs, and whether any of your previous messages (before you recognized your emotional problem) had any frustration, tilt or upsetness in them (and whether any of your messages mistreated someone).
Think about your conversational goals when a conversation upsets you. Are you upset due to failing at some goals? Are the goals realistic? Do you have good plans, that are worth trying, to achieve those goals? If you’re following good plans, and that’s productive, then why be upset?
When you get frustrated, tilted or upset, analyze ways your social status may have been threatened.
When you upset others, analyze ways you may have threatened their social status.
At least once a month, think about your goals before every single thing you say in a conversation.
Have at least one conversation per month where you attempt to be thoughtful about every word you write, including in terms of knowing clearly what you mean and why you included that word, and doing grammar analysis.

Transparency

Most of these policies can and should have some sort of transparency mechanism added on to them. I only specified that occasionally. Basically you post the policy itself publicly, take responsibility for following it, and then also share documentation so people can see whether you’re following it. Also, you should have some kind of policy related to debate or at least listening to feedback, or else the transparency might not do much good. It helps if people can criticize your policies or actions where others can see. If you have an email newsletter with 10,000 readers, and they can email complaints to you, but no one else can see any of the complaints, that doesn’t work very well for transparency even if you share a bunch of documentation of what you’re doing. People need some reasonable way to correct you or at least get visibility for what they say, like a public forum. And the available ways to correct you need to look fairly appealing, reliable, effective, etc., from their perspective – a lot of people won’t want to waste their time correcting you if they doubt you’ll listen and you have no written policy guarantees about listening to criticism.

Policy Design Pattern

A general pattern for policies is: trigger condition + action to take + measurable metric. The policy tells you when to do it and what to do, plus it offers some measurements related to both parts. And you should document your policy and actions publicly, and allow public comments, for transparency. So there are four main design elements I’m using for these policies. I bet it’s possible to make a useful policy with one of these design elements missing. I also bet there are other important, reusable design elements besides these four.

Measuring something helps with objectivity. You can use something pretty objective, which requires little judgment, for the starting conditions and for specifying what is doing enough of a task. When you can specify numbers or amounts, there’s less judgment involved. “Every 5 books you read, write at least 5000 words of notes” uses something measurable for both the starting condition and the action to be taken.

“Every 5 books you read…” could be written in a more airtight way to avoid loopholes, but if you’re trying to find loopholes you’re in big trouble anyway. For example, someone could stop reading books with 1 page left so they don’t count as completed books in order to avoid having read 5 books. That’s an example of trying to avoid doing the policy by exploiting a technicality in bad faith. There’s also a lot of room for problems that happen in good faith. Some people don’t finish the majority of books they start. Maybe those people should count books if they get more than 20% of the way through or if they finish the first chapter.

It can be OK to include less objective statements in policies. For example, a policy may say “When making an important decision…” Which decisions are important? You have to judge that. There’s no easy answer. However, some decisions would appear pretty obviously important to most people, e.g. who to hire, fire or marry. If you have transparency, people could question your judgment when you decide to discount as unimportant some decisions that they care about.

Often, transparency is only partial, e.g. focused on intellectual activities or decisions that affect people who aren’t your close friends or family. Decisions affecting employees might only be transparent to a subset of people at the company. Transparency about personal issues like who to marry would be unusual. I wouldn’t want to dismiss it out of hand as necessarily a bad idea, but it’s not what I’m recommending people do. Even when you aren’t transparent about something, you can often still follow your policy. If you have to write 10 paragraphs about important decisions, and you provide no transparency about your marriage, you could still see for yourself whether you wrote the 10 paragraphs or not (you could also report publicly that you did it without including the actual text). It’s reasonably easy to avoid fooling yourself about simple things like whether or not you wrote 10 paragraphs, even without transparency.

Fooling Yourself

When dealing with actions like thinking about something, analyzing something, considering if you might be wrong, etc., it’s hard to write rules that evaluate whether someone did a good job. So some good faith effort is needed. But we can write rules that check if we did it at all. E.g. you could require that you write at least 2 paragraphs, 1 sentence, 300 words, or 5 brainstormed bullet points . Or you could require thinking about it for 5 minutes by the clock, to put it in terms Eliezer Yudkowsky recommended in Rationality: From AI to Zombies (bold added):

Page 218:

Which leads into another good question to ask yourself straight out: Did I spend five minutes with my eyes closed, brainstorming wild and creative options, trying to think of a better alternative? It has to be five minutes by the clock, because otherwise you blink—close your eyes and open them again—and say, “Why, yes, I searched for alternatives, but there weren’t any.” Blinking makes a good black hole down which to dump your duties. An actual, physical clock is recommended.

Page 322:

The moral is that the decision to terminate a search procedure (temporarily or permanently) is, like the search procedure itself, subject to bias and hidden motives. You should suspect motivated stopping when you close off search, after coming to a comfortable conclusion, and yet there’s a lot of fast cheap evidence you haven’t gathered yet—there are websites you could visit, there are counter-counter arguments you could consider, or you haven’t closed your eyes for five minutes by the clock trying to think of a better option. You should suspect motivated continuation when some evidence is leaning in a way you don’t like, but you decide that more evidence is needed—expensive evidence that you know you can’t gather anytime soon, as opposed to something you’re going to look up on Google in thirty minutes—before you’ll have to do anything uncomfortable.

Page 1634:

When AI folk say to me, “Friendly AI is impossible,” I’m pretty sure they haven’t even tried for the sake of trying. But if they did know the technique of “Try for five minutes before giving up,” and they dutifully agreed to try for five minutes by the clock, then they still wouldn’t come up with anything. They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.

(By the way, I thought that Yudkowsky mentioned five minutes by the clock more than three times in the book. Based on searching now, I was wrong. I over-emphasized it in my memory because I thought it was important and useful. That led me to incorrectly estimate how many times it featured in the book, which has consequences like potentially overestimating how similar my thinking is to Yudkowsky’s.)

(I could have written down a prediction for how many times it was mentioned in the book before I searched, and maybe I should have. I think writing down that prediction in advance is the kind of thing Yudkowsky would like. The basic point of writing the prediction down, like the point of using an actual clock for five minutes, is that you shouldn’t trust yourself. Writing down a prediction also makes you be specific. I did not have in mind any particular number of times I thought it was in the book. It’s hard to know what I would have guessed in advance, but possibly 6.)

(In this case, I managed to avoid the primary danger that writing the prediction down is meant to combat – the danger of not realizing I was wrong – but presumably in some other cases I don’t avoid that danger, so I should write down predictions more often as a general policy. A policy I do have, more consistently than writing predictions, is emphasizing my errors. Like I’m writing this aside, and I think it’s important enough to include in the article, because I care about mistakes and I routinely try to bring them up and draw attention to them. Similarly, I often do written post mortem analysis of mistakes. Whereas most people try to downplay mistakes, change the subject, not admit to having been mistaken, reduce the attention mistakes get, etc.)

Measurable metrics don’t prevent bad faith. You can easily time yourself for five minutes and think of nothing. You can wait and run out the clock without trying to think of anything if you choose to do that. You can also often find ways to game metrics. But metrics make it harder to be biased, especially if you’re making a good faith effort. Also, I think those Yudkowsky passages, and many others, show that he sees the value in following explicit, objective policies, and using some measurable aspects to limit fooling yourself, even if he didn’t explain it in the same way that I’m approaching it. Although I’m more influenced by other sources, and have had some of these ideas for a long time, I do think Yudkowsky’s five minutes by the clock idea, and some of his other ideas, helped influence my ideas about rationality policies a little bit.

I think people who act in intentional bad faith are basically lost causes, although the harm they do (in roles like king, president, judge, policeman, prison guard, lawmaker, accountant or CEO) can be limited if they have to publicly appear to follow written policies.

My concern is primarily with people who are making some effort to be reasonable, but who sometimes fool themselves, rationalize things, are blind to their own biases, etc. Those people who want to be rational can be helped a lot with policies because they will at least partially follow the spirit of the policies in addition to the letter.

It’s very important to always follow your policies exactly as written with no exceptions whatsoever and also to make a good faith effort to follow the intent or spirit of the policy. You need both literal rule-following and also to want and like the purpose of the policy. If either is missing, it won’t work well.

If you don’t want to follow a particular policy exactly as written, don’t post it as your public policy, and don’t tell people you’ll follow it. Don’t give and break your word. Rewrite it or consider some other policy with softer rules.

If you have a policy and run into a problem, change it later. E.g. follow it this time, then wait a week to get some distance, then think it over, then (if you think it’s best) change it for the future. Suspend it during the pause week if necessary so it can’t be used again. If you violate policies when they’re inconvenient, then you’re breaking your word and defeating the purpose of having policies.

Your policies should be fairly stable and infrequently changed. They should be written carefully and thoughtfully to enable this. Keep in mind that the public may see a policy and plan ahead. If you guarantee to debate under certain conditions, someone might spend months researching a debate topic and meeting the conditions before asking you for a debate. Your policies have to be stable on a multi-year timeframe or other people who work on long timeframes and make large efforts (in other words, some of the best people) will find you unreliable.

If you want to have policies but you’re unsure, label them as tentative policies that you might change. Say it’s a beta test. If you put disclaimers on them that they’re just experimental ideas that you might not follow, then it’s fine to change your mind frequently. As long as you don’t pretend your policies are reliable, and fool yourself or others, then it’s OK if they aren’t reliable. You can also write down a candidate policy privately then pay attention to what following it would be like and try following it as long as that works OK for you. That’s kind of like playing the stock market with fantasy money before using real money.

It’s a good idea to start with tentative policies and beta test them. In general, you should only give your word that you’ll follow policies if you’ve tested them out for a significant time period first and found that they’ve worked OK for you.

Example Debate Conditions

Here are some examples of conditions you might require before accepting a debate with someone, rather than accepting debates with absolutely anyone (which might be too many debates or debates which are too low quality).

10k twitter followers or other social media metric (problematic but you could do a lot worse)
- 10 blog posts or 10,000 words written
- Provides real name
- States that they believe they have something important to say to you that you don’t know
- States a clear thesis that they disagree with you about
- Speaks English
- Will debate in public
- Agrees not to misquote
- Agrees to only stop the debate under certain conditions, not to just randomly go silent at any moment

Example Debate Stopping Conditions

In general, any debate can stop due to unanimous consent. Taking that option away would be a special case (and would probably involve a short time limit). The stopping rules below are meant as alternatives for when unanimous consent is not achieved and someone wants to stop. More than one of these could be allowed. Also, each condition means you stop if anyone actually wants to stop. They let one person unilaterally stop the debate, but they don’t require you to stop.

Advance notice 3 messages before stopping
Length 5 impasse chain
Write a final summary message explaining your perspective on the debate
You believe, in your own opinion, that you either won or lost the debate (and say which and why)
You’ve written 10,000 words in the debate
You’ve replied 20 times in the debate
The debate has lasted over a month (even though you’ve been responsive)
Debate tree reaches at least 42 nodes

Example: Reading Other Perspectives

Policy: Read one article from a rival tribe every week. For transparency, after you read an article, post a link or cite, with the date, to a public list you maintain.

The goal is to engage with other perspectives and ideas. You could add extra detail to the policy, e.g.:

You can’t use the same tribe twice in one month.

This would help with the concern that you find a couple rival tribes that you like and ignore other ones that challenge your perspective more. Part of the goal is to expose yourself to a wide variety of ideas.

These policies aren’t foolproof. You could find some loophole or follow them in bad faith. There are two things that could keep you honest. First, your integrity. Second, other people could offer criticism since you’re providing transparency.

Don’t expect this to work well unless you actually like the policy. If you do this policy because you feel like you “should”, or because other people in your social group do it, then there’s a significant risk that it won’t help you (but it also might help; there’s still a chance it works well). Similarly, if you read the articles begrudgingly, as a painful duty, it’s not going to expand your mind well. But people occasionally start doing something with a terrible attitude and then get drawn in, so that’s not hopeless either.

Warnings

It’s problematic to use generic rationality policies off a list. If you’re going to try to be rational and use policies to help you, you need to be actively involved in that process and have personalized, customized, individualized ideas about what will work for you. You need policies that you understand well, see the point of, care about, and will actually do. Writing your own policies is generally related to those things actually being true. If you had a personal friend, tutor or mentor who helped you write policies, that would also work much better than generic policies because they’d be designed to fit your life.

Generic policies can serve as examples and inspiration. You can create similar policies for yourself. You can reuse elements of generic policies. You can also look at example policies to find patterns and better understand what a policy can be like. But this topic is under-explored, so don’t assume my examples cover all the types of policies that could be standard, common or useful; I don’t think they do.

Effective Altruism Forum
EA Forum