Hide table of contents

Introduction added for EA: Decisions shouldn’t be made by adding weighted factors to get a single, overall degree of goodness for each option. Factors in different dimensions, such as the price and deliciousness of an apple, can’t be combined into one dimension (a single score) in the general case. There are fundamental mathematical and philosophical problems with combining dimensions. The same issue applies to evaluating ideas based on multiple factors, including updating credences based on multiple pieces of evidence. An alternative epistemology focused on decisive criticism and error correction doesn’t have these problems. The main purpose of this article is to look at decision making (or idea evaluation) from a mathematical perspective and to show that some standard approaches don’t work mathematically.

We make decisions based on many factors. This article is about how to make good decisions. It uses arithmetic to help analyze decision making.

For example, consider buying a car. You’ll consider price, brand, color, gas/hybrid/electric, crash test ratings, rear backup camera, sunroof, truck/sedan/SUV, new/used, buy/lease, financing offered, acceleration, handling, etc. Each of those is a factor in the decision. You could make a chart (spreadsheet) with each car (along the left) and each factor (along the top), and then fill in the information for each car (in the middle). Imagine you do research and gather all the data for the spreadsheet (many people don’t bother and just keep some of the information in their head, but it’s still useful to know how to do organized decision making even if you skip steps). Then how do you actually decide which car to get?

It’s pretty easy to decide which car is best based on any one factor. E.g. you can rank cars by price, or you can rank them by how much you like the color. But it’s harder to look at multiple factors at the same time and decide which car is best.

What people often look for is an overall evaluation for each car. E.g. how good is this car on a scale of 0 to 100? Then you can rank cars based on the one factor of how good it is overall, and then pick the car that is best overall. People also use non-numerical overall evaluations like “pretty good”, “great”, and ”terrible”.

To get an overall evaluation, you have to combine many or all of the factors. The overall evaluation should somehow take into account whether the car has air conditioning, whether it’s an automatic, the price, etc. Having air conditioning and having a lower price should both raise the overall evaluation. A typical approach is to score each factor, multiply that by a weighting for how important the factor is, and to add up the results.

Although people usually do this using intuition, common sense and estimation, it’s useful to look at how it works mathematically. Math provides a simplified model of how decision making works. It’s not perfect but it’s easier to analyze.

There are, perhaps surprisingly, major difficulties making the math work, at all, to get an overall evaluation based on multiple factors. It’s a hard math problem to solve. I’ll explain why it’s hard, why some approaches don’t work, and my original solution. My solution was developed first as a philosophy theory but it can be translated into working math, while many rival philosophy theories translate to math that doesn’t work.


This chart shows an example multi-factor decision about choosing which pet to get.

Price Cuteness Messiness Size Lifespan

The pets on the left are options we could choose, and the criteria along the top are factors we can use to evaluate options. A factor is a trait, characteristic or quality. A factor is also a sub-goal: a part of your overall goal. In this case, the overall goal is to get a good pet.

People make multi-factor decisions frequently. Sometimes they use pro/con lists (each pro or con is a factor). Sometimes they used weighted score systems. Sometimes they just think about it or choose intuitively.

For most of the article, I’ll use a simple, abstract example chart. Our options are A, B, C, which could be pets or anything else. And the factors we’ll consider are x, y, z, which could be price, cuteness and size, or anything else. For simplicity, we’ll assume that no other factors matter.

x y z
A 3 8 5
B 9 1 2
C 4 6 7

Which option is best? We can’t know from this chart. It depends on things like whether more or less of a factor is better, what units each factor is in, and how important each factor is.

For simplicity, assume more is better for every factor.

We can easily rank the options based on any one factor. E.g., for x, B is best (9 points), C second best (4 points), and A worst (3 points).

We can also determine the best option if an option is the best for every factor. That option is strictly better. But in many cases, including our example, no option is strictly best. If the only options were A and B, and the only factors were y and z, then A would be strictly best.

How can we mathematically combine multiple factors to get an overall evaluation that lets us rank the options to decide which is best?

Dimensions and Arithmetic


Factors can often be combined when they’re in the same dimension. A dimension is a type, kind or category of factor, such as length, weight, time, cuteness or color. When two factors are the same kind of thing, you can combine them with addition. For example, 3 miles can be combined with 5 miles for a total of 8 miles. (It’s more problematic with some types of factors, like color. How do you add red plus red?)

You can often tell two factors are in the same dimension when they use the same units, e.g. they’re both in miles. Different units can also be in the same dimension when they’re the same kind of thing. For example, meters and feet are in the same dimension (length). But they can’t be added together, in one step, because they’re different units.

To add different units, we first convert them into the same units, and then we add. Conversions within a dimension work because the units all represent the same underlying thing, like length. 1 meter and 3.3 feet are the same amount of length (rounded to a reasonable precision). If our factors are 4 meters and 3.3 feet, they add to 5 meters (or 16 feet). This isn’t a controversial matter of opinion that people debate; it’s facts, definitions and arithmetic.

Looking at our example chart again, suppose x, y and z were the amount of money we’ll make, in dollars, from different parts of a project. Then we can add the factors together to combine them and get an overall evaluation of each option. C is the best option because it offers the highest total ($17).

Now suppose x were in dollars but y and z were in yen, and that 100 yen are worth a dollar. They’re all in the same dimension – money – so we can convert and then add. The best option is B ($9.03 total). Notice that the best option has changed.

Similarly, grams can be converted to pounds or kilograms because they’re in the same dimension (weight). But grams cannot be converted to dollars, inches or cuteness, because those are different dimensions, different types of things. (There are sometimes conversions in specific contexts, e.g. a particular store might offer a price in dollars per gram of gold you sell them.)

When factors are in the same dimension, we often combine them and view them as one factor. E.g., when considering walking to a store, we look at the total distance instead of treating the length of each block as a separate factor.

Factors with the same units shouldn’t always be added. The width, depth and height of a desk shouldn’t be added to get a total number of inches. But the width of the left half and width of the right half of a desk can be added to get the total width of the desk.

Continuing our analysis, we’ll suppose that x, y and z are all in different dimensions. This is a problem people often face. How do you evaluate multiple factors in different dimensions?

Adding Factors

First we’ll try combining factors by addition. For A, can we add to get an overall score of 16? No, because that ignores the dimensions. That’s like adding acres, hours and grams while ignoring the units:

This can’t be simplified further. They can’t be combined into one term because they’re different things. Similarly, adding our example factors for option A correctly looks like this:

From a mathematical perspective, this cannot be simplified further and the terms can’t be combined.

The result is the same with units or variables. That makes sense because variables can be units, e.g. x could be 1 mile. Terms can only be added when every part is identical other than the number.

The way fractions work is related. You can only additively combine fractions if the denominators are the same. If they’re different, you have to convert them to the same denominator by multiplying by a conversion factor, which equals one, before you can add them into one term. That’s like multiplying by a unit conversion factor, which equals one, before additively combining terms with different units in the same dimension (like meters and miles).

Conversion factors also work with variables. Say we want to combine into one term. We’ll need an equation that relates x and y, such as . That lets us substitute 3y for x (because they’re equal) in to get . In other words, given that one x is worth three y, you can combine the factors into one total factor, 17y. Equivalently to using substitution to convert, we can use a conversion factor: (which I got by taking and dividing both sides by x). (Note: The conversion factor is just the fraction, not the whole equation, but I want to emphasize that all conversion factors are equal to one.) We can multiply the 3x by this conversion factor to change it from x units to y units, and we’ll get 9y just like when we used the substitution method. (It’s always permissible to multiply anything by 1 because that’s the multiplicative identity and won’t change the value. If you like, you could first multiply and then substitute in the conversion factor for the 1.)

Conversion factors for units are also related to knowing that two amounts, in different units, are equal. E.g. or equivalently (whether you use one letter long variable names, or longer names, doesn’t change anything mathematically). That lets you derive the conversion factors and (the first converts minutes to hours, the second converts hours to minutes).

An option with multiple factors in different units is the same kind of thing as a linear equation with multiple variables. (Factors could also have non-linear value, but we won’t address that complication.) E.g. our option A is . The different variables are like different units. To solve a system of linear equations you need, in short, one equation per variable. To reduce it to a single term you need one fewer equation. Since we can rank factors in one dimension without difficulty, reducing to a single unit/term/variable is adequate. E.g. we can compare 5x to 7x without needing to reduce them to constants/numbers (given that we know whether x is good or bad, or in other words a positive or negative factor). So e.g. if there are 7 factors or variables that we’re using to evaluate options, then we need 6 equations or conversion factors.

We’ll discuss this more later, but the short summary is: since general purpose conversion factors are only available for units in the same dimension, addition doesn’t solve the problem of multi-factor decision making. You can only convert between factors in different dimensions in special cases, e.g. converting between grams and liters of water at a particular temperature and pressure. Those special cases are inadequate for our goal, which is a general purpose decision making method.

Multiplying Factors

Since we can’t combine factors in different dimensions by addition, let’s try multiplying.

This worked. Mathematically, you can multiply those three terms to get a single term, 120xyz.

The result is 120 in the units xyz. Those are multi-dimensional units, e.g. acre-hour-grams. Those are logically valid units, but probably not useful. What do they mean? What is an acre-hour-gram? What should you do with it? Those units don’t correspond to your goal (like getting a good pet or car) or to any known, useful concept in reality. Unfortunately, most multiplications of dimensions result in units that aren’t useful.

Multi-dimensional units mean that the dimensions are multiplied (and/or divided) together. This is familiar and useful in some cases. E.g. miles per hour is a multi-dimensional unit involving miles (a length dimension) divided by hours (a time dimension). Volume is a multi-dimensional unit involving three different length dimensions which are perpendicular (often called width, depth and height), which are all multiplied together. These useful multi-dimensional units are special cases and exceptions, but we’re looking for a generic decision making method, not a method that works in just a few cases.

If you multiply twenty factors, you’ll get twenty-dimensional units. The problem is usually even worse than in my example with three factors.

One problem with multi-dimensional units is that doing a unit conversion in any one dimension changes the number you get (e.g. it’ll be 60 times larger if you convert from hours to minutes). The number depends on multiple choices of unit instead of just one. And choosing non-linear units in one dimension (like pH or the Richter scale, which are logarithmic) can change which option has the highest number. That reordering problem from switching units doesn’t happen when working with single-dimensional units (given some reasonable mathematical constraints on what is a valid unit definition).

Although there are major difficulties, multiplying can combine different units, variables or denominators (and more) into one term. In some limited sense, it does work, unlike addition.

What we’re dealing with here are fundamental aspects of how arithmetic works. Multiplying works to get one term because a term is a group of things that are multiplied together (I explain terms more later). But terms can only be added when they have exactly the same types of things being multiplied together, because addition means combining different amounts of the same thing/type/units. (Put another way, addition is a quantitative change that only works when there are no qualitative differences. Multiplication works regardless of qualitative differences and can change both quantity and quality.)

When combining a large number of fractions by multiplying, the denominator can become large and inconvenient. That’s somewhat similar to the inconvenient result when multiplying many independent variables or units from different dimensions. However, as long as there are no units or variables involved, fractions are all from the same dimension (numbers), so combining fractions with multiplication does always work OK. Regardless of how unintuitive a large denominator is, you can deal with it, e.g. by converting to a more intuitive decimal number. You can also combine fractions by adding since they’re from the same dimension (numbers), so this isn’t an advantage of multiplication over addition.

Would dividing factors work better? Division is a type of multiplication, just like subtraction is a type of addition. Division means multiplying by the multiplicative inverse, like subtraction is adding the additive inverse. Division won’t help us do better than multiplication. This also explains why I didn’t consider subtraction above.

Now What?

We don’t have a good solution yet. Combining factors from multiple dimensions is a hard problem.

Using more advanced math, like exponents, calculus, or trigonometry, won’t help. I don’t know of any math operator that will solve our difficulty. We could combine factors with set operators like union, but then we’d need a way to rank sets, so that wouldn’t help. I also considered vector math and reached a similar conclusion, but that’s harder to understand and doesn’t help much. I think addition and multiplication are actually the best operators to analyze.

My solution involves a way to set up the factors so that multiplication will always get a useful result. Multiplication is a more promising lead than addition because you actually can multiply factors from different dimensions, whereas you cannot add them at all. Multiplication gets multi-dimensional units, which have problematic usefulness, while addition isn’t able to combine terms with different, non-convertible units. Interestingly, most attempts to solve this problem have focused on addition not multiplication. (In short, they try to convert between dimensions to enable addition. That’s usually done by converting all factors into a generic goodness dimension.) I think one reason for the focus on addition is because people haven’t laid out the problem in the way this article does.

Before discussing my solution, I’ll explain the problem and math more, and relate some common approaches to arithmetic.

Multi-Dimensional Units

Some multi-dimensional units are useful, such as miles per hour, DALYs per dollar, dollar-days that an order is late, square feet or cubic meters. These correspond to meaningful concepts that we can think and reason about, and which have some meaning in reality. But the vast majority of multi-dimensional units, which are allowed by logic, don’t correspond to concepts that make sense to us.

Most useful combinations of dimensions use only two factors. Random combinations with many factors are very unlikely to be useful. And being able to make a decision that takes into account many factors is our goal.

There are intuitive, useful examples with four factors, such as volume/time (e.g. how quickly you can fill a pool with water) or volume/dollar (e.g. how expensive it is to remove rock to build a tunnel). These follow a standard pattern, which is to take a meaningful concept and then divide it by the cost. That tells you how much of something you get for spending some amount resources (which are most typically measured in time or money). Because volume is three-dimensional, and dividing by cost adds one dimension, the final result is four-dimensional. Volume is the only easy, common, standard example of a three dimensional unit, without division, that has a clear, useful meaning. That’s because we live in a three-dimensional reality (three spatial dimensions). Useful two-dimensional units are more common, including area. Some experts, like physicists, deal with more complicated multi-dimensional units sometimes. But they only use specific multi-dimensional units that have some connection with physical reality that makes them useful. They don’t have a good way to use arbitrary multi-dimensional units.

Averaging Factors

Dimensions can’t be added together at all, and they can only be usefully multiplied in special cases. What about combining factors by finding the average (mean)?

To get an average, you sum quantities and then divide by the total number of quantities summed. This involves addition, so it runs into the same problems with addition that we’ve discussed. In math terms, the average for A is:

This can’t be simplified further. All we can do is a bit of division:

That still leaves the factors separate. So averaging won’t help. It could only be useful as a secondary technique if addition worked.

Finding the median or mode (different types of averages) wouldn’t work well either. Both of those would get a result in a single dimension rather than combining factors from different dimensions. And you’d need to be able to rank all the factors from different dimensions, in a single ranking, in order to find the median.

Rank Ordering and Terms

The method of comparing options that we’ve been considering is ranking them. E.g. if our options have values of 4, 3 and 8, we can rank them: 8 is the best, then 4, then 3. This is also called ordering or sorting. Ranking requires knowing which option is best but sorting doesn’t.

Standard requirements for ranking options:

  1. Each value is only one term. E.g. we can rank an option with the value 4x but not with the value .
  2. They are like terms. We can rank 4x and 2x, but we can’t rank 4x and 2y.
  3. We know whether the variable part is positive or negative. You might assume 4x should be ranked higher than 3x, but if x is negative then 3x is actually larger.

We can sort with just (1) and (2). But for making a decision we’ll want to rank them with the best one first, so we’ll need (3) too.

We can sometimes rank things without meeting all the requirements, but we usually can’t. If we want a general purpose method, then they’re required.

Next I’ll explain some of the terminology and what the ranking rules mean.

A term is basically a single thing in math. A term has two parts, a number (the “coefficient” or “numerical part”) and everything else (the “variable” or “literal” part). For example, in the term 4x, the coefficient is 4 and the variable part is x. A term must have a coefficient but doesn’t have to have a variable. 7 is a valid term. If you see a variable with no coefficient, like y, then the coefficient is 1. 1y and y are the same thing, so people usually don’t write the 1.

The two parts of a term are being multiplied together. For example, if the term is 2x, and we know that x is 3, then the value of the term is .

The variable part of a term can be somewhat complicated. is one term. However, is two terms.

Terms are like if their variable parts are identical. Like terms can have any coefficient. x, 5x and 22x are like terms. The terms 3xyz and 17xyz are also like. But 5, 5y and 5x are unlike terms.

Like terms which are added together can be combined into one term. That’s a standard part of simplifying. E.g. we simplify 3x + 2x to 5x. Adding like terms is done by adding the coefficients while leaving the variable part alone.

Like terms can be added that way because they are different amounts of the same thing. The coefficient is the amount and the variable part is the thing. This works with everyday things. 3 pounds of bacon is a term that can be added with 2 pounds of bacon to get 5 pounds of bacon. The result is a single term because they are like terms – the “pounds of bacon” part is the variable part and it’s the same for each term. You could also add apples to apples or slices of bacon to slices of bacon.

However, you can’t additively combine 3 pounds of bacon with 2 apples, nor with 2 pounds of apples, to get one term, because they are not like terms (the variable parts are different). You’re stuck with e.g. , or , which is two separate terms that can’t be combined or simplified.

If you want to combine apples and pounds of bacon into one term, so that you can rank options, you need to convert them into one dimension. For example, you can convert pounds of bacon to calories, and you can also convert apples to calories. That lets you get a single number of calories for each option involving bacon and apples, and then you can rank them, with the most calories being at either the top or the bottom of the rankings depending on what you want. (Actually what people often want is a medium amount of calories – not too many nor too few. That is a complication that makes decision making based on ranking harder.)

This shows how unit conversion matters (you couldn’t combine them without doing a conversion first) and how your goals matter. Instead of ranking by calories, you could rank by carbohydrates, fat, protein or a nutrient like vitamin C. Which ranking(s) matter to you, and are suitable for making your decision, depends on your goals. However, often you care about multiple things, e.g. both calories and carbohydrates. No single ranking will take into account your goals about both calories and carbohydrates unless you find a way to convert calories and carbohydrates into the same dimension.

When we see multiple terms added together, we should simplify first before counting how many terms we have. Alternatively, we could count unlike terms.

The important thing, for our purposes, is whether there are two or more unlike terms (or two or more terms after simplifying). If there are, that prevents rank ordering.

We can rank options that have different amounts of bacon. And we can rank options that have different numbers of apples. But we can’t straightforwardly rank options that provide both apples and bacon.

Mathematically, we can rank because those are like terms. We rank them by which number is larger (and reverse the order if x is negative). However, we cannot rank (we can’t even rank any two of those).

We also can’t rank because each of those has two unlike terms. Also, any one of those could be largest depending on what x and y are. (We can rank if we know that x and y are both positive. That’s an example of strict superiority. It doesn’t work as a general purpose approach.)

To rank values involving two or more terms, we need conversion factors (or, equivalently, a system of equations). The basic idea is that you can only rank bacon plus apples, together, if you know how much bacon is worth one apple. You need a formula that tells you how to convert bacon into an equivalent amount of apples, which lets you reduce the foods to one term. But there’s often no conversion factor, e.g. cuteness and price are different things and there isn’t a specific amount of cuteness that’s worth a dollar. I discuss conversion factors more later.

Like terms are quantitatively different (the coefficient is the quantity, amount or degree) while unlike terms are qualitatively different (the variable part is the dimensions, types, kinds, qualities or categories). That means you can only rank order values which are in the same dimension(s).

This is another way to see what I explained earlier about addition, multiplication and dimensions. Unlike terms – terms with different dimensions – can’t be combined into one term (the same dimensions) by addition, so they can’t be ranked and compared. However, unlike terms can be multiplied to get a single term with a new, more complicated set of multiple dimensions (the variable part), like how .

Adding and Multiplying Like and Unlike Terms

2x + 3x is like saying 2 apples + 3 apples, so you can just add it and get 5 apples

2x + 3y is like saying 2 apples + 3 bananas, so you can’t combine those terms. You can’t simplify it into one term unless you convert. You can make it 2 fruit + 3 fruit = 5 fruit. They can be combined in that case because you converted so that the variable part is the same. You can’t combine them, with addition, as long as the variable part is different.

You can also see this with adding fractions. Consider . You can define x to be the unit fraction, , and then rewrite it as . Seen another way, it’s . But if the fractions had different denominators then you wouldn’t be able to additively combine them. If it was , you could define variables so it’s , which doesn’t combine into one term. To combine, you’d first convert them to have the same denominator. and so you can add them as 14ths. You can let and then it’s which combines to or .

Now we’ll look at adding and multiplying like and unlike terms by breaking the math down. When multiplying, you break it down with multiplication (factors that are multiplied together), and when adding you break it down with addition (units that are added together).

Multiplying like terms: breaks down to which we can rearrange to which is .

Multiplying unlike terms: breaks down to which we can rearrange to which is .

Adding like terms: breaks down to . I used parentheses to show that the broke down to and the broke down to . We don’t need to rearrange and can remove the parentheses to get and then add them all up to get .

Adding unlike terms: breaks down to . Rearranging won’t help and adding them up gets us the same that we started with. Adding unlike terms is the one scenario where we end up with multiple terms instead of being able to combine stuff into one term.

Dimensions and Decision Making

Pro/Con Lists

People define abstract goodness points and try to convert factors into those points so that they can add the factors up. People often do this without knowing that it’s what they’re doing. They may not know the math, or use numbers, but they still try to use an equivalent thinking process.

A good mental model of what people often do is a pro/con list. Even when people don’t write things out, many decision making processes are similar to using a pro/con list. People try to consider the plusses and minuses of different options and evaluate which is best overall.

How do pro/con lists work? You list pros, list cons, and try to calculate or estimate how good the pros are minus how bad the cons are. But pro/con lists contain many factors from many different dimensions. So this is an attempt to combine factors from different dimensions by addition.

Here’s an example article explaining a standard approach that I disagree with: How to Create an Effective Weighted Pro-Con List. A similar approach is decisional balance sheets. There are also more complex approaches that add weighted factors, e.g. analytic hierarchy process.

How are pros added together despite being in different dimensions? One tries to judge how good they are. In other words, one converts each factor into another dimension (goodness). Then, once the factors are all in the same dimension, they’re added together to get a sum total of goodness. The same is done with cons. They’re converted to badness, which is just the negative numbers in the goodness dimension, not a separate dimension. Then the total badness is added with the total goodness to get an overall amount of goodness (because badness is a negative number, adding it is equivalent to subtracting its absolute value).

Many decision making processes are comparable to pro/con lists. People, in some form (most often without numbers), try to add up the pros of each option and subtract the cons. Any kind of weighted factor analysis is also similar to using pro/con lists.

This approach doesn’t work well because there’s no good way to convert from each dimension into a goodness dimension. The conversion factors are made up somewhat arbitrarily. This is basically an attempt to convert between dimensions using a two-step process, which isn’t better.

Conversions work in both directions (if you can convert miles to feet, you can also convert feet to miles). Suppose you can convert both friendliness and cuteness to goodness. That means you can convert from cuteness to goodness and then convert that from goodness to friendliness. This implies a conversion from cuteness to friendliness.

In other words, suppose you come up with these two conversions (f is friendliness, c is cuteness, and g is goodness):

Mathematically, that implies (using substitution):

If you can’t accurately compare friendliness and cuteness directly, you also can’t convert them both to the same type of goodness.

Below, I’ll discuss problems with converting to goodness in more detail.

Goldratt’s Pro/Con Lists

Although the standard use of pro/con lists involves adding factors, they can potentially be used in other ways. There’s nothing wrong with brainstorming pros and cons. It’s combining them by addition that’s problematic.

In It’s Not Luck (ch. 21), Eli Goldratt discussed using a pro/con list to decide whether or not to buy a boat. He proposed looking at every con and trying to come up with a solution for it, and buying a boat only if every problem was solved (cons are problems). That’s a decision making method which doesn’t add factors. (It’s actually combining factors with multiplication, as I’ll explain later.)

Goldratt also has other proposals, in the same chapter, like using if-then logic to casually connect an action, like buying the boat, to each con. How will taking the action cause each con? Map that out instead of relying on intuition to connect cons to the action. A solution has to address a con’s causes. A solution changes something so that cause-and-effect (reality, nature, logic) no longer causes the con.

Converting Chess Dimensions

In order to add factors, people try to create unit conversions between dimensions. For example, if we say that 1 mile is worth 20 grams, then we’ll be able to add length to weight. Like multiplying dimensions, this works OK in special cases. But it doesn’t work most of the time. There are few contexts in which 1 mile is worth 20 grams. We usually can’t make up a good conversion factor between dimensions. And no conversion factor between dimensions works in all contexts. And we should expect the conversion factor to always be approximate. The narrower the context, the better chance we have to come up with a decent conversion.

An especially narrow context is board games. They have clearly defined options (the actions that the rules allow the players to take) and clearly defined objectives (how you win).

Chess has two players, 64 squares, and a single starting position with 32 pieces on the board. Chess lacks the complexity of regular life, which makes conversions between dimensions work better.

The dimensions in chess games include, for each player, the number of pawns, number of queens, king safety, piece mobility, center control, number of passed pawns, location of passed pawns, and it being your turn. Those are all different factors that can be taken into account when evaluating which side is winning.

The only decisions in chess games are which moves to play, and the moves are done one at a time with the two players taking turns. In any position, there is a list of legal moves a player can play, and each move can be evaluated based on how it affects factors like pawn count, king safety, center control, etc.

Typical chess software evaluates chess positions by converting between various dimensions, adding up all the factors, and getting an overall score for how good a position is for white compared to black (subtract black’s score from white’s to figure out which side is winning, which is what actually matters).

The standard units of goodness are the number of pawns. The value of having a pawn is defined as one point. Everything else is converted into those points. For example, having a rook is worth 5 points, a queen 9.75 points, and a knight 3.25 points. For mobility, we can count how many different squares a piece can move to (how many legal moves can we currently make with that piece?), and multiply that by a factor based on which piece it is, and then add that many points. E.g. every square a bishop can currently move to is worth 0.03 points, so if a bishop can move to 10 squares that’s worth 0.3 points.

This doesn’t work perfectly for all chess positions, and humans can evaluate some chess positions better than software does. But it works reasonably well in most positions that actually come up in chess games. It’s good enough to play chess well when it’s combined with also looking ahead at many possible moves and counter-moves with the speed and accuracy of a modern computer. Overall, this type of chess software is better than the top human players. Even without looking many moves ahead, this kind of chess software is better than most human players.

How often and how well does life approximate chess? How often are situations so simple? Rarely. Especially for difficult decisions that involve many factors.

In life, it’s generally much harder to add factors from different dimensions to get a reasonable, approximate answer. That’s why we’ve been able to create software that’s good at deciding on chess moves based on many factors, but we don’t yet have software to decide what groceries to get (let alone who to marry). In many ways, choosing groceries is actually way more complicated than chess and the factors involved are harder to convert between.

Dollars per Hour

Let’s look at another example of converting between dimensions. In a sufficiently narrow context, you can approximately convert between time and money, since (hypothetically) you work for an hourly wage and work a flexible number of hours that you choose.

But this is only approximate. You can only work 24 hours in a day at most, so that limits the rate of converting time to money. And once you retire, you can’t convert money in the bank into more free time. You can’t reduce your work hours below zero.

Realistically, you could only work around 15 hours per day at most. And you’d be miserable if you kept that up for very long. It’s hard to give an exact number for how much you could work. That makes it hard to make the unit conversion formula accurate. (The proper formula, rather than just saying $20/hr, would also specify any limits on how much can be converted, and specify any changes to the conversion rate as more is converted, e.g. it’d take into account overtime pay.)

If you work long hours, you may get tired, bored or lonely – or just sad about missing your favorite TV show. You could even get a repetitive stress injury or be gradually poisoned by imperfectly safe working conditions. Working longer hours has downsides other than the amount of time it takes. So the conversion between time and money is only approximate, since it ignores other relevant factors. When you try to convert time with money, in either direction, other things happen too – the effect is not purely to change your time and money.

This kind of analysis works less well in most real situations. E.g., most jobs actually have pretty limited flexibility for how many hours you work. What some people do is make money from something on the side, and then if it does well enough they quit their job entirely. They don’t gradually reduce the hours they work as their side project replaces that income.

Converting Dimensions to Goodness

Converting dimensions to goodness (or any similar dimension like awesomeness or desirability) is problematic because there’s no good way to figure out what the conversion factors should be. There are no universally correct factors. It’s not like converting inches to feet, where the objectively correct factor is . You have to look at the context and then use approximate factors that try to take into account the context well. There are no straightforward or mathematical methods that tell you how to take into account context.

In life, often large parts of your life are relevant context that is connected to a decision in some way. Chess software works well because there is no other context to take into account from outside the chess game. For human chess players, sometimes they actually do take into account outside context. For example, they may be bribed to lose on purpose in a game that people have placed bets on. Or they may lose on purpose while playing against their child. Or they they may leave a game in the middle, and forfeit, due to a family emergency. However, ignoring outside context usually works as a good approximation for humans while playing chess.

How much is 5 points of goodness? Or 200 points? There are no fixed points in this made-up dimension to compare against. There’s no way to get your bearings. People invent scales like 0 is neutral, -100 is the worst, and 100 is the best. But if something is pretty good, is that a 60 or a 70, or something else? There’s no clear way to find out. And it’s hard to be consistent for each factor you convert because “pretty good” doesn’t mean the identical thing when you say those two vague words about different factors that are good in different ways. “Pretty good” at making money and “pretty good” at not beating your wife shouldn’t score the same amount of goodness points! If a used car seems pretty good for price and pretty good for safety, does that actually mean the same thing for both factors? Should both factors both we worth the same number of points? Doubtful.

We can measure inches. We know what those units mean. But we can’t measure goodness points.

We can’t measure cuteness of a pet either, but an individual can rank it. Which pet do you think is the cutest, second cutest, etc., in your opinion? And you can assign the pets numeric cuteness scores which are compatible with your rankings (highest ranked pet has the highest score, second highest ranked pet has the second highest score, etc.). You can also make cuteness scores approximate your opinion of how different the pets are. E.g. if the best two pets are similarly cute, and the third pet is much less cute, you might score them 85, 80 and 40. Those scores fine as long as you know they aren’t real measurements like inches. The cuteness scores just mean two things: the rank ordering you assigned the pets and a rough approximation of how close or far apart the pets are for cuteness in your opinion.

We can look at goodness of one factor in a similar way to looking at cuteness. We can convert from cuteness to goodness and get meaningful rankings or approximate numeric scores. Call that cuteness-goodness. And we an also convert from price to goodness and get meaningful rankings and more exact numeric scores. Call that price-goodness. But cuteness-goodness and price-goodness are not really the same thing and adding them still doesn’t actually work. Converting everything to the same goodness dimension is harder.

What do people do about that? They multiply by weighting factors that get results they think are reasonable. But that isn’t actually a way of making decisions. How do they know what’s reasonable? They must be using their intuition, common sense or something else other than weighted factor summing. So the weighted factor summing method doesn’t work as a self-contained solution to decision making. It relies on pre-existing opinions reached some other way. People often make the math (or non-numerical estimate) come out to fit what they already think (without realizing they’re doing it). For example, college rankings often start with the pre-existing idea that Harvard is good and then give high weightings to whatever factors Harvard is good at so that Harvard-like schools come out on top, which seems like a reasonable conclusion to people who already believed that Harvard is one of the best schools.

Any method involving arbitrary choices (like what unit conversions or weights to make up) runs into a major problem: You have no good way to make an arbitrary choice unless you have pre-existing knowledge of what a good answer is.

There are other difficulties with converting to the goodness dimension. I’ll point out several more.

How do you convert binary factors? Not everything is best represented as an amount. E.g. a plan might be legal or illegal, which is problematic to convert into goodness points and then factor into your overall evaluation. (One approach is to rule out any option with a dealbreaker factor, then score options by the goodness of the other factors and use that as a tiebreaker among the options with no dealbreakers. That is a two-part decision making process which makes goodness the less important part, so in a significant sense it’s actually giving up on using goodness scores and moving away from that method.)

How do you convert category-based factors, such as species? A pet could be a turtle, dog or rabbit. How do you convert those into points when there isn’t even any clear ordering for them? (You could rank or score each species by how much you like it. But then you’re converting a different factor – how much you like the species – rather than the species itself. And you’d still run into all the problems discussed for cuteness above.)

How do you check if your conversion to goodness is correct? Because you’re making up something that’s partly arbitrary, there’s no good way to check your work. There’s no answer key or objective right answer that you can compare against. This makes it hard to correct errors when using goodness conversions.

How do you take into account contextual factors when converting dimensions to goodness? How do you know what parts of context matter and how much? How do you know what changes to your situation mean you need to reconsider? There’s no clear answer. People try to muddle through this, usually using lots of intuition, common sense and traditional knowledge (we’ve at least learned a little from the many mistakes of past generations). It’s really hard to do this well in an explicit, rational or mathematical way.

These are major problems, so let’s look at alternative approaches.

Alternative Approaches

A different way to make decisions is to choose based on one factor. If we focus on a single factor, instead of trying to consider many factors from many dimensions, then evaluating options is much easier. With this method, we don’t combine factors. This avoids the difficulty of converting between dimensions to add factors. It also avoids the difficulty of multiplying factors and getting multi-dimensional units that aren’t useful. The downside is that we don’t take into account multiple factors. This doesn’t achieve our goal of making a decision based on many factors.

Another common method is cost-benefit analysis. This works by looking at two factors, cost and benefit. Each of those factors uses only one dimension. E.g. benefit is measured only as the number of people helped and cost is measured only as dollars. A single benefit factor can work OK, as a reasonable approximation, if you’re focusing on getting one type of benefit. Viewing all costs in dollars can work OK, as a reasonable approximation, because so many things are available for sale with prices in dollars. Sometimes viewing cost as time works too (you can add up the time every step in a project will take you, and divide the benefit by that much time).

After determining the cost and benefit, the factors are combined by division, which is a type of multiplication. The resulting units are two-dimensional, and tell you how much benefit you get per unit of the cost. (That’s similar to miles per hour, which is a two-dimensional unit that tells you how many miles you can travel per hour spent driving.) Cost/benefit analysis works well because multiplication is a valid way to combine dimensions and it creates two-dimensional units that are actually useful.

For example, for one plan, your two factors could be “helps 300 people” and “costs 100 dollars”. Then you’d divide them to get . Another plan might help 500 people but cost 200 dollars, so it helps , which is worse.

These alternative approaches, and other similar ones, only work well sometimes. They are limited by only taking into account a few factors. They don’t give a general purpose solution, that works in any situation, for looking at many factors from many dimensions.

Critical Fallibilism’s Solution

Binary Factor Multiplication

I propose a different approach. There’s a way to combine unlimited factors from different dimensions without doing unit conversions. It uses multiplication not addition.

The typical problem with multiplication is getting a result with multi-dimensional units that don’t mean anything useful, e.g. the five-dimensional units gram-second-meter-dollar-Celsius. That result isn’t logically wrong, but there’s no good way to interpret the result and relate it to your life. This problem gets worse with more factors.

But we can multiply binary factors and always get a meaningful, useful result. This allows using many factors and combining them into one result. It can also be combined with some other methods. You can look at a single factor, or a cost/benefit ratio, and then multiply that with an unlimited number of binary factors to get a useful result. The problems with multiplying dimensions actually come from multiplying non-binary factors, not any factors.

Binary factors mean factors with only two possible values. Mathematically, the values are 1 and 0. Conceptually, they’re pass and fail.

Binary factors typically ask if something is good enough to meet some criterion; e.g., is this pet within my budget? Is this pet small enough for my kid to hold? Is this pet safe enough for my family? Is this pet cute in the opinions of all of my children? Binary factors can also consider if something fits in a category or not; e.g., is it black?

You can combine binary factors with logical operators to get new binary factors. A typical combiner is “and”; e.g., is it cheap enough and also cute enough? You can also combine binary factors with “or”; e.g., is it black or orange?

Multiplying binary factors is a way to check if any of them fail or not. If they all pass, the result of the multiplication is “pass” (1). If one or more fails, the result is “fail” (0). You can see this yourself by multiplying different combinations of 1s and 0s together and observing the results. but . No number of 1’s can make up for, or cancel out, a single 0. Like all the math in this article, this is uncontroversial.

Multiplying binary factors is equivalent to using the logical “and” operator on them. Thinking of it in terms of “and” can be helpful. Multiplying binary factors combines them by saying you want factor A to pass, and factor B to pass, and factor C to pass, and so on. The result is whether the whole combination passes or not, just like the logical “and” operator only outputs “true” if every input is true.

The result of multiplying binary factors has multi-dimensional units. But the number in front of the units will always be 0 or 1, pass or fail. The problem of arbitrary numbers doesn’t come up. And the result tells us something meaningful and useful: did the factors pass in every dimension or not?

In this case, the multi-dimensional units are useful and appropriate. They’re basically a list telling us which factors passed. And they fit our goal because we come up with binary factors by relating non-binary factors to our goals (what is good enough for success).

So far, this solution works mathematically, and deals with many factors, which is an improvement over previous attempts. But it has a major limitation. It’s limited to binary dimensions, not any dimensions. What if we want to take into account many non-binary factors like cuteness, size, price or time?

So there’s a second part to the solution which makes it more useful. We need a method of converting from non-binary factors to binary factors. This is similar to the problem of converting between dimensions. In fact, technically, it does involve converting between dimensions. However, it’s much easier than converting between any two dimensions (like cuteness and weight). It’s basically a simplified special case of converting between dimensions.

Converting to Binary Factors

For any non-binary dimension, there are many related binary dimensions. They’re technically different dimensions, but they’re similar to the non-binary dimension. Converting to them isn’t like converting between arbitrary, separate dimensions.

A reason converting to a related binary dimension is easier is that it’s a one-way conversion rather than a symmetric conversion that works in either direction. It’s simplifying. It tries to convert to something simpler not something equivalent. The conversion to a related binary dimension loses information, rather than attempting to retain full information. You can’t convert back from “pass” or “fail” to the original value.

Converting to from a non-binary dimension to a binary dimension is also easier because it doesn’t attempt to convert multiple factors into the same dimension. Each non-binary factor gets its own related binary dimension. There’s still no way to convert between any of the original factors. They remain separate. The goal of general conversion was to take fundamentally different things and treat them as the same. The goal of converting from non-binary to binary is to take something and find a simplified version of the same thing.

Here are some example binary dimensions related to the price dimension. “Is the price above $5?”, “Is the price above $6?”, “Is the price below $5?”, “Does the price fit my budget?”, “Am I satisfied by the price?” For each of these, you can actually convert from a non-binary price number, like $20, to an accurate answer in the binary dimension. It’s not like converting between price and cuteness where you have to make stuff up. There are objectively correct answers to these questions.

You can convert from any dimension to many related binary dimensions by asking yes-or-no questions about the binary dimension. The most important question is “Is it good enough?” and variations on that theme. Does this factor meet my needs and goals, or not? Am I satisfied, or not? Will the value of this factor cause failure at my goal, or not? Do I have a decisive criticism of this option, which is related to this factor, or not?

So decision making basically works like this: Take your goal(s) and consider what is necessary for success rather than failure. Then evaluate factors in a binary, pass/fail way based on whether they will result in failure or they’re good enough. Then multiply them together to get an overall, combined pass/fail result.

Paradigm Shift

This method may sound weird or undesirable. It’s different than how people are used to thinking about multi-factor decision making. I ask you to give it a chance on the basis that the math works. That makes it worth consideration. There are no known general purpose alternatives with working math! The approach of adding different dimensions together works as an approximation in narrow contexts, but is fundamentally broken as a general method of thinking. It’s in widespread use for e.g. Consumer Reports product evaluations, college rankings, and car rankings, but it actually makes a poor approximation in all of those cases. Malcom Gladwell wrote a nice article that explains concretely why college and car rankings don’t work well.

Besides math, my approach makes sense in terms of Critical Rationalism and Theory of Constraints. There is also some overlap with ideas in self-help and psychology literature, e.g. maximizers and satisficers.

Maximizers try to make a perfect decision. They want to optimize all the factors to get the best answer they can. This leads to perfectionism and other problems. Trying to add up all the factors into an overall evaluation is a maximizer style strategy. The addition approach says that if one factor changes a little bit, the overall evaluation should also change at least a little bit. In other words, every detail matters. Whenever you get a little bit of new information, you should update your conclusion to a new, better, more perfect conclusion. This attitude can be seen with Bayesians who think it’s rational and optimal to update on any new piece of evidence, rather than striving to design robust systems so that small differences don’t matter.

Satisficers (based on the words “satisfy” and “suffice”) try to come up with criteria for what is good enough to satisfy them. They come up with achievable goals and they will pick any solution which meets their goals. This is similar to the binary factors approach which focuses on success, as against failure, rather than perfect maximizing. In this approach, decisions have resilience. The data could change a little and it probably wouldn’t change the conclusion. Small changes to “pass” factors usually leave the conclusion as “pass” (and small changes to “fail” usually mean it still fails). This leads to optimizing important things and paying attention to big differences, but avoids perfectionist over-optimization of every minor detail.

When data (and arguments or reasoning) is mapped to pass or fail conclusions, there is a many-to-one mapping, which makes conclusions resilient: the data (and arguments or reasoning) can change and still reach the same conclusion. By contrast, with a one-to-one mapping of data (and arguments or reasoning) to conclusions, any change in the data requires a new conclusion. Attempting one-to-one mappings is typical of maximization approaches, including adding up many factors to get an overall conclusion (if any one factor goes up, the overall evaluation should also go up).

There are even self-help books that talk about binary goals. How to Be an Imperfectionist by Stephen Guise says:

The secret to consistent success that compounds over time is to combine small goals with the binary mindset: one push-up a day = (binary) 1 = success. The binary mindset reframes what success and perfection are to you, and the small goal makes the target so easy it’s nearly impossible to resist.

The book has a section explaining what binary and analog are and how to use a “binary mindset”. In short, it’s impossible to do analog tasks perfectly, so perfectionists struggle. A binary mindset allows perfection (1 or pass is a perfect score), so perfectionists can be satisfied, happy and done. The book even talks about converting goals from binary to analog or vice versa. Understanding ideas like these can help make Critical Fallibilism more intuitive.


When we evaluate options, we consider many factors. Why? Because we have one or more goals and we want to choose an option which does well at our goals. We think the factors are relevant to our goals.

Factors should be sub-goals. Doing well on a factor is one part of your overall goal(s). If a factor wasn’t a sub-goal, then why would you want to take it into account? It wouldn’t matter to you.

Using binary factors means using binary sub-goals. We can view goals in terms of success or failure instead of amounts of success or amounts of goodness. With clear (sub-)goals that unambiguously define what constitutes success (pass) or not (fail), we can then evaluate options in a pass/fail way. Does the option pass every sub-goal? In other words, evaluate each option for whether it passes or fails at each factor (sub-goal), then combine the factors by multiplying to get an overall pass or fail result which tells you whether using the option will succeed at all your (sub-)goals or not.

An overall goal is a combination of (binary) sub-goals. You multiply them together to get the overall goal (also called a big-picture, final or combination goal). The overall goal is in multi-dimensional units involving every dimension you want to succeed in. An option, to meet that goal, should also be in multi-dimensional units – the same units. So to succeed at an eight-dimensional goal, we need an option involving eight factors, in eight dimensions, that we can multiply together to get an eight-dimensional result. It actually makes sense that the dimensions of an option we’re considering should match the dimensions of the goal we’re trying to achieve.

Philosophical Meaning of Binary Approach

If an idea passes at every factor or dimension, that means, in short, that there are no known errors with the option. We don’t have a decisive criticism of the option. It’s non-refuted rather than refuted. As best we know, it’s a viable solution to our problem(s). It survived criticism and still seems adequate to achieve our goal(s). We think this option will accomplish our purpose. If we use this option, we can expect success rather than failure. (These are high level statements to give you a preview of what Critical Fallibilism says, and what kinds of concepts it deals with. You may also notice connections with Critical Rationalism.)

Instead of assuming more goodness is better, we should learn to think in terms of how much is good enough and where the breakpoints are. Trying to get more goodness for every factor results in wasting time on local optima instead of only optimizing bottlenecks. In any stable, viable system, most factors have excess capacity and we don’t need more of them. (This paragraph briefly indicated how Theory of Constraints ideas are used in Critical Fallibilism.)

History of Binary Approach

So far, this article hasn’t fully explained the binary decision making method. If you think it’s incomplete, you’re right. There are more questions to answer about how and why it works. Those will be covered in future articles (and you can find some answers in my previous articles like Yes or No Philosophy Summary). I think it’ll be useful if I provide some background information before concluding.

I first developed this method by trying to understand and improve on Karl Popper’s philosophy, Critical Rationalism (CR). CR proposes an evolutionary epistemology instead of a justificationist one, and focuses on fallibility, criticism and error correction. It advocates using exclusively negative arguments, and criticizes positive arguments. It rejects induction and says we learn by critical discussion.

I wanted to understand more precisely what criticisms and errors were, and how to organize a critical discussion to be highly effective. CR says you can evaluate ideas based on many factors by looking only at criticism (not positive, supporting or justifying arguments) and then judging how well ideas stand up to criticism. I thought that was too vague.

My insight was that a criticism says a reason an idea is an error (or else it isn’t a valid criticism). I rejected partial, halfway criticism. A criticism should contradict the idea it’s criticizing. If it’s possible to accept both the criticism and the idea being criticized, then that’s vague and ineffective thinking. This is how I came to favor a binary approach. I only looked at it mathematically later.

I also wanted a way to reach clear, confident conclusions. Some fallibilists hesitate and hedge endlessly, but I didn’t want that. (I don’t think that would satisfy Popper either.)

I also connected my focus on decisive criticism with the way error correction works. Analog systems are fundamentally limited in their compatibility with error correction. Our computers are digital – and in particular binary – because it better enables error correction.

After I had developed this idea non-mathematically for years, as an improvement on CR, I found Eli Goldratt’s Theory of Constraints (TOC). TOC explains many thinking tools, some of which help with the binary approach to epistemology. TOC’s ideas about constraints and excess capacity were particularly helpful. Another connection is TOC’s way of using a pro/con list by ignoring the pros and trying to make every con a pass rather than fail. I mentioned above that that uses multiplication. Hopefully you can see now that it’s equivalent to evaluating every con as a 1 or 0 and then multiplying them together to get an overall score of 1 (go ahead and buy the boat) or 0 (do not buy the boat).

I looked at some self-help and psychology literature recently. It didn’t inspire my ideas, but it’s nice to find more lines of thinking that independently converge on some of the same conclusions, like the ideas about maximizers, satisficers, and using the binary mindset to help with perfectionism.

A major reason for developing this approach is that it applies to rational debate. When people try to add up strong and weak arguments and evidence, they are adding linearly weighted factors. Each argument or piece of evidence is a factor. Evidence and arguments are judged for strength, which means assigning them weights even if no numbers are used. Criticisms have negative weights. However, arguments are in different dimensions so they can’t be combined by adding. While studying philosophy, I rejected this approach to debate in favor of evaluating ideas as refuted or non-refuted and evaluating arguments as decisive or indecisive. Translating from epistemology to math provided another way to understand and explain the issue.


The purpose of this article has been to look at methods for evaluating complex options and to connect them to math. It has focused on how combining many factors from different dimensions is hard. Addition and multiplication are standard combining operations but both run into major difficulties. They don’t work as general purpose solutions. That motivates looking for alternatives. Experts have been aware of some difficulties but haven’t been able to find any good alternatives. The binary factor multiplication approach is a novel and promising alternative.





More posts like this

Sorted by Click to highlight new comments since: Today at 9:31 PM

Totally disagree with your approach and your critique, but upvoted because I want to encourage more of this sort of work, and because people can benefit from going "huh?" and being confused about the questions you're raising.

Note: I only skimmed your post, might have missed something obvious.

Intuitively, Binary Factor Multiplication seems like a good way to robustly get to mediocrity -- like select for projects that have no red flags, but also nothing amazing. Whereas in many real-life consequentialist decisions you want to either do either a (weighted) sum  of factors, or multiply factors, or some combination of the two. My intuition is that this will more consistently lead people to do things that will roughly maximize their expected impact, instead of doing projects that are robustly good on a bunch of metrics but not actually great on any.

Making decisions by adding weighted factors involves non-arbitrarily converting between qualitatively different, incommensurable dimensions. That is impossible in the general case. It’s like adding seconds with grams, which requires a conversion ratio, like 3s:2g. I made that ratio up arbitrarily but no other numbers would be better.

Decision making systems should be compared first by whether they work at all. Other issues, like how conveniently they avoid red flags or mediocrity, are secondary.

I think the discussion priorities should be, first, is there a flaw in the impossibility argument? Second, if the impossibility argument is accepted as plausible, we could discuss what’s actually going on when people appear to do something impossible.

You're absolutely right that you can't just combine different dimensions and magically get out a third qualitatively different dimension.

The way I would do it is to roughly assign utility to each point in the joint distribution, . This may not be as simple as assigning individual utilities and adding the distributions together (), because maybe what I care about is the average, or maybe two of the dimensions combine in a multiplicative manner for a subset of their range. So before I intuitively plot the multivariate distribution, I make some assumptions about how their utilities combine.

Often, it's not too complicated. Let's say I have  and . The utility of me buying an apple is , where  is how much the apple costs. Cost and apple-goodness doesn't have any complicated relationship in utility for me, so the subtraction works fine for most cases.

Other times, it's something like multiplicative. Like, uhh... The utilities over quantities of each ingredient required to make blueberry muffins, vs the utility over quantities of blueberry muffins itself. 

(sorry for example)

Has anyone written down the thing you're proposing in detail? I haven't seen it in MCDA or Bayesian literature before and a quick Google Scholar search didn't turn anything useful up. Does it have a name or some standard terms/keywords that I should search? Is there any particular thing you'd recommend reading?

Would you estimate what percentage of the EA community agrees with you and knows how to do this well?

Here's an attempt to restate what you said in terms that are closer to how I think. Do you understand and agree with this?

Convert every dimension being evaluated into a utility dimension. The article uses the term "goodness" instead of "utility" but they're the same concept.

When we only care about utility, dimensions are not relevantly qualitatively different. Each contains some amount of utility. Anything else contained in each dimension, which may be qualitatively different, is not relevant and doesn’t need to be converted. Information-losing conversions are OK as long as no relevant information is lost. Only information related to utility is relevant.

(So converting between qualitatively different dimensions is impossible in the general case, like the article says. But this is a big work around which we can use whenever we’re dealing with utility, e.g. for ~all decision making.)

When the dimensions are approximately independent, it’s pretty easy because we can evaluate utility for one dimension at a time, then use addition.

When the dimensions aren't independent, then it may be complicated and hard.

Sometimes we should use multiplication instead of addition.

I may be misinterpreting something, but I think what Emrik has described is basically how generic multi-attribute utility instruments (MAUIs)  are used by health economists in the calculation of QALYs. 

For example, as described in this excellent overview, the EQ-5D questionnaire asks about 5 different dimensions of health, which are then valued in combination: 

  1. mobility (ability to walk about)
  2. self-care (ability to wash and dress yourself)
  3. usual activities (ability to work, study, do housework, engage in leisure activities, etc.)
  4. pain/discomfort
  5. anxiety/depression

Each level is scored 1 (no problems), 2 (moderate problems), or 3 (extreme problems). These scores are combined into a five-digit health state profile, e.g., 21232 means some problems walking about, no problems with self-care, some problems performing usual activities, extreme pain or discomfort, and moderate anxiety or depression. However, this number has no mathematical properties: 31111 is not necessarily better than 11112, as problems in one dimension may have a greater impact on quality of life than problems in another. Obtaining the weights for each health state, then, requires a valuation exercise.

Valuation methods

There are many ways of generating a value set (set of weights or utilities) for the health states described by a health utility instrument. (For reviews, see e.g., Brazier, Ratcliffe, et al., 2017 or Green, Brazier, & Deverill, 2000; they are also discussed further in Part 2.) The following five are the most common:

  • Time tradeoff: Respondents directly trade off duration and quality of life, by stating how much time in perfect health is equivalent to a fixed period in the target health state. For example, if they are indifferent between living 10 years with moderate pain or 8 years in perfect health, the weight for moderate pain (state 11121 in the EQ-5D-3L) is 0.8.
  • Standard gamble: Respondents trade off quality of life and risk of death, by choosing between a fixed period (e.g., 10 years) in the target health state and a “gamble” with two possible outcomes: the same period in perfect health, or immediate death. If they would be indifferent between the options when the gamble has a 20% probability of death, the weight is 0.8.
  • Discrete choice experiments: Respondents choose the “best” health state out of two (or sometimes three) options. Drawing on random utility theory, the location of the utilities on an interval scale is determined by the frequency each is chosen, e.g., if 55% of respondents say the first person is healthier than the second (and 45% the reverse), they are close together, whereas if the split is 80:20 they are far apart. This ordinal data then has to be anchored to 0 and 1; some ways of doing so are presented in Part 2. Less common ordinal methods include:
    • Ranking: Placing several health states in order of preference.
    • Best-worst scaling: Choosing the best and worst out of a selection of options.
  • Visual analog scale: Respondents mark the point on a thermometer-like scale, usually running from 0 (e.g., “the worst health you can imagine”) to 100 (e.g., “the best health you can imagine”), that they feel best represents the target health state. If they are also asked to place “dead” on the scale, a QALY value can be easily calculated. For example, with a score of 90/100 and a dead point of 20/100, the weight is (90-20)/(100-20) = 70/80 = 0.875.
  • Person tradeoff (previously called equivalence studies): Respondents trade off health (and/or life) across populations. For example, if they think an intervention that moves 500 people from the target state to perfect health for one year is as valuable as extending the life of 100 perfectly healthy people for a year, the QALY weight is 1  – (100/500) = 0.8.[13]

Thanks. I have seen similar valuation methods elsewhere which might interest you. 1000minds' Multi-Criteria Decision Analysis (MCDA/MCDM) article has a list of methods, with summaries, like Direct rating, Points allocation, SMART, SMARTER, AHP, etc.

So, when you have 31111, each number is in a separate dimension and there’s no problem so far. Then the valuation method handles the hard part. Each valuation method you quote (and many MCDM ones) have a common property: they rely on the intuition or judgment of a decision maker. The decision maker is asked to make comparisons involving multiple dimensions. But that doesn’t explain how to do it; it relies on people somehow doing it by unspecified methods and then stating answers. Does that make sense and do you see the problem? Do you think you know how decision makers can come up with the answers needed in by these valuation methods?

Put another way, I read the valuation methods as attempts to make pre-existing knowledge more explicit and quantified. It assumes a decision maker already knows some answers about how to value different dimensions against each other, rather than telling him how to do it. But I’m interested in how to get the knowledge in the first place.

Yes, that's an excellent reformulation of what I meant!

I think this roughly corresponds to how people do it intuitively in practice, but I doubt most people would be able to describe it in as much detail (or even be aware) if asked. But at least among people who read LessWrong, it's normal to talk about "assigning utility". The percentage among people who self-describe as EA who also think like this goes up the longer they've been in EA, and maybe like 80% of people who attended an EAG in 2022. (Pure speculation; I haven't even been to an EAG.)

I don't know anywhere it's written like this, but it probably exists somewhere on LW and probably in academic literature as well. On second though, I remembered Arbital probably has something good on it. Here.

Yes, that's an excellent reformulation of what I meant!

Great. Let’s try two things next.

First, do you think my solution could work? Do you think it’s merely inferior or is there something fully broken about it? I ask because there are few substantially different epistemologies that could work at all, so I think every one is valuable and should be investigated a lot. Maybe that point will make sense to you.

Second, I want to clarify how assigning utility works.

One model is: The utility function comes first. People can look at different outcomes and judge how much utility they have. People, in some sense, know what they want. Then, when making decisions, a major goal is trying to figure out which decisions will lead to higher utility outcomes. For complicated decisions, it’s not obvious how to get a good outcome, so you can e.g. break it down and evaluate factors separately. Simplifying, you might look at a correlation where many high utility outcomes have high scores for factor X (and you might also be able to explain why X is good), so then you’d be more inclined to make a decision that you think will get a lot of X, and in factor summing approaches you’d assign X a high weighting.

A different model is: Start with factors and then calculate utility. Utility is a downstream consequence of factors. Instead of knowing what has high utility by looking at it and judging how much you like it or something along those lines, you have to figure out what has high utility. You do that by figuring out e.g. what factors are good and then concluding that outcomes with a lot of those factors probably have high utility.

Does one of these models fit your thinking well?

I don't have time to engage very well, but I would say the first model you describe fits me better. I don't look at the world to figure out my terminal utilities (well, I do, but the world's information is used as a tool to figure out the parts of my brain which determine what I terminally want). Instead, there's something in my brain that determines how I tend to assign utility to outcomes, and then I can reason about the likely paths to those outcomes and make decisions. The paths that lead to outcomes I assign terminal utility, will have instrumental utility.

I haven't investigated this as nearly deeply as I would like, but supposedly there are some ways of using Aristotelian logic (although I don't know which kind) to derive probability theory and expected utility theory from more basic postulates. I would also look at whether any epistemology I'm considering is liable to be dutch-booked, because I don't want to be dutch-booked.

"I ask because there are few substantially different epistemologies that could work at all, so I think every one is valuable and should be investigated a lot."

Agreed. Though it depends on what your strengths are what role you wish to play in the research-and-doing community. I think it's fine that a lot of people defer to others on the logical foundations of probability and utility, but I still think some of us should be investigating it and calling "foul!" if they've discovered something that needs a revolution. This could be especially usefwl for a relative "outsider"[1] to try. I doubt it'll succeed, but the expected utility of trying seems high. :p

  1. ^

    I'm not making a strict delineation here, nor a value-judgment. I just mean that if someone is motivated to try to upend some popular EA/rationalist epistemology, then it might be easier to do that for someone who hasn't been deeply steeped into that paradigm already.

I don't have time to engage very well


Agreed. Though it depends on what your strengths are what role you wish to play in the research-and-doing community. I think it's fine that a lot of people defer to others on the logical foundations of probability and utility, but I still think some of us should be investigating it

I agree. People can specialize in what works for them. Division of labor is reasonable.

That’s fine as long as there are some people working on the foundational research stuff and some of them are open to serious debate. I think EA has people doing that sort of research but I’m concerned that none of them are open to debate. So if they’re mistaken, there’s no good way for anyone who knows to fix it (and conversely, any would-be critic who is mistaken has no good way to receive criticism from EA and fix their own mistake).

To be fair, I don’t know of any Popperian experts who are very open to debate, either, besides me. I consider lack of willingness to debate a very large, widespread problem in the world.

I think working on that problem – poor openness to debate – might do more good than everything EA is currently doing. Better debates would e.g. improve science and could make a big difference to the replication crisis.

Another way better openness to debate would do good is: currently EA has a lot of high-effort, thoughtful arguments on topics like animal welfare, AI alignment, clean water, deworming, etc. Meanwhile, there are a bunch of charities, with a ton of money, which do significantly less effective (or even counter-productive) things and won’t listen, give counter arguments, or debate. Currently, EA tries to guide people to donate to better charities. It’d potentially be significantly higher leverage (conditional on ~being right) to debate the flawed charities and win, so then they change to use their money in better ways. I think many EAs would be very interested in participating in those debates; the thing blocking progress here is poor societal norms about debate and error correction. I think if EA’s own norms were much better on those topics, then it’d be in a better position to call out the problem, lead by example, and push for change in ways that many observers find rational and persuasive.

When I make a decision, I care more about how good the outcome of the decision is than how mathematically consistent my process for making the decision is. My decision making, in practice, is fuzzy, time constrained, and rarely formalized.

Do you think that the alternative you discuss in this post are more likely to lead quicker, better answers? Or is this post more just calling out the deep mathematical foundations of typical decision decision making progress, even if they’re fine to use practice?

Disclaimer: didn’t read much of the post.

Do you think that the alternative you discuss in this post are more likely to lead quicker, better answers?


Or is this post more just calling out the deep mathematical foundations of typical decision decision making progress, even if they’re fine to use practice?

I think the approaches I criticize do not work practically, at all, except by basically embedding a different, functional process within them (which people often do without realizing it, and which has major downsides).

Curated and popular this week
Relevant opportunities