Invisible impact loss (and why we can be too error-averse)

Lizka

TL;DR: It’s hard to really feel sad about impact that doesn’t happen, but it’s easy to feel sad about mistakes in existing work.

This is bad because we sometimes have a choice between making a good-but-small thing or a big-but-slightly-worse thing, and the mismatch in our emotional reactions to the downsides of the two options turns into a cognitive bias that harms our decision-making. This can make us too error-averse and err too much on the side of just not doing things at all.

For instance, if I’m choosing between making five imperfect cookies or one perfect cookie, I might notice the imperfections more than the four missing cookies, whereas it’s often better to just make more imperfect cookies.

In some cases, we should be quite wary of mistakes. I outline where I think we’re too error-averse and where error-aversion is appropriate below.

Summary of my concrete suggestions:

To counteract fear of criticism, foster a culture of celebrating successes and exciting work.
To notice invisible impact loss, have the phrase “invisible impact loss” rattling inside your head, and try to quantify decisions that are prone to such errors.
Fight perfectionism; lean into agile project management, start half-assing, and try to determine what your real goals are.

This highly technical diagram shows 5 slightly lumpy jammie dodgers (a type of British cookie) contrasted with a single less-lumpy jammie dodger. If I were baking cookies, I might naturally try to spend more effort trying to make a really nice cookie and end up with the result on the right. After all, it’s easy to see the flaws in the cookies on the left, while the missing cookies on the right aren’t there — they’re invisible. But it’s probably better to just make more cookies, even if they’re lumpier, so that I can e.g. hand them out at the office. I’d be a bit embarrassed, but ultimately there’d be more value from the larger set of lumpier jammie dodgers.

Preamble: What sparked this thought

When I was on the Events Team at CEA, we had a conversation about whether to approximately double EA Global capacity at the last minute (a few weeks before the actual event) in order to admit more people who met the admissions bar. (This was the most recent conference for which capacity was a problem.)

I remember starting off skeptical about the idea. After all,

There would be more logistics issues; lunch lines would be long, etc.
We’d have to move to two venues, which is awkward. People would have to walk from one venue to another to get to a different meeting or session.
Doubling would put a lot of strain on the team, which would mean we’d be worse at responding to people, communicating things like the schedule or important COVID information, catching awesome ways to improve the conference…
And in general, the experience of the average attendee would probably be worse.

But someone else on the team made me realize a big thing I wasn’t fully tracking:

About 500 more people who met the admissions bar would get to experience the conference. However much impact the default conference would produce — sparked collaborations, connections for years to come, inspirations for new projects, positive career changes — there would be about double that.

Although the whole conversation was sparked by the idea that we had way more people we wanted to admit than slots for attendees, I wasn’t internalizing the benefit we would miss out on if we didn’t double. I couldn’t viscerally feel the loss of impact from a smaller but near-perfect conference the way I could feel my aversion to the various issues I could imagine with the bigger conference. The impact loss was invisible to me.

I now actively try to notice invisible impact loss. I’ve noticed discussions on the Forum that miss this, and worry that it’s stunting work that could be extremely valuable, so I decided to write this post.^[1]

Note: EA Global conferences have changed since then. You can see some discussion of this here. I should also note that the conference didn’t end up facing many of the issues I worried about while we were deciding whether to double.

What causes this phenomenon? How can we push back? (Suggestions)

Here are some factors I think are at play, and ideas for what we can do about them. (Please add to this in the comments!) See the summary at the top.

1. Fear of criticism^[2]

We’re much, much more likely to be criticized for doing something imperfectly than for not doing something.^[3] (In part because it’s unclear who to criticize for not doing something.) This makes people more averse to doing anything, as they’re worried about the potential for criticism.

What can we do about fear of criticism?

It’s great that we’re a community that is critical, but I think we could celebrate successes and exciting work more than we currently do. This could give people an incentive to do stuff, as they might expect to be appreciated for that work. I also think self-celebration (bragging) should be normalized more.^[4]

Concrete examples: more threads like this one, more generic comments of appreciation.
Relatedly, we can be more generous in our criticism (relevant resource, and a post on supportive scepticism). I think people in effective altruism already are more generous than is usual on the internet, but I’d guess that we can up our standards here.

(And we can individually reframe our attitude towards criticism. We can actively train ourselves to be receptive to criticism, to take criticism and process it productively. The standard advice is to view every criticism as an opportunity to grow in the object-level (improve that type of work), but I also think we can grow on a meta level — view it as an opportunity to build our criticism-response muscles.)

2. Invisible impact loss is hard to notice (it’s “invisible”!), but noticing flaws is easy

My suggestions:

When making decisions like whether to double a conference, just try hard to notice the invisible impact loss. Have the phrase “invisible impact loss” in the back of your mind.
Fermi estimates can be a really good grounding: just try to quantify the impact of the possibilities for every significant decision. Then you can try to account for the cost of errors and imperfections but also notice the missing value from invisible impact loss.
1. Note: you don’t need a stats degree for this!
2. Two examples:
  1. For the conference, we could have done something really rough; most of the value is in the attendees’ experiences, and most of the cost is their time. So let’s just consider the net benefit from the conference to/via every attendee (without considering e.g. the cost of the additional venue). Then, as long as the net value is positive and the conference doesn’t get half as valuable or less for every attendee as it would have been, doubling is worth it.
  2. In the (unrealistic) jammie dodgers case, suppose that most of what I care about is how happy people are eating the jammie dodgers. If a lumpy jammie dodger gives people 10 units of happiness while a less-lumpy jammie dodger gives people 12 units of happiness, then I’m comparing a total of 50 units of happiness (5 x 10) to 12 units of happiness (1 x 12).
3. Resources for Fermi estimates: intro to Fermi estimation, notes from my workshop on Fermi estimation, and the relevant Forum Wiki entry.

3. Perfectionism

If you’re anything like me, the idea of kicking off something with known flaws (or something that you know has flaws that you haven’t identified yet) is aversive.

A few things help me:

A ship-fast-and-iterate mentality that I’m developing mostly by working with people who are better than I am at this. The rough idea is that you should deliver something useful fast, even if it has flaws, which allows you to get feedback and identify and fix more flaws or otherwise improve the product. I think this is heavily related to “agile” approaches (e.g. to software development), although I don’t know much about the theory, and “lean” principles.
Half-assing it with everything you’ve got and the rest of the Replacing Guilt series.
Deadlines and accountability mechanisms. The best way for me to make sure that I actually post something that I’ve written is to create a deadline for it.

When this reasoning doesn’t quite apply — reasons to be error-averse (and some suggestions)

You can find a related discussion here, and a talk on accidental negative impacts here.

Here are some cases when, given a decision between “a good project” and “an OK project that’s bigger or more ambitious,” it might be better to err on the side of the better or more polished (and less ambitious) project (do less, but better):

When the downside risks are high
1. Or the other costs of a low-quality version of the project are high
When you might prevent someone else from doing a better job (or when the unilateralist’s curse applies)
When this type of project often has long tails of success (that depend on quality)
When a key goal of the project is your own development and training

In general, I think that a broad way to mitigate this kind of risk is to get external feedback and seriously listen to it.

Note that sometimes more than one of these applies. Let’s run through these cases one by one.

(1) The downside risks are high

Example: Say you’re baking cookies, and one of your ingredients is poisonous if not prepared correctly.^[5] Then a more efficient and less cautious method of preparation might not be appropriate — even if it doubles your output. The downside risks here are high! You might kill someone. Better make one safe cookie than 10 potentially lethal cookies.

Downside risks are sometimes high in EA projects.^[6] Some reasons the downside risks of a project might be high:

There are information hazards in this area
The area is inherently dangerous; there are other serious risks from this kind of work, like poisoning someone
- In EA, this might apply if your project is physically dangerous, if you’re interacting with vulnerable people, like minors, or for some other reason.
You might significantly harm the reputation of something that relies on its reputation to succeed
- E.g. if you’re establishing a new kind of publication, and you publish a bunch of articles with basic grammar mistakes, the mistakes themselves may not matter much, but people might write the publication off as amateurish and unprofessional.
The project might suck up a bunch of valuable resources
- E.g. you know that your project will be appealing to people, you’re not sure it’s that valuable, and it’ll be costly either in time or money.
- A note: I’m not sure we should spend too much time worrying about this; I think we have OK systems for not dumping valuable resources into low-expected-impact projects, or at least generally shouldn’t expect them to outcompete high-expected-impact projects.
This is the first project of its kind, and you might lock in a bad norm or pattern (this is related to (2) — preventing someone else from doing a better job

Mitigation: check whether you’re operating in a risky area, or if any of the above applies. Also, get feedback from others (although act carefully around information hazards). If you’re not sure why no one has done a thing yet, take the possibility of risks or a unilateralist’s curse seriously.

(2) You might prevent someone else from doing a better job

Example: Say you’ll be talking to 10 important state officials in a variety of offices. You’re an expert on biosecurity and could focus on carefully explaining risks from pandemics to the two relevant officials, or you could prepare 10 pitches to the 10 officials about various issues relevant from an EA perspective, but you might do it poorly; you just don’t understand global health or AI safety very well. You might want to focus on the biosecurity pitches in case you say something incorrect about global health or AI that makes the relevant officials dismiss the concern as that of an amateur.

Another example: You’re starting the first EA group in a given area, and you don’t have much time (or expertise) to do it well. Someone else might start one soon, but won’t do it if you’ve already started it, and the group you start might be worse than their version.

Mitigation: In this case, it might be worth spending more time understanding the counterfactual; if you don’t do (a bigger version of) the project, will someone else? It might be better to coordinate.

(3) This type of project often has long tails of success (that depend on quality)

Example: You’re working on a book. Book success is long-tailed; most books don’t do very well, and some become wildly popular. [Disclaimer: this is something I’m pretty sure I remember reading about in a source I thought was trustworthy, but don’t have a quick source for.] You could write two ~OK books or one great book.^[7] You might want to just write the great book, as it’s likely to get more than double the readers!

Mitigation: seriously check that you have something that might be wildly successful,^[8] and if you notice that you do, go ahead with that (rather than spending energy on volume of projects).

(4) When a key goal of the project is your own development or training

Example: Say you’re working on one of your first research papers. Your basic goal is to get it published, and you think you can do that by half-assing. However, you also want to test your fit for research and learn great research practices. You won’t be able to properly learn and test your fit if you half-ass. Then it might make sense to go all-in and really focus on trying to make something excellent, even if you think the impact from this one paper is not that big.

Mitigation: Notice if you expect that most of the value of a given project is via your own development. In those cases, it might make sense to focus on making something excellent.

Closing thoughts

The summary of this post is at the top — in very brief, notice invisible impact. Please argue with me and add your own examples in the comments!

Related links (most of which were mentioned)

Why we should err in both directions
- And related: Terminate deliberation based on resilience, not certainty
Half-assing it with everything you've got
Celebrations and gratitude thread
Accidental harm
EA and the current funding situation
[Edit] Some related content (will try to add as I see more):
- "Pandemic Ethics and Status Quo Risk" (Richard Y Chappell)

Thanks to everyone who gave me feedback on drafts of this post!

^{^}
I’m not the first person to talk about this, by a long shot. For instance, in a recent post, William MacAskill writes:
It seems to me to be more likely that we’ll fail by not being ambitious enough; by failing to take advantage of the situation we’re in, and simply not being able to use the resources we have for good ends.
It’s hard to internalise, intuitively, the loss from failing to do good things; the loss of value if, say, EA continued at its current giving levels, even though it ought to have scaled up more. For global health and development, the loss is clear and visceral: every year, people suffer and lives are lost. It’s harder to imagine for those concerned by existential risks. But one way to make the situation more vivid is to imagine you were in an “end of the world” movie with a clear and visible threat, like the incoming asteroid in Don’t Look Up. How would you act? For sure, you’d worry about doing the wrong thing. But the risk of failure by being unresponsive and simply not doing enough would probably weigh on you even harder.
^{^}
MacAskill again:
… there are asymmetric costs to trying to do big things versus being cautious. Compare: (i) How many times can you think of an organisation being criticised for not being effective enough? and (ii) How many times can you think of someone being criticised for not-founding an organisation that should have existed? (Or, suppose I hadn’t given a talk on earning to give at MIT in 2012, would anyone be berating me?) In general, you get public criticism for doing things and making mistakes, not for failing to do anything at all.
^{^}
Related: Action/inaction distinction (“doing vs allowing harm”)
^{^}
“Bragging” feels very unnatural to me, but I do think it can be useful for the reasons I list.
^{^}
I don’t really know why you’d make cookies with this, but let’s go with the example anyway. Here’s an example of such an ingredient — although again, I’m not really sure why you’d make cookies with it.
^{^}
You can find more discussion about this here.
^{^}
Although note that even with books, you might want to test some ideas first and see if you can hit on product-market fit. A source I’ve been told is useful on this topic is Write Useful Books.
^{^}
Related: How to choose an aptitude by Holden Karnofsky

137 Reactions

Disentangling "Improving Institutional Decision-Making"

16 comments96 karma

Mentioned in

293Criticism is sanctified in EA, but, like any intervention, criticism needs to pay rent

192Announcing my departure from CEA (& sharing assorted notes)

112Ways in which EA could fail

82Draft Amnesty Day: an event we might run on the Forum

79New Epistemics Tool: ThEAsaurus

Load more (5/11)

Comments15

Sorted by

New & upvoted

Click to highlight new comments since: Today at 3:45 PM

Kat Woods2y20

I couldn't agree more with this post. I've been referring to it in my circles as the "risks of inaction" and "leaving impact on the table", if any of those terms resonate more with people.

Will MacAskill also mentioned in a post once the "bureaucrat's curse", which I love. It's the inverse of the unilateralist's curse, where if just one person doesn't like the idea, it gets killed.

I see this everywhere, especially in longtermism. The fear of accidentally making things worse (which is a warranted fear!), overshadows the fear of accidentally moving too slowly.

If you're on a bus hurtling towards a cliff, instinctively acting in a panic can make things worse, but also moving too slowly or not at all also leads to high downsides

Lizka2y14

Thank you!

I also really like the phrase bureaucrat's curse. Here's the relevant passage (in this post):

As well as the unilateralist’s curse (where the most optimistic decision-maker determines what happens), there’s a risk of falling into what we could call the bureaucrat’s curse,^[10] where everyone has a veto over the actions of others; in such a situation, if everyone follows their own best-guesses, then the most pessimistic decision-maker determines what happens. I’ve certainly seen something closer to the bureaucrat’s curse in play: if you’re getting feedback and your plans, and one person voices strong objections, it feels irresponsible to go ahead anyway, even in cases where you should. At its worst, I’ve seen the idea of unilateralism taken as a reason against competition within the EA ecosystem, as if all EA organisations should be monopolies.

(In a comment, Linch points out that this is a special case of the unilateralist's curse.) I also really like the suggestions below the cited passage — on what we need to do or keep doing to manage risks properly:

Stay in constant communication about our plans with others, inside and outside of the EA community, who have similar aims to do the most good they can
Remember that, in the standard solution to the unilateralist’s dilemma, it’s the median view that’s the right (rather than the most optimistic or most pessimistic view)
Are highly willing to course-correct in response to feedback

(In writing, I think there's something somewhat related to the bureaucrat's curse, which is writing-by-committee, or what Stephen Clare called "death by feedback".)

Habryka [Deactivated]2y16

While I agree with a lot in this post, I do want to push back on this reasoning:

About 500 more people who met the admissions bar would get to experience the conference. However much impact the default conference would produce — sparked collaborations, connections for years to come, inspirations for new projects, positive career changes — there would be about double that

I think an estimate of "double that" is pretty wrong. I think the first 500 people who would be admitted would of course be selected for getting value out of the conference, and I expect the value that different people gain to be heavy-tailed. It is hard to predict who exactly will get value out of a conference, but it wouldn't surprise me if you get to a state where you capture 90% of the value by admitting the right 500 people.

On the other hand, I think a conference might produce value in the square of the number of the participants, since people can self sort, and meeting more people is more valuable than meeting less people.

I think in one line of reasoning you get something like "a conference twice the size would be maybe 10-20% more valuable" and in the other line of reasoning you get "a conference twice the size could be 4x as valuable", but I don't have any line of reasoning I endorse that outputs the 2x number.

Lizka2y4

Thanks for the pushback. I agree that a linear model will be importantly wrong, although if you approximate the impact from the conference using the number of connections people report and assume that stays roughly the same, it doesn't seem wild as a first pass. (Please let me know if you disagree!)

[Half-formed thoughts below.]

On the other hand, I think 10-20% more valuable seems very off to me, especially in this case, given we were not "lowering the bar" for the second group of attendees. Setting this case aside, I can imagine a world in which someone is very confident in their ability to admit the people who will benefit the most from a conference (and the people who would be most useful for them to meet with), and in this world, you might be able to get 90% of the value with 50% of the size — but I don't really think we're in this world (especially in terms of identifying people who will benefit most from the event).

I'm not really sure how well people self-sort at conferences, which was a big uncertainty for me when I was thinking about these things more. I do think people will often identify (often with help) some of the people with whom it would be most useful to meet. If people are good at self-sorting (e.g. searching through swapcard and finding the most promising 10-15 meetings), and if those most-useful meetings over the whole conference aren't somehow concentrated on meetings with a small number of nodes, then admitting double the people will likely lead to more than double the impact.^[1] If people are not good at self-sorting, though, it seems more likely that we'd get closer to straightforward doubling, I think. (I'm fairly confident that people are better than random, though.)

^{^}
It does seem possible that there are some "nodes" in the network — at a very bad first pass, you could imagine that everyone's most valuable meetings are with the speakers. The speakers each meet with lots of people (say, they have lots of time and don't get tired) and would be at the conference in any world (doubling or not). Then the addition of 500 extra people doesn't significantly improve the set of possible meetings for the 500 first attendees, although 500 extra people get to meet with the speakers (which is nearly all that matters in this model).
I'm really unsure about the extent to which the "nodes" thing is true (and if it's true I don't really think that "speakers" are the right group), but there's something here that seems like it could be right given what we hear. There's also the added nuance that some nodes are probably in the second group of 500, and also that the size and capacity for meetings of the "nodes" group would matter.

Habryka [Deactivated]2y2

Hmm, I do really think there is a very wrong intuition here. I think by-default in most situations the return to doubling a specific resource should be modeled as logarithmic (i.e. the first doubling is as valuable as the second doubling). I think in this model, it is very rare that doubling a thing along any specific dimension produces twice the value. I think the value of marginally more people in EA should likely also be modeled as having logarithmic returns (or I might argue worse than logarithmic returns, but I think logarithmic is the right prior).

I think you will get estimates wrong by many orders of magnitude if you do reasoning of the type "if I just double this resource that will double the whole value of the event", unless you have a strong argument for network effects.

Linch2y4

I wonder how much of your intuition comes from thinking that marginal (ex ante) impact of marginal EAG attendees is much lower than the existing average, vs normal logarithmic prior considerations vs how much of it comes from diseases of scale (e.g. higher population making things harder to coordinate, pressure towards conformity).

The first consideration is especially interesting to isolate, since:

I think the value of marginally more people in EA should likely also be modeled as having logarithmic returns (or I might argue worse than logarithmic returns, but I think logarithmic is the right prior

If you think doubling the quality-adjusted people in EA overall has logarithmic returns, you still get ~linear effects from doubling the output of one event or outreach project, since differential functions are locally linear.

MichaelStJules2y2

I would say the square in the number of participants is too extreme, since the average attendee probably wouldn’t meet many more people than otherwise, except for those who wouldn't have gotten to attend at all.

(EDIT, nvm this bit; I don’t know how to strike it out via mobile: Plus, because people are going to meet based on interests, if you were thinking about the number of possible meetings, I think it would be better to think about it like multiple cliques doubling in size than a single large clique doubling in size, or something more complicated.)

The first 500 (except those hiring?) probably wouldn't get much more out of it, since it at most only slightly adds to who they might have met in terms of counterfactual value, and they might even get less, since they need to compete with the new 500 over meetings with the first 500. Then the next 500 at least get to meet each other, but also the first 500, and especially the (I assume) roughly fixed number of organizations that are hiring.

The first 500 is also plausibly made up of many people who are already largely in contact with one another because they work at EA-related orgs, either the same org, or orgs working in the same area who review each other's work, strategize together or collaborate.

Around 2x seems plausible to me, but my best guess is less than 2x.

Habryka [Deactivated]2y8

Sorry, I think you could arrive at 2x for bayesian reasons (like weighing multiple models), but I just wanted to push back on the model that an event with twice as many attendees should be straightforwardly modeled as twice as valuable.

MichaelStJules2y2

I agree that it's not straightforward that a linear model is approximately correct. I do think a linear model could still be approximately correct for straightforward linear reasons, like the value being roughly proportional to the number of one-on-ones, though, and not just because you weighed multiple models together and it happened to come out to about 2x.

Kat Woods2y9

Potential way to feel better about fear of EA criticism:

Fear of criticism is indeed a huge source of this problem. One thing I've found recently that's improved my ability to weather EA criticism online is to think of the comment section not as a comment section, but as the debate section.

Then it's not that the post is being criticised, but that it's being debated, which is something I enjoy and appreciate.

Maybe this framing will help others, but open for debate on it :)

Emrik2y4

Furthermore, the quality distribution of jammie dodgers is arguably fat-tailed.^[1] If by many examples you've trained your intuition about what "good cookies" look like, you're most likely still sampling near the median part of the distribution. The very best might be very different. What you naively perceive as "lumpy"--a trait you rarely see in "good cookies" so you instead grab another one--might in fact be part of the unusual character that takes it into the very best category.^[2] After all, you should expect the extreme outliers to be different in some unusual way compared to the merely good outliers you've trained your intuitions on. I always eat the ones that don't fit in.

^{^}
Although more realistically the distribution has several peaks due to recipe variation and baker idiosyncrasies.
^{^}
Sensitivity over specificity for non-poisonous cookie-distributions! Not only because, as you say, flaws are easier to notice than hitherto-unknowable outlier winning-traits, but also because flaws are less consequential in lower-bounded distributions.

Nathan Young2y3

With this in mind, can I edit the wiki to try and make it a much better resource? I think it's basically dead and you don't lose much by risking this, but there is a lot of upside if I'm right.

I know I pester about this, but I think that this is the topic of the post. You have the chance to either permit or deny a change of norms with low downside and high upside. I suggest you take the risk.

Denis 2y1

This is a great and very insightful post (and some good comments too). Definitely time well spent reading all of this.

One tragic example of this that we see every day is homelessness. I presume I'm not the only person living in a city where there are people sleeping outside even during the winter, while the city council is spending millions and many years renovating existing buildings to make them meet exacting standards so that they can eventually house these people.

And it feels like we need someone to go in there and say "listen, living in a substandard apartment, even one that doesn't meet the requirements for fire-safety or accessibility is still far, far safer and better than living on the street when it's below freezing and there are criminals and addicts likely to attack you." But I imagine that if I'm a city bureaucrat, the reward-system facing me is that if I house 10 people and they are all happy, I get a good score, but if I house 100 and one of them suffers an accident due to a flaw in the building, I will likely get fired - an evaluation that fails to factor in what would happen to the other 90 people if they spent the winter sleeping on the street.

At the same time, they can also go too far in the other direction (where impact loss doesn't apply), for example providing hostels which offer no safety or privacy, so that many people actually choose to stay on the street.

I don't want to over-simplify, I'm not an expert on this area. Nor do I want to criticise the many people who I'm sure are doing everything they can to help, including many public servants. I just think it's one of many areas where Lizka's reasoning and approach would be very valuable, and it feels like it's not happening.

Brad West🔸2y1

I think this reasoning applies to the initial funding for new, neglected areas. It often appears like grantmakers are evaluating a project in absolute terms as to whether they think it is likely to succeed.

But exploration and discovery costs are often a pittance compared to potential impact of new ideas, and when the potential for exploitation of promising interventions is incorporated, we should definitely be more risk-seeking with the resources we deploy as a community.

The greatest EV fund distributor may very well be one with many duds and perhaps we should be wary of incentivizing funds that have a bunch of good outcomes. You hear on 80k and on many other sources that we should be risk-neutral re altruistic projects, but this neutrality depends on institutions that will enable new ideas.

Vasco Grilo🔸2y1

Thanks for writing this!

Related to minimising downside, I like 80,000 Hours' article Ways people trying to do good accidentally make things worse, and how to avoid them.