Hide table of contents

Or: "Underrated/overrated" discourse is itself overrated.

BLUF: "X is overrated", "Y is neglected", "Z is a weaker argument than people think", are all species of second-order evaluations: we are not directly offering an assessment of X, Y, or Z, but do so indirectly by suggesting another assessment, offered by someone else, needs correcting up or down.

I recommend everyone cut this habit down ~90% in aggregate for topics they deem important, replacing the great majority of second-order evaluations with first-order evaluations. Rather than saying whether you think X is over/under rated (etc.) just try and say how good you think X is.

The perils of second-order evaluation

Suppose I say "I think forecasting is underrated". Presumably I mean something like:

  1.  I think forecasting should be rated this highly (e.g. 8/10 or whatever)
  2. I think others rate forecasting lower than this (e.g. 5/10 on average or whatever)
  3. So I think others are not rating forecasting highly enough.

Yet whether "Forecasting is overrated" is true or not depends on more than just "how good is forecasting?" It is confounded by questions of which 'others' I have in mind, and what their views actually are. E.g.:

  • Maybe you disagree with me - you think forecasting is overrated - but it turns out we basically agree on how good forecasting is. Our apparent disagreement arises because you happen to hang out in more pro-forecasting environments than I do.
  • Or maybe we hang out in similar circles, but we disagree in how to assess the prevailing vibes. We basically agree on how good forecasting is, but differ on what our mutual friends tend to really think about it.
  • (Obviously, you could also get specious agreement of two-wrongs-make-a-right variety: you agree with me forecasting is underrated despite having a much lower opinion of it than I do, because you assess third parties having an even lower opinion still)

These are confounders as they confuse the issue we (usually) care about: how good or bad forecasting is, not the inaccuracy of others nor in which direction they err re. how good they think forecasting is.  

One can cut through this murk by just assessing the substantive issue directly. I offer my take on how good forecasting is: if folks agree with me, it seems people generally weren't over or under- rating forecasting after all. If folks disagree, we can figure out - in the course of figuring out how good forecasting is - whether one of us is over/under rating it versus the balance of reason, not versus some poorly scribed subset of prevailing opinion. No phantom third parties to the conversation are needed - or helpful to - this exercise.

In praise of (kind-of) objectivity, precision, and concreteness

This is easier said than done. In the forecasting illustration above, I stipulated 'marks out of ten' as an assessment of the 'true value'. This is still vague: if I say forecasting is '8/10', that could mean a wide variety of things - including basically agreeing with you despite you giving a different number to me. What makes something 8/10 versus 7/10 here?

It is still a step in the right direction. Although my '8/10' might be essentially the same as your '7/10', there probably some substantive difference between 8/10 and 5/10, or 4/10 and 6/10. It is still better than second order evaluation, which adds another source of vagueness: although saying for myself forecasting is X/10 is tricky, it is still harder to do this exercise on someone else's (or everyone else's) behalf.

And we need not stop there. Rather than some singular measure like 'marks out of 10' for 'forecasting' as a whole, maybe we have some specific evalution or recommendation in mind. Perhaps: "Most members of the EA community should have a Metaculus or Good Judgement account they forecast on regularly", or "Forecasting interventions are the best opportunities in the improving institutional decision-making cause area", or "Forecasting should pay well enough that skilled practitioners can realistically 'go pro', vs. it remaining universally an amateur sport". Or whatever else.

We thus approach substantive propositions (or proposals), and can avoid a mire of a purely verbal disagreement - or vaguely adversarial vibing.

Caveats

(Tl;dr: I'm right.)

Sometimes things aren't that ambiguous

The risk I highlight of 'Alice thinks X is overrated, Bob thinks it is underrated - but they basically agree on X, but disagree on what other people think about it' can sometimes be remote. One example is if someone has taken the trouble to clearly and precisely spell out where they stand themselves. Just saying "I'd take the over/under on what they think" could be poor epistemic sportsmanship (all too easy to criticise something specific whilst sheltering in generalities yourself), and could do to be more precise (how much over? etc.) but at least there is an actual difference, and you can be reliably placed to a region on the number line.

Another example is where you are really sure you are an outlier vs. ~ everyone else: you rate something so highly or lowly that ~ everyone else - whoever they are - is under/overrating it by your lights. This will typically be reserved for ones hottest, most extreme, and iconoclastic takes. In principle, this should be rare. In practice, it can be the prelude to verbal clickbait: "looking after your kids is overrated" better be elaborated with something at least as spicy as Caplan's views on parenting, rather than some milquetoast climbdown along the lines of 'parents should take care of themselves too' or whatever.

Even here, trying to say how much can be clearer if your view really is 'a hell of a lot'. "Buffy the Vampire Slayer is criminally underrated" could merely mean I place it a cut above other ~naughties TV serials. Yet if I really think things like, "Season 5 of Buffy alone places it on the highest summits of artistic achievement, and the work as a whole makes a similar contribution to television as Beethoven's Grosse Fuge does to classical music" I should say so, such that listeners are clear in which ballpark I am in, and how far I am departing from common sense.

Updates and pricing in

Overrated/underrated can have a different goal than offering an overall assessment. It could instead be a means of introducing a new argument for or against X. E.g. perhaps what I could mean by 'forecasting is underrated' is something like "I have found a new consideration in favour of forecasting, so folks - who are not aware of it yet - need to update upwards from wherever they were beforehand."

This is better, but still not great. (E.g.) "X is underrated because R" at least gives a locus for discussion (R? ¬R?), but second-order considerations can still confound. Although R may be novel to the speaker, others may at least be dimly aware of it, or some R* nearby to it, so perhaps they have already somewhat 'priced in' R for the all things considered assessment. "I think the strength of R pro/con X is under/overestimated by others" has the familiar problems outlined above.

Saying how much - the now familiar remedy - remains effective. (E.g.) "I think R drops the value of X by 5%/50%/99%" or whatever clearly signals the strength of consideration you are assigning to R, and sidesteps issues of trying to assess whether someone else (in the conversation or not) are aware of or are appropriately incorporating R into their deliberations.

Cadenza

As before, this greater precision is not a free lunch: it takes both more space on the page to write and more time in the brain to think through. Also as before, there are times when this extra effort is a waste. If I assert "Taylor Swift is overrated" to my sister, and she asserts "Bach is overrated [sic][1]" in turn, neither the subject matter warrants - nor the conversational purpose well-served by - a careful pseudo-quantitative quasi-objective disquisition into the musical merit of each. Low-res 'Less/more than someone thinks' remarks are also fine for a bunch of other circumstances. Usually unimportant ones.

Yet also as before, sometimes there is a real matter which really matters, sometimes we want our words to amount to substantial work not idle talk, and sometimes we at least aspire to be serious people striving to say something serious about something serious. For such Xs, it is rare for there to be disagreement about whether a given issue is relevant to X, ditto whether its direction is 'pro' or 'con' X, but rather its magnitude: how much it counts 'pro' or 'con' X, and so where the overall balance of reason lies re. X all things considered, where all the things to be considered are all various degrees of 'kinda, but...', which need to be all weighed together.[2]

In these cases that count, something like counting needs to be attempted in natural language, despite its inadequacy for the task. Yet although (e.g.) "8/10", "maybe this cuts 20% off the overall value of X" (etc.) remain imperfect, more/less statements versus some usually vague comparator is even worse. Simply put: underrated/overrated is a peregrination, not a prolegomenon, for the project of proper precisification.[3]

Reality is concrete; its machinations, exact. When it is important to talk about it, our words should try to be the same.

  1. ^

    [sic]

  2. ^

    Cf. my previously expressed (and still maintained) allergy towards 'crux' 'cruxy', etc.

  3. ^

    Peccavi

100

7
3

Reactions

7
3

More posts like this

Comments10


Sorted by Click to highlight new comments since:

It's somewhat striking that you frame your top-level advice as a comparative:

I recommend everyone cut this habit down ~90% in aggregate for topics they deem important, replacing the great majority of second-order evaluations with first-order evaluations.

People surely differ in their current behaviour, and need different adjustments. So why not simply specify what you think the optimal ratio of first- to second-order evaluations is?

My take: not infrequently, as here, comparatives are more precise than first-order evaluations.

You're calling attention to a dimension that people may not have thought about much, and certainly don't have established metrics for. If you said "people should be 9/10 on the use of first-order evaluations and 3/10 on the use of second-order evaluations", you don't know how people will interpret that. It's well within the realm of possibility that some readers will nod along and say "yes that's how I do things already", even when you would assess their actions quite differently.

By using a comparative, you get the benefit of a common reference point -- how much things are already being done. People will have a sense of this even if they don't know how to measure it. You get to specify that people should cut it down by 90%, which is concrete and can surface disagreements.

I do happen to think you're quite wrong in suggesting cutting it down by ~90%, although I agree with the directional nudge vs current practice. I guess that at the moment second-order comparisons comprise the large majority of communication, and it would be better if they comprised a slightly smaller majority -- perhaps tripling the amount of use first-order evaluations get.

Hi Owen,

My interpretation is that Gregory is arguing for greater precision in comparative statements, rather than arguing against comparisons in general.

I feel that often saying X is overrated/underrated is a lazy way for people (including me sometimes) to increase/decrease X's status without making the effort to state concretely their position on X (which opens them up to more criticism and might require introspection and more careful reasoning rather than purely evaluating vibes) 

As an example, could you give X/10 ratings to the idea of relative and absolute ratings?

I am glad somebody wrote this post. I often have the inclination to write posts like these, but I feel like advice like this is sometimes good and sometimes bad and it would be disingenuous for me to stake out a claim in any direction. Nonetheless, I think it’s a good mental exercise to explicitly state the downsides of comparative claims and the upsides of absolute claims, and then people in the comments will (and have) assuredly explain the opposite.

Interesting take. I don't like it. 

Perhaps because I like saying overrated/underrated.

But also because overrated/underrated is a quick way to provide information. "Forecasting is underrated by the population at large" is much easier to think of than "forecasting is probably rated 4/10 by the population at large and should be rated 6/10"

Over/underrated requires about 3 mental queries, "Is it better or worse than my ingroup thinks" "Is it better or worse than my ingroup thinks?" "Am I gonna have to be clear about what I mean?"

Scoring the current and desired status of something requires about 20 queries "Is 4 fair?" "Is 5 fair" "What axis am I rating on?" "Popularity?" "If I score it a 4 will people think I'm crazy?"...

Like in some sense your right that % forecasts are more useful than "More likely/less likely" and sizes are better than "bigger smaller" but when dealing with intangibles like status I think it's pretty costly to calculate some status number, so I do the cheaper thing.

 

Also would you prefer people used over/underrated less or would you prefer the people who use over/underrated spoke less? Because I would guess that some chunk of those 50ish karma are from people who don't like the vibe rather than some epistemic thing. And if that's the case, I think we should have a different discussion.

I guess I think that might come from a frustration around jargon or rationalists in general. And I'm pretty happy to try and broaden my answer from over/underrated - just as I would if someone asked me how big a star was and I said "bigger than an elephant". But it's worth noting it's a bandwidth thing and often used because giving exact sizes in status is hard. Perhaps we shouldn't have numbers and words for it, but we don't.

I agree that "underrated/overrated" or similar directional commentary is often a better way to convey information. Not least because the directional comment sometimes is information (e.g. there's a source of systematic error which biases the results) whereas an attempt to estimate a magnitude of the adjustment necessary is just a guess. And using vague verbal qualifiers (x is very large, the error is minimal) instead of a made-up figure much more accurately conveys that something is opinion or methodological critique rather than new data.

Using an actual figure where it exists is obviously good epistemics, but use of guesstimates risks anchoring truth-seekers to your guesses. Setting the expectation that anyone who participates to supply numbers is worse, as it sets a high bar to commentary (really I should be able to say a field is "neglected" without specifying how much funding it deserves and how it should be spent!) and can be used to insulate from criticism. "If you think I've inflated my outlying estimate you should tell me exactly how much you think each figure should be so I can attack your lack of evidence instead" seems like a more problematic rhetorical technique than understating just how extreme your enthusiasm for something is in order to help reach consensus. 

Also would you prefer people used over/underrated less or would you prefer the people who use over/underrated spoke less? Because I would guess that some chunk of those 50ish karma are from people who don't like the vibe rather than some epistemic thing. And if that's the case, I think we should have a different discussion.

I guess I think that might come from a frustration around jargon or rationalists in general

As an outsider (other outside perspectives exist!) I'd say there's probably more frustration with rationalists/EAs often appearing to like the vibe of artificially precise numerical claims about things which are weakly evidenced or completely subjective...

Reality is concrete but the artistic merit of Buffy or moral weight for livestock isn't (even if it is an occasionally useful concept for modelling/ranking priorities), and I'm not sure "people should rate forecasting at 8/10" actually conveys any information at all. The illusion of precision is overrated ;-)

(I am pretty unsure I understood this correctly, so this comment might be a mistake, posting anyway as it might be clarifying for others as well if so)

It seems to me that there are two dimensions here:

(a) whether or not a statement is comparative (b) whether or not a statement is confounded by an unobservable

Comparative statements can be confounded when the comparison standard is not made explict, which seems to be your main critique. If I understand you correctly, you see the main response in non-comparative first order evaluations.

But shouldn't, in many cases, the solution to that be better explicated and precise comparative statements (e.g. "I think forecasting is X times better than commonly assumed where my assumption of commonly assumed is based on Y?") rather than a non-comparative first-order evaluation of how good forecasting is in objective standards?

It seems to me that a big advantage of comparative statements is that (i) usually decisions require comparative statements and, if those are not available, non-comparative estimates willl then often be compared (introducing confounding in terms of whether different estimates were made with roughly comparable methods and standards) and also that (ii) many situations only allow for comparative statements and allow for more robustness on comparative grounds rather than trying to get to accurate first-order evaluations. 

E.g. it seems to me that almost all credible knowledge in longtermism comes from comparative statements where there are vast uncertainties on the absolute first-order goodness of many things, but -- relatively speaking -- much more certainty on the relative priority and, luckily, that is also what matters most when making decisions. E.g. it seems pretty impossible to estimate the absolute goodness of reducing existential risk from source X and source Y, but we can say relatively meaningful things about the priority of working on X or Y. Would getting to more precise comparisons on the level of comparative statements also be part of your suggested project here?

 

[anonymous]2
1
0

I think overrated-underrated is useful because it's trying to say whether we should be doing more or less of X on the margin. Often it's much more useful to know whether something is good on the current margin rather than on average. 

Curated and popular this week
trammell
 ·  · 25m read
 · 
Introduction When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2] In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior. There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3] Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
LewisBollard
 ·  · 6m read
 · 
> Despite the setbacks, I'm hopeful about the technology's future ---------------------------------------- It wasn’t meant to go like this. Alternative protein startups that were once soaring are now struggling. Impact investors who were once everywhere are now absent. Banks that confidently predicted 31% annual growth (UBS) and a 2030 global market worth $88-263B (Credit Suisse) have quietly taken down their predictions. This sucks. For many founders and staff this wasn’t just a job, but a calling — an opportunity to work toward a world free of factory farming. For many investors, it wasn’t just an investment, but a bet on a better future. It’s easy to feel frustrated, disillusioned, and even hopeless. It’s also wrong. There’s still plenty of hope for alternative proteins — just on a longer timeline than the unrealistic ones that were once touted. Here are three trends I’m particularly excited about. Better products People are eating less plant-based meat for many reasons, but the simplest one may just be that they don’t like how they taste. “Taste/texture” was the top reason chosen by Brits for reducing their plant-based meat consumption in a recent survey by Bryant Research. US consumers most disliked the “consistency and texture” of plant-based foods in a survey of shoppers at retailer Kroger.  They’ve got a point. In 2018-21, every food giant, meat company, and two-person startup rushed new products to market with minimal product testing. Indeed, the meat companies’ plant-based offerings were bad enough to inspire conspiracy theories that this was a case of the car companies buying up the streetcars.  Consumers noticed. The Bryant Research survey found that two thirds of Brits agreed with the statement “some plant based meat products or brands taste much worse than others.” In a 2021 taste test, 100 consumers rated all five brands of plant-based nuggets as much worse than chicken-based nuggets on taste, texture, and “overall liking.” One silver lining
 ·  · 1m read
 ·