tl;dr: Dan Luu has a detailed post where he tracks in detail past predictions and argues that contra Karnofsky, Arb, etc, the track record of futurists is overall quite bad. Relevantly to this audience, he further argues that this is evidence against the validity of current longtermist efforts in long-range predictions.

(I have not finished reading the post). 

______

I've been reading a lot of predictions from people who are looking to understand what problems humanity will face 10-50 years out (and sometimes longer) in order to work in areas that will be instrumental for the future and wondering how accurate these predictions of the future are. The timeframe of predictions that are so far out means that only a tiny fraction of people making those kinds of predictions today have a track record so, if we want to evaluate which predictions are plausible, we need to look at something other than track record.

The idea behind the approach of this post was to look at predictions from an independently chosen set of predictors (Wikipedia's list of well-known futurists1) whose predictions are old enough to evaluate in order to understand which prediction techniques worked and which ones didn't work, allowing us to then (mostly in a future post) evaluate the plausibility of predictions that use similar methodologies.

Unfortunately, every single predictor from the independently chosen set had a poor record and, on spot checking some predictions from other futurists, it appears that futurists often have a fairly poor track record of predictions so, in order to contrast techniques that worked with techniques that I didn't, I sourced predictors that have a decent track record from my memory, an non-independent source which introduces quite a few potential biases.

Something that gives me more confidence than I'd otherwise have is that I avoided reading independent evaluations of prediction methodologies until after I did the evaluations for this post and wrote 98% of the post and, on reading other people's evaluations, I found that I generally agreed with Tetlock's "Superforecasting" on what worked and what didn't work despite using a wildly different data set.

In particular, people who were into "big ideas" who use a few big hammers on every prediction combined with a cocktail party idea level of understanding of the particular subject to explain why a prediction about the subject would fall to the big hammer generally fared poorly, whether or not their favored big ideas were correct. Some examples of "big ideas" would be "environmental doomsday is coming and hyperconservation will pervade everything", "economic growth will create near-infinite wealth (soon)", "Moore's law is supremely important", "quantum mechanics is supremely important", etc. Another common trait of poor predictors is lack of anything resembling serious evaluation of past predictive errors, making improving their intuition or methods impossible (unless they do so in secret). Instead, poor predictors often pick a few predictions that were accurate or at least vaguely sounded similar to an accurate prediction and use those to sell their next generation of predictions to others.

By contrast, people who had (relatively) accurate predictions had a deep understanding of the problem and also tended to have a record of learning lessons from past predictive errors. Due to the differences in the data sets between this post and Tetlock's work, the details are quite different here. The predictors that I found to be relatively accurate had deep domain knowledge and, implicitly, had access to a huge amount of information that they filtered effectively in order to make good predictions. Tetlock was studying people who made predictions about a wide variety of areas that were, in general, outside of their areas of expertise, so what Tetlock found was that people really dug into the data and deeply understood the limitations of the data, which allowed them to make relatively accurate predictions. But, although the details of how people operated are different, at a high-level, the approach of really digging into specific knowledge was the same.

Because this post is so long, this post will contain a very short summary about each predictor followed by a moderately long summary on each predictor. Then we'll have a summary of what techniques and styles worked and what didn't work, with the full details of the prediction grading and comparisons to other evaluations of predictors in the appendix.

  • Ray Kurzweil: 7% accuracy
    • Relies on: exponential or super exponential progress that is happening must continue; predicting the future based on past trends continuing; optimistic "rounding up" of facts and interpretations of data; panacea thinking about technologies and computers; cocktail party ideas on topics being predicted
  • Jacque Fresco: predictions mostly too far into the future to judge, but seems very low for judgeable predictions
    • Relies on: panacea thinking about human nature, the scientific method, and computers; certainty that human values match Fresco's values
  • Buckminster Fuller: too few predictions to rate, but seems very low for judgeable predictions
    • Relies on: cocktail party ideas on topics being predicted to an extent that's extreme even for a futurist
  • Michio Kaku: 3% accuracy
    • Relies on: panacea thinking about "quantum", computers, and biotech; exponential progress of those
  • John Naisbitt: predictions too vague to score; mixed results in terms of big-picture accuracy, probably better than any futurist here other than Dixon, but this is not comparable to the percentages given for other predictors
    • Relies on: trend prediction based on analysis of newspapers
  • Gerard K. O'Neill: predictions mostly too far into the future to judge, but seems very low for judgeable predictions
    • Relies on: doing the opposite of what other futurists had done incorrectly, could be described as "trying to buy low and sell high" based on looking at prices that had gone up a lot recently; optimistic "rounding up" of facts and interpretations of data in areas O'Neill views as underrated; cocktail party ideas on topics being predicted
  • Patrick Dixon: 10% accuracy; also much better at "big picture" predictions than any other futurist here (but not in the same league as non-futurist predictors such as Yegge, Gates, etc.)
    • Relies on: extrapolating existing trends (but with much less optimistic "rounding up" than almost any other futurist here); exponential progress; stark divide between "second millennial thinking" and "third millennial thinking"
  • Alvin Toffler: predictions mostly too vague to score; of non-vague predictions, Toffler had an incredible knack for naming a trend as very important and likely to continue right when it was about to stop
    • Relies on: exponential progress that is happening must continue; a medley of cocktail party ideas inspired by speculation about what exponential progress will bring
  • Steve Yegge: 50% accuracy; general vision of the future generally quite accurate
    • Relies on: deep domain knowledge, font of information flowing into Amazon and Google; looking at what's trending
  • Bryan Caplan: 100% accuracy
    • Relies on: taking the "other side" of bad bets/predictions people make and mostly relying on making very conservative predictions
  • Bill Gates / Nathan Myhrvold / old MS leadership: timeframe of predictions too vague to score, but uncanny accuracy on a vision of the future as well as the relative importance of various technologies
    • Relies on: deep domain knowledge, discussions between many people with deep domain knowledge, font of information flowing into Microsoft

[...]

61

7 comments, sorted by Click to highlight new comments since: Today at 3:40 PM
New Comment

EDIT: This comment accumulated a lot of disagreement karma. If anyone would like to offer their reasons for disagreement, I might learn something. I wonder if the disagreements are with my choices of examples, or the substance of my prediction model, or something else.

Do you think a futurist's job is to:

  1. track trends and extrapolate them
  2. paint a positive picture of the future
  3. create a future that suits their interests 

Longtermists devote some of their attention to a positive vision of the future, not a prediction of how things are likely to go.  I question the vision's likelihood and appeal.

A prediction like "environmental doomsday is coming and hyperconservation will pervade everything" assumes that rationality, ethics, and mutual accommodation will determine policy responses and public sentiment. The global scene of resource management looks quite different.

  1. If the goal were to paint an accurate vision of the future, catastrophe would be a good choice, assuming current approaches decide the future as well. 
  2. If the goal were to offer a positive vision, hyperconservation could be part of it, because that vision includes reliance on positive values. 
  3. If the goal were to create a future to suit the futurist, the  futurist would offer predictions that include their investments (for example, in fusion power or desalination or CCS) and distort the implications. At least, that's the typical scenario. Futurists with money tied to their predictions have an interest in creating self-serving predictions.

When you write

Instead, poor predictors often pick a few predictions that were accurate or at least vaguely sounded similar to an accurate prediction and use those to sell their next generation of predictions to others.

I wonder about judging predictions by motive. If the prediction really is for sale, then I should get my money's worth. If I were to buy a prediction from someone, I would throw out their subjective probability estimates and ask for their ontology and the information that they matched to it.

What you do with this sort of investigation is explore scenarios and gain useful knowledge. You develop predictive information as you revise your ontology, the web of beliefs about entities and their relationships that you match to the world to understand what is going on and what is going to happen. A domain expert or a good predictor offers ontology information, or real world information, that you can add to your own.

A good predictor has good research and critical thinking skills. They use those skills to decide plausibility of ontology elements and credibility of sources. They gather information and become a bit of a domain expert. In addition, they develop means to judge relevant information, so that they can make a prediction with a relatively sparse ontology that they curate.  

In my red-team submission, I said that decisions about relevance are based on beliefs. How you form your beliefs, constrain them, and add or remove them, is what you do to curate your ontology. Given a small set of the right information a good predictor has a specific ontology that lets them identify entities in the real world and match them to ontology relationships and thereby predict an outcome implied by their ontology. 

One implication of my model summarized here is that a predictor in a domain is only as good as:

  •  their ontology for that domain
  • the information to which they have access
  • the question they answer with their prediction

If the ontology describes entities and relationships relevant to the question, then you can get an answer quickly. Some of your futurists might have lacked in one or more of those areas, including the questions that they attempted to answer.

A rephrase of a question can get a good answer from a reliable predictor when the original question returned a prediction that mismatched the eventual outcome. 

It might be helpful to look at:

  1. qualifying questions
    A less qualified question is something like, "If Greenland were to melt away someday, would that cause civilizational collapse?", and given its generality, the answer is "No."
    A more qualified question is something like: "If Greenland were to melt entirely within 15 years starting in 2032, would that cause civilizational collapse?" and the answer is "Yes." (I believe)
  2. answering with certain alternatives 
    This sort of answer limits the possibilities. So a question like: "Will acidification of the ocean reduce plankton populations?" has an answer "Acidification of the ocean will reduce plankton populations or alter their species composition."
  3. answering with plausible possibilities 
    This sort of answer offers new possibilities. So a question like:  "Could Greenland melt down earlier than anticipated?" has an answer "The meandering jet stream could park weather systems over Greenland that alternate heat waves with heavy rains [not snow] lasting for weeks on end." and other answers as well.
  4. answering with statistical data
    This sort of answer is not the typical bayesian answer, but I think you understand it well, probably better than I do. So a question like, "Is medication X effective for treating disease Y?" has an answer based on clinical trial data "Medication X removed all symptoms of disease Y in 85% of patients treated with X."
  5. answering with "unknown"
    This sort of answer looks the same in the every case, "The answer is unknown." but tells you something specific that a Bayesian probability would not. An answer based on Bayesian probabilities offers a blind guess and gives false confidence. "The answer is unknown." tells you that you do not have any information with which to answer the question or are asking the wrong question. 
    So a question like "Are their significant impacts of micro-plastics and pollutants on fish populations over the next 10 years?" answered with "The answer is unknown." tells you that you need more information from somewhere. If you only have your own ontology, you can follow up with a different question, perhaps by asking a better-qualified question, or asking for alternatives, or for some plausible possibilities, or for some statistical data. For example, "Are there no impacts of micro-plastics and pollutants on fish populations?" has the answer "No." and "What do micro-plastics and pollutants do to fish exposed to them?" offers more information that could lead to a question that gets you somewhere in making predictions.

There's a few other things worth considering about futurist predictions (for example, do any ever qualify their answer with "...or something else will happen.") when they are asked about what will happen. 

Anyway, best of luck with your analysis of these futurists and their failures, Linch.

Anyway, best of luck with your analysis of these futurists and their failures, Linch.

To be clear, this is not written by me but by Dan Luu. Sorry if my post was unclear!

OK, I got it, no problem.  

(I haven't read the post.)

(I don't have a stance on how good past (futurists') predictions have been.)

I  think we should update on how to think about serious, careful analyses like Bioanchors—or on other reasons-for-belief about the future, like scaling laws—by only a trivial amount based on the track record of past predictions. Past predictions being pretty terrible seems to me to be consistent with me being able to discern whether a prediction is reasonable, at least when I (seem to) have lots of relevant knowledge/context. If others think we should update substantially based on past futurists, I'd be excited to learn why.

Appendix: other evaluations

in the blogpost discusses Dan Luu's evaluations compared to Arb's evaluations etc and why he thinks EA/LTist work is closer to that of past futurists  than to e.g. superforecasters. I was originally planning to quote it but a) it's very long and b) I couldn't quickly come up with a good summary.

I think Luu's article is good, and worth reading for people involved in thinking about these issues. A point raised in the appendix in the linked post (not here):

More generally, the whole methodology is backwards — if you have deep knowledge of a topic, then it can be valuable to put a number down to convey the certainty of your knowledge to other people, and if you don't have deep knowledge but are trying to understand an area, then it can be valuable to state your uncertainties so that you know when you're just guessing. But here, we have a fairly confidently stated estimate (nostalgebraist notes that Karnofsky says "Bio Anchors estimates a >10% chance of transformative AI by 2036, a ~50% chance by 2055, and an ~80% chance by 2100.") that's based off of a model that's nonsense that relies on a variable that's picked out of thin air. Naming a high probability after the fact and then naming a lower number and saying that's conservative when it's based on this kind of modeling is just window dressing. 

I think the article Luu is discussing didn't have a very credible approach to partitioning uncertainty over the possibilities in question, which might be what Luu is talking about here.

However, if Luu really means it when he says that the point of writing down probabilities when you don't have deep topic knowledge is "so you know when you're just guessing", then I disagree. My view is (I think) somewhat like ET Jaynes: maximising entropy really does help you partition your uncertainty. Furthermore, if the outcome of entropy maximisation is not credible it may be because you failed to include some things you already knew.

For example, Tetlock compares political forecasts to "chance", but "chance" is really just another name for the "maximise entropy unconditionally" strategy. Another way we could state his finding is that, over 10+ year timescales, this strategy is hard to beat.

I've seen examples where maximising entropy can lead to high probabilities of crazy-seeming things. However,  I wonder how often this outcome results from failing to account for some prior knowledge that we do have but haven't necessarily articulated yet.

I have often been fascinated watching young children expressing great confidence, even though, from my adult point of view, they have no basis for confidence other than their internal sense (i.e. they don't understand the domain about which they are speaking in any adult way).

It is also my experience and belief that adults carry this same unwarranted sense of confidence in their opinions. Just to pick one example, 73% of Americans (including 80% of American men) believe they are better than average drivers.

Our culture selects for confidence, especially in men. This leads to overconfidence, especially among successful men. Successful people have often made at least one successful prediction (which may have led to their success), which may have simply been luck, but which reinforces their self-confidence.

I therefore strongly agree that longtermist predictions carry huge uncertainty despite expressions of confidence by those promoting them. I argue that in evaluating effective action, we should lower our expected value of any intervention based on how far in the future we are predicting, with a discount rate of 3.87% annually [/joke].