TL;DR: Someone should probably write a grant to produce a spreadsheet/dataset of
past instances of where people claimed a new technology would lead to societal
catastrophe, with variables such as “multiple people working on the tech
believed it was dangerous.”
Slightly longer TL;DR: Some AI risk skeptics are mocking people who believe AI
could threaten humanity’s existence, saying that many people in the past
predicted doom from some new tech. There is seemingly no dataset which lists and
evaluates such past instances of “tech doomers.” It seems somewhat ridiculous*
to me that nobody has grant-funded a researcher to put together a dataset with
variables such as “multiple people working on the technology thought it could be
very bad for society.”
*Low confidence: could totally change my mind
———
I have asked multiple people in the AI safety space if they were aware of any
kind of "dataset for past predictions of doom (from new technology)"? There have
been some articles and arguments floating around recently such as "Tech Panics,
Generative AI, and the Need for Regulatory Caution
[https://datainnovation.org/2023/05/tech-panics-generative-ai-and-regulatory-caution/]",
in which skeptics say we shouldn't worry about AI x-risk because there are many
past cases where people in society made overblown claims that some new
technology (e.g., bicycles, electricity) would be disastrous for society.
While I think it's right to consider the "outside view" on these kinds of
things, I think that most of these claims 1) ignore examples of where there were
legitimate reasons to fear the technology (e.g., nuclear weapons, maybe
synthetic biology?), and 2) imply the current worries about AI are about as
baseless as claims like "electricity will destroy society," whereas I would
argue that the claim "AI x-risk is >1%" stands up quite well against most
current scrutiny.
(These claims also ignore the anthropic argument/survivor bias—that if they ever
were right about doom we wouldn
A complaint about using average Brier scores
Comparing average Brier scores between people only makes sense if they have made
predictions on exactly the same questions, because making predictions on more
certain questions (such as "will there be a 9.0 earthquake in the next year?")
will tend to give you a much better Brier score than making predictions on more
uncertain questions (such as "will this coin come up head or tails?"). This is
one of those things that lots of people know but then everyone (including me)
keeps using them anyway because it's a nice simple number to look at.
To explain [https://xkcd.com/1053/]:
The Brier score for a binary prediction is the squared difference between the
predicted probability and the actual outcome (O−p)2. For a given forecast,
predicting the correct probability will give you the minimum possible Brier
score (which is what you want). But this minimum possible score varies depending
on the true probability of the event happening.
For the coin flip the true probability is 0.5, so if you make a perfect
prediction you will get a Brier score of 0.25 (=0.5∗(1−0.5)2+0.5∗(0−0.5)2). For
the earthquake question maybe the correct probability is 0.1, so the best
expected Brier score you can get is 0.09 (=0.1∗(1−0.1)2+0.9∗(0−0.9)2), and it's
only if you are really badly wrong (you think p>0.5) that you can get a score
higher than the best score you can get for the coin flip.
So if forecasters have a choice of questions to make predictions on, someone who
mainly goes for things that are pretty certain will end up with a (much!) better
average Brier score than someone who predicts things that are genuinely more
50/50. This also acts as a disincentive for predicting more uncertain things
which seems bad.
--------------------------------------------------------------------------------
We've just added Fatebook
[https://forum.effectivealtruism.org/posts/ZDo6XjmivLKGKycdw/fatebook-for-slack-track-your-forecasts-right-where-your]
(which
THE NEXT TECHNOLOGICAL REVOLUTION COULD COME THIS CENTURY AND COULD LAST LESS
THAN A DECADE
This is a quickly written note that I don't expect to have time to polish.
SUMMARY
This note aims to bound reasonable priors on the date and duration of the next
technological revolution, based primarily on the timings of (i) the rise of homo
sapiens; (ii) the Neolithic Revolution; (iii) the Industrial Revolution. In
particular, the aim is to determine how sceptical our prior should be that the
next technological revolution will take place this century and will occur very
quickly.
The main finding is that the historical track record is consistent with the next
technological revolution taking place this century and taking just a few years.
This is important because it partially undermines the claims
[https://forum.effectivealtruism.org/posts/XXLf6FmWujkxna3E6/are-we-living-at-the-most-influential-time-in-history-1]
that (i) the “most important century”
[https://www.cold-takes.com/most-important-century/] hypothesis is
overwhelmingly unlikely and (ii) the burden of evidence required to believe
otherwise is very high. It also suggests that the historical track record
doesn’t rule out a fast take-off.
I expect this note not to be particularly surprising to those familiar with
existing work on the burden of proof
[https://www.cold-takes.com/forecasting-transformative-ai-whats-the-burden-of-proof/]
for the most important century hypothesis. I thought this would be a fun little
exercise though, and it ended up pointing in a similar direction.
Caveats:
* This is based on very little data, so we should put much more weight on other
evidence than this prior
* I don’t think this is problematic for arguing that the burden of evidence
required to think a technological revolution this century is likely is not
that high
* But these priors probably aren’t actually useful for forecasting – they
should be washed out by other evidence
* My calculations use the
Metaculus is excited to announce the winners of the inaugural Keep Virginia Safe
Tournament [https://www.metaculus.com/tournament/vdh/]! This first-of-its kind
collaboration with the University of Virginia (UVA) Biocomplexity Institute
[https://biocomplexity.virginia.edu/] and the Virginia Department of Health
[https://www.vdh.virginia.gov/] (VDH) delivered forecasting and modeling
resources to public health professionals and public policy experts as they have
navigated critical decisions on COVID-19.
Congratulations to the top 3 prize winners!
1. Sergio [https://www.metaculus.com/accounts/profile/115725/]
2. 2e10e122 [https://www.metaculus.com/accounts/profile/103600/]
3. mattvdm [https://www.metaculus.com/accounts/profile/105906/]
Thank you to forecasting community! Your predictions were integrated into VDH
planning sessions and were shared with local health department staff, statewide
epidemiologists, and even with the Virginia Governor’s office.
For more details on the tournament outcomes, visit the project summary
[https://www.metaculus.com/notebooks/11162/the-keep-virginia-safe-tournament-202122-project-summary/].
Our successful partnership with UVA and VDH continues through the Keep Virginia
Safe II Tournament
[https://www.metaculus.com/tournament/keep-virginia-safe-ii/], where Metaculus
forecasts continue to provide valuable information
[https://www.metaculus.com/questions/14918/keep-virginia-safe-ii-rd2--forecast-impacts/].
Join to help protect Virginians and compete for $20,000 in prizes.
Find more information about the Keep Virginia Safe Tournament, including the
complete leaderboard, here [https://www.metaculus.com/tournament/vdh/].
Hi all!
Nice to see that there is now a sub-forum dedicated to Forecasting, this seems
like a good place to ask what might be a silly question.
I am doing some work on integrating forecasting with government decision making.
There are several roadblocks to this, but one of them is generating good
questions (See Rigor-Relevance trade-off
[https://goodjudgment.com/question_clusters/]among other things).
One way to avoid this might be to simple ask questions about the targets the
government has already set for itself, a lot of these are formulated in a
SMART [1] way and are thus pretty forecastable. Forecasts on whether the
government will reach its target also seem like they will be immediately
actionable for decision makers. This seemed like a decent strategy to me, but I
think I have not seen them mentioned very often. So my question is simple: Is
there some sort of major problem here I am overlooking?
The one major problem I could think of is that there might be an incentive for a
sort of circular reasoning: If forecasters in aggregate think that the
government might not be on its way to achieve a certain target then the gov
might announce new policy to remedy the situation. Smart Forecasters might see
this coming and start their initial forecast higher.
I think you can balance this by having forecasters forecast on intermediate
targets as well. For example: Most countries have international obligations to
reduce their CO2 emissions by X% by 2030, instead of just forecasting the 2030
target you could forecasts on all the intermediate years as well.
1. ^
SMART stands for: Specific, Measurable, Assignable, Realistic, Time-related
- See https://en.wikipedia.org/wiki/SMART_criteria
[https://en.wikipedia.org/wiki/SMART_criteria]
On January 6, 2022, at 4pm GMT, I am going to host a gather town meetup
[https://app.gather.town/app/aPVfK3G76UukgiHx/lesswrong-campus] to go through
Scott Alexander's Prediction Competition
[https://astralcodexten.substack.com/p/2023-prediction-contest] on Blind Mode
which means you only spend max 5 minutes on each question.
Because of that, and also possibly because these are the rules (I'm finding
out), we likely won't collaborate (though if the rules ok it, maybe we do!), but
if you've been wanting to enter and haven't yet made time, come, and we'll set
some pomodoros and have a good time!
Event link here:
https://forum.effectivealtruism.org/events/wENgADx63Cs86b6A2/enter-scott-alexander-s-prediction-competition
[https://forum.effectivealtruism.org/events/wENgADx63Cs86b6A2/enter-scott-alexander-s-prediction-competition]
For a long time I found this surprisingly nonintuitive, so I made a spreadsheet
that did it, which then expanded into some other things.
* Spreadsheet
[https://docs.google.com/spreadsheets/d/1GqdutJuvVDUJVIPLvcwe5a2fuFcgAC2jITV1LlxrjKM/edit#gid=0]
here
[https://docs.google.com/spreadsheets/d/1GqdutJuvVDUJVIPLvcwe5a2fuFcgAC2jITV1LlxrjKM/edit#gid=0],
which has four tabs based on different views on how best to pick the fair
place to bet where you and someone else disagree. (The fourth tab I didn't
make at all, it was added by someone (Luke Sabor) who was passionate about
the standard deviation method!)
* People have different beliefs / intuitions about what's fair!
* An alternative to the mean probability would be to use the product of the
odds ratios.
Then if one person thinks .9 and the other .99, the "fair bet" will have
implied probability more than .945.
* The problem with using Geometric mean can be highlighted if player 1
estimates 0.99 and player 2 estimates 0.01.
This would actually lead player 2 to contribute ~90% of the bet for an EV
of 0.09, while player 1 contributes ~10% for an EV of 0.89. I don't like
that bet. In this case, mean prob and Z-score mean both agree at 50%
contribution and equal EVs.
* "The tradeoff here is that using Mean Prob gives equal expected values
(see underlined bit), but I don't feel it accurately reflects "put your
money where your mouth is". If you're 100 times more confident than the
other player, you should be willing to put up 100 times more money. In
the Mean prob case, me being 100 times more confident only leads me to
put up 20 times the amount of money, even though expected values are more
equal."
* Then I ended up making an explainer video
[https://www.youtube.com/watch?v=KOQ7OugP-Kc]because I was excited about it
Other spreadsheets I've seen in the space:
* Brier score bet
I've heard a variety of takes on this, ranging from "people/decision-makers just
don't use forecasting/prediction markets when they should," to "the main issue
is that it's hard to come up with (and operationalize) useful questions," to
"forecasting methods (including aggregation, etc.) and platforms are just subpar
right now; improving them is the main priority." I'd be interested in what
people think.
Of course, there could also be a meta-take like "this is not the right question"
— I'd be interested in discussing that, too.