Forecasting & Estimation

New Discussion New Post
Sorted by Magic (New & Upvoted), Posts Expanded

I'm interested in bringing forecasting techniques to my company, like Danny Hernandez described in HBR. If anyone has experience doing this, how did it play out and what were the key obstacles?

Epoch is a research group forecasting the development of transformative Artificial Intelligence. We try to understand how progress in AI happens and what economic impacts we might see from advanced AI.

We want to enable better governance during...

Summary

This document seeks to outline why I feel uneasy about high existential risk estimates from AGI (e.g., 80% doom by 2070). When I try to verbalize this, I view considerations like 

  • selection effects at the level of which
...
1sphor4d
I'm sorry, I'm not sure I understood correctly. Are you saying you agree there are selection effects, but you object to how you think Nuno and I are modeling MIRI and the processes generating MIRI-style models on AGI? 
2Mauricio4d
Fair! Sorry for the slow reply, I missed the comment notification earlier. I could have been clearer in what I was trying to point at with my comment. I didn't mean to fault you for not meeting an (unmade) challenge to list all your assumptions--I agree that would be unreasonable. Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption--enough that the argument alone (along with other, largely uncontroversial assumptions) doesn't make it "quite easy to reach extremely dire forecasts about AGI." (Though I was thinking more about 90%+ forecasts.) (That assumption--i.e. the main claims in the 3rd paragraph of your response--seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it [https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer#Disagreements] and researchers doing prosaic AI safety work.)

The following report presents Metaculus community forecasts and analyses on key measures of human progress developed by Our World in Data for the Forecasting Our World in Data Tournament. Download the full report here.

Executive Summary

In 2022, Metaculus...

1Matt Goodman15h
What would that add? I think that would add speculation on to what is already speculation, and I'd think only the passing of time would be able to give feedback on whether the predictions turn out to be true.  I guess it could give more information, if you sought out different people for the meta-predictions, than had made the original predictions. But then I'm not sure why you wouldn't just have these new people do the original prediction questions directly. 

Summary

  1. Using a dataset of 124 machine learning (ML) systems published between 2009 and 2022,[1] I estimate that the cost of compute in US dollars for the final training run of ML systems has grown by 0.49 orders of magnitude (OOM) per
...
8Ben Cottier2d
Also, I'd like to be clear about what it means to "keep up". I expect those lower-resourced types of actors won't keep up in the sense that they won't be the first to advance state-of-the-art on the most important AI capabilities. But the cost of a given ML system falls over time and that is a big driver of how AI capabilities diffuse. 
8Ben Cottier2d
Thanks Haydn! I just want to add caution on taking the extrapolations too seriously. The linear extrapolation is not my all-things-considered view of what is going to happen, and the shaded region is just the uncertainty in the linear regression trendline rather than my subjective uncertainty  in the estimates. I agree with you inasmuch as I expect the initial costs of state-of-the-art models to get well out of reach for actors other than big tech (if we include labs with massive investment like OpenAI), and states, by 2030. I still have significant uncertainty about this though. Plausibly, the biggest players in AI won't be willing to spend $100M just on the computation for a final training run as soon as 2030. We still don't have a great understanding of what hardware and software progress will be like in future (though Epoch has [https://epochai.org/blog/predicting-gpu-performance] worked [https://epochai.org/blog/revisiting-algorithmic-progress] on this). Maybe efficiency improves faster than expected and/or there just won't be worthwhile gains from spending so much in order to compete. 

As part of my work for Open Philanthropy I’ve written a draft report on AI takeoff speeds, the question of how quickly AI capabilities might improve as we approach and surpass human-level AI. Will human-level AI be...

2jacobpfau11d
My deeply concerning impression is that OpenPhil (and the average funder) has timelines 2-3x longer than the median safety researcher. Daniel [https://forum.effectivealtruism.org/posts/3vDarp6adLPBTux5g/what-a-compute-centric-framework-says-about-ai-takeoff?commentId=ZPmPeZeMmTfEa8LEZ] has his AGI training requirements set to 3e29, and I believe the 15th-85th percentiles among safety researchers would span 1e31 +/- 2 OOMs. On that view,  Tom's default values are off in the tails. My suspicion is that funders write off this discrepancy, if noticed, as inside-view bias i.e. thinking safety researchers self-select for scaling optimism. My,  admittedly very crude, mental model of an OpenPhil funder makes two further mistakes in this vein: (1) Mistakenly taking the Cotra report's biological anchors weighting as a justified default setting of parameters rather than an arbitrary choice which should be updated given recent evidence. (2) Far overweighting the semi-informative priors report [https://www.openphilanthropy.org/research/report-on-semi-informative-priors/] despite semi-informative priors abjectly failing to have predicted Turing-test level AI progress. Semi-informative priors apply to large-scale engineering efforts which for the AI domain has meant AGI and the Turing test. Insofar as funders admit that the engineering challenges involved in passing the Turing test have been solved, they should discard semi-informative priors as failing to be predictive of AI progress.  To be clear, I see my empirical claim about disagreement between the funding and safety communities as most important -- independently of my diagnosis of this disagreement. If this empirical claim is true, OpenPhil should investigate cruxes separating them from safety researchers, and at least allocate some [https://www.lesswrong.com/s/4hmf7rdfuXDJkxhfg/p/DvRJhGzxP4oY6Hhmu] of their budget on the hypothesis that the safety community is correct. 
  1. Publicly place a large bet you'll do X at time T
  2. Through a sock puppet/friend place a smaller bet on you doing ¬X at time T+1
  3. Allow the first bet to shift the odds against the second bet
  4. Do X
...
1Bob Jacobs16h
You can easily manipulate the markets without getting caught, which makes a persons promise to not manipulate the market impossible to check and therefore impossible to trust. And it's not just markets about yourself that are easy to manipulate, it's markets about everything you can change. So if there is a market about if the bin in my street will be tipped over, that market isn't about me, but it's trivially easy to manipulate. As humanity becomes more powerful the things we can change become larger and larger and the things we can use prediction markets for become smaller and smaller. The calibrating solution wouldn't work because the premises of perfect information and logical omniscience only hold in economists imagination. See my other comment for a concrete example.

by Trevor ChowBasil Halperin, and J. Zachary Mazlish

 

In this post, we point out that short AI timelines would cause real interest rates to be high, and would do so under expectations of either unaligned or aligned AI. However, 30- to...

11trammell10d
By the way, someone wrote this Google doc in 2019 on "Stock Market prediction of transformative technology [https://docs.google.com/document/d/1k92xc46h8P4XlD-WPKj-Hcn7wboTT_fVuVXFpSvKhNo/edit#heading=h.7hd2d0wh7pl7]". I haven't taken a look at it in years, and neither has the author, so understandably enough, they're asking to remain nameless to avoid possible embarrassment. But hopefully it's at least somewhat relevant, in case anyone's interested.
1basil.halperin10d
(Nice, thanks for sharing)
2trammell10d
Thanks for adding comments to it!

Recently I had a conversation with Eli Lifland about the AI Alignment landscape. Eli Lifland has been a forecaster at Samotsvety and has been investigating said landscape.

I’ve known Eli for the last 8 months or so, and...

9machinaut3d
I liked the transcript, much easier to skim and skip around than a video.  At different points in my life/career I would have liked the video more -- having both seems like an accessibility win. I liked that this is organized -- the table-of-contents is a great way to explore and jump around. I wish there was some summary of key points or takeaways.  It seems like that could have gone with or in place of the sections overview at the top.  It seems like a bunch of care/preparation went into having good questions, so I think here I'd have a lot of trust in the interviewer's brief. Also I think it would be more skim-able if there were clearer typographic indications of who is speaking what paragraph.  Not exactly sure how to do it on the AF site, but having each speaker highlighted in a slightly different color, or indentation, or something like that.  Right now my assumption is that there isn't a great way to do that on this site.
2Ozzie Gooen3d
Just fli - in this case, we spent some time in the beginning making a very rough outline of what would be good to talk about. Much of this is stuff Eli put forward. I've also known Eli for a while, so had a lot of context going in.
3emre kaplan3d
I'm not a relevant stakeholder as I don't currently work on AI alignment, but just in case it's useful here are my two cents. I skipped the video and merely read the transcript. If there wasn't a transcript I wouldn't bother with watching the video. I strongly prefer transcripts to audio files in general as I can skim the transcripts and search for keywords.

We summarize and compare several models and forecasts predicting when transformative AI will be developed.

Highlights

  • The review includes quantitative models, including both outside and inside view, and judgment-based forecasts by (teams of) experts.
  • While we do not necessarily endorse
...
2Daniel_Eth4d
Yeah, my point is that it's (basically) disjunctive.
4Ozzie Gooen4d
Yea, I assume the full version is impossible. But maybe there are at least some simpler statements that can be inferred? Like, "<10% of transformative AI by 2030." I'd be really curious to get a better read on what market specialists around this area (maybe select hedge fund teams around tech disruption?) would think.
4Jaime Sevilla4d
I don't think it's impossible - you could start from Harperin's et al basic setup [1] and plug in some numbers about p doom, the long rate growth rate etc and get a market opinion. I would also be interested in seeing the analysis of hedge fund experts and others. In our cursory lit review we didn't come across any which was readily quantifiable (would love to learn if there is one!). [1] https://forum.effectivealtruism.org/posts/8c7LycgtkypkgYjZx/agi-and-the-emh-markets-are-not-expecting-aligned-or [https://forum.effectivealtruism.org/posts/8c7LycgtkypkgYjZx/agi-and-the-emh-markets-are-not-expecting-aligned-or]

I'd like to share my experience in both running and playing the game Wits & Wagers at EA retreats in New Zealand. We have played the game at every yearly retreat for the past five years, with...

3JohnW3d
That looks really cool, thanks for sharing! Do you think it would work well in a large group setting? It seems like a good halfway-house between the standard Wits&Wagers and a forecasting tournament.
3WilliamKiely2d
I only play-tested it once (in-person with three people with one laptop plus one phone editing the spreadsheet) and the most annoying aspect of my implementation of it was having to record one's forecasts in a spreadsheet from a phone. If everyone had a laptop or their own device it'd be easier. But I made the spreadsheet to handle games (or teams?) of up to 8 people, so I think it could work well for that.
9Catherine Low3d
I've loved being a participant in several of JohnW's Wits and Wagers events, and have run it a couple of times myself since. It is my favourite evening retreat activity. Better than karaoke IMO!

TLDR

  • An increase in the number of forecasters seems to lead to an improvement of the Metaculus community prediction. I believe this effect is real, but due to confounding effects, the analysis presented here might overestimate the improvement
...
3David Rhys Bernard4d
Can you explain more why the bootstrapping approach doesn't give a causal effect (or something pretty close to one) here? The aggregate approach is clearly confounded since questions with more answers are likely easier. But once you condition on the question and directly control the number of forecasters via bootstrapping different sample sizes, it doesn't seem like there are any potential unobserved confounders remaining (other than the time issue Nikos mentioned). I don't see what a natural experiment or RCT would provide above the bootstrapping approach.
3Kenan Schaefkofer3d
Predictors on Metaculus seeing the prior prediction history seems like a plausible confounder to me, but I'm not sure it would change the result.
2Michael_Wiebe3d
Yes, this is the main difference compared to forecasters being randomly assigned to a question.

See also: List of Tools for Collaborative Truth Seeking

Squiggle is a special-purpose programming language for generating probability distributions, estimates of variables over time, and similar tasks, with reasonable transparency. It was developed by the Quantified Uncertainty Research...

7Peter Wildeford4d
There's also a version of Squiggle in python if you're more keen to use Python! https://github.com/rethinkpriorities/squigglepy [https://github.com/rethinkpriorities/squigglepy]
5Leo Gao5d
How much work would it be to make something like this as a python library, and how much would that reduce its usefulness? I think this is really cool and have been looking for something like this, but I am multiple times more likely to use something if it's a python library as opposed to a brand new language, and I assume others think similarly.
17Tier 1 Longtermist5d
https://github.com/rethinkpriorities/squigglepy [https://github.com/rethinkpriorities/squigglepy]

Previously: Samotsvety's AI risk forecasts.

Our colleagues at Epoch recently asked us to update our AI timelines estimate for their upcoming literature review on TAI timelines. We met on 2023-01-21 to discuss our predictions about when advanced AI systems...

2aogara4d
Fair enough. I think people conceive of AGI too monolithically, and don't sufficiently distinguish between the risk profiles of different trajectories. The difference between economic impact and x-risk is the most important, but I think it's also worth forecasting domain-specific capabilities (natural language, robotics, computer vision, etc). Gesturing towards "the concept we all agree exists but can't define" is totally fair, but I think the concept you're gesturing towards breaks down in important ways. 

The purpose of this article (cross-posted from fp21.org) is to share our model for forecasting implementation and gain community input on the substance of the piece. We’d be excited about any feedback.

Background on us: fp21 is a...

3Nathan Young5d
I worked for the UK Civil Service and it was hard to push forecasting because: * Making markets is hard * Getting people to care about the numbers if they appear is hard I think that the social problem of prediction market buy in is a bit harder than people generally think. Michael story writes about it well here. https://mwstory.substack.com/p/why-i-generally-dont-recommend-internal [https://mwstory.substack.com/p/why-i-generally-dont-recommend-internal] 
5JWS5d
Thanks for posting this, I strongly upvoted it for these reasons: 1. It's concise but a very high information-to-padding ratio, higher than I sometimes find on the forum[1] 2. IIDM is something I'm interested in, and seeing a high-quality post in this area is very welcome. I think it adds value to this area and brings attention to it on the Forum 3. The structure of 'explanation - strength - weakness' was very clear, in general I thought that the whole post was well structured but this made the post very easy to follow As for the content of the post itself, I agree with most of it and I look forward to reading the references and links! My only comment for consideration would be, all 4 models seem to have an underlying weakness - that they are actually unlikely to be fully integrated into policy decision-making circles, and that acts as a bottleneck on applying any form on forecasting in the policy realm. So my questions in that area would be: 1. Have there been any historical examples where forecasts where explicitly integrated into decision making, either in the public or private sectors? [My assumption is very little of both, and what there has is likely a lot more on the private than public side] 2. What are the empirical barriers to forecasting being adopted in public policy? Are there case studies of this being attempted and shut down, and in these cases where were the key points leading to a rejection of these models? 3. Are there any particular constituencies/polities where we might expect forecasting to have more of a foothold - where engagement by the EA IIDM community might lead to actual implementation? And finally, I just want to end by saying again I thought it was a very good post :) 1. ^ Especially on my own posts....
2Nathan Young5d
Cool models. Thanks for writing this.

What is a subforum?

Subforums are spaces for discussion, questions, and more detailed posts about particular topics. Full posts in this space may also appear on the Frontpage, and posts from other parts of the EA Forum may appear here if relevant tags are applied. Discussions in this space will never appear elsewhere.

Welcome!

This is a dedicated space for discussions about forecasting and estimation, which we think are important tools for doing good in the world. Subforums is a beta feature, and is still evolving!

Here’s how you can get involved: