An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants

NunoSempere

An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants

Comments 16

Sorted by

New & upvoted

elifland

(Comments re-worded from those on a draft)

Overall I like the direction this post pushes in.

I shared a briefing with the participants summarizing the nine Open Philanthropy grants above, with the idea that it might speed the process along.
In hindsight, this was suboptimal, and might have led to some anchoring bias. Some participants complained that the summaries had some subjective component. These participants said they used the source links but did not pay that much attention to these opinions.
On the other hand, other participants said they found the subjective estimates useful. And because the briefing was written in good faith, I am personally not particularly worried about it. Even if there are anchoring issues, we may not necessarily care about it if we think that the output is accurate, in the same way that we may not care about forecasters anchoring on the base rate. [emphasis mine]
If I were redoing this experiment, I would probably limit myself even more to expressing only factual claims and finding sources. A better scheme may have been share a writeup with a minimal subjective component, then strongly encouraging participants to make their own judgments before looking at a separate writeup with more subjective summaries, which they can optionally use to adjust their estimates

I disagree with the opinions expressed in the bolded paragraph. I wouldn't want forecasters to anchor on a specific base rate I gave them! I'd want them to find their own. Of course you think that the forecasters are anchoring on something accurate since the opinions they're anchoring on are your own! This isn't reassuring to me at all.

Thoughts on scaling up this type of estimation up [section header]

I'm more excited about in-depth evaluation of agendas/organizations as a whole than trying to scale up shallow estimations to all grants.

Giving some very quick numbers to this, say:
a 12% chance of AGI being built before 2030,
a 30% of it being built in Britain by then if so,
a 90% of it being built by DeepMind if so,
an initial 50% chance of it going well if so
GovAI efforts shift the probability of it going well from 50% to 55%.
Punching those numbers into a calculator, a rough estimate is that GovAI reduces existential risk by around 0.081%, or 8.1 basis points.

This BOTEC feels too optimistic about GovAI's impact to me, and I trust it even less than most BOTECs because it's not directly modeling the channel though which I (and I believe GovAI) think GovAI will have the most impact, which is field-building.

NunoSempere

Thanks Eli. I think I most disagree with you on the BOTEC point. Copying a paragraph from the text:

The key number here is the 5% improvement (from 50% to 55%). I’m getting this estimate mostly because I think that Allan Dafoe being the “Head of Long-term Strategy and Governance” at DeepMind seems like a promising signal. It nicely corresponds to the “having people in places to implement safety strategies” part of GovAI’s pathway to impact. But that estimation strategy is very crude, and I could imagine a better estimate ranging from <0.5% to more than 5%.

So I think that the handwavy estimate is still meaningful.

Dan_Keys

I think it would've been better to just elicit point estimates of the grants' expected value, rather than distributions. Using distributions adds complexity, for not much benefit, and it's somewhat unclear what the distributions even represent.

Added complexity: for researchers giving their elicitations, for the data analysis, for readers trying to interpret the results. This can make the process slower, lead to errors, and lead to different people interpreting things differently. e.g., For including both positive & negative numbers in the distributions.

Not much benefit: at least, when I read this report I mostly looked at the point estimates, except for the section showing that researchers' confidence intervals for the two elicitation methods didn't overlap.

Unclear what the distribution represents: The distribution is basically a probability distribution over a probability (p(x-risk)), and it's not obvious which uncertainties should be represented in the distribution and which are part of p(x-risk). e.g., If someone thinks that there's an 80% chance that a research direction is misguided & useless and a 20% chance that it's meaningful & relevant, should they just multiply their distribution by 0.2 (relative to research that is definitely in a meaningful & relevant direction), or should this give a more spread-out distribution with most of the probability mass near zero, or something in between?

NunoSempere

Unclear what the distribution represents: The distribution is basically a probability distribution over a probability (p(x-risk)), and it's not obvious which uncertainties should be represented in the distribution and which are part of p(x-risk). e.g., If someone thinks that there's an 80% chance that a research direction is misguided & useless and a 20% chance that it's meaningful & relevant, should they just multiply their distribution by 0.2 (relative to research that is definitely in a meaningful & relevant direction), or should this give a more spread-out distribution with most of the probability mass near zero, or something in between?

Yeah, you can use a mixture distribution if you are thinking about the distribution of impact, like so, or you can take the mean of that mixture if you want to estimate the expected value, like so. Depends of what you are after.

NunoSempere

My intuitions point the other way with regards to point estimates vs distributions. Distributions seem like the correct format here, and they could allow for value of information calculations, sensitivity, to highlight disagreements which people wouldn't notice with point estimates, to better combine. The bottom line could also change when using estimates, e.g., as in here.

That said, they do have a learning curve and I agree with you that they add additional complexity/upfront cost.

Dan_Keys

Agreed that there are some contexts where there's more value in getting distributions, like with the Fermi paradox.

Or, before the grants are given out, you could ask people to give an ex ante distribution for "what will be your ex post point estimate of the value of this grant?" That feeds directly into VOI calculations, and it is clearly defined what the distribution represents. But note that it requires focusing on point estimates ex post.

NunoSempere

> Or, before the grants are given out, you could ask people to give an ex ante distribution for "what will be your ex post point estimate of the value of this grant?" That feeds directly into VOI calculations, and it is clearly defined what the distribution represents. But note that it requires focusing on point estimates ex post.

Aha, but you can also do this when the final answer is also a distribution. In particular, you can look at the KL-divergence between the initial distribution and the answer, and this is also a proper scoring rule.

NunoSempere

More generally, I think there is a difference between what would have been best for this analysis, and you might be right that point estimates would have been better, and what EA/longtermism should be aiming to have, which I think are more uncertain estimates in the shape of distributions.

Ozzie Gooen

Some thoughts on the greater project:

- The greater prospect of “let’s have collaborative estimates of the impacts of key longtermist projects” is something I strongly want to see, but I think it’s also *really* difficult to do well.

- This experiment went through a few early strategies. I think the results are clearly mediocre (in that estimates were all over the place, and were wildly inconsistent), but could be a good place to build much better work.

- I see this very much as an MVP, so I’d expect it to have severe limitations. I generally prefer processes of “build a bunch of MPVs and test them out, and see what fails”, then one of “spend a whole lot of time getting it right the first time.”

- The fact that estimates were inconsistent suggests that elicitation is very difficult to do well, but also that there’s a great deal of improvement to be done. So, future work is probably less tractable than expected, but more important.

- I’m still very bullish on relative evalutions, but think that they will require a lot of clever innovations to do well.

- I think that longer-term, it would be promising to have people submit relative evaluations as long Squiggle (or similar) files. I’m unsure how these can best be displayed or organized for specific discussions.

Some thoughts on the estimations:

- I think this is the first most/any of us have really had to estimate the relative value of these kinds of longtermist projects. There’s been very little literature on this before. I think the numbers are correspondingly questionable (including my own).

- Utility elicitation for comparing one item to another that could be negative, in particular, was really poor. I tried some naive Squiggle calculations that clearly weren’t very accurate. I’m not sure what tool would be best here, maybe there’s some custom drawing-with-mouse tool that could work, or people could figure out better quantitative function representations.

- It’s very hard to evaluate these sorts of projects without much more data. Ideally, there would be a lot of data gathering. For example, if a program funds 10 people to do work, we’d ideally have a good table of all of their outputs, and comments from people in the area about how good these comments were. A lot of evaluation work can reduce to “effective systems to gather objective and subjective information from diverse sets of sources.”

- I believe people estimated “how valuable do you think this is” instead of, “how valuable do you think a council would think this is?” The latter should be much more uncertain, and possibly much more important to readers (if done well).

- From what I remember, I think my main disagreement with other evaluators is that some had much narrower ranges than I thought were reasonable. I guess that some of this is part of a learning process.

Misha_Yagudin

A slightly edited section of my comment on the earlier draft:

I lean skeptical about "relative pair-wise comparisons" after participating: I think people were surprised by their aggregate estimates (e.g., I was very surprised!); I think later convergence was due to common sense and mostly came from people moving points between interventions and not from pair-wise anything;

I think this might be because I am unconfident about eliciting distributions with Squiggle. As I don't have good intuition about how a few log-normals with 80% probability between xx and yy would compare to each other after aggregations (probably this is common, see 2a). After I did my point estimates + my CI via Squiggle for everything alltogether, I think they didn't match each other that well. Maybe that's because lognormal is right-skewed and fairly heavy-tailed?

NunoSempere

Thanks Misha

Dan_Keys

In the table with post-discussion distributions, how is the lower bound of the aggregate distribution for the Open Phil AI Fellowship -73, when the lowest lower bound for an individual researcher is -2.4? Also in that row, Researcher 3's distribution is given as "250 to 320", which doesn't include their median (35) and is too large for a scale that's normalized to 100.

NunoSempere

Hey, thanks

Also in that row, Researcher 3's distribution is given as "250 to 320", which doesn't include their median (35) and is too large for a scale that's normalized to 100.

Should have been -250, updated.

This also explains the -73.

zchuang

Sorry but the squiggle link doesn't work in footnote 4. I'd like to replicate it in a different colour just to look at graph properly.

Mo Putera

This produces an estimate of 0.52% of the future, or 52 basis points, which is around 6x higher than our initial estimate of 8.1 basis points.

Isn't that chart mean = 0.052% = 5.2 basis points ≈ the earlier pointwise estimate, not 0.52%, or am I misreading?

NunoSempere

That seems correct, though now I am doubting myself.

Comments

Ozzie Gooen

Some thoughts on the greater project:

- I’m still very bullish on relative evalutions, but think that they will require a lot of clever innovations to do well.

Some thoughts on the estimations:

^{^}

Note that in the first case, I am displaying the mean, and in the other, the medians. This is because a) means of very wide distributions are fairly counterintuitive, and in various occasions, I don't think that participants thought much about this, and b) because of a methodological accident, participants provided means in the first case and medians in the second.

Note also that medians are a pretty terrible aggregation method.

^{^}

Note that the distributions aren't necessarily lognormally distributed, hence why the medians may look off. See this spreadsheet for details.

^{^}

80% for researcher #5, because of idiosyncratic reasons.

^{^}

Squiggle model here.

^{^}

Open Philanthropy grants for 2021: 216, Long-term future fund grants for 2021: 46, FTX Future fund public grants and regrants: 113 so far, so an expected ~170 by the end of the year. In total this is 375 grants, and I'd wager it will be growing year by year.

An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants

An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants

Summary

Background and motivation

Methodology

Selection of participants

Selection of grants

Estimation target

Elicitation method #1: Utility function extractor application

Elicitation method #2: Hierarchical tree estimates

Elicitation method #3: Individual aggregate estimates

Elicitation method #4: Discussion and new individual estimates

Observations and reflections

Initial estimates from the same researcher using two different methods did not tend to overlap

Estimates between participants after holding a discussion round were mostly in agreement

Discussion of the shape of the results

Thoughts on accuracy

What was the role of Squiggle?

Thoughts on scaling up this type of estimation up

Relative estimates as an elicitation method vs as an output format

Relative estimates of value seem a bit more resilient to shifts in what we care about

Thoughts on alternative value estimation methods

Future work

Acknowledgements

Appendix: More details