Six experiments with a simple optimizer's curse model

titotal

Executive summary: Through six variations on a toy charity-ranking model, the author argues that the optimizer’s curse is fairly robust to changes in bias, uncertainty structure, distributional shape, portfolios, and correlation, though its magnitude can shrink under long-tailed effectiveness distributions or highly correlated errors.

Key points:

In a baseline model where estimation error is large relative to true variation, the top-ranked charity’s median estimated effectiveness substantially exceeds its median actual effectiveness, illustrating the optimizer’s curse.
Introducing pessimistic bias into estimates only cancels the curse if the bias is extreme (e.g., around -350 lives when the best true intervention is ~200), meaning modest pessimism is insufficient.
Increasing uncertainty (higher error standard deviation) raises the estimated effectiveness of the top charity while lowering its actual effectiveness, and in the model reducing error from 200 to 50 lives per million increases median actual impact from 85 to 181 lives per million.
Allowing interventions to differ in uncertainty preserves the curse and biases selection toward high-uncertainty options, with median estimated effectiveness (321) far exceeding median actual effectiveness (111) in one setup.
Changing the shape of the true-effectiveness distribution shows that right skew has little impact on the curse, but very long-tailed distributions (low degrees of freedom t-distributions) reduce its magnitude by widening the gap between the best and second-best options.
Portfolio strategies reduce overestimation but typically lower actual effectiveness in the equal-uncertainty case, while error correlation decreases the curse’s magnitude and correlation in true effectiveness can eliminate gains from ranking.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

SummaryBot

4mo

Key points:

In a baseline model where estimation error is large relative to true variation, the top-ranked charity’s median estimated effectiveness substantially exceeds its median actual effectiveness, illustrating the optimizer’s curse.
Introducing pessimistic bias into estimates only cancels the curse if the bias is extreme (e.g., around -350 lives when the best true intervention is ~200), meaning modest pessimism is insufficient.
Increasing uncertainty (higher error standard deviation) raises the estimated effectiveness of the top charity while lowering its actual effectiveness, and in the model reducing error from 200 to 50 lives per million increases median actual impact from 85 to 181 lives per million.
Allowing interventions to differ in uncertainty preserves the curse and biases selection toward high-uncertainty options, with median estimated effectiveness (321) far exceeding median actual effectiveness (111) in one setup.
Changing the shape of the true-effectiveness distribution shows that right skew has little impact on the curse, but very long-tailed distributions (low degrees of freedom t-distributions) reduce its magnitude by widening the gap between the best and second-best options.
Portfolio strategies reduce overestimation but typically lower actual effectiveness in the equal-uncertainty case, while error correlation decreases the curse’s magnitude and correlation in true effectiveness can eliminate gains from ranking.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

^{^}

I’ve switched to the American spelling of “optimizer” to be consistent with the original paper.

^{^}

Using their estimate of 3000-5500 to save a life, the range of lives saved by a million dollars is 180 to 330. If this error range was a 95% probability interval, then the estimated uncertainty SD is around 40 lives, similar to the “grounded” intervention from last post.

^{^}

Except when comparing grounded to speculative.