PhD student at Aberdeen University studying Bayesian reasoning
Interested in practical exercises and theoretical considerations related to causal inference, forecasting and prioritization.
On getting research collaborators
(adapted from a private conversation)
The 80/20 advice I would give is: be proactive in reaching out to other people and suggesting to them to work for an evening on a small project, like writing a post. Afterwards you both can decide if you are excited enough to work together on something bigger, like a paper.
For more in depth advice, here are some ways I've started collaborations in the past:
Thank you! I am quite honoured. And congratulations to the other winners!
I absolutely love the work you have done, thank you so much!
He is listed in the website. > OpenAI is governed by the board of OpenAI Nonprofit, which consists of OpenAI LP employees Greg Brockman (Chairman & CTO), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Holden Karnofsky, Reid Hoffman, Shivon Zilis, Tasha McCauley, and Will Hurd.
It might not be up to date though
Note that Eliezer Yudkowski argument in the opening link is that OpenAI's damage was done by fragmenting the AI Safety community on its launch.
This damage is done - and I am not sure it bears much relation to what OpenAI is trying to do going forward.
(I am not sure I agree with Eliezer on this one, but I lack details to tell if OpenAI's launch really was net negative)
I found some revelant discussion in the EA Forum about extremizing in footnote 5 of this post.
The aggregation algorithm was elitist, meaning that it weighted more heavily forecasters with good track-records who had updated their forecasts more often. In these slides, Tetlock describes the elitism differently: He says it gives weight to higher-IQ, more open-minded forecasters. The extremizing step pushes the aggregated judgment closer to 1 or 0, to make it more confident. The degree to which they extremize depends on how diverse and sophisticated the pool of forecasters is. The academic papers on this topic can be found here and here. Whether extremizing is a good idea is controversial; according to one expert I interviewed, more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke. After all, a priori one would expect extremizing to lead to small improvements in accuracy most of the time, but big losses in accuracy some of the time.
The post in general is quite good, and I recommend it.
I liked this one a lot!
It was very easy to read and pulled me in. I felt compelled by the protagonists inner turmoil, and how he makes his decision. The writing was clear but it flowed very well. This is something I will send some friends to introduce them to effective altruism.
The only part I didn't like was the ending. I like the intention of linking to GiveWell's page but it pulled me totally out of the fantasy. Also the friend felt a bit 2D. But these are minor quibbles.
Thank you for writing this!
Thanks for this post - I think this was a very useful conversation to have started (at least for my own work!), even if I'm less confident than you in some of these conclusions
Thank you for your kind words! To dismiss any impression of confidence, this represents my best guesses. I am also quite confused.
I've heard other people give good-sounding arguments for other conclusions
I'd be really curious if you can dig these up!
You later imply that you think [the geo mean of probs outperforming the geo mean of odds] is at least partly because of a specific bias among Metaculus forecasts. But I'm not sure if you think it's fully because of that or whether that's the right explanation
I am confident that the geometric mean of probs outperformed the geo mean of odds because of this bias. If you change the coding of all binary questions so that True becomes False and viceversa then you are going to get worse performance that the geo mean of odds.
This is because the geometric mean of probabilities does not map consistently predictions and their complements. With a basic example, suppose that we have p1=0.01,p2=0.3. Then √p1∗p2+√(1−p1)∗(1−p2)≈0.89<1.
So the geometric mean of probabilities in this sense it's not a consistent probability - it doesn't map the complement of probabilities to the the complement of the geometric mean as we would expect (the geometric mean of odds, the mean of probabilities and the median all satisfy this basic property).
So I would recommend viewing the geometric mean of probabilities as a hack to adjust the geometric mean of odds down. This is also why I think better adjustments likely exist, since this isn't a particularly well motivated adjustment. It does however seem to slighly improve Metaculus predictions, so I included it in the flowchart.
To drill this point even more, here is what we would get if we aggregated the predictions in the last 860 resolved metaculus binary questions by mapping each prediction to their complement, taking the geo mean of probs and taking the complement again:
As you can see, this change (that would not affect the other aggregates) significantly weakens the geo mean of probs.
I get what you are saying, and I also harbor doubts about whether extremization is just pure hindsight bias or if there is something else to it.
Overall I still think its probably justified in cases like Metaculus to extremize based on the extremization factor that would optimize the last 100 resolved questions, and I would expect the extremized geo mean with such a factor to outperform the unextremized geo mean in the next 100 binary questions to resolve (if pressed to put a number on it maybe ~70% confidence without thinking too much).
My reasoning here is something like:
So overall I am not super convinced, and a big part of my argument is an appeal to authority.
Also, it seems to be the case that extremization by 1.5 also works when looking at the last 330 questions.
I'd be curious about your thoughts here. Do you think that a 1.5-extremized geo mean will outperform the unextremized geo mean in the next 100 questions? What if we choose a finetuned extremization factor that would optimize the last 100?
Hmm good question.
For a quick foray into this we can see what would happen if we use our estimate the mean of the max likelihood beta distribution implied by the sample of forecasts p1,...,pN.
The log-likelihood to maximize is then
The wikipedia article on the Beta distribution discusses this maximization problem in depth, pointing out that albeit no closed form exists if α and β can be assumed to be not too small the max likelihood estimate can be approximated as ^α≈12+^GX2(1−^GX−^G1−X) and ^β≈12+^G1−X2(1−^GX−^G1−X), where GX=∏ip1/Ni and G1−X=∏i(1−pi)1/N.
The mean of a beta with these max likelihood parameters is ^α^α+^β=(1−G1−X)(1−GX)+(1−G1−X).
By comparison, the geometric mean of odds estimate is:
Here are two examples of how the two methods compare aggregating five forecasts
I originally did this to convince myself that the two aggregates were different. And they seem to be! The method seems to be close to the arithmetic mean in this example. Let's see what happens when we extremize one of the predictions:
We have made p3 one hundred times smaller. The geometric mean is suitable affected. The maximum likelihood beta mean stays close to the arithmetic mean, unperturbed.
This makes me a bit less excited about this method, but I would be excited about people poking around with this method and related ones!