PhD student at Aberdeen University studying Bayesian reasoning

Interested in practical exercises and theoretical considerations related to causal inference, forecasting and prioritization.


Forecasting quantum computing

Wiki Contributions


Jsevillamol's Shortform

On getting research collaborators

(adapted from a private conversation)

The 80/20 advice I would give is: be proactive in reaching out to other people and suggesting to them to work for an evening on a small project, like writing a post. Afterwards you both can decide if you are excited enough to work together on something bigger, like a paper.

For more in depth advice, here are some ways I've started collaborations in the past:

  • Deconfusion sessions
    I often invite other researchers for short sessions of 1-2 hours to focus on a topic, with the goal of coding together a barebones prototype or a sketch of a paper.

    For example, I engaged in conversation with Pablo Moreno about Quantum Computing and AI Aligment. We found we disagreed, so I invited him to spend one hour discussing the topic more in depth. During the conversation we wrote down the key points of disagrement, and we resolved to expand them into an article.
  • Advertise through intermediate outputs
    I found it useful for many reasons to split big research projects into post-size bits. One of those reasons is to let other people know what I am working on, and that I am interested in collaborating.

    For example, for the project on studying macroscopic trends in Machine Learning, we resolved to first write a short article about parameter counts. I then advertised the post asking for potential collaborators to reach out.
  • Interview people on their interests
    Asking people what motivates them and what they want to work on can segue into an opportunity to say "actually, I am also interested in X, do you want to work together on it?". I think this requires some finesse, but it is a skill that can be practiced.

    For example, I had an in depth conversation with Laura González about her interests and what kinds of things she wanted to work on. It came up that she was interested in game design, so I prodded her on whether she would be interested in helping me refine a board game prototype I had previously shown her. This started our collaboration.
  • Join communities of practice.
    I found it quite useful to participate in small communities of people working towards similar goals. 

    For example, my supervisor helped me join a Slack group for people working on AI Explainability. I reached out to the people for one-on-one conversations, and suggested working together to a few. Miruna Clinciu accepted - and now we are buiding a small research project.
EA Forum Prize: Winners for May-July 2021

Thank you! I am quite honoured. And congratulations to the other winners!

New Data Visualisations of the EA Forum

I absolutely love the work you have done, thank you so much!

Why aren't you freaking out about OpenAI? At what point would you start?

He is listed in the website

> OpenAI is governed by the board of OpenAI Nonprofit, which consists of OpenAI LP employees Greg Brockman (Chairman & CTO), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Holden Karnofsky, Reid Hoffman, Shivon Zilis, Tasha McCauley, and Will Hurd.

It might not be up to date though

Why aren't you freaking out about OpenAI? At what point would you start?

Note that Eliezer Yudkowski argument in the opening link is that OpenAI's damage was done by fragmenting the AI Safety community on its launch.

This damage is done - and I am not sure it bears much relation to what OpenAI is trying to do going forward.

(I am not sure I agree with Eliezer on this one, but I lack details to tell if OpenAI's launch really was net negative)

My current best guess on how to aggregate forecasts

I found some revelant discussion in the EA Forum about extremizing in footnote 5 of this post.

The aggregation algorithm was elitist, meaning that it weighted more heavily forecasters with good track-records who had updated their forecasts more often. In these slides, Tetlock describes the elitism differently: He says it gives weight to higher-IQ, more open-minded forecasters. The extremizing step pushes the aggregated judgment closer to 1 or 0, to make it more confident. The degree to which they extremize depends on how diverse and sophisticated the pool of forecasters is. The academic papers on this topic can be found here and here. Whether extremizing is a good idea is controversial; according to one expert I interviewed, more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke. After all, a priori one would expect extremizing to lead to small improvements in accuracy most of the time, but big losses in accuracy some of the time.

The post in general is quite good, and I recommend it.

[Creative Writing Contest] [Fiction] The Fey Deal

I liked this one a lot!

It was very easy to read and pulled me in. I felt compelled by the protagonists inner turmoil, and how he makes his decision. The writing was clear but it flowed very well. This is something I will send some friends to introduce them to effective altruism.

The only part I didn't like was the ending. I like the intention of linking to GiveWell's page but it pulled me totally out of the fantasy. Also the friend felt a bit 2D. But these are minor quibbles. 

Thank you for writing this!

My current best guess on how to aggregate forecasts

Thanks for this post - I think this was a very useful conversation to have started (at least for my own work!), even if I'm less confident than you in some of these conclusions 

Thank you for your kind words! To dismiss any impression of confidence, this represents my best guesses. I am also quite confused.

I've heard other people give good-sounding arguments for other conclusions

I'd be really curious if you can dig these up!

You later imply that you think [the geo mean of probs outperforming the geo mean of odds] is at least partly because of a specific bias among Metaculus forecasts. But I'm not sure if you think it's fully because of that or whether that's the right explanation

I am confident that the geometric mean of probs outperformed the geo mean of odds because of this bias. If you change the coding of all binary questions so that True becomes False and viceversa then you are going to get worse performance that the geo mean of odds.

This is because the geometric mean of probabilities does not map consistently predictions and their complements. With a basic example, suppose that we have . Then .

 So the geometric mean of probabilities in this sense it's not a consistent probability - it doesn't map the complement of probabilities to the the complement of the geometric mean as we would expect (the geometric mean of odds, the mean of probabilities and the median all satisfy this basic property).

So I would recommend viewing the geometric mean of probabilities as a hack to adjust the geometric mean of odds down. This is also why I think better adjustments likely exist, since this isn't a particularly well motivated adjustment. It does however seem to slighly improve Metaculus predictions, so I included it in the flowchart.

To drill this point even more, here is what we would get if we aggregated the predictions in the last 860 resolved metaculus binary questions by mapping each prediction to their complement, taking the geo mean of probs and taking the complement again:

The complement of the geometric mean of complement probabilities is called comp_geo_mean

As you can see, this change (that would not affect the other aggregates) significantly weakens the geo mean of probs.

When pooling forecasts, use the geometric mean of odds

I get what you are saying, and I also harbor doubts about whether extremization is just pure hindsight bias or if there is something else to it.

Overall I still think its probably justified in cases like Metaculus to extremize based on the extremization factor that would optimize the last 100 resolved questions, and I would expect the extremized geo mean with such a factor to outperform the unextremized geo mean in the next 100 binary questions to resolve (if pressed to put a number on it maybe ~70% confidence without thinking too much).

My reasoning here is something like:

  • There seems to be a long tradition of extremizing in the academic literature (see the reference in the post above). Though on the other hand empirical studies have been sparse, and eg Satopaa et al are cheating by choosing the extremization factor with the benefit of hindsight.
  • In this case I didn't try too hard to find an extremization factor that would work, just two attempts. I didn't need to mine for a factor that would work. But obviously we cannot generalize from just one example.
  • Extremizing has an intuitive meaning as accounting for the different pieces of information across experts that gives it weight (pun not intended). On the other hand, every extra parameter in the aggregation is a chance to shoot off our own foot.
  • Intuitively it seems like the overall confidence of a community should be roughly continuous over time? So the level of underconfidence in recent questions should be a good indicator of its confidence for the next few questions.

So overall I am not super convinced, and a big part of my argument is an appeal to authority. 

Also, it seems to be the case that extremization by 1.5 also works when looking at the last 330 questions.


I'd be curious about your thoughts here. Do you think that a 1.5-extremized geo mean will outperform the unextremized geo mean in the next 100 questions? What if we choose a finetuned extremization factor that would optimize the last 100?

My current best guess on how to aggregate forecasts

Hmm good question.

For a quick foray into this we can see what would happen if we use our estimate the mean of the max likelihood beta distribution implied by the sample of forecasts .

The log-likelihood to maximize is then 


The wikipedia article on the Beta distribution discusses this maximization problem in depth, pointing out that albeit no closed form exists if  and  can be assumed to be not too small the max likelihood estimate can be approximated as   and , where  and .

The mean of a beta with these max likelihood parameters is .

By comparison, the geometric mean of odds estimate is:

Here are two examples of how the two methods compare aggregating five forecasts


I originally did this to convince myself that the two aggregates were different. And they seem to be! The method seems to be close to the arithmetic mean in this example. Let's see what happens when we extremize one of the predictions:


We have made p3 one hundred times smaller. The geometric mean is suitable affected. The maximum likelihood beta mean stays close to the arithmetic mean, unperturbed. 

This makes me a bit less excited about this method, but I would be excited about people poking around with this method and related ones!

Load More