Jonathan Nankivell

116 karmaJoined


I'm Jonathan Nankivell, an undergraduate in my last year studying Mathematics. My interests are in ML and collaborative epistemics.

I had to discover EA twice before it stuck. My first random walk was 'psychology -> big five framework -> principle component analysis -> pol.is -> radical exchange -> EA' and my second was 'effect of social media -> should I read the news? -> Ezra Klein on the 80,000 hours podcast -> EA'.


Good comment and good points.

I guess the aim of my post was two-fold:

  1. In all the discussion of the explore-exploit trade-off, I've never heard anyone describe it as a frontier that you can be on or off. The explore-exploit frontier is hopefully a useful framework to add to this dialogue.
  2. The literature on clinical trial design is imo full of great ideas never tried. This is partly due to actual difficulties and partly due to a general lack of awareness about the benefits they offer. I think we need good writing for a generalist audience on this topic and this my attempt.

You're definitely right that the caveat is a large one. Adaptive designs are not appropriate everywhere, which is why this post raises points for discussion and doesn't provide a fixed prescription.

To respond to your specific points.

the paper also points out various other practical issues with adaptive designs in section 3

Section three discusses whether adaptive designs lead to

  1. a substantial chance of allocating more patients to an inferior treatment
  2. reducing statistical power
  3. making statistical inference more challenging
  4. making robust inference difficult if there is potential for time trends
  5. making the trial more challenging to implement in practice.

My understanding of the authors' position is that it depends on the trial design. Drop-the-Loser, for example, would perform very well on issues 1 through 4. Other methods, less so. I only omit 5 as CRO are currently ill-equipped to run these studies - there's no fundamental reason for this and if demand increased, this obstacle would reduce. In the mean time, this unfortunately does raise the burden on the investigating team.

if what standard care entails changes due to changing guidelines

This is not an objection I've heard before. I presume the effect of this would be equivalent to the presence of a time trend. Hence some designs would perform well (DTL, DBCD, etc) and others wouldn't (TS, FLGI, etc).

Adaptive designs generally require knowledge of outcomes to inform randomisation of future enrolees.

This is often true, although generalised methods built to address this can be found. See here for example.

In summary: While I think that these difficulties can often be overcome, they should not be ignored. Teams should go in eyes open, aware that they may have to do more themselves than typical. Read, discuss, make a plan, implement it. Know each option's drawbacks. Also know their advantages.

Hope that makes sense.

DLT is an urn-based method where treatment assignment is determined by a simulated urn. By removing balls when the treatment fails and by adding balls uniformly so that no type runs out, we can balance the trial allocation in a sensible way.

The actual algorithm, for a two treatment study:

Consider an urn containing three types of balls. Balls of types 1 and 2 represent treatments. Balls of type 0 are termed immigration balls. When a subject arrives, one ball is drawn at random. If a treatment ball of type  (1 or 2) is selected, the -th treatment is given to the subject and the response is observed. If it is a failure, the ball is not replaced. If the treatment is a success, the ball is replaced and consequently, the urn composition remains unchanged. If an immigration ball (type 0) is selected, no subject is treated, and the ball is returned to the urn together with two additional treatment balls, one of each treatment type. This procedure is repeated until a treatment ball is drawn and the subject treated accordingly. The function of the immigration ball is to avoid the extinction of a type of treatment ball.


Extending DLT to multi-treatment settings is as simple as adding additional ball types.

Simulation studies show that DTL performs very well as a way to maximise statistical power. I've read that this is because it (1) approaches the correct ratio asymptotically and (2) has lower variance than other proposed methods, although I don't have an intuitive understanding of why this is.

I've had a look around and this paper has a nice summary of the method (and proposes how it should handle delayed responses).

Regarding the plausibility of biological X-risks:

Is anyone aware of any instance of a disease killing the entirety of a town?

Thanks for the pointers. I've reread this now - it is a good read and interesting throughout!

Intriguing though health economics is, I think I am more focused on the actual treatments. I want to understand, when faced with a patient, the methods that can be used to pick the treatment. I am interested in how effective each method would be for the patient and how well they would scale to an entire population.

It seems strange that these questions have been omitted in a book comparing healthcare systems.

I have been considering what I call the fundamental question of healthcare: 'what healthcare system gives the best treatment to the most patients?' This question, simple though it is, seems to lack any established answer.

The importance of this question seems self-evident. Is anyone aware of any research or proposals that seek to address it?

Update: I emailed Alex Tabarrok to get his thoughts on this. He originally proposed using dominant assurance contracts to solve public good problems, and he has experience testing it empirically.

He makes the following points about my suggestion:

  • The first step is the most important. Without clarity of what the public good will be and who is expected to pay for it, the DAC won't work
  • You should probably focus on libraries as the potential source of funding. They are the ones who pay subscription fees, they are the ones who would benefit from this
  • DACs are a novel forum of social technology. It might be best to try to deliver smaller public goods first, allowing people to get more familiar, before trying to buy a journal

He also suggested other ways to solve the same problem:

  • Have you considered starting a new journal? This should be cheaper. There would also be a coordination questions to solve to make it prestigious, but this one might be easier
  • Have you considered 'flipping' a journal? Could you take the editors, reviewers and community that supports an existing journal, and persuade them to start a similar but open access journal? (The Fair Open Access Alliance seem to have had success facilitating this. Perhaps we should support them?)

My current (and weakly held) position is that flipping editorial boards to create new open access journals is the best way to improve publishing standards. Small steps towards a much better world. Would it be possible to for the Future Fund to entice 80% of the big journals to do this? The top journal in every field? Maybe.

Research Coordination Projects

Research that can help us improve

At the root of many problems that are being discussed are coordination problems. People are in prisoners' dilemmas, and keep defecting. This is the case in the suggestion to buy a scientific journal: if the universities coordinated they could buy the journal, remove fees, improve editorial policies, and they would be in a far better situation. Since they don't coordinate, they have to pay to access their own research.

Research into this type of coordination problem has revealed two general strategies for overcoming the prisoners' dilemma type effects: quadratic funding and dominant assurance contracts.

I propose a research project to investigate opportunities to use these techniques, which, if appropriate, would get bankrolled by the future fund.

We could, of course, simply get the future fund to pay for this. There is, however, an alternative that might be worth thinking about.

This seems like the kind of thing that dominant assurance contracts are designed to solve. We could run a Kickstarter, and use the future fund to pay the early backers if we fail to reach the target amount. This should incentivise all those who want the journals bought to chip in.

Here is one way we could do this:

  1. Use a system like pol.is to identify points of consensus between universities. This should be about the rules going forward if we buy the journal. For example, do they all want pre-registration? What should the copyright situation be? How should peer-review work? How should the journal be ran? etc
  2. Whatever the consensus is, commit to implementing it if the buyout is successful
  3. Start crowdsourcing the funds needed. To maximise the chance of success, this should be done using a DAC (dominant assurance contract). This works like any other crowdfunding mechanism (GoFundMe, Kickstarter, etc), except we have a pool of money that is used to pay the early backers if we fail to meet the goal. If the standard donation size we're asking the unis for is £X, and having the publisher bought is worth at least £X to the uni, then the the dominant strategy for the uni is to chip in.
  4. If we raise the money, great! We can do what we committed to doing. We're happy, the unis are happy, the shareholders of the publisher are happy. If we fail to raise the money, we pay all the early backers, and move on to other things.

Credence Weighted Citation Metrics

Epistemic Institutions

Citation metrics (total citations, h-index, g-index, etc.) are intended to estimate a researcher's contribution to a field. However, if false claims get cited more then true claims (Serra-Garcia and Gneezy 2021), these citation metrics are clearly not fit for purpose.

I suggest modifying these citation metrics by weighing each paper by the probability that it will replicate. If each paper has citations and probability of replicating , we can modify each formula as follows: instead of measuring total citations we consider credence weighted total citations Instead of using the h-index where we pick 'the largest number such that articles have ', we could use the credence weighted h-index where we pick the largest number such that articles have . We can use this idea to modify citation metrics that evaluate researchers (as above), journals (Impact factor and CiteScore) and universities (rankings).

We can use prediction markets to elicit these probabilities, where the questions are resolved using a combination of large scale replication studies and surrogate scoring. DARPA SCORE is a proof of concept that this can be done on a large scale.

Prioritising credence weighted citation metrics over citation metrics, would improve the incentives researchers have. No longer will they have to compete with people who write 70 flimsy papers a year that no one actually thinks will replicate; now researchers who are right will be rewarded.

Self-Improving Healthcare

Biorisk and Recovery from Catastrophe, Epistemic Institutions, Economic Growth

Our healthcare systems aren't perfect. One underdiscussed part of this is that we learn almost nothing from the vast majority of treatment that happens. I'd love to see systems that learn from the day-to-day process of treating patients, systems that use automatic feedback loops and crowd wisdom to detect and correct mistakes, and that identify, test and incorporate new treatments. It should be possible to do this. Below is my suggestion.

I suggest we allocate treatments to patients in a specific way: the probability that we allocate a treatment to a patient, should match the probability that that treatment is the best treatment for that patient. This will create a RCT of similar patients, which we can use to update the probabilities that we use for allocation. Then repeat. This will maximise the number of patients given the best treatment in the medium to long-term. It does this by detecting and correcting mistakes, and by cautiously testing novel treatments and then, if warranted, rolling them out to the wider population.

This idea is still in it's early stages. More detailed thoughts (such as where the probabilities come from) can be found here. If you have any thoughts or feedback, please get in touch.

Load more