Abstract
Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. Athey et al. (2019) generalise the surrogacy method to work with multiple surrogates, rather than just one, increasing its credibility in social science contexts. I empirically test the multiple surrogates approach for long-term effect estimation in real-world conditions using long-run RCTs from development economics. In the context of conditional cash transfers for education in Colombia, I find that the method works well for predicting treatment effects over a 5-year time span but poorly over 10 years due to a reduced set of variables available when attempting to predict effects further into the future. The method is sensitive to observing appropriate surrogates.
Introduction
Many fields seek to estimate the long-term effects of a treatment or policy. In medicine one may want to estimate the effect of a surgery on life expectancy, or in economics the effect of a conditional cash transfer during childhood on adult income. One way to measure these effects would be to run a randomised controlled trial (RCT) and then wait to observe the long-run outcomes. However, typically the results would be observed too late to be relevant for the policy decision today.
One approach developed in medicine to deal with this problem is to study the effects on an intermediate outcome or a surrogate outcome. One can then combine results on the effect of the treatment on the surrogate and the relationship between the surrogate and the long-term outcome to estimate the effect of the treatment on the long-term outcome. For example, one could measure the effect of a surgery on the size of a tumour and the relationship between tumour size and mortality rates and use this to calculate the effect of surgery on life expectancy. To combine results in this way we must make an assumption often known as the Prentice criterion, namely that the treatment and the long-term outcome are independent, conditional on the surrogate (Prentice, 1989). In the previous example, the size of the tumour could be a surrogate for life expectancy if life expectancy is independent of the surgery conditional on the size of the tumour.
Surrogates for long-run effect estimation are often used both formally and informally in medicine, however their use in economics is minimal, despite significant interest in the long-run effects of a variety of programmes and policies (Bouguen et al., 2018). This is likely because the surrogacy assumption is hard to justify in a social science context and there are multiple ways it could be violated. Freedman et al. (1992) show that conditional independence requires that the surrogate mediates the full effect of the treatment on the long-term outcome and if it does not, the surrogate is not valid. Others have shown that even under full mediation, if there is unobserved confounding between the surrogate and the long-term outcome then the surrogacy assumption is also invalid (VanderWeele, 2015).
Due to these issues, Athey et al. (2019) develop surrogacy methods which rely on many surrogate variables instead of just one. The idea behind this is that even though any individual variable may not be a valid surrogate, collectively they are more likely to satisfy the surrogacy assumption. They combine many short-term outcomes into a “surrogate index” which is the expected value of the long-term outcome conditional on the short-term outcomes. They show that under the assumption that the long-term outcome is independent of treatment conditional on the surrogate index, the average treatment effect on the surrogate index is the same as the average treatment effect on the long-term outcome. Based on this they develop different estimators for long-term effects when you do not observe the long-term outcome. I test these surrogacy estimators with real world data from long-run RCTs.
RCTs started to increase in popularity in development economics in the late 1990s (Banerjee et al., 2016). Recently, researchers have started to use the exogenous variation generated by these early experiments to study the effects of programs such as conditional cash transfers on long-term outcomes, such as high school graduation rates and adult income twenty years later. Bouguen et al. (2018) summarise the results of 14 different long-run development RCTs. This provides a laboratory to assess the performance of these surrogacy estimators for long-term outcomes against the unbiased benchmark of the experimental estimate.
The main strategy in this paper is to analyse an experimental dataset in two ways. First, get an unbiased estimate of the standard experimental average treatment effect by regressing long-term outcomes on treatment status. Then, manipulate the data (for example pretending we do not observe the long-term outcomes in one treatment arm) and reanalyse the data using the surrogacy approach. If the estimate from the surrogacy approach is close to the unbiased estimate from the experimental approach then this means the surrogacy approach works well. The further away the two estimates are, the poorer the performance of the surrogacy approach.
I use data from two RCTs, both from Barrera-Osorio et al. (2019) who study the effect of conditional cash transfers on medium- and long-term educational outcomes. I test many different implementations of the surrogacy approach, varying which sample is used as the observational dataset. In these RCTs, I find that both surrogacy approaches work well when the full set of surrogates is used and the long-term effect is 4-5 years in the future. However, the performance of the method is very sensitive to the set of surrogates used and if key surrogates are missing, for instance because we are trying to predict effects further into the future the method performs poorly.
Athey et al. (2019) show that the surrogacy method works well for estimating long-term (9 years) effects on employment of a job-training program. My results show that it is more difficult to predict long-term impacts of human capital interventions from their short-term impacts. Generating more evidence like this is key to understanding when we can reliably estimate long-term effects which are critically important in many domains.
The paper proceeds as follows. Section 2 summarises the econometric theory from Athey et al. (2016, 2019). Section 3 describes in more detail the data I use from Barrera-Osorio et al. (2019). Section 4 describes the results from my different implementations of the surrogacy approach and robustness checks, while section 5 concludes.