I love this analysis, and I think it highlights how important model choice can be. Both constant and decaying treatment effects seem plausible to me. Instead of choosing only one of the two models (constant effect or decaying effect) and estimating as though this were the truth, a middle option is Bayesian model averaging:
Along with the prior over the parameters in the models, you also have a prior over the models themselves (eg 50/50 constant vs decaying effect). The data will support some models more than others and so you get a posterior distribution over the models based on the data (say 33/67 constant vs decaying effect). The posterior probability that each model is true is the weight you give each model when you estimate the treatment effect you're interested in. It's a formal way of incorporating model uncertainty into the estimate, which would allow others to adjust the analysis based on their priors on the correct model (presumably GiveWell would start with a larger prior on constant effects, you would start with a larger prior on decaying effects).
One (small) reason one might start with a larger prior on the constant effects model is to favor simplicity. In Bayesian model averaging when researchers don't assign equal priors to all models, I think the second most common decision is to penalize the model with more parameters in favor of the simpler model, which would mean having a larger prior on the constant effects model in this case. I share your prior that decaying effects of economic programs in adults is the norm; I think it's less clear for early childhood interventions that have positive effects in adulthood. This paper has a review of some qualifying interventions (including deworming) - I'd be interested to see if others have multiple long-run waves and whether those also show evidence of decaying effects.
Thanks for this point, I didn't think clearly about how the models are nested. I think that means the BMA I describe is the same as having one model with a decay parameter (as you say) but instead of a continuous prior on the decay parameter the prior is a mixture model with a point mass at zero. I know this is one bayesian method for penalizing model complexity, similar to lasso or ridge regression.
So I now realize that what I proposed could just be seen as putting an explicit penalty on the extra parameter needed in the decay model, where the penalty is the size of the point mass. The motivation for that would be to avoid overfitting, which isn't how I thought of it originally.