Nov 22, 2017
I am currently evaluating multiple interventions aimed at mental illness. In order to compare these to each other and interventions in other areas, it is important to make an estimate of severity of the problem and of the impact of interventions. Several standard systems for evaluating health interventions exist, each of which has strengths and weaknesses. How accurate/useful are these systems for mental illness?
Mental illness has a death toll (primarily from suicide and overdoses) that can be compared to deaths from physical ailments. Death has the advantage of being a binary state subject to very little measurement error or differing definitions across culture. However it is an imperfect proxy for suffering inflicted by mental illness. Depending on culture one country may have a higher depression rate but lower suicide rate. A country with better medical services may have a worse drug problem but fewer deaths from overdoses. Cause of death is subject to manipulation. Mortality is also a very poor measure of anxiety, since anxiety is almost never the immediate cause of death (although it may shorten lifespan).
Disability adjusted life years (DALYs) are an attempt to use a single number to express the health of a population. The calculation method can vary from study to study; for purposes of this post I will be referring only to the methods used in the Global Burden of Disease 2010 (hereafter GBD 2010) study.
Aggregated DALYs for a population are calculated by multiplying the [disability prevalence] x [disability weight] x [years until remission or death]. Some surveys (but not all) include further discounts for age, assuming that a year lived as a 70 year old is less valuable than a year lived as a 25 year old. This is known as age-weighting. Disability weight is calculated by asking individuals to compare two scenarios and rate which person seems “healthier.” GBD 2010 surveyed approximately 14,000 individuals from five countries (Bangladesh, Indonesia, Peru, the United Republic of Tanzania and the United States of America) and offered a web based survey as well, which was eventually taken by approximately 16,000 people. Previous versions of the GBD exclusively used the evaluations of health care practitioners.
Because they are only are a measure of health, DALYs are not a good measure of suffering. For example, a loved one dying is an obvious cause of suffering via grief, but has no impact on the DALY metric of the survivors. DALYs also deliberately exclude the availability of mitigations: vision impairment has the same DALY cost regardless of the availability of corrective lenses (Voight & King, 2010). These choices make DALYs highly legible and comparable, at the cost of excluding many things one might care about. “Healthy” is a highly ambiguous term, which many cultures consider to refer only to physical health. This suggests that if one cares about suffering, or includes mental health in their definition of health, DALYs are likely to severely underrate the impact of mental illness.
QALYs are explicitly designed to evaluate quality of life, not just health. Instead of choosing which of two individuals is healthier, survey participants may choose which situation they would rather live in (e.g., five years of blindness or four years of deafness), what risk of death they would accept in order to cure an ailment (e.g. 10% risk of death for surgery to restore function to your leg), or “how bad does this sound to you on a scale of 1-100?”
QALYs are noticeably better than DALYs for measuring the impact of mental illness, in that everyone agrees mental illnesses lower quality of life. However there is still concern that they underestimate the impact because people are bad at imagining themselves in different situations, and bad at imagining mental illness in particular. Dolan (2008) argues that any rating based on trade-offs is inherently weak, because humans are so bad at remembering the past and anticipating the future. He favors using ratings of subjective well being from people currently suffering from a condition. Brazier, et al. (2008) cites data that the general public rates mental health issues as less important than physical health, less so than those who suffer from mental illness (Brazier (2008), which if true would lead to an underestimate of the cost of mental illness. Meanwhile De Wit, Busschbach, and De Charro (2000) argue that people underestimate their ability to adapt to situations, and thus all QALY estimates are overestimates. Michael Plant argues that this applies only to physical ailments, and that this leads people to underestimate the severity of mental illness relative to physical illness.
The cost-effectiveness estimates for malaria nets are based solely on the averted physical suffering. In order to truly compare malaria QALYs with depression QALYs, we must take into consideration the mental health toll of malaria. This turns out to be a very complicated question that can’t be answered without getting into moral ontology, which is beyond the scope of this document.
For a very, very crude idea of the effect on bednets on suffering, see this guesstimate model, which lets you estimate the mental illness cost of malaria from mourning and mental-health related side effects. Ultimately the DALY/$ (guesstimated in the range of 10^-3 and 10 ^-5) are insignificant next to the DALY/$ gain from deaths averted (in the range of 10^-1).
A second issue is that using productivity loss as a metric will bias interventions towards people with higher potential incomes, which is the opposite of most people’s instincts.
None of these measurements met my goals of being easy to measure and capturing the entire impact of mental illness. This is not surprising, since even the impacts of physical ailments are hard to measure. The only clear conclusion is that QALYs are better than DALYs for any purpose I can think of. Of the options available, death and financial cost are the most objective, easiest to measure, and easiest to compare to other ailments, but lose a lot of data around suffering. QALYs capture that data, but are still of questionable suitability for comparing to other ailments.