This is a cross-post from my Substack (original post here).
Note: At the Effective Altruism Global: San Francisco conference in 2017, Prof. Will MacAskill implored the audience to “keep EA weird”. As the EA movement grows, it’s important to keep EA’s original spirit of exploration alive. To help do that, I’m planning to write several articles on potential new — and weird — cause areas.
Effective altruists want to figure out how to do the most good per dollar spent. “Good” is often cached out in terms of deaths prevented or quality-adjusted life years saved (QALYs). QALYs attempt to adjust life-years for different states of health. However, the very method of QALY calculation, which typically involves surveys asking about trade-offs, might have some blind spots. For instance, if a disease state is rare, than most likely the requisite survey data for it has never been collected. The surveys themselves may have blind spots too. Consider these points from Andrés Gómez Emilsson (emphasis mine):
“Someone described the experience of having a kidney stone as ‘indistinguishable from being stabbed with a white-hot-glowing knife that's twisted into your insides non-stop for hours’. It’s likely that the reason why we do not hear about this is because (1) trauma often leads to suppressed memories, (2) people don’t like sharing their most vulnerable moments, and (3) memory is state-dependent (you cannot easily recall the pain of kidney stones .. you’ve lost a tether/handle/trigger for it, as it is an alien state-space on a wholly different scale of intensity than everyday life).”
As Daniel Kahneman describes in his book Thinking Fast and Slow, the remembering self is different than the experiencing self. People have trouble describing and conceptualizing extreme events, either positive or negative. People also don’t like thinking about extreme negative events generally, whether they experienced them or others did. I personally sometimes notice my brain flinching away when thinking about kidney stone pain, even though I haven’t experienced it myself.
In the first part of this post I’ll go over the evidence for extreme pain events. Then, I’ll focus on kidney stones. The main reason for focusing on kidney stone pain is that over the past two years I’ve worked off-an-on on automated deep learning based software for detecting and measuring kidney stones in CT scans (see my paper in Medical Physics). So I have some expertise on the subject. Currently I am working with a radiologist at Massachusetts General Hopsital who is an expert on stone disease, Prof. Avinash Kambadakone.
Background - suffering focused ethics
“In my opinion human suffering makes a direct moral appeal, namely, the appeal for help, while there is no similar call to increase the happiness of a man who is doing well anyway.”
“Instead of the greatest happiness for the greatest number, one should demand, more modestly, the least amount of avoidable suffering for all.”— Karl Popper, The Open Society and Its Enemies (1945)
The idea that we should focus on eliminating suffering over increasing pleasure is intuitive to many people. See this recent Twitter poll from Robin Hanson:
So, I don’t think I need to spend much time here convincing people that reducing suffering should take precedent over increasing happiness. Note what I have in mind here is what is called “weakly-negative utilitarianism” which is quite different than pure negative utilitarianism, which focuses only on eliminating suffering. Readers interested in diving further into these topics should check out Lukas Gloor’s essay “The Case for Suffering-Focused Ethics”.
Background - long-tailed distributions of pleasure and pain
“Finding scalable treatments for migraines, kidney stones, childbirth, cluster headaches, CRPS, and fibromyalgia may be extremely high-impact.” — Andrés Gómez Emilsson
A typical pain scale used by doctors. The patient is asked to name a number or point on the scale.
Studying pain is notoriously difficult, and trying to compare pain qualia (experiences) is even more so. Most commonly, a doctor may ask a patient to rate their pain on a ten point scale like the one shown above. But is the difference going from level four pain to level five the same as the difference between level nine and level ten? Furthermore, what determines the bounds of the scale? We all have a clear idea what 0/10 pain is from experience, but what is 10/10 pain? Clearly, “worst pain possible” is going to require some imagination, and different people will have different conceptions of what that means. Anecdotal evidence suggests people often report experiencing “radically new” levels of pain beyond what they formerly thought was possible. So, the 10 point scale will have a lot of variance in it across people since they will be different conceptions for what 10/10 pain is. There are probably other sources of variance too— I am not aware of this variance being studied directly (ie with electric shocks).
In his 2019 EA Forum post “Logarithmic Scales of Pleasure and Pain: Rating, Ranking, and Comparing Peak Experiences Suggest the Existence of Long Tails for Bliss and Suffering”, Andrés Gómez Emilsson argues that there exist tail experiences in both pain and pleasure. Extreme pain events, although rare, could play a rather large role in the utilitarian calculation effective altruists attempt to carry out. The existence of tail experiences also makes binning pain experiences into 10 buckets with 10 being the worst is a bit wonky. If tail experiences are 10x or 100x more painful than common forms of pain, then most pain should concentrated between 0-1 on the 10 point scale — but that’s not how the scale is used.
The most convincing data to support the idea of tail experiences in pain come from preliminary survey data that Andrés presents. The dataset is very small, but a pretty clear finding is that when people think about their worst experience, many describe it as significantly worse than the second worst. Here’s the data:
This clearly looks like a heavy-tailed distribution. It would be very instructive to do more surveys like this, in particular with more people who have experienced kidney stones, cluster headaches, and other ailments. This seems like very low-hanging fruit for EA.
The three medical issues that came up highest for pain in Andrés’ survey were kidney stones, cluster headaches, and childbirth. Among the respondents, 9/93 (9.7%) mentioned kidney stones. This data is roughly in line with the life-time risk for getting a symptomatic kidney stone and suggests that among those who are unfortunate enough to have a kidney stone, for most of them it will result in one of the most painful experiences in their life. Cluster headaches have a 100x lower prevalence than kidney stones (~1/1000 instead of ~1/10) but among suffers pain from cluster headaches occurs much more frequently on average than kidney stones, possibly 100x more so. So integrating pain over time, it’s possible cluster headaches may be a bigger problem.
Here’s some other evidence that Andrés presents for the existence of tail experiences:
- The firing activity of groups of neurons follows a lognormal distribution, a type of distribution that contains a long tail of rare extreme events. The intensity of the sensory stimuli is communicated via the firing rate of the afferent nerves. It seems parsimonious to assume that internally in the brain encodes stimuli intensity via firing rate (although there are other plausible scenarios, such as having different neurons or groups of neurons represent different levels of stimuli intensity (population coding)).
- The Schmidt pain index (for insect bites), the Scoville scale (for hot peppers), and the KIP scale (for cluster headaches) have all been described by their creators and users as being exponential in nature.
- Personal reports from drug users. People who take psychedelics are often surprised when taking a higher dose or a different substance dramatically increases the intensity of the experience beyond what they were expecting.
The burden of stone disease is rising
The latest data indicates that in the United States people have a lifetime risk between 15-20% of having a urinary tract stone. These numbers have been increasing over the years, so much so that prevalence studies from more than a decade ago are out of date. Many factors might explain these increases. Some notable factors that increase risk are salt consumption, dehydration, the consumption of animal products, and high protein diets. Kidney stones also have a high rate of re-occurrence. Over half of stone-formers will have another stone within five years.
A 2005 study estimated that the total direct and indirect costs of kidney stone disease were approximately $4.5 billion in 2000 in the United States. The same authors estimate that a 75% effective intervention that that costs less than $300 per patient per year would be cost effective at reducing health care expenditures. They note that a shift away from expensive medications towards low cost treatment modalities such as increased water intake and lemon juice could increase cost effectiveness.
Technologies for screening exist
Currently, people usually don’t learn they have kidney stones until they experience pain, typically flank pain on their side. This leads them to consult a doctor or go to the emergency room, where diagnostic scanning will be done to confirm the presence of kidney stones and decide on the best means of treatment. Kidney stones are the 9th most common cause for emergency room visits. From a suffering reduction approach, we need a way to screen patients before the pain starts to flare up. Another point is that the potential for painful obstruction etc is proportional to the max diameter of the stone. It may even be that pain of passing a stone goes up exponentially with the stone diameter. If we think of the stones as spherical (actually a crude approximation but illustrative here) then the amount of material on the surface of the stone goes as the diameter^2. So reducing diameter is easier when the stones are small. So early treatments should yield more bang per buck.
Several imaging techniques exist to screen for kidney stones - kidney-ureter-bladder (“KUB”) X-ray radiographs, ultrasound, MRI, and CT (colloquially called “CAT scanning”). A nice summary of the pros and cons of these different techniques is shown in this table:
The technique with the highest sensitivity is CT followed by MRI, which doesn’t involve any radiation dose but is roughly three times as expensive. Only CT is useful for detecting small stones (< 4 mm) and uric acid stones (which comprise about 15% of stones and are transparent on radiographs).
For screening purposes, CT and ultrasound are only two viable options. The main issue with CT is radiation doses, which increases cancer risk to a non-trivial degree. However, low dose screening protocols have been developed and validated (I won’t bother my readers with technical details here, although I personally find them fascinating). The bottom line is that screening protocols now exist where the radiation exposure is on par with a KUB radiograph (about 1.5 mSv). Adoption of these protocols has been slow, however. The effect of low dose radiation on cancer risk is somewhat controversial but 1.5 mSv given to people with an age around 50 would increase their lifetime cancer risk by about 1 in 10,000. In other words, you would get one new cancer for about every 10,000 scans performed. This is pretty non-trivial and further Fermi estimates are needed here to understand benefits vs risks.
How many kidney stones might we detect early via CT scanning? Potential kidney donors typically undergo a CT scan to screen for potential counter-indications. Studies show that about 7.5% of potential donors have previously undiscovered kidney stones. Some transplant centers allow donors with kidney stones to donate, others do not. Many people get CT colonographies around age 60. When a CT colonography is performed, the radiologist is responsible not just for analyzing the colon but also for doing due diligence on the rest of the scan. In my work at NIH, I looked at 5,381 CT colonography scans that had reports available. Among those, 755 had keywords suggesting the that a kidney stone was found (15 %). These scans were all from ~10 years ago, so current numbers might be a bit higher.
Unfortunately, my impression is that tiny stones may be downplayed in reports by radiologists due to concerns about ‘incedenalomas’ from CT scanning consuming too much of limited healthcare money and resources. Another point is that doctors often don’t recommend treatments for tiny stones because they assume “they will likely pass on their own”. From what I have been told, passing even tiny stones can be excruciatingly painful. To doctors, tiny stones are “not clinically significant”. Yet from a suffering reduction perspective we should consider that if we can treat tiny stones and can dissolve them in place then the patient can avoid the pain of having to pass the stone.
How do stones form?
Two things are required for stone formation - supersaturation of some dissolved substance and a nucleation site. For such a pervasive condition, it’s remarkable how recent most of the science is on it. Stone nucleation dynamics, genetics (stone disease is 4-50% heritable) and the role of bacteria (discussed below) all remain poorly understood. Calcium salt deposits on the tip of the renal papilla are believed to serve as nucleation sites for calcium-based stone formation. These deposits, called Randall's plaques, result in slightly higher X-ray attenuation on CT, so theoretically they can be detected using CT techniques.
However, this is currently very challenging. A 2013 study found that studied this concluded that current CT technology cannot do it. I suspect that if we used the high resolution CT protocols for cardiac CT to analyze a single kidney instead of the heart, detecting Randall’s plaques may be quite feasible. This could be a future area for research.
Another interesting thing is that Oxalobacter formigenes, a bacteria discovered in 1985, appears to play an important role in breaking down oxalate. There is some preliminary work showing a correlation between the presence of this bacteria and decreased risk for calcium oxalate stones. Antibiotic use is also correlated with higher risk for kidney stone disease, and antibiotics may kill this bacteria. In the future, bacterial treatments may be possible, but the science here is still in its infancy — again, more research needed!
So you have a tiny kidney stone.. now what?
There are not a lot of large high quality studies on interventions for preventing kidney stones or kidney stone growth. This points to a larger bias in medicine towards treating disease rather than preventing it in the first place. Still, it seems fairly well established that juices that are high in citric acid help prevent stone formation. Fresh squeezed lemon and lime juice have the highest concentration of citric acid, but people do not typically drink lemon and lime juice straight from the fruit — it is typically diluted first. A meta-review concluded that orange juice is best for preventing kidney stones, followed by lemon juice. On the flip side, Vitamin C, also found in fruits, increases oxalate levels, which contributes to the chance of getting calcium-oxalate stones, the most common type of stone. Other high-oxalate foods include spinach, almonds, and cocoa. Instead of sugary-juices, it is much healthier to consume sugar-free alternatives or just take potassium citrate in tablet form, which is very cheap. Lowering salt consumption and raising water intake are two other relatively easy things to do. There also exist prescription drugs like thiazide diueretics, but they don’t seem to be radically better than the simpler treatments already mentioned and may incur uncomfortable side effects. Uric acid stones, which comprise 10-15% of stones and can be distinguished somewhat well on CT scans, are known to be very easy to treat with drugs.
Finally, I don’t want to steal too much of his thunder here, but Andrés Goméz Emilsson has been doing some interesting work investigating the herb Phyllanthus niruri which is known as Chanca Piedra (“stone breaker”). See this podcast clip:
Andrés says he plans to publish a report on this later this year.
The main take-away here is that more research is needed. Most research on kidney stone disease has focused on treatments, leading to sophisticated and well-validated treatment techniques such as percutaneous nephrolithotomy, using catheters to blast the stones with lasers, placement of stents to help kidney stones to pass, and shock wave lithotripsy. Much less research has gone into understanding stone formation and growth dynamics, risk factors, early detection methods, and preventative therapy.
Here are some promising areas for future research:
- Do more surveys on kidney stone pain and frequency. (Anectdotal reports suggest a small minority of people with kidney stones pass a large number of stones with frequencies as high as one per week.)
- Development of better neuroimaging biomarkers for pain (fMRI, EEG, fNIRS), to move away from reliance on self-reports and bolster the case for tail pain states. (Side note : the Black Mirror episode Black Museum explores a doctor who uses an EEG device to feel the pain of patients. Hopefully this won’t be necessary to more rigorously compare pain states).
- Better scanning tech for screening (lower dose CT, dual-energy CT for stone type, lower-cost MRI, AI+ultrasound).
- More studies to validate and compare low-cost treatments for both prevention and removal of small stones (increased water intake, potasium citrate, Chanca Piedra). Ideally RCTs.
- Cost effectiveness calculation / Fermi estimation of screening + treatment.
I want to thank Andrés Goméz Emilsson for a very helpful chat and for reading an early draft of this post.
Appendix - Weber’s law
How exactly do we perceive pain? A 1947 study using a 500 Watt heat lamp is somewhat instructive here. Using a lens, they focused a beam of heat onto their subjects foreheads. They wanted to understand if the perception of pain scaled linearly with the intensity of the light. However, they couldn’t just ask the subjects directly how much pain they were experiencing, because there is no standard reference scale for pain. The researchers carried out their experiment using the method of “just noticeable differences”. Essentially, this method involves increasing the stimuli until the subject reports a difference. They found that the subject’s ability to distinguish two different stimuli was inverse proportional to the strength of the stimulus, in line with an idea called Weber’s law. Interestingly, the researchers hit a limit around when the intensity of the heat source was above 500 millicalories / second / cm^2 — subjects could not distinguish intensities higher than this level. It appears the researchers maxed out the firing rate of the pain receptors (nociceptors) in the skin.
Some people think that Weber’s law goes against the idea that there can exist extreme pain events. The most straightforward interpretation of Weber’s law is that the brain applies something like a log() function to the input its receiving and it is this log() of the intensity that is rendered into our subjective experience (or qualia). Andres argues against this by noting an issue with the just-noticeable-differences method - it may be that the the intensity is rendered linearly, but then our ability to discriminate between those two sensory experiences goes logrithmically. This sort of makes an intuitive sense - a very strong sensory experience may overwhelm us and make it harder for us to do a fine-grained discrimination. The example Andres gave to me once is that if you are experiencing two large fiery hot flaming suns in your head it may be much harder to distinguish a 10% difference between them compared to two tiny softly glowing orbs.
However, I don’t find Andres’ argument that convincing. It seems more parsimonious that the that the conscious rendering of a stimuli would be proportional to the rate of firing of some neuron(s), and that our ability to distinguish between two firing rates should also be directly proportional to the rate (for instance being able to distinguish differences of +/-1%).
Secondly, it’s pretty well-accepted at this point that the brain does apply something like a log() function to render some inputs, and there are other methods beyond just-noticeable differences that back this up. This is most obvious for brightness. The sun is 398,110 times brighter than a full moon, but it doesn’t feel that much brighter. This is partially because our eye’s pupils dilate at night to let more light in and other adaptation factors, but the biggest reason is that our brain applies something like a log() function to the input.
The bigger issue with Weber’s law is that it doesn’t always apply. In fact, for electric shock the situation is the the reverse! (see figure blow) Could kidney stones be like electric shocks? Could the pain of a stone grow exponentially with the diameter of a stone? We just don’t really know.
For a deep dive into medical imaging and kidney stone disease see Prof. Kambadakone’s 2010 paper in Radiographics, the leading pedagogical journal for radiology.
Hill, Alexander J., et al. “Incidence of Kidney Stones in the United States: The Continuous National Health and Nutrition Examination Survey.” Journal of Urology, 207 (4), Apr. 2022, pp. 851–56. This papers study used 2007-2010 NHANES data and estimated that “19% of men and 9% of women will be diagnosed with a kidney stone by the age of 70”.
Saigal, Christopher S., et al. “Direct and Indirect Costs of Nephrolithiasis in an Employed Population: Opportunity for Disease Management?” Kidney International, vol. 68, no. 4, Oct. 2005, pp. 1808–14.
Kurtz, Michael P., and Brian H. Eisner. “Dietary Therapy for Patients with Hypocitraturic Nephrolithiasis.” Nature Reviews Urology, 8 (3), Mar. 2011, pp. 146–52.
Regarding the determination of uric acid vs non uric acid stones via CT - most studies on the accuracy here use stones inserted into “phantoms” that mimic the human body. Roughly speaking a trained clinician using CT can distinguish uric acid stones from non-uric acid with an accuracy around 40-60%. Distinguishing uric acid stones is important because even large uric acid stones can usually be treated with drugs (oral dissolution therapy) rather than requiring surgery or other invasive methods. Dual-energy CT (DECT) technology can distinguish the two types with near 100% accuracy. About 5-10% of hospitals now have so-called “premium” scanners which have dual energy capability, but DECT techniques have not diffused into clinical practice very much.
Planz, Virginia B., et al. “Ultra-Low-Dose Limited Renal CT for Volumetric Stone Surveillance: Advantages over Standard Unenhanced CT.” Abdominal Radiology, 44 (1), Jan. 2019, pp. 227–33.
Kim, Irene K., et al. “Incidental Kidney Stones: A Single Center Experience with Kidney Donor Selection: Kidney Stones and Donor Selection.” Clinical Transplantation, 26 (4), July 2012, pp. 558–63.
Elton, D. C., et al. “A deep learning system for automated kidney stone detection and volumetric segmentation on non-contrast CT scans.” Medical Physics. 49 (4) pgs 2545-2554.
Uribarri, Jaime. “The First Kidney Stone.” Annals of Internal Medicine, vol. 111, no. 12, Dec. 1989, p. 1006.