39 karmaJoined Feb 2018


I'll use my time on the soapbox to make an ancillary point: that EAs and especially rationalists are on the whole quite naive about genetics, and in particular are too enthusiastic about the use of genetics knowledge to facilitate biological enhancements in some way or other. To be clear, I'm actually more bullish in principle than many of my colleagues regarding more widespread use of, say, polygenic scores - for reference, I'm probably a bit less positive on the idea than, say, Shai Carmi. But rationalists and some EAs seem to think that genetics is both a much more important determinant of traits than is actually true, and also that modifying genetics or applying genetics knowledge is much more tractable than it really is. Like, it's not reflexive Luddism that's preventing you from being able to improve your offspring's IQ by +10 over baseline, it's primarily the limits of our abilities at present (and even what's feasible in ten years) to link genetic changes to traits, or to accurately perform embryo selection and gene editing.

tl;dr I work in genetics, was initially very sceptical that were any genetic variants that really did cause shorter sleep, did some research which made me slightly more sceptical still, and then found a recent study which I think serves as an additional very, very strong update that the variants mentioned in this post have no such effect.

Long version: I have worked for 5 years at a company dedicated to finding associations between genetic variants and disease. I have a bit-part technician's role in our primary workflow for discovering these associations, and I have done some work alongside our clinical sequencing group (a bunch of physician-scientists contracted by clinicians around the world to try to identify genetic causes for their patient's conditions), but I will freely admit that most of the work I do is deeply non-clinical stuff like analysing DNA from the skulls of  9th-century settlers of Iceland to find out where they were born. The upshot is that I'm claiming some expertise here, but not on the level of "I literally did my thesis on this" or anything.

I came across this cause area via a tweet and was initially deeply sceptical. It really smacked of the noughties-era candidate gene studies that turned out to have very high false positive rates, as Scott Alexander described in withering detail. And when I looked into it, it transpired that the DEC2 P384R mutation was indeed identified through a candidate gene approach back in 2009, during the last hoorah of the candidate gene era, just as GWAS studies were beginning to take over.

I'll try to briefly convey the fundamental scientific principle behind my scepticism. Most genetic variants that you find in human populations are extremely rare, but we also share DNA with our relatives, which means that there are a huge number of "mutations" which are quite common in any one family but rare outside that family. Even if you circle all the people in one family who share some trait, like short sleep duration, and look at what "rare" variants they share, depending on the frequency cut-off you use to define "rare", that is still potentially hundreds of thousands variants across the genome. Even if you assume that the trait in question is largely caused genetically, how on Earth do you narrow this down to a single causative variant? Well, there are many populous categories of variation that you can fairly safely assume don't have any physiological impact, and if you're feeling brave you can even decide to just look at the variants in some small subset of genes that you bet are more likely to have a role in determining your trait, but even after those extra assumptions, you still have many candidate variants. At least, you should have many candidate variants if you have done things correctly - however, in the past, researchers often bee-lined to a small number of genes that they assumed ex ante were related to the trait and only looked for variants there, based on an even less complete understanding of biological pathways than we have now, and with smaller sample sizes, and less accurate sequencing methods. 

To be clear, some early attempts to link traits to genetic variants were successful, especially when you were guided by existing knowledge of biochemistry or if previous attempts had already narrowed down the search region a lot. For instance, the gene whose variants are responsible for ~all cases of achondroplasia, a kind of dwarfism, was identified in 1994 after previous work had already narrowed the search down to a small genomic region shared by all affected members within and across 18 multigenerational affected families. But according to the second link, at least 5 other genes had previously been suggested as causing achondroplasia and failed to pass scrutiny.

This indicates the kind of analysis needed to perform a successful "linkage analysis" plus candidate gene study. I read the papers in which the short-sleep variants were presented and noted that they were all nominated based on very small numbers of alleged carriers.[1] 

I also wanted to see if there were any independent evidence in favour of the idea short-sleep variants. I did about 30 mins of research into this. First, I looked for any associations between variants in DEC2 (aka BHLHE41) with any trait in the GWAS Catalog, which compiles results from many genetic association studies. These studies generally look at more common variants with generally small effects, but it's a fairly safe assumption that a gene which carries variants with large effects on a trait will also carry variants that have small effects on that trait. For instance, the gene with large-effect variants responsible for achondroplasia also has other variants associating with smaller changes in height (which may or may not be independent of each other). For DEC2, however, I saw no plausible sleep-related hits for any variant in or around this gene. And it was the same story for for other three genes you mentioned, ADRB1, NPSR1, and GRM1. Conversely, other genes that did seem to be fairly robustly associated with sleep duration across studies include PAX8 and MEIS1. Funnily enough, the latter gene was even more strongly associated with restless leg syndrome, which may well represent the mechanism by variants in this gene affect sleep! I found the results of my  GWAS Catalog research to be a reasonably strong update against the idea of these variants being actually associated with short sleep. I also looked at a crowdsourced database of potential clinically important variants called ClinVar, which are generally much rarer, and found that DEC2/BHLHE41 variants had indeed been nominated as causing short sleep - possibly based on Fu's research? - but none seemed to have support from multiple submitters.

I was tempted to put a pin in this and wait until I got to work on Monday to check for associations myself between variants in these genes and sleep traits in the most cutting-edge datasets available. The cohorts I have access to are large enough that I would probably find quite a few carriers of the variants in question. However, I then came across this study published in September 2022, The impact of Mendelian sleep and circadian genetic variants in a population setting. I will let the paper speak for itself:

Rare variants in ten genes have been reported to cause Mendelian sleep conditions characterised by extreme sleep duration or timing. These include familial natural short sleep (ADRB1, DEC2/BHLHE41, GRM1 and NPSR1)... We aimed to determine the effects of these variants on sleep traits... using 191,929 individuals with data on sleep and whole-exome or genome-sequence data from 4 population-based studies: UK Biobank [British], FINRISK [Finnish], Health-2000-2001 [also Finnish], and the Multi-Ethnic Study of Atherosclerosis (MESA) [US].

We identified 149 unrelated individuals from the UK Biobank with self-reported measures of sleep duration and carrying a previously reported pathogenic variant for natural short sleep in ADRB1 (A817V, n = 69), DEC2/BHLHE41 (P384R, n = 10), or GRM1 (S458A, n = 67; A889T, n = 3). We found no evidence that these individuals have short sleep durations in the UK Biobank...

The null effect of the S458A variant in GRM1 was also observed in FINRISK/Health 2000–2011 where <5 S458A carriers were identified and had no statistically significant difference compared to non-carriers in average sleep duration...

We confirmed the lack of association between previously identified pathogenic variants and sleep duration using accelerometer estimates of sleep in a subset of 34,226 individuals from the UK Biobank...

They also perform a "burden" test in which they gather up all the people that seem to have any high-impact mutation in a particular gene and see if those carriers as a group are different to everyone else. Again, they find no evidence that big changes to the genes carrying alleged "short-sleep" variants actually associate with changes in sleep length. Note that their tests seem to have some specificity, in that they don't reject every one of the claims of association analysed. They also look at variants that were previously claimed to associate with a different sleep-related trait - going to bed earlier or later than other people - and do find some degree of support for a few of those genes/variants. 

To be clear, this paper really represents the test you'd want to do to settle basically once and for all whether these supposed short-sleep variants affect sleep. They find usually dozens of other carriers for these variants, and the large posited effect sizes of ~1 full standard deviation or more, which should be perfectly detectable, just don't appear.  This study seems to be very strong evidence that the variants mentioned in the above post are not actually associated with shorter sleep. I suppose the negative evidence is a bit less strong for a few specific variants that were found only rarely in these comparison datasets, but the burden tests and my GWAS Catalog search still point towards those genes in toto not having a strong role. My steelman case for the opposing side would involve some technical arguments. For instance, maybe the variants like DEC2 P384R actually tag some haplotype in the original families which also carries some other marker which really is causative, but again the burden tests and GWAS Catalog points against this. I also suspect that the burden test might be underpowered, but I can't evaluate this easily. I think the NPSR1 variant is the most likely to be genuinely associated with sleep. It's so rare that it wasn't found in the 2022 study and so couldn't be analysed, and the gene is reported to have a few weak associations with some behavioural traits GWAS Catalog, but I still think it's very likely to not replicate independently.

I will end quite forcefully and say that anybody who does a clear-eyed assessment of the current state of evidence should realise that it is very unlikely that any of the variants mentioned in the OP genuinely cause much shorter sleep in humans, and extremely unlikely that more than one of them does. Fu should no longer promote the idea unless they find some new, overwhelmingly strong supporting evidence.

  1. ^

    DEC2: 2 affected and 5 non-affected in one family.

    NPSR1: 2 affected and 1 non-affected in one family, additional knowledge that the candidate variant is very rare outside the family.

    ADRB1: 5 carriers and 3 non-affected in one family analysed by linkage analysis, 4 of those had part of their genome sequenced. An individual who "should" have carried the variant according to linkage analysis actually slept a normal amount.

    GRM1: 2 different variants in 2 different families, one with 3 affected and 4 non-affected, the other with 1 (arguably 2?) affected and 1 non-affected. 4 affected across the families had part of their genome sequenced.

    You are really not narrowing down the massive genomic search very much when you look at such small numbers of people. Some of these variants aren't even that rare in the general population - 1 in 1000 people or so being carriers. At frequencies and posited effect sizes this high, the finding would be pretty easy to replicate elsewhere, and as far as I can see they haven't.