Jeff Kaufman

Software Engineer @ Nucleic Acid Observatory
13729 karmaJoined Aug 2014Working (15+ years)Somerville, MA, USA



Software engineer in Boston, parent, musician. Switched from earning to give to direct work in pandemic mitigation. Married to Julia Wise. Speaking for myself unless I say otherwise.

Full list of EA posts:


This seems like the kind of place GPT would make up things when the answer wasn't on the web, and I would basically ignore this.

If someone thinks this is worth paying attention to, let me know and I'll spot-check some of the rows?

The original version of this post had results from a simulation where the key results were off by a factor of 100. See the update at the top of the post for more.

Compensating the person sufficiently that they're willing to do the work (because, ex, they don't enjoy it, or it displaces other work they see as much more valuable).

I read this as coming from a culture of listing "happy/cheerful prices".

Expanded this into Retirement Accounts and Short Timelines.

(And both derive from a draft blog post I wrote in May but didn't end up publishing because it had a modeling component that was only half baked.  It would be great to be able to connect the probability that you'll need money in different scenarios to the penalties for early withdrawal but it's all kind of tricky.)

I know this is a tangent, but I think at least in the US putting money in tax-advantaged retirement accounts still usually makes sense. I'll take the Roth 401k case, since it's the easiest to argue for:

  • In worlds that somehow end up as a vague continuation of the status quo, you'll want to have money at retirement.

  • The money is less locked up then it sounds:

    • If you want to withdraw just the contributions (and so untaxed) you can roll a Roth 401k over to a Roth IRA, if your employer allows this.

    • Five years from when you open your account there are options for taking money out tax-free (including gains) even if you're not 59.5 yet. If you think you might want to live off your savings while you do something uncompensated you can take "substantially equal periodic payments", but there are also ones for various kinds of hardship.

    • In an emergency you can take the money out now and owe taxes later.

  • The money is more protected than if you save it normally:

    • The first ~1.5M in your retirement account is protected from bankruptcy.

    • Means testing generally ignores retirement accounts but does include conventional ones. College financial aid that uses the more thorough CSS PROFILE is a partial exception here: the college does still look at the information, but often ignores them and is less likely to ask for them than if the money is in a conventional account.

    • If you lose a lawsuit, your 401k (but not an IRA) is protected from judgement creditors.

    • In future cases where people are trying to come up with rules about what counts as money you have right now, they're much less likely to count retirement assets than regular ones, which is usually what you want.

if you use a high-traffic commuter train station or supermarket I would guess you get a fairly broad cross-section of the city

Definitely! Right after writing to you I started thinking about this, estimating costs, and talking to coworkers; sorry for not posting back! I do think something along these lines could work well.

These numbers are maybe optimistic, but not ridiculously so.

My main update since then is that if you do it at a transit station you probably need to compensate people, but also that a small amount of compensation doesn't sink this. Giving people $5 or a candy bar for a swab is possible, and if a team of two people at a busy transit station can get 50-200 swabs in an hour that's your biggest sample acquisition cost. I still think $1k is practical for the sequencing.

I'm trying to come up with examples of people doing something similar, which we'd want for presenting this to the IRB. Two examples so far:

Do you know of anything else that feels similar to this? People in public areas collecting biological samples from volunteers (perhaps lightly compensated).

Lots of great questions!

the SIREN 2.0 study, running this winter, will generate some more data to answer this question.

Thanks for pointing this out; I hadn't seen it and it's super relevant. I don't see what sample type they're using in the press release, but any kind of ongoing metagenomics to look at respiratory viruses is great!

how do you relate relative abundance to detection probability? I would have thought the total number of reads of the pathogen of interest also matters.

It depends on your detection method, but modeling it as needing some number of cumulative reads hitting the pathogen is a good first approximation.

If you think it would take N reads of the pathogen to flag it then if you know RA(1%) and the exponential growth rate you can make a back of the envelope estimate of how much sequencing you'd need on an ongoing basis to flag it before X% of people had ever been sick. For example, if you need 100 reads to flag, it doubles weekly, and RAi(1%) is 1e-7 then to flag at a cumulative incidence of 1% (and current weekly incidence of 0.5%) you'd need 100/1e-7 = 1e9 reads a week.

(I chose 1% cumulative incidence and weekly doubling to make the mental math easier. At 1% CI half the people got sick this week and half in previous weeks, and the cumulative infection rate across all past sequencing should sum to 1%, so we can use RAi(1%) directly. Though I might have messed this up since I'm doing it in my head lying in bed.)

if you tested the entire population you would have some reads on every pathogen even if the relative abundance of some pathogens is very low.

If you collected a large enough sample volume and sequenced deeply though, yes.

Relatedly, does the cost of the sequencing scale roughly linearly with the relative abundance required? That is, if your 40,000x figure is correct, would that imply swabbing is ~40,000x cheaper than wastewater?

It doesn't, for three reasons:

  • Sequencing in bulk is a lot cheaper per read. You might pay $13k for 10B read pairs, or $1k for 100M. But that's just ~10x.

  • Some components (lab time, kits) vary in proportion to the number of samples and don't go up much as your samples are bigger.

  • It's only your sequencing costs that vary with relative abundance, and while with wastewater I expect the cost of sequencing to dominate that's not the case for any other sample type I can think of (maybe air?) If you're sampling from individuals the cost of getting the samples is likely quite high (we were recently quoted $80/person from a contractor, and while I think we can do better if you want 1k people per pooled sample it's almost certainly more expensive than the sequencing charge).

Why does the number of swabs affect the relative abundance? If you double the number of swabs, I would expect that both the total reads and the number of SARS-COV-2 reads double, hence holding the relative abundance constant.

Some people have vastly higher viral loads than others, and the relative abundance you see for a pool depends on whether you get some of these people. Your intuition would be correct for pools large enough that this variation was no longer relevant.

I don't see it being contained in hard-to-target populations.

Sorry, I was unclear! The easiest way to collect a pooled sample is the walk around some building and sample everyone. This gets you a big sample pretty cheaply, but it's not a great one if you want to understand the containing city because it's likely that many people in the building will get sick on a similar timeframe. The sample members are too correlated in their exposure.

In your UK example, I'm guessing you could sample some office buildings of 1k people and find 0 cases and others and find 200 cases.

To avoid this you need broader sample collection, but that's logistically more difficult and so more expensive.

Airport arrivals would be great, though that's a difficult setting to work in.

I'm sceptical this is bridging a 40,000x gap (maybe 40,000x isn't the relevant benchmark here - see comments previously).

I'm also skeptical! I think sampling from individuals is extremely promising. It seems like you ought to be able to get down to more like $2/person in which case a pool of 1k costs you $2k in collection. Then add in $1k for sequencing and you're still well above wastewater. But my initial attempts to partner with people already doing sampling haven't turned up good leads.

actions are never taken just for the sake of participating in the movement, but they are always tied to the end goal of having an impact

This seems a bit aspirational to me. Participating in DEAM, attending a secular solstice, sharing vegan recipes, or just hanging out with an EA friend group seem primarily about participating in the movement. Yes, you can chain these back to impact, but I'd be surprised if in the movements they're comparing EA to you couldn't similarly construct such chains.

I haven't voted either way on your comments, but in general making several comments as a way to run a poll is not a good idea unless you're pretty confident others will find your poll interesting. And then if someone does/doesn't think the poll belongs here they upvote/downvote each question in the poll, magnifying the karma impact.

Then I expect this comment was downvoted because (a) it's complaining about downvotes and (b) doesn't engage with what people should do if they don't think your poll comments belong on the Forum.

Thanks! I'm most interested in viral load in the sense of the relative abundance you get with untargeted shotgun sequencing (since you need sequencing (or something similarly general) to detect novel threats and/or avoid having a trivially-bypassable detection system) but there's not much literature on this.

Load more