[UPDATED June 30, 10:00 pm EDT to reflect substantial improvements to the statistical approach and corresponding changes to the results]

I spent some time this weekend looking into the impact of COVID-19 on the 2020 U.S. presidential election, and I figured I might share some preliminary analysis here. I used data from The COVID Tracking Project and polling compiled by FiveThirtyEight to assemble a time series of Biden's support head-to-head against Trump in 41 states (all those with adequate polling), along with corresponding COVID-19 data. I then implemented a collection of panel models in R evaluating the relationship between Biden's performance against Trump in state polling and the severity of the pandemic in each state. My data, code, and regression output are on GitHub, and I've included some interpretive commentary below.

Interpretation of Results

When appropriately controlling for state-level fixed effects, time fixed effects, and heteroscedasticity, total COVID-19 cases and deaths are not significantly associated with support for Biden, nor are the number of ongoing COVID-19 hospitalizations (see Models A, B, and C in the 6.30 R script). However, controlling for state-level fixed effects, greater daily increases in cases and in deaths are significantly associated with higher support for Biden (see Models 2 and 3 in Table I above). Breusch-Pagan tests indicate that we must also control for heteroscedasticity in those models, and when we do so, the results remain significant (see Models 2 and 3 in Table II above), though only at a 90 percent confidence level.

These results do come with a caveat. While Lagrange FF multiplier tests indicate that there is no need to control for time fixed effects in Table I models 2 and 3, F-tests suggest the opposite conclusion. I lack the statistical acumen to know what to make of this, but it's worth noting because when you control for both time fixed effects and heteroscedasticity, the results cease to be statistically significant, even at a 90 percent confidence level.

Interestingly, the state-level fixed effects identified by Table I models 2 and 3 are strikingly powerful predictors of support for Biden everwhere except for Arkansas, Florida, Georgia, North Carolina, Nevada, Ohio, and Texas, all of which (except for Arkansas) are currently considered general election toss-ups by RealClearPolitics. This makes sense — in a society as riven by political polarization as ours, you wouldn't necessarily expect the impacts of the present pandemic to substantially shift political sympathies on the aggregate level in most states. The few exceptions where this seems more plausible would, of course, be swing states. In the case of Arkansas, the weak fixed effect identified by the models is likely attributable to the inadequacy of our data on the state.

A Hausman test indicates that when regressing the amount of COVID-19 tests performed in a state on the support for Biden there, a random effects model is more appropriate than a fixed effects model (because the state-level "fixed effects" identified are uncorrelated with the amount of tests performed in each state). Implementing this regression and controlling for heteroscedasticity yields the result featured under Model 1 in Table II above: statistically significant at a 99 percent confidence level.

This is striking, both for the significance of the relationship and for how counterintuitive the result of the Hausman test is. One would assume that the fixed effects identified by a model like this one would basically reflect a state's preexisting, "fundamental" partisan bent, and my prior had been that the more liberal a state was, the more testing it was doing. If that were true, one would expect the Hausman test to favor a fixed effects model over a random effects model. However, it turns out that my prior wasn't quite right. A simple OLS regression of states' post-2016 Cook Partisan Voting Indexes on the amount of testing they had done as of June 26 (controlling for COVID-19 deaths as of that date and population) reveals no statistically significant relationship between leftist politics and tests performed (see Model 1 in Table III), and this result persists even when the controls for population and deaths are eliminated (see Model 2 in Table III).

This is odd. Hausman tests on Models A, B, and C in the 6.30 R script favor fixed effects models over random effects models, indicating that state-level fixed effects (i.e. each state's underlying politics) are correlated with COVID-19 cases, hospitalizations, and deaths, but those same fixed effects are not correlated with COVID-19 tests. Moreover, when applying appropriate controls (e.g. for heteroscedasticity, time fixed effects, etc.), we find that while cases, hospitalizations, and deaths are not associated with support for Biden, testing is associated with support for Biden (basically the opposite of what I would have expected, under the circumstances). We can run a Breusch-Pagan Lagrange multiplier test on Table I's Model 1 just to confirm for sure that a random effects model is appropriate (as opposed to an OLS regression), and it is. At that point, we are left with the question of what those random effects are that are associated with support for Biden but not with COVID-19 testing, as well as it's corollary: Why aren't the fixed effects in Models A, B, and C associated with testing (given that they are associated with cases, hospitalizations, and deaths)? Without the answers to these questions, it's hard to know what to make of the robust association between total COVID-19 testing and support for Biden revealed by Table I's Model 1.

The puzzling nature of the Table I, Model 1 results might incline some to dismiss the regression as somehow erroneous. I think jumping to that conclusion would be ill-advised. Among other things that speak in its favor, the random effects identified by the model, however mysterious, are remarkably consistent with common intuitions about partisanship at the state level, even more so, in fact, than the fixed effects identified by Models 2 and 3 in Table I. Unlike those fixed effects models, Model 1's state-level random effects explain a considerable amount of Biden's support in Georgia and Texas. I consider this a virtue of the model because Georgia and Texas have not been considered swing states in any other recent U.S. presidential elections. They are typically quite safe for the Republican candidate. Furthermore, Model 1 identifies particularly weak random effects in a few swing states not picked up by Models 2 and 3 — notably, Wisconsin, Pennsylvania, and Arizona. Wisconsin and Pennsylvania are genuine swing states: They went blue in 2008 and 2012 before going red in 2016. Arizona has been more consistently red over the last 20 years, but the signs of changing tides there are clear and abundant. Most notably, the state elected Kyrsten Sinema, an openly bisexual, female Democrat, to the Senate in 2018 to fill the seat vacated by Jeff Flake, who is none of those things.

It's worth noting that there is an extent to which the above is really an oversimplification of "swinginess." As FiveThirtyEight explains, state-level opinion elasticity is not the same as being a swing state. While how close a state's elections are is determined by the proportion of Democrat relative to Republican voters in the state (with states closer to 50/50 obviously being "swingier" in this sense), the extent to which events out in the world lead to shifts in polling in a given state is determined largely by how many people in the state are uncommitted to a particular partisan camp. A state being full of such voters without strong partisan commitments might well express itself in close elections, but it also might not, and by the same token, another way a state might end up with close elections is by being 50 percent composed of die-hard, party-line Republicans and 50 percent composed of die-hard, party-line Democrats. We would not expect new developments in current events to particularly shift voter sentiment in such a state. As a result, FiveThirtyEight proposes the metric of state-level opinion elasticity—measured using the extent to which shifts in national polling correspond to shifts in state-level polling in each state—as an alternative concept of "swinginess" that is potentially more appropriate for analyses such as this one. This is important because FiveThirtyEight has found that a number of states where there are frequently close elections actually exhibit extremely low elasticity. The paradigm case of this is Georgia, which has an elasticity of 0.84 (meaning a one-point shift in national polling corresponds to a 0.84-point shift in Georgia polling).

On this basis, a better way of comparing the fixed effects identified by Table I models 2 and 3 with the random effects identified by Model 1 would be to compare the average elasticities (calculated by FiveThirtyEight) of the "swing states" identified by each model. In the case of Models 2 and 3, the average is an uninspiring 0.994, or weakly un-swing-y. In the case of Model 1, however, the average is 1.025: pretty swing-y. That's what happens when you swap out Georgia (0.84) for Arizona (1.07) and Wisconsin (1.06).

I have one outstanding technical question about Table I, Model 1. When controlling for heteroscedasticity in R's plm package, I understand that the "arellano" covariance estimator is generally preferred for data exhibiting both heteroscedasticity and serial correlation but is generally not preferred for random effects models (as opposed to fixed effects models). The "white" covariance estimators, on the other hand, are preferred for random effects models, though "white1" should not be used to control for heteroscedasticity in data that also exhibit serial correlation. A Breusch-Godfrey test indicates that Model 1 requires an estimator compatible with serial correlation, but it is of course a random effects model, not a fixed effects model. Would it be better to control for heteroscedasticity here with "white2" or with "arellano?" Ultimately, it doesn't matter much because both approaches yield a result that is statistically signicant at at least a 95 percent confidence level, but only the "white2" estimator is statistically significant at a 99 percent confidence level.

Thanks so much! I'm thrilled to hear you liked it. To be honest, my main reservation about doing anything non-anonymous with it is that I'm acutely aware of the difficulty of doing statistical analysis well and, more importantly, of being able to tell when you haven't done statistical analysis well. I worry that my intro-y, undergrad coursework in stats didn't give me the tools necessary to be able to pick up on the ways in which this might be wrong. That's part of why I thought posting it here as a shortform would be a good first step. In that spirit, if anyone sees anything here that looks wrong to them, please do let me know!

After taking a closer look at the actual stats, I agree this analysis seems really difficult to do well, and I don't put much weight on this particular set of tests. But your hypothesis is plausible and interesting, your data is strong, your regressions seem like the right general idea, and this seems proof of concept that this analysis could demonstrate a real effect. I'm also surprised that I can't find any statistical analysis of COVID and Biden support anywhere, even though it seems very doable and very interesting. If I were you and wanted to pursue this further, I would figure out the strongest case that there might be an effect to be found here, then bring it to some people who have the stats skills and public platform to find the effect and write about it.

Statistically, I think you have two interesting hypotheses, and I'm not sure how you should test them or what you should control for. (Background: I've done undergrad intro stats-type stuff.)

Hypothesis A (Models 1 and 2) is that more COVID is correlated with more Biden support.

Hypothesis B (Model 3) is that more Biden support is correlated with more tests, which then has unclear causal effects of COVID.

I say "more COVID" to be deliberately ambiguous because I'm not sure which tracking metric to use. Should we expect Biden support to be correlated with tests, cases, hospitalizations, or deaths? And for each metric, should it be cumulative over time, or change over a given time period? What would it mean to find different effects for different metrics? Also, they're all correlated with each other - does that bias your regression, or otherwise affect your results? I don't know.

I also don't know what controls to use. Controlling for state-level FEs seems smart, while controlling for date is interesting and potentially captures a different dynamic, but I have no idea how you should control for the correlated bundle of tests/cases/hospitalizations/deaths.

Without resolving these issues, I think the strongest evidence in favor of either hypothesis would be a bunch of different regressions that categorically test many different implementations of the overall hypothesis, with most of them seemingly supporting the hypothesis. I'm not sure what the right implementation is, I'd want someone with a strong statistics background to resolve these issues before really believing it, and this method can fail, but if most implementations you can imagine point in the same direction, that's at least a decent reason to investigate further.

If you actually want to convince someone to look into this (with or without you), maybe do that battery of regressions, then write up a very generalized takeaway along the lines of "The hypothesis is plausible, the data is here, and the regressions don't rule out the hypothesis. Do you want to look into whether or not there's an effect here?"

Who'd be interested in this analysis? Strong candidates might include academics, think tanks, data journalism news outlets, and bloggers. The stats seem very difficult, maybe such that the best fit is academics, but I don't know. News outlets and bloggers that aren't specifically data savvy probably aren't capable of doing this analysis justice. Without working with someone with a very strong stats background, I'd be cautious about writing this for a public audience.

Not sure if you're even interested in any of that, but FWIW I think they'd like your ideas and progress so far. If you'd like to talk about this more, I'm happy to chat, you can pick a time here. Cool analysis, kudos on thinking of an interesting topic, seriously following through with the analysis, and recognizing its limitations.

Thank you so much for putting so much thought into this and writing up all of that advice! Your uncertainties and hesitations about the stats itself are essentially the same as my own. Last night, I passed this around to a few people who know marginally more about stats than I do, and they suggested some further robustness checks that they thought would be appropriate. I spent a bunch of time today implementing those suggestions, identifying problems with my previous work, and re-doing that work differently. In the process, I think I significantly improved my understanding of the right (or at least good) way to approach this analysis. I did, however, end up with a quite different (and less straightforward) set of conclusions than I had yesterday. I've updated the GitHub repository to reflect the current state of the project, and I will likely update the shortform post in a few minutes, too. Now that I think the analysis is in much better shape (and, frankly, that you've encouraged me), I am more seriously entertaining the idea of trying to get in touch with someone who might be able to explore it further. I think it would be fun chat about this, so I'll probably book a time on your Calendly soon. Thanks again for all your help!

Glad to hear it! Very good idea to talk with a bunch of stats people, your updated tests are definitely beyond my understanding. Looking forward to talking (or not), and let me know if I can help with anything

[UPDATED June 30, 10:00 pm EDT to reflect substantial improvements to the statistical approach and corresponding changes to the results]I spent some time this weekend looking into the impact of COVID-19 on the 2020 U.S. presidential election, and I figured I might share some preliminary analysis here. I used data from The COVID Tracking Project and polling compiled by FiveThirtyEight to assemble a time series of Biden's support head-to-head against Trump in 41 states (all those with adequate polling), along with corresponding COVID-19 data. I then implemented a collection of panel models in R evaluating the relationship between Biden's performance against Trump in state polling and the severity of the pandemic in each state. My data, code, and regression output are on GitHub, and I've included some interpretive commentary below.

Interpretation of ResultsWhen appropriately controlling for state-level fixed effects, time fixed effects, and heteroscedasticity, total COVID-19 cases and deaths are not significantly associated with support for Biden, nor are the number of ongoing COVID-19 hospitalizations (see Models A, B, and C in the 6.30 R script). However, controlling for state-level fixed effects, greater daily increases in cases and in deaths

aresignificantly associated with higher support for Biden (see Models 2 and 3 in Table I above). Breusch-Pagan tests indicate that we must also control for heteroscedasticity in those models, and when we do so, the results remain significant (see Models 2 and 3 in Table II above), though only at a 90 percent confidence level.These results do come with a caveat. While Lagrange FF multiplier tests indicate that there is no need to control for time fixed effects in Table I models 2 and 3, F-tests suggest the opposite conclusion. I lack the statistical acumen to know what to make of this, but it's worth noting because when you control for both time fixed effects and heteroscedasticity, the results cease to be statistically significant, even at a 90 percent confidence level.

Interestingly, the state-level fixed effects identified by Table I models 2 and 3 are strikingly powerful predictors of support for Biden everwhere except for Arkansas, Florida, Georgia, North Carolina, Nevada, Ohio, and Texas, all of which (except for Arkansas) are currently considered general election toss-ups by RealClearPolitics. This makes sense — in a society as riven by political polarization as ours, you wouldn't necessarily expect the impacts of the present pandemic to substantially shift political sympathies on the aggregate level in most states. The few exceptions where this seems more plausible would, of course, be swing states. In the case of Arkansas, the weak fixed effect identified by the models is likely attributable to the inadequacy of our data on the state.

A Hausman test indicates that when regressing the amount of COVID-19 tests performed in a state on the support for Biden there, a random effects model is more appropriate than a fixed effects model (because the state-level "fixed effects" identified are uncorrelated with the amount of tests performed in each state). Implementing this regression and controlling for heteroscedasticity yields the result featured under Model 1 in Table II above: statistically significant at a 99 percent confidence level.

This is striking, both for the significance of the relationship and for how counterintuitive the result of the Hausman test is. One would assume that the fixed effects identified by a model like this one would basically reflect a state's preexisting, "fundamental" partisan bent, and my prior had been that the more liberal a state was, the more testing it was doing. If that were true, one would expect the Hausman test to favor a fixed effects model over a random effects model. However, it turns out that my prior wasn't quite right. A simple OLS regression of states' post-2016 Cook Partisan Voting Indexes on the amount of testing they had done as of June 26 (controlling for COVID-19 deaths as of that date and population) reveals no statistically significant relationship between leftist politics and tests performed (see Model 1 in Table III), and this result persists even when the controls for population and deaths are eliminated (see Model 2 in Table III).

This is odd. Hausman tests on Models A, B, and C in the 6.30 R script favor fixed effects models over random effects models, indicating that state-level fixed effects (i.e. each state's underlying politics) are correlated with COVID-19 cases, hospitalizations, and deaths, but those same fixed effects are not correlated with COVID-19 tests. Moreover, when applying appropriate controls (e.g. for heteroscedasticity, time fixed effects, etc.), we find that while cases, hospitalizations, and deaths are not associated with support for Biden,

testing isassociated with support for Biden (basically the opposite of what I would have expected, under the circumstances). We can run a Breusch-Pagan Lagrange multiplier test on Table I's Model 1 just to confirm for sure that a random effects model is appropriate (as opposed to an OLS regression), and it is. At that point, we are left with the question of what those random effects are that are associated with support for Biden but not with COVID-19 testing, as well as it's corollary: Why aren't the fixed effects in Models A, B, and C associated with testing (given that they are associated with cases, hospitalizations, and deaths)? Without the answers to these questions, it's hard to know what to make of the robust association between total COVID-19 testing and support for Biden revealed by Table I's Model 1.The puzzling nature of the Table I, Model 1 results might incline some to dismiss the regression as somehow erroneous. I think jumping to that conclusion would be ill-advised. Among other things that speak in its favor, the random effects identified by the model, however mysterious, are remarkably consistent with common intuitions about partisanship at the state level, even more so, in fact, than the fixed effects identified by Models 2 and 3 in Table I. Unlike those fixed effects models, Model 1's state-level random effects explain a considerable amount of Biden's support in Georgia and Texas. I consider this a virtue of the model because Georgia and Texas have not been considered swing states in any other recent U.S. presidential elections. They are typically quite safe for the Republican candidate. Furthermore, Model 1 identifies particularly weak random effects in a few swing states not picked up by Models 2 and 3 — notably, Wisconsin, Pennsylvania, and Arizona. Wisconsin and Pennsylvania are genuine swing states: They went blue in 2008 and 2012 before going red in 2016. Arizona has been more consistently red over the last 20 years, but the signs of changing tides there are clear and abundant. Most notably, the state elected Kyrsten Sinema, an openly bisexual, female Democrat, to the Senate in 2018 to fill the seat vacated by Jeff Flake, who is none of those things.

It's worth noting that there is an extent to which the above is really an oversimplification of "swinginess." As FiveThirtyEight explains, state-level opinion elasticity is not the same as

being a swing state.While how close a state's elections are is determined by the proportion of Democrat relative to Republican voters in the state (with states closer to 50/50 obviously being "swingier" in this sense), the extent to which events out in the world lead to shifts in polling in a given state is determined largely by how many people in the state are uncommitted to a particular partisan camp. A state being full of such voters without strong partisan commitments might well express itself in close elections, but it also might not, and by the same token, another way a state might end up with close elections is by being 50 percent composed of die-hard, party-line Republicans and 50 percent composed of die-hard, party-line Democrats. We would not expect new developments in current events to particularly shift voter sentiment in such a state. As a result, FiveThirtyEight proposes the metric of state-level opinion elasticity—measured using the extent to which shifts in national polling correspond to shifts in state-level polling in each state—as an alternative concept of "swinginess" that is potentially more appropriate for analyses such as this one. This is important because FiveThirtyEight has found that a number of states where there are frequently close elections actually exhibit extremely low elasticity. The paradigm case of this is Georgia, which has an elasticity of 0.84 (meaning a one-point shift in national polling corresponds to a 0.84-point shift in Georgia polling).On this basis, a better way of comparing the fixed effects identified by Table I models 2 and 3 with the random effects identified by Model 1 would be to compare the average elasticities (calculated by FiveThirtyEight) of the "swing states" identified by each model. In the case of Models 2 and 3, the average is an uninspiring 0.994, or weakly un-swing-y. In the case of Model 1, however, the average is 1.025: pretty swing-y. That's what happens when you swap out Georgia (0.84) for Arizona (1.07) and Wisconsin (1.06).

I have one outstanding technical question about Table I, Model 1. When controlling for heteroscedasticity in R's plm package, I understand that the "arellano" covariance estimator is generally preferred for data exhibiting both heteroscedasticity and serial correlation but is generally not preferred for random effects models (as opposed to fixed effects models). The "white" covariance estimators, on the other hand,

arepreferred for random effects models, though "white1" should not be used to control for heteroscedasticity in data that also exhibit serial correlation. A Breusch-Godfrey test indicates that Model 1 requires an estimator compatible with serial correlation, but it is of course a random effects model, not a fixed effects model. Would it be better to control for heteroscedasticity here with "white2" or with "arellano?" Ultimately, it doesn't matter much because both approaches yield a result that is statistically signicant at at least a 95 percent confidence level, but only the "white2" estimator is statistically significant at a 99 percent confidence level.This is really cool, maybe email it to 538 or Vox? I’ve had success contacting them to share ideas for articles before

Thanks so much! I'm thrilled to hear you liked it. To be honest, my main reservation about doing anything non-anonymous with it is that I'm acutely aware of the difficulty of doing statistical analysis well and, more importantly, of being able to tell when you haven't done statistical analysis well. I worry that my intro-y, undergrad coursework in stats didn't give me the tools necessary to be able to pick up on the ways in which this might be wrong. That's part of why I thought posting it here as a shortform would be a good first step. In that spirit, if anyone sees anything here that looks wrong to them, please do let me know!

After taking a closer look at the actual stats, I agree this analysis seems really difficult to do well, and I don't put much weight on this particular set of tests. But your hypothesis is plausible and interesting, your data is strong, your regressions seem like the right general idea, and this seems proof of concept that this analysis could demonstrate a real effect. I'm also surprised that I can't find any statistical analysis of COVID and Biden support anywhere, even though it seems very doable and very interesting. If I were you and wanted to pursue this further, I would figure out the strongest case that there might be an effect to be found here, then bring it to some people who have the stats skills and public platform to find the effect and write about it.

Statistically, I think you have two interesting hypotheses, and I'm not sure how you should test them or what you should control for. (Background: I've done undergrad intro stats-type stuff.)

I say "more COVID" to be deliberately ambiguous because I'm not sure which tracking metric to use. Should we expect Biden support to be correlated with tests, cases, hospitalizations, or deaths? And for each metric, should it be cumulative over time, or change over a given time period? What would it mean to find different effects for different metrics? Also, they're all correlated with each other - does that bias your regression, or otherwise affect your results? I don't know.

I also don't know what controls to use. Controlling for state-level FEs seems smart, while controlling for date is interesting and potentially captures a different dynamic, but I have no idea how you should control for the correlated bundle of tests/cases/hospitalizations/deaths.

Without resolving these issues, I think the strongest evidence in favor of either hypothesis would be a bunch of different regressions that categorically test many different implementations of the overall hypothesis, with most of them seemingly supporting the hypothesis. I'm not sure what the right implementation is, I'd want someone with a strong statistics background to resolve these issues before really believing it, and this method can fail, but if most implementations you can imagine point in the same direction, that's at least a decent reason to investigate further.

If you actually want to convince someone to look into this (with or without you), maybe do that battery of regressions, then write up a very generalized takeaway along the lines of "The hypothesis is plausible, the data is here, and the regressions don't rule out the hypothesis. Do you want to look into whether or not there's an effect here?"

Who'd be interested in this analysis? Strong candidates might include academics, think tanks, data journalism news outlets, and bloggers. The stats seem very difficult, maybe such that the best fit is academics, but I don't know. News outlets and bloggers that aren't specifically data savvy probably aren't capable of doing this analysis justice. Without working with someone with a very strong stats background, I'd be cautious about writing this for a public audience.

Not sure if you're even interested in any of that, but FWIW I think they'd like your ideas and progress so far. If you'd like to talk about this more, I'm happy to chat, you can pick a time here. Cool analysis, kudos on thinking of an interesting topic, seriously following through with the analysis, and recognizing its limitations.

Thank you so much for putting so much thought into this and writing up all of that advice! Your uncertainties and hesitations about the stats itself are essentially the same as my own. Last night, I passed this around to a few people who know marginally more about stats than I do, and they suggested some further robustness checks that they thought would be appropriate. I spent a bunch of time today implementing those suggestions, identifying problems with my previous work, and re-doing that work differently. In the process, I think I significantly improved my understanding of the right (or at least good) way to approach this analysis. I did, however, end up with a quite different (and less straightforward) set of conclusions than I had yesterday. I've updated the GitHub repository to reflect the current state of the project, and I will likely update the shortform post in a few minutes, too. Now that I think the analysis is in much better shape (and, frankly, that you've encouraged me), I am more seriously entertaining the idea of trying to get in touch with someone who might be able to explore it further. I think it would be fun chat about this, so I'll probably book a time on your Calendly soon. Thanks again for all your help!

Glad to hear it! Very good idea to talk with a bunch of stats people, your updated tests are definitely beyond my understanding. Looking forward to talking (or not), and let me know if I can help with anything

Thanks! I booked a slot on your Calendly -- looking forward to speaking Thursday (assuming that still works)!