I’ve been seeing a lot of confusing excess deaths charts floating around on Facebook and in the news media. The consistent story is that 2020 is seeing excess deaths due to COVID-19 over previous years. So I decided to see if I could replicate this using CDC data. Fortunately CDC seems to be actively (?) counting COVID and COVID-like deaths for 2020 at this URL. Also, CDC’s “Wonder” system allows one to pull data from previous years. So my strategy was to take deaths from the two most recent years in Wonder (2017-2018) and average these deaths to get a baseline that we can compare 2020 deaths to. Of course we are just over halfway through 2020, so I have to account for that as well (it’s interesting because we only have about 5-6 months of COVID-19 deaths, but we have an additional month or two of other deaths. I just assume that we’re halfway through our deaths to simplify.
Results
First, doing the work to connect this data resulted in some interesting insights. Below I show the state demographics sorted by the Excess Deaths in 2020 and we see some surprising things.
What does the table reveal? First off we see that the demographics that have the highest number of excess deaths in 2020 compared to the 2017-18 average are the older demographics from DC and New Jersey. This makes sense due to the large numbers of deaths per capita in these states. We also note from this data that there are clear gaps in the CDC data because we’re not seeing New York at the top of the excess deaths list. Right now the CDC data for 2020 seems to only have about 1/3 of New York’s deaths captured. This is a big liability with using CDC data…
Another interesting thing to note are the rows with yellow highlighting. These are all demographics in states that have had very little COVID-19 death impact compared to the 2017-18 baseline. However, they still have a high Excess Death number. There are many reasons why this might be the case, but I’m suspicious because many to most of these state/age demographic groups are also at high risk from suicide. I wanted to check this by looking at 2020 suicide statistics, but apparently no one has this data. The most recent suicide statistics you can find are in 2018 CDC data.
Histogram of Excess Deaths
Now I want to evaluate what the distribution of excess deaths looks like across all demographic groups in all states. This will give us an overall sense of the probability of having excess deaths in 2020. I do this with a histogram. See diagram below.
This histogram shapes up to look a lot like a Gaussian Distribution with a mean around 110% and a standard deviation of roughly 15%. This means roughly 70% of our demographic groups in the country are projected to have excess deaths ranging from 95% of the 2017-18 baseline all the way up to 125% of the baseline. This indicates to me that yes, 2020 is a worse year for deaths. Based off the data in the table above, we can safely assume that in many regions this is due to COVID-19. The data shows that for some states and their older demographics, COVID-19 is projected to exceed the 30% of total deaths that heart disease consistently accounts for.
Notes:
- I’ll mention again that I have accounted for the roughly 1/2 of a year of death data that we’ve collected in 2020.
- I averaged 2017 and 2018 deaths to make sure that I didn’t pick a year with unusually high deaths (2017 had a lot of flu deaths) as my baseline. It is not possible to get this data from 2019 off the CDC site yet.
- Yes, the CDC data is spotty. Normally the older data is pretty solid, but newer data always has data staleness issues with the CDC. They call this provisional death data to make the point that they’re slow and we shouldn’t assume it’s as good as the older data is.
- Remember, since I’m assuming the death rates will continue at a similar rate throughout the rest of the year, this is a projection.
- It is very possible that COVID-19 deaths will accelerate or decelerate and the excess deaths will look different at the end of the year than I project right now.
Conclusion
- Data truly gives us reason to believe that 2020 has been an unusually high year for deaths. This is unsurprising due to the focus our news media gives to COVID-19 cases. The mean value for excess 2020 deaths over the 2017-18 baseline is about 110%. This means that if there were 100 deaths in a region for the first 6-7 months of our baseline, on average, demographics have seen 110 deaths in 2020. This may seem like a small number, but an additional 10% is pretty significant and adds up.
- Some demographics in some regions will see COVID-19 be one of their top overall sources of death in 2020. About 15% of the rows in my table (that I just show just a small portion of above) will have COVID-19 account for more than 15% of their total deaths. To give an idea of the significance of that, normally heart disease accounts for 30% of the total deaths in the country and cancer accounts for 25%. The next highest source of death across the board is accidents at 8%. Flu and Pneumonia normally account for around 2.5% of total deaths. Recall too that the CDC numbers seem low, so this percentage is likely to increase.
Thanks Tod. I am not sure why the CDC shows NYC as a unique stand alone data set versus the rest of NY state. They do not do it for any other metropolitan area. I also do not like the CDC County focused data, it just confuses the mess.
Hey Ray, this is the NYC public health data. CDC data is pretty poor in general unless its more than a year old.