Comparison of Over 65 and Under 65 Deaths in Arizona due to COVID along with the cumulative case counts. 12/1/20
Above is an interesting way to look at the two outbreaks we’ve had in Arizona and the cumulative number of cases (useful because it shows us the case trends).
Note that the deaths seem to be higher during the summer outbreak than during the current one considering the rate of case growth. During this current outbreak the deaths are so far staying under 50 per day, but back even in the earlier phases of the summer outbreak they were inching up to 100 per day.
Also, the deaths are just the raw number of deaths and aren’t normalized by the respective populations. What this means is that the red lines represent the total number of deaths over 65 years old (about 13% of the AZ population) and the blue represent everyone else.
Deaths during the current outbreak have a ratio of 2.95 deaths over 65 to 1 death under 65. During the summer outbreak the death ratio of over 65 to under 65 was 2.31. This is a pretty big difference and indicates to me that the virus might be getting less deadly for society as a whole. If I knew exactly how old the people dying were it would help (if they average 85 that’s much more informative than just knowing they’re over 65). This may indicate that the “Years of Life Lost” due to COVID is decreasing.
In the chart above, the state had lockdown restrictions in place until May 15, then most counties put mask requirements in place on June 9th. Early October is when most of the second set of restrictions on bars, gyms, and movie theaters were lifted. It doesn’t seem like any of these dates are correlated with anything the virus did. Seems like it has it’s own mind…
I decided to evaluate the new cases from the current outbreak differently. Previously I was interested in where case growth as a percentage of previous cases. This may be a useful metric, because it signifies anomalous case growth in a specific location. Presumably that info could be used by a public health organization to target localized outbreaks.
However, perhaps much more interesting would be Zip Codes where high case growth per 1000 residents is happening. The chart below shows this metric. Both the cumulative number of cases (blue) and the last month’s case growth (orange) are normalized by the population of the Zip Code.
What does this chart tell us?
On the left of the chart, we see the zip codes (see table below for better visibility into this portion of the chart) that have had the largest number of cases per 1000 residents cumulatively (since the beginning of COVID). A couple of these regions have seen very high growth in the last month. But as your eyes move rightward, you can see some regions that had experienced high COVID cases in the past that had lower numbers of outbreaks in the last month. And of course, other areas (the peaky orange lines) have experienced very high numbers of cases in the last month. It would be good to understand why some regions have had worse outcomes over the last month than other regions. We’ll evaluate some of this below looking at the table.
The general trend does seem consistent, though. Regions that experienced higher numbers of cases during the summer outbreak are in general experiencing higher numbers of cases during the current outbreak. I was hoping to see a different trend (that might have indicated immunity in some regions) but will keep watching for that trend to emerge.
Details of the Normalized Case Growth
The below table is sorted by the Cumulative Cases per 1000 in a Zip Code. The Growth-Norm column represents growth in Cases per 1000 over the last month. Note that some regions that have experienced high case growth up to this point didn’t have nearly as large of Case Growth as other regions that had experienced similarly high cases in the past. These are circled in green. You can also see regions with larger than expected case growth circled in red. Are there any factors that might be correlated with this lower and higher amounts of growth?
The first thing that I rule out is Education and Median Age. These on the surface don’t seem to be related. Some regions with lower median age are right next to regions with 15 or so years higher in median age. The same applies for education. What I do see, however, is a trend with population density, where regions with higher population density seem to be seeing lower COVID case growth per 1000 people. This might make some sense if you think about how regions with high density generally always have high populations, and therefore, a larger denominator in the growth per 1000 person equation… However, this also means that the numerator (the change in case count over the last month) is disproportionately low. Which, I think is interesting. Why would there be less cases than expected in regions with higher density? Thoughts:
I wonder if this might be an indicator of the effectiveness of government interventions (mandatory masks, school restrictions, etc.)? Since all the data I’ve seen indicate that school restrictions aren’t resulting in large numbers of case reductions (regardless of whether they’re in school or not, every study seems to be showing that people under 15 don’t really transmit the virus), and most regions don’t have restaurant/gym/bar closures now, I’m assuming if it is anything it is the mandating (and compliance!) with the mask restrictions. Unless someone can chime in with a different idea…. Compliance is an interesting thought, because it seems like in a more populous area, there appears to be more social pressure to comply with COVID restrictions. Whereas, my observation is that in less dense areas, the social pressure is much less.
Also interesting to me is the resurgence of cases on the border. These regions were very quiet ever since the summer wave slowed down and generally went from having the highest case rates in the country down to the very lowest. But now we see Yuma and Santa Cruz counties experiencing case growth again. Also, the South Mountain region of Phoenix (85042) is also experiencing another surge in cases. But looking at the highest number of new cases per 1000 over the last month, we see some interesting places. Page (up near Lake Powell), Cottonwood (near Prescott), and Douglass (on the border, but only lightly affected during the summer border rash of cases) all are near the top of the list with 85350 in Yuma.
Zip Codes Sorted by Density (orange) with last Month’s Case Growth in Blue.
The chart below looks complicated, but don’t let the looks deceive you. Here’s Pima and Maricopa Counties cumulative Cases per 1000 people (and the trend lines) compared to the number of daily tests in the state (it’s trend line is the orange, dotted U-shaped line.
You can see the weird bump around 9/20 or so on the Pima County line. That is the first few days U of A tested out their homegrown antigen test on lots of students. The bump represents the excessive false positives in the test (they fixed it, I think). Remember, just because you test positive for something doesn’t mean you have it!
The U trend on the tests is really interesting to me. Even as the summer wave was accelerating (far left) we see the trend in tests decreasing. Then when the cases are largely flat we see the trend reverse and start increasing. This could be some sort of psychological effect or maybe the number of tests is some sort of a leading indicator of case rates? This seems like an informative chart, so I’ll post it every week or two.
Maricopa’s normalized case rate is around 7 cases per 1000 persons higher than Pima County. This has been sustained since mid-May. Not sure what it reflects, but it could be the greater adherence to government mandates (mask, distancing). Or it could have some demographic cause? It does seem that activity/going to work results in infections, because the normalized infection rates (per 1000 persons) are identical across the whole “working-aged” 20 to 64 age range. The 65+ population has just over 1/2 the rate per 1000 of the working-age group and the under 20 population has just over 1/3 of the rate per 1000 of the working age group (see second chart below).
Here’s the latest update on the current COVID-19 outbreak with backing data. I’ll show a number of different views of the data, including some extra focus on the Arizona data since I have my best data from my own state. Always happy to take requests from folks from other states.
Arizona COVID-19 case growth by Zip Code – 11/7 through 11/15
Note below that we’re still not seeing a whole lot of repeats in the top 30. I think that some geographic areas are surging and then slow down. This week a new zip code from Pinal County is now at the top with 20 percent growth, but since their numbers were already very small, it’s probably not as relevant as 85756 in Pima County, which is the first county that has shown up recently in the top 30 this fall that had significant cases in the summer. It is a relatively large zip code in population, as evidenced by the large orange bubble below. If you look closely, it does appear that the majority of the top zip codes have a relatively young median age. The main exception is 85614, which is a Green Valley zip code and has a median age of 68.
Top 30 zip codes by COVID-19 Case growth from 11/7 to 11/15Table of top zip codes by COVID case growth – 11/7 to 11/15
Case Growth Across Arizona over the Last 2 months
Note that 85719 (home of the U of Arizona) stands out over the last 2 months. Fortunately, due to the low median age in this zip code, there appear to have been no deaths and few hospitalizations in this zip code for all these cases. It’s also fairly certain that many of the cases in this region were false positives due to this zip code being the first to use the U of A’s antigen tests, which have been demonstrated to have high false positives. 86001 in Flagstaff also has a major university, and thus, higher case growth over the last 2 months. Initially, during September, 85009 from SW Phoenix had high case growth, but it slowed down and this zip code is no longer in the top zip codes for case growth.
Table of top zip codes by case growth from 9/12 to 11/15
Arizona Cumulative Case and Death Curves
I like looking at the data this way because it becomes clear if growth rates are linear or if they are increasing non-linearly (the upward curve). Right now, we’re seeing nonlinear growth in Arizona cases, but probably more like linear growth of deaths. Deaths seem to be increasing at a rate of about 23 per day, which seems to be just slightly above the average since 9/11. Cases however are increasing non-linearly and the instantaneous slope today is around 2100 new cases per day. It’s hard to tell at this stage because things can change quickly, but it’s possible that the slope of this phase of the outbreak is lower than the slope from the start of the summer outbreak (around June 14th).
Arizona Hospitalization Status
I have been hearing word that the hospitals are heavily burdened by COVID cases again. This is likely very true and may be different in different localities. However, at a state level, there is still no need to be afraid. Here is the ICU Bed status for the State from the state Dept of Health Services dashboard. The increase of COVID patients has been nonlinear since October, but the numbers are still around 20% of the state’s ICU bed capacity. I expect that the grey bars will get squeezed by COVID before this outbreak is over (much like it did in early July).
US State Case Growth Rates
These tables allow you to see case growth rates as well as cumulative case and death numbers per 1000 people. Note that North and South Dakota are still right in the middle of the fight and their case acceleration rate is still quite high (Montana is catching up).
I think until this current outbreak slows that I’ll continue to do weekly data dumps for people who need to see the latest data in a unvarnished, non-manipulated form. Again, I’ll have better data for Arizona since I live in that state and have collected data from the state Dept. of Health Services for much of 2020. Of course, they don’t make it easy to collect the data in any form except the current day, so I have to go back every day and capture the latest. However, by doing so, I feel like I have insights that many don’t have. One of my reasons showing the Arizona data in such detail is that I feel that the behavior of the virus is similar in all regions and perhaps the Arizona results can provide insight into COVID activity in other states.
Zip Code Data
I feel that the Zip Code case data (wish I had deaths/hospitalizations by zip code too, but that is not provided) is valuable at understanding how the outbreaks are trending. For instance, we continue to see the largest case growth for this current Arizona winter outbreak in areas that weren’t hit very hard by COVID during the spring or the summer. This raises a couple of questions… 1) Why is it just hitting these regions now? Some of them are places that people from Tucson or Phoenix travel for vacation. I would have expected the case growth to have occurred along with the big outbreak in Arizona over the summer. The second question this raises is if this is an indicator that we’re seeing the effect of immunity in the areas that were hit hard over the summer (Yuma, SW Phoenix, S. Tucson, Nogales). See the latest charts below on the case growth in the last week across Arizona. Note in the table that there aren’t any obvious patterns in this wave (see my zip code correlation study from July here which demonstrated a number of patterns in the summer outbreaks)
Top thirty zip codes by Case Growthfrom 10/31 to 11/7. Red bubbles are the areas of highest growth. Diameter of the bubbles represents population of the zip codes.Top 20 Zip Codes by Case Growth – Table of Info
Deaths Per Day
I think a lot of people are fairly aware that deaths have decreased in count since the big COVID waves in the Northeast this spring. I was curious how the “Daily Death” count in Arizona compared between the over 65 and the under 65 age demographics through the big summer outbreak and now in the winter outbreak. The plots below that perform this comparison are stacked bar plots. You can see three things in each bar, the under 65 deaths that day (the height of the blue bar on the Y-axis), the over 65 deaths that day (the difference between the height of the red bar and the blue bar), and the total deaths (the height of the stacked blue-red bar for the day). Hopefully that’s clear enough. But it’s a pretty useful chart, especially for visualizing differences between 2 or 3 groups.
The first plot shows raw numbers of deaths (blue is under 65, red is over 65). Therefore you can see that on the highest day for deaths during July we saw somewhere over 50 deaths in the under 65 demographic, around 120 deaths in the over 65 group, and about 170 deaths total. This is a good way to view the data and it reveals that on most days, there are many more deaths in people over 65. However, this isn’t that informative of a visualization, because the blue bars represent 87% of the state’s population. Therefore the second graph shows the death data normalized by the population of the group. I’m representing it as deaths per 100,000 people in the age grouping so the numbers aren’t too small to be meaningful. Therefore, now you can see that on the same day that we saw the 170 deaths, on the chart with normalized data, this represented about 13 deaths per 100,000 persons over age 65 and just under 1 death per 100,000 persons under age 65. This is a good way to visualize the true impact across age groups. If I separated the age groups under 65 it would be evident that the deaths are far more rare under age 45.
Cases
Now that I’ve demonstrated normalizing by population, here is how the cumulative case curve looks when normalized by the population of each age grouping. You can read this as that the 55-64, the 20-44, and the 45-54 groups all currently have cumulatively reached 45 cases per 1000 persons in their group. Note that this is just the cumulative count, not the number of cases currently active! What this shows us is that case growth for the three groups above has tracked almost identically since June. The interesting points to note, however, is that the over 65 case count when normalized by the over 65 population numbers is much lower (even though their deaths are much higher) and the under 20 normalized case counts is even lower. This tells us a few things. Cases are rarer in the 65+ population and even rarer in the under 20 population. The fact that 65+ deaths have been so much higher on a smaller number of cases shows that getting COVID is much more deadly proposition for this age group.
Case rates normalized by age demographic population – Arizona – 11/7/2020
Hospitalization
A while back, the Arizona DHS improved their hospitalization status chart by adding the COVID cases in. Here’s an example of the ICU bed usage across the state. The other types of hospital bed usage charts look basically the same, but you can find them by following the link above. We see the peak from the summer hitting and the non-COVID ICU patients were squeezed out. Utilization never really went over 90% because of the hospitals’ ability to manage their beds. Then as the hospitalizations from COVID crashed in late July, new patients flooded into the ICU beds to keep the overall utilization around 80%. Now it’s creeping up again due to an uptick of COVID patients. I’m curious (and hopeful!) if the increase in COVID hospitalizations will be more gradual during this outbreak. It seems likely to me, but we’ll have to watch.
AZ ICU hospital bed usage by type (COVID vs. Other) – 11/7/2020
COVID-19 US State Table
The below is sorted by the “acceleration” of cases per day. Therefore, North Dakota is seeing an increase of 0.0388 cases per 1000 persons every single day. Therefore their case velocity (IROC_confirmed) of .9640 cases per 1000 persons will likely increase to around 1.03 cases per 1000 per day tomorrow and 1.0688 the following day. This acceleration metric (dIROC_confirmed) is a useful indicator to determine when an outbreak is slowing in a state. When Arizona was number one on this list last summer this metric is exactly where we first noticed the change. As you can see, the midwestern states are currently seeing the largest case growth, but right behind them are the Northeastern states. I’m hoping and praying that the daily Delta_Deaths metric in all of these regions remains lower than it tended to be during the spring.
Since the trend is once again toward increasing case rates, I’ll just put out there a bunch of graphics and tables so you can see what’s happening.
Arizona Zip Code Case Counts
Since I live in Arizona and have followed it more closely than any other state, it might be interesting to see the trends in Arizona. My suspicion is that other states are seeing similar trends. Our first trend turns out to have been relatively low case rates and high deaths due to infections in communities at higher risk, such as our nursing homes and reservations. During this time period the whole state went into lockdown. Case rates were relatively low and constant throughout this lockdown period and a few weeks after. The second AZ trend was the summer outbreak, which I’m quite confident occurred in conjunction with a large outbreak at the same time in Mexico. During this time, we saw cases increase polynomially and mask rules were implemented across the board but did not slow down infections. If you look back in time on this site, you’ll see that the most heavily hit communities during this timeframe were the ones that had significant ties to Mexico (SW Phoenix, S. Tucson, Border Region). Now we’re in the third outbreak and what we’re seeing is the virus sweeping through the communities not touched during the first two outbreaks. Though case rates are increasing, death rates remain relatively low.
Top 30 zip codes by COVID case growth from 10/21 to 10/31. Color represents % growth and size of bubble represents the population of the zip codeTable of data relating to the Zip Code Chart
In the above two diagrams, note that none of the heavily-infected zip codes from the summer are present. These are generally areas that haven’t been hit hard yet. This makes me suspect that in this wave there is some element of immunity to COVID being expressed by the harder hit zipcodes from the summer. Perhaps this is a normal balancing we should expect.
Data on Each County from Arizona
The above table confirms the above. The top 5 counties all have had lower case rates to date (see their cases per 1000 numbers). You can also see how the death rates range from the high in Apache and Navajo Counties, both of which were hit hard back in the early days of the outbreak when deaths were higher, all the way down to tiny Greenlee County. Maricopa and Pima Counties, which have the large populations, are somewhere in the middle.
Other US States
State COVID Data sorted by Case Rate Acceleration (dIROC Confirmed) – 11/31/20State COVID data sorted by Death Rate Acceleration (dIROC_deaths) – 11/31/20
The above two tables show the states that currently have the highest accelerations of their case and death rates. Acceleration means that the slope of cases (# cases per day) is getting larger or smaller over time. As you can see, North Dakota is in the unfortunate position of having the largest case acceleration (an increase of 0.038 cases per day, every day) and the largest death acceleration (.0035). Interestingly, though South Dakota has tracked right with North Dakota on cases, the death rates are much lower in South Dakota. This makes me suspect that in ND the virus got into a community somewhere that was highly susceptible but in SD it didn’t. Note that ND’s death acceleration is almost 3x the next highest (Iowa). Fortunately, in this latest outbreak, deaths continue to be rare.
Additionally, we can see the Northeastern states creeping back up to the top of the case and death rate lists. In other regions, it seems like when COVID comes back a 2nd or 3rd time the death rates are much smaller, so what’s happening in the Northeast is puzzling. I’d have to look closer at those states (i.e., by zip code) to figure out exactly what is happening.
World Data
I’ve shown the below diagram a few times throughout the COVID outbreak and interestingly, the trend continues that the virus is unusually inactive or unmeasured between about 10 degrees South and 30 degrees north. The below shows the cases and deaths per 1000 since the start of the outbreak by latitude. Other than the growth in 20 to 10 degrees South (Brazil) and 20-30 degrees North (India) not much has changed. Note that despite, India’s large COVID numbers, the overall number of cases and deaths per 1000 people is still much lower than other regions.
Normalized number of cases and deaths per 1000 people by latitude.
And below shows current states for countries + US States sorted by normalized case growth rate (IROC_c_n). We see tiny Andorra at the top of the list, but a number of European countries are moving back up the list. Note that in Belgium (the country in Europe that had the highest death rate), the current death rate (IROC_d_n) is a good bit lower than Czechia, a country that was largely missed by the first round of COVID.
I was looking at COVID-19 data that was sorted by case count and noticed that the Dakotas and Wisconsin were at the top of the list and then looked a column over and realized that all those regions still had low deaths per 1000 people. It made me curious about how common it is to have a high-deaths region.
So I built a histogram of all the counties in the U.S. and binned them by their deaths per 1000 persons. Just as a reminder, the height of the bar represents the number of counties in the bin. For instance, the tall bar on the far left represents about 500 counties all of which have less than about 0.1 deaths per 1000 persons. The really short bars on the far right represent the one or two counties with over 3.0 deaths per 1000 persons (0.3%).
I put labels on the histogram to identify which bars well-known counties fall in (yes, it’s biased towards Arizona).
Yes, the NYC boroughs (Queens, Bronx, Manhattan) all are still at the top of the list, but their death rates have slowed significantly from the peak rates back in March/April.
Also, the red line represents the exponential function that fits the decay of the histogram. Therefore, the likelihood of a county having a large death rate follows an exponential decay. The formula would be DECAY RATE = 432*e^(-2.5) + 4.3
Back in August I did my first detailed Excess Deaths assessment (See Link) based on data from CDC’s “Wonder” database on deaths from 2017 and 2018 and comparing it to data from the CDC’s provisional COVID-19 death counts (Link). Using this data I was able to measure data by state and by 10 year age demographic. What I found was interesting. To summarize, there were significant excess deaths in the groups that wouldn’t be surprising to you (65+ years old, Northeastern states and DC). But the really interesting (and concerning!) thing I discovered was that there were significant excess deaths in younger demographics who had been lightly impacted by COVID-19.
Quick Explanation of Methodology
The CDC Wonder Database allows one to search for total deaths by all types. The data is very detailed but it isn’t recent. In general the newest data in Wonder is 2 years old. Knowing that 2017 was a “high death” year due to large numbers of flu deaths and that 2018 was a bit below average, I decided to take these two years and average the deaths as my baseline to compare to 2020 data. The data from Wonder can be aggregated across regions (I chose States) as well as by demographics (I chose age in 10 year groupings).
The 2020 provisional death data put out by the CDC can also be grouped in similar ways (states and 10 year groupings). Plus, in addition to providing COVID-19, Influenza, and Pneumonia deaths, it also provides total death numbers for these groupings. This allows for an easy comparison. It is unclear how CDC arrives at these numbers, but they don’t seem to be extremely laggy and they line up more or less with the numbers from Johns Hopkins. Here’s a picture of the website where you can pull the data. As you can see, the claim from the CDC is that the data is as of 10/14.
Since the year is still not over, I’m doing a very simple scaling assuming that the death rate will continue at the current rate for the rest of the year. This isn’t a solid assumption, but I don’t think it matters much. Since we’re in October, 10 months along, I used a scaling factor of 1.2. Back in August (when the data was lagging a bit) I used a scaling factor of just under 2, accounting for 7 months of data.
Excess death percentages in the over 65 age population had decreased quite a bit. There were numerous states where this population had over 150% excess deaths but in the current results, I onnly see two older age demographics in the top ten. Note that since we’re comparing 2020 COVID/Flu/Pneumonia deaths with overall 2017-18 averages for each age group, this accounts for cases where total death numbers for older demographics are much larger than death numbers for younger demographics.
Excess deaths for younger demographics, particularly 25-34 and 35-44, have remained the same. This implies to me that the rate of overall excess deaths for these groups has stayed consistent while the rate of excess deaths for the older generation has fallen significantly. This is not surprising to anyone who has watched the data because it’s clear that even while COVID cases rise and fall, COVID deaths have been falling everywhere (for lots of good reasons). BUT, whatever is killing the younger demographic at higher rates than normal years has yet to slow down.
Overall COVID/Flu/Pneumonia Deaths as a percent of 2017-18 averages has fallen since August. This also aligns with the sharp decrease in COVID deaths since July/August.
Washington DC seems to have excess deaths across all age demographics. Note that the 5-14 year group’s 250% excess death number is only like 5 excess deaths… I’m not sure I could make a good guess as to why DC’s numbers are so high. Maybe someone can weigh in on this?
You can see the data yourself in the table below (sorted by 2020 excess death percentage). Yellow indicates a state/demographic pair that has low COVID/flu/pneumonia impact (around 15% or less) but still has high excess deaths
Merged Table of CDC 2017-18 average numbers compared to CDC Provisional 2020 death numbers. 10/20/2020
I also showed an overall histogram of excess deaths in my last post. This histogram is a type of chart that measures “counts” of samples that fit into a specific bin. For instance, in this case, each sample is a state/demographic pair and the histogram is plotted over 80 bins that range from around 10% of 2017-18 deaths up to around 150% of 2017-18 deaths. So each bin represents roughly 2%. We can see in this histogram that the peak of the histogram is where about 60 state/demo pairs fell into a bin that looks like around 90%. If you see this as the mean and the histogram as a rough bell curve (normal distribution) then you can see that using this method and based upon the CDC’s 2020 death projection numbers, the overall excess death distribution for 2020 has shifted to the left since August (when the peak value was in the bin that represented 110% (go back and look… don’t take my word for it!). This also makes sense knowing that the high death rates from April through June have slowed.
Combined age groups’ histogram of 2020 excess deaths – October 20, 2020
Since I was curious, I wrote code to plot the histograms for each age demographic to see how they related to each other. It’s a bit messy, but you can see in the legend which colors correspond to which demographic. Key takeaways from this visualization is that 1) 35-44 has been hardest hit, followed by 25-34, at least on an excess death percentage basis, 2) 65-74 seems to be slightly below the 100% which would represent the 2017-18 average, and 3) 5-14 and 15-24 have less excess death than 2017-18.
Overlapping Histograms for Each Age Demographic.
Highly Reported-on CDC Excess Death Pre-print (from 10/20) – take it with a grain of sand.
On October 20th a CDC scientist released a pre-print that the CDC published here. The assessment of the authors, based upon their simulation is that there were 299K excess deaths in the US during 2020. Of course, this was immediately picked up upon by our fearless media. In many cases, they reported on the pre-print incorrectly because the statistics in the pre-print go a bit beyond that of a newspaper data scientist. Actually, the statistics in the pre-print are a bit muddy and don’t seem to line up in places, so I can’t blame the news journalist folks much. I might write a longer report on this paper if I get time, but I’m not confident in their simulation’s assumptions on a typical year-to-year death growth rate and they don’t account for deaths that didn’t occur because a sick person died of COVID first. And their overall numbers don’t match the ones that CDC publishes in the provisional 2020 death numbers either, so this is problematic. I took a stab at replicating their model based on a much simpler and more reasonable regression model than what they selected and their 299,000 number (compared to the 2015-2019 average) appears to represent expected growth in deaths, not excess deaths (see chart after conclusions). We’ll have to wait for the actual paper to come out with all the details I guess. Of course, the Washington Post didn’t wait..
Conclusions
It is tough to make any solid projections based on ANY COVID-19 data. It is always possible that the CDC’s data is inaccurate (it usually is… these kinds of things are infamously hard to measure). And clearly 2020 is a unique year for deaths. It isn’t clear from the CDC’s data that COVID-19 has created significant excess deaths, however.
The really serious question is about the real excess deaths that haven’t slowed down in the younger demographics. This problem is not coming from deaths due to COVID but is likely related to the anxiety and stress created by COVID, by government actions that are aimed at reducing or eliminating COVID cases, by isolation, etc. Unfortunately there is a lot of evidence coming out that these governmental actions haven’t been exceptionally effective (a quick look at COVID case rates across various new government actions shows that they haven’t had very measurable impacts). The other takeaway is that excess deaths for ages younger than 15 have been much less than 2017-18 averages. The combination of being isolated from society (driving in cars less, less exposure to disease, etc.) and the lack of an effect on this group from COVID are likely the cause.
Backup: Tod’s “Simpler” 2020 excess death model
Deaths per 100K persons since 2012. Note this is normalized by population, but despite this, deaths have been increasing consistently for the last 10 years or so. The Red Dot is the regression-based projection of 2020 deaths. Note that the delta of about 60 deaths per 100K between 2020 and the 2015-2019 average will amount to around 295K “excess deaths”.
As temperatures fall in different parts of the US, we’re starting to see case growth acceleration resume in some of the hardest-hit regions from the spring.
US State COVID-19 Data Table sorted by Case Acceleration (dIROC_confirmed) – 10/7/2020
Below we can see the table sorted by the acceleration of the death rate. These are pretty much the only states that are seeing increases of the rate.
US State COVID-19 Data Table sorted by Death Rate Acceleration (dIROC_confirmed) – 10/7/2020
Since New York seems to be re-emerging here with above average increases in the Case Rate and Death Rate, here’s their time series plots below, first Case Rate and then Death Rate. The Instantaneous Rate of Change for cases (IROC-Confirmed) is around 1000 new cases per day. For deaths the IROC is about 20 new deaths per day. Both of these values are growing. You can visibly see the Case rate increasing (the cumulative case line is curving upward) but the Death rate increase is a bit too small still to visualize well (but you can see the polynomial fit starting to show the upward curve).
New York state Cumulative Case curve plus 3rd order polynomial fit. 10/7/2020New York state Cumulative Death curve plus curve fit. 10/7/2020
Top Twenty AZ Zip Codes by COVID-19 Case Growth, 9/11 to 9/18 – data source: AZDHS DashboardTable: All AZ Zip Codes with over 6% Case Growth between 9/11 and 9/18 – data source: AZDHS Dashboard
Evaluation of Case Growth Over the Last Week
I notice a handful of interesting things in the data this week.
University of Arizona cases APPEAR to have shot through the roof. Note that 85719 has most of the U of A population (see how big the zip code is?) and it appears to have a very large majority of the University-related COVID growth. The next highest zip codes in Pima County are residential zip codes from suburbs such as Oro Valley, Marana, and Green Valley and the growth percentage is based off of very small numbers of cases. These would all be very long commutes for an on-campus student. 85705 is the zip code just north of campus and it saw an 8% increase in cases, which is interesting, because I’m curious if U of A cases will start spreading to adjacent zip codes. But the 100 cases that make up this 8% growth is far smaller than the number we see in 85719 over the last week. Will continue watching this zip code to determine if the University outbreak is spreading. There’s a good chance this is just measuring COVID-positive students that are living off-campus in large complexes.
The 85719 Case Growth captured by the state seems much too high based off of the number the University is releasing from their new dashboard. It’s not clear how numbers get from the University to the State, but I can’t see much consistency to date. More on this later in this post.
I notice that Case Growth in Flagstaff (Northern Arizona University) has increased. The raw number is ~100 new cases, but this is based on a small number of cases to date. Last week we didn’t have many new cases from this zip code.
I also notice that ASU’s main campus in Tempe doesn’t even factor in the top 20 any longer. I look at their numbers and see an increase of only about 20 cases. This combined with the official ASU reporting (here) makes very little sense. I’ll analyze this later in this post too.
The 85709 zip code in the southwest corner of Phoenix continues to see large case growth. This zip code has seen a lot of cases and was frequently one of the hottest COVID spots during the June-August phase in the outbreak where case growth was the largest. Back then, there was evidence that the outbreak in this zip code was correlated with the similarly large outbreak in Sonora, Mexico, but this may not be the case now. It doesn’t seem obvious that this growth has any correlation with the university cases either. I can’t see case demographics by zip code, but I do know that the age demographic under age 44 accounted for 64% of the Case Growth in all of Maricopa County. Since 85709 has a median age of 28, there’s a good chance that over 64% of the new cases in this Zip code are under 44. I still feel that this is interesting and ought to be evaluated.
The Challenges of Understanding Case Growth Accurately
The confusing nature of the latest data from the state is something worthwhile to discuss because I’ve noted news outlets (tucson.com is terrible about this for instance) grabbing the latest U of A numbers, interviewing one U of A professor, and then writing a very scary but highly inaccurate article. It’s even worse now since the numbers are smaller and therefore plagued much more by statistical variation. So here are some thoughts about our current state of counting cases to help you understand what might be really happening.
It is Difficult to Use Data that is Generated “by Accident” to Learn Big Things. In an application of data science within a field like epidemiology we often want to draw an inference from a selection of measured data that applies to a broad population. This is usually done by sampling a representative portion of the population to the overall population we want to understand. Just like conducting an election poll, this kind of representative sampling needs to be well-designed and well-measured. The collection of COVID-19 has come about “by accident” and thus has nothing in common with a well-architected election poll. This means we can truly extract very little inference about specific aspects of this outbreak from the data samples that come into the state DHS dataset. Due to the nature of collection of data in an emergency (without any pre-formed strategy, of course) we get what we get and if we’re lucky we can determine if any natural experiments can be uncovered in the data. Just keep this in mind and it will help. 🙂
The University of Arizona Appears to be Relying too Much on an Inaccurate Form of Testing. The data sampling strategy at the U of A and apparently at the AZDHS has changed since school resumed on campus. U of A built their dashboard and this clarified some of their strategy but also revealed some real gaps. What does their strategy appear to be? Conduct low-cost Antigen tests that provide results in real time whenever there’s any evidence of a localized outbreak. This makes good sense based upon the apparent limitation of the Antigen tests (see point #x). Isolate the people with positive results and conduct more-accurate (but slowly scored) PCR tests on the symptomatic (or on football players with positive Antigen tests…). We know the numbers of Antigen tests vs. PCR tests (about 10 Antigen tests to every PCR test) and the numbers of tests conducted by Campus Health to those conducted elsewhere (10% of tests are being done at Campus Health). This seems to indicate that 10% of the U of A positive COVID cases have symptoms deemed worthy of a visit to the nurse. The upside is that this seems to be a pretty solid approach. The downside seems to be that the positive Antigen tests (about 1/2 of which are likely to be false positives) are getting inconsistently sucked into the AZ DHS case data. The reason I struggle with this is that the quality of the Antigen results is highly variable and likely to be wrong. This also drives more chicken little journalism. In my mind the only valuable positivity numbers are coming from the PCR tests being conducted at the health clinic. These will isolate the positive cases with symptoms (but will likely miss the much larger numbers of students that get COVID without symptoms). Unfortunately, the state seems to be recording all the positive numbers, including the many false positives.
Yet Again, Arizona’s DHS Has Changed their Measurement Strategy in Mid-Stream. AZDHS has changed their collection strategy. My points above about these Antigen tests being less useful for serious data collection have kept results from these tests out of the AZDHS data up until this week. I noticed on their dashboard that they changed the name of a category from “PCR Tests” to “Diagnostic Tests”. This, combined with the large increase in tests at the same time makes it clear to me that they’re now equating PCR and Antigen testing and pulling in the Antigen test results from the U of A and elsewhere. My experience is that it is NEVER good to change your data collection strategy in mid-experiment. Now all the new test data is contaminated and will be statistically different than the first few months of data collection. What they should have done is added a third category of testing. Then they could report on PCR Tests (the gold standard), Antigen Tests (less accurate but valuable for speed), and Serology Tests (for antibodies). The willingness by AZDHS to change measurement strategies in the middle of a health care crisis continues to surprise me (no, this is not the first time).
Arizona State’s COVID Stats are Not Very Transparent. ASU seems to not have a very solid collection strategy and their numbers make very little sense. Their numbers are surely not decreasing, but that’s what they seem to be advertising. They describe a decrease of around 120 cases in three days from their Tempe campus. This seems very strange considering that U of A is showing case growth at U of A of around 500 during that same time frame (this is both PCR and Antigen test numbers). Clearly the two Universities are not measuring the same way.
The State DHS Numbers Don’t Seem Accurate for the Primary U of A zip code. The 85719 numbers from the AZDHS site showing growth of about 1400 cases in the last week seems out of line compared to the 890 cases (PCR+Antigen) the U of A reports. Only about 200 of those cases are based on PCR test results. This is further evidence that AZDHS has now started recording positive Antigen tests. This is another data measurement mistake. For the first X months of COVID all our results are based off PCR tests which have very few false positives. Now we’re adding a low-quality source of data to the high quality one and we can’t separate them. Most likely this number is erroneous and I suspect that the confusion in changing the method that the state records cases may be partially to be blame. I’d guess some accidental double counting is happening in this confusion.
COVID Antigen Testing False Positives make the Test Less Meaningful: I’m disparaging Antigen tests a bit here. These tests have been used for years in other diseases to identify key proteins that will signify the presence of a viral infection. COVID-specific Antigen tests have been recently approved in emergency fashion by the FDA. In their interim guidance, the CDC says that Antigen tests have very low false positives, but the manufacturers indicate something different for their COVID Antigen tests. One of the main ones out there now is made by Abbot, who generally has some of the most accurate tests across the board. The Abbot press release from a month ago indicates a sensitivity of 97.1 and specificity (false positive rate) of 98.5. Assuming this is a reasonable representation of other Antigen tests that have been approved, it will more-than-likely result in 1/2 of the positive tests being false. Here’s how that works:
Confusion Matrix for the case where we conduct 1000 tests in a population where 2% is infected (very realistic numbers for COVID-19 at a given period of time) with a test that has 97.1% sensitivity and 98.5% specificity.
See the confusion matrix above for the case referenced. Right now 2% infection is a high estimate for just about any community we might sample (Arizona State is indicating that 0.4% of their student population is infected right now). If this number is truly lower, we see a case where nearly every positive result is false. If you take a moment to digest the diagram, you’ll note that the false negatives are very low (the upper right quadrant) where the false positives are about 1/2 of the total positives (lower left quadrant). This is why when a disease is rare (like COVID is — despite all the headlines) sensitivity is relatively meaningless while specificity is critical. The Abbot Antigen test’s specificity of 98.5 sounds great, but in a rare event, it really means that 1.5% of all the people who don’t have the disease (in our case 980 out of 1000) will show up as positive. When we only expect a small number of true positive results (in our case, 2% of 1000, or 20) then the false positives drown out the signal from the true positive. About 1/2 of the people who are told they have COVID in this example actually do not. Hopefully this helps make my case that the state should NOT be including Antigen test results with PCR test results (which since they use DNA/RNA testing to evaluate the presence of the virus have very close to 100% specificity).
Now if you target these Antigen tests in a more focused way, i.e., on a Sorority where you believe a population exists that has a much larger infection rate, then the test will be much more accurate at determining exactly who is infected. This is because there are less “well” people to inflate the false positive count. If the True positives are just twice the number of false positives, the test is now much more useful at evaluating who the sick people really are. BUT, if you deploy it broadly into your broader community the way the U of A is, with thousands of tests per day, the false positives will overwhelm the true positives.