The chart above is the one I’ve been thinking about putting together for quite a while now. It’s really busy, but it has a ton of information in it. Here’s how to read it.
- Normalizing Case Counts by Population: I’m comparing both Pima and Maricopa counties (the two largest in the state by far) on a cases per 1000 basis. Why do I do this? If I compare them on raw numbers of cases, it looks like Pima County is doing SO much better than Maricopa because they have 1/4 the cases. However, Pima also has 1/4 the population. This is one way the news media exaggerates stories, probably because it looks stark and dramatic when you don’t compare appropriately. When I do this the right way, you can see that Pima and Maricopa had the same exact slope (more or less) up until Memorial Day. This is the purple arrow. After memorial day, we see case growth accelerate in both counties but a good deal faster in Maricopa County. This is the source of much of the overall case growth in Arizona.
- Polynomial Trend Lines: The fat, light blue and pink lines are the trend lines for Maricopa and Pima respectfully. These are both modeled with 3rd Order Polynomials, which essentially means that the formula to create the trend line is something like Ax^3 + Bx^2 + Cx + D. This essentially shows that the case growth is accelerating (curving upward). Almost every state’s case growth right now can be modeled with this same kind of function. The trend line allows us to do simple predictions for the next few days on what the case growth might be. It is not a good predictor for much more than a few days out because the situation is too complex for that.
- Testing Numbers and Results: The yellow dashed line represents the numbers of tests on each day. I had to pull these numbers from the state’s online Dashboard manually because they won’t let us download it. So the data may be off by 5-10 tests per day. Note that the Test Numbers are represented on the secondary Y-axis (the one on the right). This can be confusing, but it allows me to provide more valuable visualizations. I also tried to capture the weekly percent positive for the tests. As you can see the percent of tests that are positive is growing. I’ll try to offer some possible explanations for that in my conclusion.
- Data Lags: Note that I extend the blue “Stay at Home” rectangle about 10 days past the 5/17 expiration date. I do this to demonstrate that most of the data we see every day has the potential for being as much as 10 days old. Data collection isn’t very clean and efficient when dealing with health-related issues. Any time you look at COVID-19 data, whether it is the CDC or the WHO or the AZ DHS, you need to remember that it’s likely reporting the state from a week or so beforehand. I’ve seen some embarrassing data analyses during this outbreak by professional media that did not account for the fact that recent data is likely to be underreported due to this lag. The testing numbers above are a good result. I have no reason to believe that AZ has slowed testing in the last week. We just don’t have the accurate numbers in yet.
- Events/Triggers: I’ve labeled various events and triggers on the chart. The stay at home order and its expiration are interesting, as is Memorial Day. Face Masks became mandatory in AZ on about 6/20. You can be sure that will be added as an important event as the time goes on and more data comes in. My expectation is that we’ll see some sort of change in the trends in late June or early July (to account for the data lag but also the 14 day hospitalization cycle time).
- Hospitalization: I’m not including hospitalization stats in this chart, largely because first, the chart is already too busy, but second, I have a hard time trusting/believing the states and counties hospitalization data, which all seem to contradict one another. Suffice it to be said that right now hospitals in the state are jam packed with COVID cases and there’s not much margin (at least in the traditional sense).
Analysis of the Chart
- Comparison of Pima and Maricopa Cases per 1000: As mentioned above, it’s very interesting to me that the case growth up until Memorial Day in both counties is essentially linear and basically the same slope. Ending the stay at home order doesn’t seem to have dramatically impacted this case growth rate (even considering the data lag). Two events seem to have occurred simultaneously that may be causal for the dramatic case growth lately. First is Memorial Day. We see the exponential case growth start a few days after Memorial Day. It may well be that a number of people contracted the virus during Memorial Day activities (we’ll probably never know if the protests/riots contributed due to bad data on those events). Maybe there were super-spreader events during the holiday too that we haven’t identified. The second major event that certainly contributes is the doubling of testing that also started about this same time. The state was conducting an average of about 8K tests per day up until about June 4th when it doubled this to an average of about 16K tests per day. During the stay at home period there was an extreme bias in the tests toward sick patients because one could barely get tested unless they exhibited symptoms. Even then, only about 5% of tests were positive. The spreading that may have occurred around Memorial day combined with the doubling of testing have resulted in not just doubling the number of cases, but exponential growth, because now the percent positive rates are approaching 20%.
- Why are the Tests’ positivity rates so high? This is interesting to think about but here are a few possible reasons. First: There is a lot more virus out there now since Memorial Day and people are catching it. One telling stat is what I have shown a few times (which still holds) that shows that the growth rate of infections in the 65+ community is still the same now as it was during the stay at home order. In short, this demographic is still travelling down the same purple arrow! All other groups are reflecting the exponential growth trend. It is likely that the 65+ community is being just as careful now as they did during stay at home orders (and maybe group homes have also become more careful) and they’re avoiding the bloom in the virus. Everyone else is exposed to a larger population of the virus. This is speculation, but it makes sens. Second: It may well be that there is emerging another kind of testing bias and now people who are more likely to be infected are more likely to get tested. For instance, since I can’t see WHERE the tests are being conducted, there’s a chance that a higher percentage of tests are coming from regions that are already having major outbreaks (border counties, native communities). This is possible, especially given that there appears to be clear indications that the virus is more prevalent in some areas than others. The only way to really prevent this bias is to do what some European countries did and randomize testing. Otherwise we have no real idea of what’s happening. Third point: I’m convinced that we’re not seeing issues with false positives on the PCR tests (but I still believe there are high false positives on the antibody tests that make them somewhat less informational right now).
- Why are the Rates different for Pima and Maricopa County? First, one thing we’re seeing is that the rates can be very different in different regions. Not just across the world or across US states, but even by AZ Zip codes. After about 3 weeks of tracking this I’m still seeing the less wealthy zip codes have the highest overall numbers of cases per 1000 people AND the highest growth rates over time. This is interesting to analyze because it makes one curious about why this is happening. There are a number of hypotheses for this. It’s possible that people who are overall less healthy (maybe they don’t have good health care) may be more likely to get infected and then need to seek medical care. However, it does appear that this isn’t a very solid hypothesis when one looks at the demographics where the largest number of cases by far is in the younger, healthier age groups. Culture is one hypotheses I hear, for this, where the cultures in less wealthy regions have evolved to rely on others much more than the cultures in wealthy regions require. There are also ethnic cultures and traditions that may have some causality. Also, based on this evidence, some of the activities that are more commonly engaged in the wealthier zip codes (dining out, going to the gym, etc.) may actually be less causal of infections than we thought. From my observation, also, the culture of mask wearing in Arizona is stronger in the wealthier zip codes than in less wealthy or rural zip codes. It’s possible that this has an impact, but time will tell how significant of one. Regardless, there’s still much to learn about this.
- Case Severity: With this virus, just like with the flu, there is a very wide range of severity. Measuring cases is interesting from a numbers standpoint, but it is not a good representation for the severity of an outbreak. A very large majority of the new cases we’ve been measuring (and in some cases, stressing about) are asymptomatic (or low-symptomatic) cases that aren’t requiring medical attention. The better measure of severity is deaths, of course, but also hospital cycle times and capacity measures (because they’re leading indicators for deaths). The hospital measures are extremely hard for a number of reasons mentioned in an earlier article on this site… Hospitals and their staffs are clearly being stressed with the growth in severe cases (even thought this growth is very small compared to the asymptomatic cases). Some of this is because this disease forces a 2-plus week cycle time on cases, something that appears to be extremely unusual for viral infections.
The Effect of Wealth on Cases in a Region
Above is the latest comparison of COVID-19 cases per 1000 population compared to median income. Note that the lowest median income zip codes is on the left and the highest is on the right. The average number of cases per 1000 for the poorest 25 zip codes is 9. The average for the wealthiest 25 zip codes is 2.6. You can see the yellow trend line shows a decrease in cases from the left to the right (case counts are on a logarithmic scale on the right y-axis). The red line are actual cases per 1K for the zip codes. Note that you may not see your zip code labeled on the chart (only about every 10th zip are labeled because otherwise the chart would extend around the room!).
Case growth follows this exact same trend. This means that the regions with the highest rate of change in their case counts (hot spots) tend to be on the left of this chart. This indicates that the overall trend of more cases in less weathy areas is not changing.
The Effect of Population Density
One explanation for the “wealth effect” is population density. This makes sense in light of the now-ubiquitous 6 Feet of Separation. Many of the lower income areas with high outbreaks are in zip codes that are known to have large numbers of people living in dense environments (apartment complexes, for instance). However, some of the regions with the highest outbreaks are rural and agricultural regions that have very low population density.
Overall, however, the chart above does show that the cases per 1K tends to go up as population density increases. The trend line is fit with an exponential function that has a decent (but not ideal) fit. Most likely, density is one component of the problem, but is likely not one of the larger components.
The Effect of Median Age
Another interesting characteristic of some zip codes that may be driving higher case counts is median age. This makes sense, especially since we already know that most of the cases in the current outbreak are asymptomatic cases among younger people. Therefore, this chart tells a very clear story. Outbreaks are much higher in regions with lower median ages.
Two questions/thoughts…
1. I would really like to see the death rates… since we aren’t hearing anything more about the rate of death due to COVID-19, I’m assuming that that rate has rated relatively constant (I.e. the rate of confirmed cases rising exponentially has little affect on the death rate).
2. I have heard from several health care professionals that due to government incentives, just about anyone showing symptoms similar to COVID-19 symptoms were being reported as having COVID-19… seems like this practice would cause a big skew in the perceived reality of positive cases. Have you heard any similar reports?
P.S. Tod, you’re using Jupyter Notebook, aren’t you?
Hi Jim. I tend to show tables from time to time with the Deaths per 1000 persons. Is that what you’re looking for? I’ve been trying not to contribute to COVID-overload by posting the tables each day, but maybe I ought to. In short, Death rates are way down. Case Growth rates are way down too from the surge in April/May. And the areas where surges are happening right now seem to be having a very large number of asymptomatic cases. In the west our hospitals are getting hit hard, but some component (large?) is actually due to surges of US citizens and legal residents coming to the US for COVID treatment. The case counts for the Mexican states below us are probably 5-10x higher than they’re reporting, just judging from the high number of deaths they have to a low number of reported cases.
I’ve heard the stories of overcounts, too, but I tend to give less weight to some peoples’ perception that this is the Gov’t manipulating numbers. If it’s true, it’s likely due to an economic cause… hospitals are getting hit hard in their cash flow. I expect any of this would be offset by the practice of undercounting that happens in nursing homes where (I’m told by a nursing home MD) they often don’t bother to give the COVID test because a person is so sick with whatever other thing ails them. In those cases they move straight to palliative care and never have a confirmed case. So maybe these things balance?
And yes, I’m a huge proponent of Jupyter Notebooks. Most of my standard visualizations are just straightforward Matplotlib. Everything is done in Python.
Oh, and I just figured out that you’re Jim Palmer. Good to hear from you, friend! 🙂
Hi Tod – a trigger event missing from Analysis (1) is the phased expiration of the stay at home / business closure orders. One expects a 2-3 week lag in case diagnosis due to the incubation period (in most cases up to 14 days) and the delay in getting a test and diagnosis. The increase in cases at Memorial Day aligns to this timing well, better than I thought it would, particularly since behavior (as you note) is more important than orders. A note on (5) – there is and has been no State-wide mask order. Those are implemented at the local level, when allowed by the State government. From that perspective your focus on Maricopa and Pima counties is spot on.
Hey Jim, you’re right. I keep thinking about how the phased expiration could inform us better as to what causal activities for COVID infections actually are… unfortunately, the data’s a bit unclear. The Zip Code data I’ve been using lately helps a bit, though, because certain activities are more likely in some zip codes than others. AND the mask rollout being inconsistent gives us a really interesting natural experiment. I’m curious if we’ll see increased rates in a large, non-mask county (Pinal?) compared to Pima and Maricopa. I did an analysis of two neighboring border counties, Webb County, TX, (masks mandatory 4/3/2020) and Hidalgo County, TX, (no masks until mid June) but can’t see any real evidence that Webb County’s ordinance had an effect. Quite a bummer.
All this makes me break out in a cold sweat like I just read a question on a statistics or economics exam that I don’t understand an will never be able to answer! But its very informative, thorough, and full of good analysis. The frustrating part to me are the ordinances that seem to be based on pure guesswork. Its throwing darts at a board with a blindfold. The mask thing seems to be just to make people feel better and I DO NOT like being told what to do just to make people feel better. Slippery slope from a civil liberties standpoint if you ask me. Maybe a mask is a small thing but what is next? I don’t like how easily the herd falls in line with this stuff. Sheep always get slaughtered.
Alan, it’s a conspiracy between me and your former Economics instructor at Baylor to keep you in a perpetual state of “About to get a 13 on a test”! 🙂
The ordinances aren’t based on solid science like you’d hope. I would have thought the CDC would have been curious about the effectiveness of pandemic control measures, but it seems like they weren’t. There’s work in this space from the Asian CDC’s, but not much that’s very mature. Of course it’s hard to design experiments like this at a large scale, but even exploiting natural experiments (like a sick passenger on an airplane and how they spread the virus) is kind of lacking.
Take care. Hopefully you’ll only get sheared, not slaughtered! 🙂 (you’re kind of skinny, so maybe you have a chance)