It has been interesting to see that the distribution of Confirmed COVID cases in AZ has followed the Pareto Principle. This principal is also sometimes called the 80/20 law and essentially refers to a process where 80% of the effects result from 20% of the causes. There are lots and lots of examples of this in nature and society, such as:
- 20% of criminals commit 80% of crimes
- 80% of the world’s wealth is held by 20% of its population
- Microsoft found that 20% of Software bugs resulted in 80% of the crashes
- 20% of drivers cause 80% of all traffic accidents
- 80% of pollution originates from 20% of all factories
- 20% of a companies products represent 80% of sales
- 20% of employees are responsible for 80% of the results
- 20% of students have grades 80% or higher
Why is this principal useful? Not all issues follow this principal but a surprisingly large number do. Lots of business gurus have a strong intuition for problems that might be Pareto problems, because that gives them an easy place to attack (the 20%) in order to realize lots of gains (the 80%).
How does this apply to the current summer Arizona COVID-19 outbreak?
The map below shows the 20% of zip codes in Arizona that account for 80% of the cases (by the way, I checked, the top 20% of zip codes for cases per 1000 people also comes out to 80% of the cases). If I were running the state response, these are the zip codes I’d be focusing on. Probably my start would be to flood the areas with low- or zero- cost tests so I’d know exactly where the outbreaks are occurring in those regions and hopefully what the transmission vectors are. Perhaps that’s actually what is happening. If true, of course, it reinforces the perception of the problem because now testing would be non-uniform, focused mostly on the problem areas. But in the world of limited testing resources, I suppose this is the least-bad problem.
Analysis
First, you can see that the zip codes do correspond with large population centers. This makes sense since in this chart we’re evaluating raw numbers of cases. The light green and red colors inform us where the larger outbreaks in these population centers are. You can see that SW Phoenix, South Tucson, and the border regions (Yuma, Nogales, Douglas) are the highest affected areas. We also know that most of the cases in Arizona are people between 20 and 44 years old. One can assume that the 20% zip codes with the most cases may have even higher 20-44 representation than the rest of the state. I wonder if we’ll see these regions hit “herd immunity” (if that exists for this virus) earlier than the large numbers of zip codes that have low numbers of cases?
I have read multiple reports recently that the Arizona outbreak is primarily a mutation of the coronavirus that is characterized by high transmission and low mortality. I have no idea if this is really true, but if so, it does explain why the state has had so many cases with a correspondingly low number of hospitalizations and deaths (of course these are still occurring, but they’re only increasing on the order of 20-150 per day whereas cases have been increasing on the order of 2000-4000 a day for a few weeks now).
This makes me curious if this is the same mutation that is rampant in Northern Mexico right now. There is evidence (just see the map above) that areas with lots of essential travel to Mexico have very high numbers of COVID-19 cases. Sonora, Mexico, just over the border from Arizona has very, very close ties with our state. They are advertising just over 10,000 cases (to Arizona’s 116,000) with 987 deaths (to Arizona’s 2151). Arizona’s population is over 2.5 times larger than Sonora, so if you scale these numbers by population you see Nogales reporting 3.5 cases and 0.34 deaths per 1000 people. Arizona is reporting 15.9 cases per 1000 people and 0.29 deaths per 1000. If all things are equal, this indicates that Sonora is likely under-reporting their cases by about a factor of 5 (at least… I hold that Arizona’s case numbers are at least 2x too low due to testing bias). This makes sense, as Arizona is testing like crazy and finding a lot of cases that are less-symptomatic, whereas I imagine Sonora may not be doing this. The spread of the virus between the US and Mexico (probably both directions) is also interesting because it reflects the culture of constant commerce and relationships between Mexico and Arizona and gives insight into how this virus travels a very complex route.
Todd,
Wouldn’t be better if you had a similar map based on per capita numbers rather than nominal. Santa Cruz/Apache/Navajo counties have high per capita numbers but don’t make your map due to low gross numbers. Then attack the zip codes with the highest percentage of penetration rather than just the populated zip codes.
Tim, I have done that… What I found is that when I plot by cases per capita, the top few zip codes are so much larger than the rest that it renders my whole colormapping scheme ineffective… i.e., I have two zip codes that are bright red, one that is light blue, and the rest are all dark blue, regardless of their numbers. The border zip codes are the ones with the really high per capita numbers… one zip near Yuma is getting close to 100%, no kidding! Interestingly, I found that if I pick the highest 20% of the zip codes by per capita cases, they ALSO represent 80% of all the cases! Lots we could discuss there over steaks! 🙂 Also, I don’t have numbers by Zip for most of Navajo and Apache due to tribal regulations. That’s why those two counties are a big blank…