Latest Arizona Case Stats by Zip Code – the Challenge of Measuring Cases

Top Twenty AZ Zip Codes by COVID-19 Case Growth, 9/11 to 9/18 – data source: AZDHS Dashboard
Table: All AZ Zip Codes with over 6% Case Growth between 9/11 and 9/18 – data source: AZDHS Dashboard

Evaluation of Case Growth Over the Last Week

I notice a handful of interesting things in the data this week.

  1. University of Arizona cases APPEAR to have shot through the roof. Note that 85719 has most of the U of A population (see how big the zip code is?) and it appears to have a very large majority of the University-related COVID growth. The next highest zip codes in Pima County are residential zip codes from suburbs such as Oro Valley, Marana, and Green Valley and the growth percentage is based off of very small numbers of cases. These would all be very long commutes for an on-campus student. 85705 is the zip code just north of campus and it saw an 8% increase in cases, which is interesting, because I’m curious if U of A cases will start spreading to adjacent zip codes. But the 100 cases that make up this 8% growth is far smaller than the number we see in 85719 over the last week. Will continue watching this zip code to determine if the University outbreak is spreading. There’s a good chance this is just measuring COVID-positive students that are living off-campus in large complexes.
  2. The 85719 Case Growth captured by the state seems much too high based off of the number the University is releasing from their new dashboard. It’s not clear how numbers get from the University to the State, but I can’t see much consistency to date. More on this later in this post.
  3. I notice that Case Growth in Flagstaff (Northern Arizona University) has increased. The raw number is ~100 new cases, but this is based on a small number of cases to date. Last week we didn’t have many new cases from this zip code.
  4. I also notice that ASU’s main campus in Tempe doesn’t even factor in the top 20 any longer. I look at their numbers and see an increase of only about 20 cases. This combined with the official ASU reporting (here) makes very little sense. I’ll analyze this later in this post too.
  5. The 85709 zip code in the southwest corner of Phoenix continues to see large case growth. This zip code has seen a lot of cases and was frequently one of the hottest COVID spots during the June-August phase in the outbreak where case growth was the largest. Back then, there was evidence that the outbreak in this zip code was correlated with the similarly large outbreak in Sonora, Mexico, but this may not be the case now. It doesn’t seem obvious that this growth has any correlation with the university cases either. I can’t see case demographics by zip code, but I do know that the age demographic under age 44 accounted for 64% of the Case Growth in all of Maricopa County. Since 85709 has a median age of 28, there’s a good chance that over 64% of the new cases in this Zip code are under 44. I still feel that this is interesting and ought to be evaluated.

The Challenges of Understanding Case Growth Accurately

The confusing nature of the latest data from the state is something worthwhile to discuss because I’ve noted news outlets (tucson.com is terrible about this for instance) grabbing the latest U of A numbers, interviewing one U of A professor, and then writing a very scary but highly inaccurate article. It’s even worse now since the numbers are smaller and therefore plagued much more by statistical variation. So here are some thoughts about our current state of counting cases to help you understand what might be really happening.

  1. It is Difficult to Use Data that is Generated “by Accident” to Learn Big Things. In an application of data science within a field like epidemiology we often want to draw an inference from a selection of measured data that applies to a broad population. This is usually done by sampling a representative portion of the population to the overall population we want to understand. Just like conducting an election poll, this kind of representative sampling needs to be well-designed and well-measured. The collection of COVID-19 has come about “by accident” and thus has nothing in common with a well-architected election poll. This means we can truly extract very little inference about specific aspects of this outbreak from the data samples that come into the state DHS dataset. Due to the nature of collection of data in an emergency (without any pre-formed strategy, of course) we get what we get and if we’re lucky we can determine if any natural experiments can be uncovered in the data. Just keep this in mind and it will help. 🙂
  2. The University of Arizona Appears to be Relying too Much on an Inaccurate Form of Testing. The data sampling strategy at the U of A and apparently at the AZDHS has changed since school resumed on campus. U of A built their dashboard and this clarified some of their strategy but also revealed some real gaps. What does their strategy appear to be? Conduct low-cost Antigen tests that provide results in real time whenever there’s any evidence of a localized outbreak. This makes good sense based upon the apparent limitation of the Antigen tests (see point #x). Isolate the people with positive results and conduct more-accurate (but slowly scored) PCR tests on the symptomatic (or on football players with positive Antigen tests…). We know the numbers of Antigen tests vs. PCR tests (about 10 Antigen tests to every PCR test) and the numbers of tests conducted by Campus Health to those conducted elsewhere (10% of tests are being done at Campus Health). This seems to indicate that 10% of the U of A positive COVID cases have symptoms deemed worthy of a visit to the nurse. The upside is that this seems to be a pretty solid approach. The downside seems to be that the positive Antigen tests (about 1/2 of which are likely to be false positives) are getting inconsistently sucked into the AZ DHS case data. The reason I struggle with this is that the quality of the Antigen results is highly variable and likely to be wrong. This also drives more chicken little journalism. In my mind the only valuable positivity numbers are coming from the PCR tests being conducted at the health clinic. These will isolate the positive cases with symptoms (but will likely miss the much larger numbers of students that get COVID without symptoms). Unfortunately, the state seems to be recording all the positive numbers, including the many false positives.
  3. Yet Again, Arizona’s DHS Has Changed their Measurement Strategy in Mid-Stream. AZDHS has changed their collection strategy. My points above about these Antigen tests being less useful for serious data collection have kept results from these tests out of the AZDHS data up until this week. I noticed on their dashboard that they changed the name of a category from “PCR Tests” to “Diagnostic Tests”. This, combined with the large increase in tests at the same time makes it clear to me that they’re now equating PCR and Antigen testing and pulling in the Antigen test results from the U of A and elsewhere. My experience is that it is NEVER good to change your data collection strategy in mid-experiment. Now all the new test data is contaminated and will be statistically different than the first few months of data collection. What they should have done is added a third category of testing. Then they could report on PCR Tests (the gold standard), Antigen Tests (less accurate but valuable for speed), and Serology Tests (for antibodies). The willingness by AZDHS to change measurement strategies in the middle of a health care crisis continues to surprise me (no, this is not the first time).
  4. Arizona State’s COVID Stats are Not Very Transparent. ASU seems to not have a very solid collection strategy and their numbers make very little sense. Their numbers are surely not decreasing, but that’s what they seem to be advertising. They describe a decrease of around 120 cases in three days from their Tempe campus. This seems very strange considering that U of A is showing case growth at U of A of around 500 during that same time frame (this is both PCR and Antigen test numbers). Clearly the two Universities are not measuring the same way.
  5. The State DHS Numbers Don’t Seem Accurate for the Primary U of A zip code. The 85719 numbers from the AZDHS site showing growth of about 1400 cases in the last week seems out of line compared to the 890 cases (PCR+Antigen) the U of A reports. Only about 200 of those cases are based on PCR test results. This is further evidence that AZDHS has now started recording positive Antigen tests. This is another data measurement mistake. For the first X months of COVID all our results are based off PCR tests which have very few false positives. Now we’re adding a low-quality source of data to the high quality one and we can’t separate them. Most likely this number is erroneous and I suspect that the confusion in changing the method that the state records cases may be partially to be blame. I’d guess some accidental double counting is happening in this confusion.
  6. COVID Antigen Testing False Positives make the Test Less Meaningful: I’m disparaging Antigen tests a bit here. These tests have been used for years in other diseases to identify key proteins that will signify the presence of a viral infection. COVID-specific Antigen tests have been recently approved in emergency fashion by the FDA. In their interim guidance, the CDC says that Antigen tests have very low false positives, but the manufacturers indicate something different for their COVID Antigen tests. One of the main ones out there now is made by Abbot, who generally has some of the most accurate tests across the board. The Abbot press release from a month ago indicates a sensitivity of 97.1 and specificity (false positive rate) of 98.5. Assuming this is a reasonable representation of other Antigen tests that have been approved, it will more-than-likely result in 1/2 of the positive tests being false. Here’s how that works:
Confusion Matrix for the case where we conduct 1000 tests in a population where 2% is infected (very realistic numbers for COVID-19 at a given period of time) with a test that has 97.1% sensitivity and 98.5% specificity.

See the confusion matrix above for the case referenced. Right now 2% infection is a high estimate for just about any community we might sample (Arizona State is indicating that 0.4% of their student population is infected right now). If this number is truly lower, we see a case where nearly every positive result is false. If you take a moment to digest the diagram, you’ll note that the false negatives are very low (the upper right quadrant) where the false positives are about 1/2 of the total positives (lower left quadrant). This is why when a disease is rare (like COVID is — despite all the headlines) sensitivity is relatively meaningless while specificity is critical. The Abbot Antigen test’s specificity of 98.5 sounds great, but in a rare event, it really means that 1.5% of all the people who don’t have the disease (in our case 980 out of 1000) will show up as positive. When we only expect a small number of true positive results (in our case, 2% of 1000, or 20) then the false positives drown out the signal from the true positive. About 1/2 of the people who are told they have COVID in this example actually do not. Hopefully this helps make my case that the state should NOT be including Antigen test results with PCR test results (which since they use DNA/RNA testing to evaluate the presence of the virus have very close to 100% specificity).

Now if you target these Antigen tests in a more focused way, i.e., on a Sorority where you believe a population exists that has a much larger infection rate, then the test will be much more accurate at determining exactly who is infected. This is because there are less “well” people to inflate the false positive count. If the True positives are just twice the number of false positives, the test is now much more useful at evaluating who the sick people really are. BUT, if you deploy it broadly into your broader community the way the U of A is, with thousands of tests per day, the false positives will overwhelm the true positives.

COVID-19 Arizona Case Growth – 9/11/20

Arizona has seen its case growth numbers head towards zero for the last few weeks, but there may still be some value in exploring how the infection is affecting the state. Remember, tracking COVID cases is not useful in itself. Cases are a strong leading indicator, of course, of things we structurally care about as a state, such as hospital overtaxing and ultimately deaths. I believe this is the most productive mindset to have when approaching cases. Here is what the state’s cumulative Case curve looks like now.

AZ cumulative Cases, 9/11/2

We note that the Instantaneous Rate of Change (IROC) of the curve has now dropped to somewhere around 790. The trend is decreasing, however, as you can note about 4 days in a row where the rate appears to be approaching zero. We have three to four days of anomalous data from about 9/2 to 9/4, where the state appears to have been capturing University Antigen tests as confirmed cases. As the U of Arizona learned, at least, many of these Antigen positive results have turned out to be false positives when checked with a subsequent, more accurate PCR test. It appears from the data that 60-70 percent of the Antigen positive results are false positive. Since this realization, the state appears to only be counting the university cases if they’re confirmed with a PCR test. But not doing this for 3 days or so appears to have inflated our case numbers. Enough on that.

Zip Code Case Growth Update

Top Thirty Zip Codes by Increase in COVID-19 Cases from 9/5 to 9/11

This map doesn’t look much different than the previous week’s case increase map, except that there appears to be a bit higher numbers in Flagstaff (home of Northern Arizona University) and Prescott (home of Embry-Riddle University). But by far, the top two zip codes in case growth over the last week continue to be the homes of the University of Arizona and Arizona State. This is true even though the numbers of cases reported have dropped a bit due to only recording the cases confirmed with PCR.

Table of Zip Codes

Top 12 Zip Codes by Case Growth, 9/5 to 9/11

The main thing to note here is that the top two are Tempe’s and Tucson’s University zip codes. Snowflake’s showing up as number three is a bit deceptive. They had 11 new cases this week, but they’ve only had 128 cases to date before this week. The 11 might be from one significant spreading event, or it could just be random noise. The 85009 zip code in Southwest Phoenix has been one that has had a handful of case spikes since Memorial day. The 200-ish new cases in that Zip code could be significant, especially since the Mexico-related infections from a month or two ago seem to have slowed significantly.

Conclusion

Data indicates that COVID-19 might be in the process of burning itself out in Arizona. For now at least… It will be interesting to see if the University cases lead to increased hospitalization numbers in their demographic about a week from now (so far, there hasn’t been any change). With this Zip Code approach above, we can also track if the University cases are spreading to adjacent or other Zip Codes.

COVID-19 Arizona University Area Outbreaks

Below you’ll see the Arizona Zip Code map of Case growth in the last week. Color of the bubbles represents the % growth in cases over one week. Size of the bubble represents population size of the zip code. What do we see?

1. We see two zip codes with growth far greater than any others. 85719 (U of Arizona) and 85281 (ASU) come in at 38% and 23% growth in cases over the last week. The next highest zip code is in Buckeye and comes in at 7.3% growth.

2. Flagstaff comes in around 4.3% growth. Perhaps they party less at NAU, or maybe there are less cases at altitude?

3. The below map only shows the top 30 zip codes. Most of these are under 5% growth.

4. Right now I’m doing this to see if the university cases spread outside the university areas. My hypothesis is that they will remain contained and the infection will burn itself out in those zip codes. I’ll be watching this and publishing results about every week. I’m also watching the hospital stats closely to see if the university case growth will result in increases in hospitalization.

Arizona map of top 30 zip codes by case growth between 8/30 and 9/5/2020
Table showing top 18 zip codes by case growth between 8/30 and 9/5 and some info about each zip code

UPDATE

Apparently, it turns out that some of the numbers from the Antigen tests have been false positives. The U of A admitted this and in doing so, it became clear that positive Antigen tests are going to the university health center to take PCR tests to confirm. Initially, the state was counting all of the Antigen positive tests as positives overall, but that seems to have stopped. Recall my earlier discussions about specificity and false positives. Any time a test has a specificity of around 97 or 98% and the disease is infecting only about 2-3% of the population you’re going to have about 1/2 false positives. See the university’s chart below. If my detective work is correct, all 109 Campus Health tests below were on people who had come up positive in previous days/weeks on the Antigen test. If true, then there’s about a 60% false positive rate (which makes sense based upon the possible specificity of the Antigen test and the rate of infection on campus). Will keep watching this, but it seems less concerning than before.

College is Back in Session. Did COVID Come Back With it?

Interesting data from the first week or so back at schools. University of Arizona had about 126 cases reported today while the entire rest of the county had around 30. Test positivity (a bad metric the way most government groups are trying to use it) has been about 2.5% at U of A since 7/31 until today where it jumped to 8.2%. Its hard to make much of a judgement from this as I don’t have any of the data between 7/31 and today, but that might be surprising. It does appear to be a large jump in tests from the average since 7/31, which might indicate there are more people feeling sick enough to get tested. I can’t tell much more because U of A’s data is kind of sparse and I can’t find numbers to indicate how many students are on campus right now. ASU, however, gives us a better wealth of data about their cases…

Here’s from the Arizona State University COVID page.

https://biodesign.asu.edu/research/clinical-testing/asu-covid-19-management-framework

Takeaways from this info are as follows:

  1. First off, we don’t have good numbers from ASU on how many tests were given to arrive at the numbers listed above.
  2. Cases for ASU faculty and staff appear low compared to their likely demographic in the rest of the state. The number listed is 0.2%, but it isn’t clear if that’s a cumulative count or an instantaneous count of active cases. Even if it is the count of Active Cases and these staff are in quarantine, that is still far lower than the instantaneous count of the students Active Cases (see below, it’s 3.4%). The demographics that ASU employees are most likely to be in ranges from 20 to 64, which represents 3 demographic categories the state has been collecting case data on. All three of these are experiencing something on the order of 3.6% cumulative infection rates (or 36 out of 1000 as my chart below shows), so we would expect their current outbreak rates to be similar. There may be a data collection divergence between ASU collection and the State of Arizona collection (perhaps some faculty and staff tested positive over the summer at a CVS but didn’t tell their employer?). However, this is still a fairly big gap. Does it mean that university employees are less likely to get COVID than their same-age counterparts outside the university? What about their potential exposure to sick students? More to follow, but this does present some interesting questions.
  3. The 1.3% positivity across all 74,500 non-online students is probably a case where the denominator is artificially large. How many of these students left campus and have been sheltering at home? I don’t think this number is relevant.
  4. The more interesting indicator is the 336 positive students out of 9662 living on campus at Tempe. It’s unclear what the time period is that ASU has been collecting this, but the fact that they are apparently currently in isolation sounds like they are Active Cases, not cumulative cases. If true, this is a very high rate of Active infection for the Tempe campus (3.4%) and is about equal to the infection rate their age demographic has experienced cumulatively in the state since the start of the outbreak. This would be a big jump (it would account for 1/2 of the cases in the County on 9/2) and it appears that we can see it on the Maricopa County case chart.
  5. There are 32 cases out of 1195 in the ASU Downtown and ASU west campuses. Again, well have to assume that these are active cases since they’re referenced as being “in isolation”. Therefore, 2.6% of the students at these two campuses are currently sick with COVID-19. Compare to 3.4% from Tempe and 0.2% in the faculty and staff.
  6. Finally, and my favorite stat here is that there are 0 cases out of 771 students at the ASU Polytech campus. That of course is 0%. What do we take from this? Nerds are more careful? They wash their hands more? Or maybe there’s just not much partying going on at this campus (it has various government agencies sharing the campus along with other technical education programs).
as of 9/2/20, the cumulative count of cases by age group measured as a count out of 1000 people in that age group.
cumulative total case count for Maricopa County. Note the measurable increase in slope on 9/2. Perhaps this is due to the cases on the ASU campus.