I saw an interesting article in the LA Times that indicated that the suburban counties of the LA area were seeing case growth higher than LA county. This seemed interesting, because the last time I looked it was no where near the case. I investigated, and thought it might be good to write about, because it’s another example of the lack of data science knowledge in the people that are influencing our opinions with their reporting.
The actual Data from California
Here’s the standard US State data table I use. This is the top 14 counties in California sorted by the instantaneous rate of change of their Confirmed Case curve. This is a measure that allows us to find the slope of a curve at any given day. I prefer this approach to averaging the last 7 days of cases, which is what a lot of the media are doing. This post will point out why the averaging approach can be misleading.
So What Do We See Here?
- First, it seems like Riverside and San Bernardino Counties both have a higher Case rate of change than LA county. You can see that by their ranking. But it does seem like LA County still has a higher rate of change than Orange county, but by just a little bit… We can also see that LA county has a pretty high cumulative number of cases per 1000 persons. This is measured since the outbreak started whereas the rate of change just measures the current trend. So we would assume that San Bernardino and Riverside counties, who both have lower cumulative numbers than LA county, were probably much lower a while back, but are now catching up.
- Cumulative deaths per 1000 are very low across California compared to back East. LA County has significantly more deaths per 1000 persons than Riverside, San Bernardino, or Orange County, but also significantly less than Chicago or New York City.
- Imperial County is still the most impacted county in California. This is a border region near Mexicali, Mexico, where there is a very large outbreak right now. I have read that Imperial County hospitals were quickly overwhelmed with US Citizens who live in Mexico but preferred to be treated for COVID in the US. Many of these people have been transferred to LA area hospitals. Imperial County has the largest number of infections and deaths per 1000 people in the state and also has the highest case rate of change.
Why Does the Article Indicate that COVID is spreading Faster in Orange County?
The article indicates that the Inland Empire counties and Orange County all rescinded their mask orders and reopened earlier than LA County. The conclusion is that their case growth since then has been due to this. I think this is a fascinating natural experiment, but am not sure if there’s enough information to make this conclusion. Here’s the chart that the LA Times uses to make their case:
This looks pretty compelling at first. As with any chart, you need to read the fine print. They’ve been averaging the number of cases over the last 14 days and LA county has held pretty much between a 14 day average of 300 and 400 since 7/1. You can see that the trends for Orange County and the Inland Empire Counties have both increased faster than LA County. But taking a 14 day average of cases might present some problems. If I had 1000 cases each day last week and 200 cases each day this week, my 14 day average will be (7000 + 1400) / 14 = 600. There’s nothing wrong with that number, but clearly it doesn’t tell the story of my region. The story is that I had really high cases last week and something miraculous happened and my cases went way down this week. You can’t tell that story when looking at a 14 day average. Here’s a picture of what has happened in LA County and Orange County that kind of goes against the LA Times narrative:
See what happened in the Orange County case curve? Starting about Memorial Day, they had explosive growth in Cases, which started flattening around 7/10. There’s a pretty impressive shift in case growth from that day until the present. About half of the 14 day window the LA Times uses to average is “explosive case growth” and the other half is the new rate that they’ve maintained for a week. Therefore, if they had chosen the more standard 7 day average window (or even better, my preferred approach of instantaneous rate of change) then their chart would have told a completely different story. Who knows why the LA Times chose 14 days? Perhaps it was to make a political point or possibly (my preference) they just didn’t have the experience in house with data science to question their numbers and understand how to present data.
Even though I make this point on the LA Times data science approach, the fact does remain that the suburbs do seem to be catching up to LA county, and this is concerning. Like in AZ, where I have done quite a bit of research into the details of this recent surge in cases, there is probably a very complex story that can’t be simplified into a headline.
Backup – Inland Empire Counties charts
Note that we don’t see the same flattening in Riverside and San Bernardino that we do in OC. Also, the Death curves are below. When normalized by population, LA County and OC are about equal on the Instantaneous Rate of Change for Deaths. Riverside County is the highest for these four counties and San Bernardino has the lowest death rate of change (by a factor of 2).