COVID-19 California Update

I saw an interesting article in the LA Times that indicated that the suburban counties of the LA area were seeing case growth higher than LA county. This seemed interesting, because the last time I looked it was no where near the case. I investigated, and thought it might be good to write about, because it’s another example of the lack of data science knowledge in the people that are influencing our opinions with their reporting.

The actual Data from California

Califormia COVID-19 Table – 7/19/20

Here’s the standard US State data table I use. This is the top 14 counties in California sorted by the instantaneous rate of change of their Confirmed Case curve. This is a measure that allows us to find the slope of a curve at any given day. I prefer this approach to averaging the last 7 days of cases, which is what a lot of the media are doing. This post will point out why the averaging approach can be misleading.

So What Do We See Here?

  1. First, it seems like Riverside and San Bernardino Counties both have a higher Case rate of change than LA county. You can see that by their ranking. But it does seem like LA County still has a higher rate of change than Orange county, but by just a little bit… We can also see that LA county has a pretty high cumulative number of cases per 1000 persons. This is measured since the outbreak started whereas the rate of change just measures the current trend. So we would assume that San Bernardino and Riverside counties, who both have lower cumulative numbers than LA county, were probably much lower a while back, but are now catching up.
  2. Cumulative deaths per 1000 are very low across California compared to back East. LA County has significantly more deaths per 1000 persons than Riverside, San Bernardino, or Orange County, but also significantly less than Chicago or New York City.
  3. Imperial County is still the most impacted county in California. This is a border region near Mexicali, Mexico, where there is a very large outbreak right now. I have read that Imperial County hospitals were quickly overwhelmed with US Citizens who live in Mexico but preferred to be treated for COVID in the US. Many of these people have been transferred to LA area hospitals. Imperial County has the largest number of infections and deaths per 1000 people in the state and also has the highest case rate of change.

Why Does the Article Indicate that COVID is spreading Faster in Orange County?

The article indicates that the Inland Empire counties and Orange County all rescinded their mask orders and reopened earlier than LA County. The conclusion is that their case growth since then has been due to this. I think this is a fascinating natural experiment, but am not sure if there’s enough information to make this conclusion. Here’s the chart that the LA Times uses to make their case:

From the LA TImes, July 17

This looks pretty compelling at first. As with any chart, you need to read the fine print. They’ve been averaging the number of cases over the last 14 days and LA county has held pretty much between a 14 day average of 300 and 400 since 7/1. You can see that the trends for Orange County and the Inland Empire Counties have both increased faster than LA County. But taking a 14 day average of cases might present some problems. If I had 1000 cases each day last week and 200 cases each day this week, my 14 day average will be (7000 + 1400) / 14 = 600. There’s nothing wrong with that number, but clearly it doesn’t tell the story of my region. The story is that I had really high cases last week and something miraculous happened and my cases went way down this week. You can’t tell that story when looking at a 14 day average. Here’s a picture of what has happened in LA County and Orange County that kind of goes against the LA Times narrative:

Orange County Confirmed Cases from 3/18 to 7/17/20
LA County Confirmed Cases from 3/14 to 7/17/20

See what happened in the Orange County case curve? Starting about Memorial Day, they had explosive growth in Cases, which started flattening around 7/10. There’s a pretty impressive shift in case growth from that day until the present. About half of the 14 day window the LA Times uses to average is “explosive case growth” and the other half is the new rate that they’ve maintained for a week. Therefore, if they had chosen the more standard 7 day average window (or even better, my preferred approach of instantaneous rate of change) then their chart would have told a completely different story. Who knows why the LA Times chose 14 days? Perhaps it was to make a political point or possibly (my preference) they just didn’t have the experience in house with data science to question their numbers and understand how to present data.

Conclusion

Even though I make this point on the LA Times data science approach, the fact does remain that the suburbs do seem to be catching up to LA county, and this is concerning. Like in AZ, where I have done quite a bit of research into the details of this recent surge in cases, there is probably a very complex story that can’t be simplified into a headline.

Backup – Inland Empire Counties charts

Note that we don’t see the same flattening in Riverside and San Bernardino that we do in OC. Also, the Death curves are below. When normalized by population, LA County and OC are about equal on the Instantaneous Rate of Change for Deaths. Riverside County is the highest for these four counties and San Bernardino has the lowest death rate of change (by a factor of 2).

Riverside Counties Cases from 3/24 to 7/17
San Bernardino County Cases from 3/26 to 7/17

4 Replies to “COVID-19 California Update”

  1. Your data analysis is very interesting and also compelling, but the weekly new case statistics not backed by contact tracing nor death/recovery data rate may not explain the temporal-spatial distribution of COVID-19 cases. I immigrated from S. Korea where the contact tracing and larger scale testing, wherever and whenever the cluster shapes up, revealed the large church gathering, dense call-center and UPS-type delivery/transportation, and bar/Buffett restaurants have been the curplit. It is painful to observe that science based policy and medical infrastructure are falling behind in US.

    1. Yes, Taehwan, clearly the data we’re working with is subpar and there is not a good strategy in place. I feel that the CDC messed up by not having a “shovel-ready” infection data strategy that they could roll out quickly. As I’ve observed, because they’re so slow, the Systems Engineering dept at JHU had to step up and take over the data role that the CDC should have owned. Lots of other issues, mostly regarding lack of systems thinking at the US level.

  2. Hi Todd,
    I love your articles and this one really relates given that I’m moving from Massachusetts to LA County this week. I do have a curious question about the graphics. Given the virus grows exponentially, would it be better to present the data in log space? Perhaps it makes communication easier use linear, but I’d think any trends would be more easily described in log.

    1. Hey, what are you going to do in LA? The virus actually doesn’t grow exponentially (as in y=e^x or y=n^x) but actually all the growth cases I’ve seen are y=x^2 or y=x^3. Small detail, but the media keeps getting polynomial and exponential confused.

      But good question… I’ve toyed with some diagrams in log space (good example would be deaths in NYC compared to deaths in LA) but what I’m trying to do here is be really clear so people don’t get further confused (the various media outlets are major contributors to that confusion). No one outside technical fields understands what a logarithmic axis means and when that LA data looks like it is getting close to NY data, people start to either get worried or start morality posturing. But you’re right. If I was just publishing this for you I’d do it in log space and show lines instead of power curves.

      Take care!

Leave a Reply

Your email address will not be published. Required fields are marked *