COVID-19 Topic: The Scarcity of Counties with High Cases per 1000 people.

I have been watching COVID-19 Cases per 1000 numbers flatten off around 15 or 20 in counties regardless of whether they were actively managing the outbreak or not. This has made me wonder if there were not a biological reason why the outbreaks tend to hit limits. Collecting and visualizing existing data would give some insight as to whether this hypothesis had enough merit to evaluate more closely. Below is a quick analysis of what the data actually tells us about the commonality or scarcity of counties with high normalized case counts.

Methodology

First, I’ll explain what a histogram is. Whenever you have data that falls into a certain range, say 0 to 10, you can take a count of the number of examples of that data that fall into bins within that range. The simplest way to bin this 0-10 range would be 0-1, 1-2, 2-3, and so on. This would give you 10 new ranges as your bins. Counting the number of examples in your data that fall between 0 and 1 gives you the number in the y-axis of the histogram (the bins become the x-axis). For many processes, we may see the histogram form that looks like a Gaussian (or bell-shaped) distribution with low numbers in the bins towards the edges and high numbers of counts around the mean (say 4-5 or 5-6). The histogram then gives us a sort of probability distribution if done correctly that can tell us a lot about the process we’re measuring.

So below you’ll see a histogram where I have bins that each represent 2 Cases per 1000. This covers a range up to our highest COVID Cases per 1000 number (around 140). As you can see, the highest counts cluster in the bins toward the left side of the chart. This resulting histogram (the gray bars) looks like the discrete Poisson distribution and the shape of the distribution can be modeled as an exponential decay (the red line). This is pretty interesting because I’ve found that the slope of cumulative case growth is best modeled with a third order polynomial, but the exponential decay is a much steeper slope than a polynomial. I’m curious about what this might be indicating, but this is the same type of process as radioactive decay.

The formula for this exponential decay is y = a*(-b)^x + c , where a represents the original amount, b represents the amount of change (note that since this is decay, b is negative), x in this case represents the growth in cases per thousand, and c is a constant. The b parameter is a measure of the steepness of the curve at any position x, so it is interesting to see how b changes over time.

You can see the values of a, b, and c in the upper right of the graph below. This is the most recent histogram. We can see that there is a steep decay down to the asymptote where we see counties with more than 60 cases per 1000 to be somewhat of a black swan event.

Histogram of Number of Counties across Cases per 1000 – 8/4/20

Now we’ll look at the histogram from 2 weeks earlier on 7/18. As you can see the b value is a bit higher, which makes the slope a bit steeper.

Here’s the histogram from 7/4, one month earlier than the top chart.

And the histogram from 6/4.

And finally 5/4


Conclusion

Overall, what I note in this data is that the probability of counties with large numbers of cases per 1000 is increasing over time. The trend on the steepness of the exponential decay curve that fits these Poisson distributions is that it seems to half every month. This is also an exponential decay signal in itself. Interesting…

However, there does appear to be some fundamental limiting factor based on the total number of cases in the country. The exponential distribution has a finite variance, which limits surprising “black swan” events in the tails of the distribution. The fact that the counties with large numbers of normalized COVID-19 cases are rare and that this trend follows this distribution and is best fit with an exponential decay curve indicates that the system that generates COVID-19 cases in counties (a system which includes natural and geographical features, societal control features, and cultural elements) naturally limits the cases. At least this is what the data has shown so far.

However, there does appear to be some fundamental limiting factor based on the total number of cases in the country. The exponential distribution has a finite variance, which limits surprising “black swan” events in the tails of the distribution. The fact that the counties with large numbers of normalized COVID-19 cases are rare and that this trend follows this distribution and is best fit with an exponential decay curve indicates that the system that generates COVID-19 cases in counties (a system which includes natural and geographical features, societal control features, and cultural elements) naturally limits the cases. At least this is what the data has shown so far.

Update – 9/6

The peak of the histogram has shifted to the right as more and more counties have experienced COVID case growth. However, the exponential fit of the slope (-b) from the peak downward is still in the same ballpark as it was a month ago. What does this indicate? I’m not completely sure, but it seems like the fundamental nature of the ecosystem (the world, the US, political systems, etc.) that generates “cases” remains consistent. Outlier counties in normalized case count are still very rare.

Leave a Reply

Your email address will not be published. Required fields are marked *