COVID-19: Data Normalized by Population 3/23/2020

Much of the results I’ve been showing in the last few days has been total counts of cases, deaths, etc., usually by country or state. Yesterday i decided that there’s probably enough data at this point to evaluate whether all the counts should be normalized by the country’s population.

Reasons for doing this are probably obvious… we saw a large spike in cases quickly in Hubei Province in China and then later saw a huge spike in cases in Italy. At some point last week Italy passed China in number of counts of cases, but was that really the most interesting measure? Since Italy has a much smaller population than China, the outbreak in Italy per 1000 people living there was much, much higher.

Here are the rankings (minimum population of 5 million) after being normalized for population.

Now comparisons can be made that are more relevant. Note that Italy and Switzerland have nearly the same number of Active Cases per 1000 population but Italy has about 10x the number of deaths. Why is this? I look across the blended statistics for each country that I’m using to do COVID severity correlations and I see only two things that are really different. 1) Switzerland has over 1 more hospital bed per 1000 people than Italy and 2) Switzerland has a significantly lower number of people over 65 than Italy. Both of these together might make sense.

Here’s my updated Correlation Chart – Now the Slope of the Confirmed Cases is done on Confirmed Cases normalized by population.

FactorCorrelation with Slope (normalized by pop.)
Over 65 Population – %22.38%
Population Density8.70%
2019 Diabetes – % of pop.-5.89%
HIV – % of pop.-7.64%
Diabetes growth in last 10 years-8.02%
Population Growth Rate-8.76%
2019 Population-9.42%
Tuberculosis – % of pop.-12.21%
Area of Country – sq-ft-13.11%
2020 Population-13.59%

How do I capture this slope? I fit a linear regression line to the data after a country hits 50 cases (arbitrary – I’ve seen others use 100). I then take the slope of the fitted line and project that as the rate of increase of confirmed cases for that country. Of course, most of these outbreaks are non-linear, but it’s simpler to fit a line than a curve, so I believe it makes sense to approach the problem this way. Regardless, this slope is a measure of how severe the outbreak is. Now that I’m using normalized data, I think it’s more relevant and better reflects the amount of pressure COVID-19 is putting on a specific country and its population.

Lots of discussion could be had on the correlated factors and why they’re positively or negatively correlated with this COVID-19 pressure. Feel free to weigh in.

Leave a Reply

Your email address will not be published. Required fields are marked *