COVID-19: Data Normalized by Population 3/23/2020

Much of the results I’ve been showing in the last few days has been total counts of cases, deaths, etc., usually by country or state. Yesterday i decided that there’s probably enough data at this point to evaluate whether all the counts should be normalized by the country’s population.

Reasons for doing this are probably obvious… we saw a large spike in cases quickly in Hubei Province in China and then later saw a huge spike in cases in Italy. At some point last week Italy passed China in number of counts of cases, but was that really the most interesting measure? Since Italy has a much smaller population than China, the outbreak in Italy per 1000 people living there was much, much higher.

Here are the rankings (minimum population of 5 million) after being normalized for population.

Now comparisons can be made that are more relevant. Note that Italy and Switzerland have nearly the same number of Active Cases per 1000 population but Italy has about 10x the number of deaths. Why is this? I look across the blended statistics for each country that I’m using to do COVID severity correlations and I see only two things that are really different. 1) Switzerland has over 1 more hospital bed per 1000 people than Italy and 2) Switzerland has a significantly lower number of people over 65 than Italy. Both of these together might make sense.

Here’s my updated Correlation Chart – Now the Slope of the Confirmed Cases is done on Confirmed Cases normalized by population.

FactorCorrelation with Slope (normalized by pop.)
femaleSmokingRate26.93%
Over 65 Population – %22.38%
Mean_BMI_male17.00%
Population Density8.70%
totalSmokingRate6.40%
STD_BMI_female3.88%
IncomeGroup-1.06%
hospital_beds-1.56%
STD_BMI_male-1.96%
Mean_BMI_female-3.09%
2019 Diabetes – % of pop.-5.89%
HIV – % of pop.-7.64%
Diabetes growth in last 10 years-8.02%
Population Growth Rate-8.76%
2019 Population-9.42%
Tuberculosis – % of pop.-12.21%
maleSmokingRate-12.50%
Area of Country – sq-ft-13.11%
2020 Population-13.59%

How do I capture this slope? I fit a linear regression line to the data after a country hits 50 cases (arbitrary – I’ve seen others use 100). I then take the slope of the fitted line and project that as the rate of increase of confirmed cases for that country. Of course, most of these outbreaks are non-linear, but it’s simpler to fit a line than a curve, so I believe it makes sense to approach the problem this way. Regardless, this slope is a measure of how severe the outbreak is. Now that I’m using normalized data, I think it’s more relevant and better reflects the amount of pressure COVID-19 is putting on a specific country and its population.

Lots of discussion could be had on the correlated factors and why they’re positively or negatively correlated with this COVID-19 pressure. Feel free to weigh in.

COVID-19: Cases and Deaths by Latitude

I saw a chart a while back that predicted the cases/deaths by latitude. They plotted it over the world map and had colorbands describing COVID-19 potential risk. Something like that… But based on the amount of data we have, that’s a pretty wild, probably overfit, prediction.

Here’s what we DO know about cases by latitude. This chart shows total confirmed cases and deaths stacked on one bar (light blue+red) and current (yesterday) reported cases and deaths stacked on a second bar (green+dark orange).

Takeaway: Most of the cases have occurred between 30 and 60 degrees latitude. Around 40-45% of the world’s population does live in this belt, of course. But we have 92.5% of the cases in this range!

Any ideas why??

COVID-19 Update 3/21/20 – New Spikes and (small) Indicators of Recovery

I’m entering this late on 3/20 because most of the world’s data is already in. Some interesting things to discuss. First off, we saw a huge spike in Active cases in NY on 3/20. I can only hope that this spike is an anomaly and that we won’t see more like it in the US. You can see this in the bar chart showing change in the last 24 hours, but also in the time series chart immediately following. While we may have hopes that Washington and New Jerseys outbreaks MIGHT be slowing, it is clear that NY continues to accelerate.

As for the rest of the world, you can see in the chart below that Italy continues to accelerate as well. They announced around 6K new cases yesterday and over 600 new deaths. France and Spain also announced ~200 new deaths. One bright spot as an American, though, is that New York’s death rate is extremely small compared to the number of cases. When normalized by the total number of cases, it is far below that of the large European countries and even smaller ones like Switzerland.

Finally, here’s something interesting to watch. I’m looking for any country at all that might be recovering from their initial outbreak. I’ll be watching Japan for the next week or so to see if their recovery cycle-time matches the ~20 days that we saw in the Hubei province data. That might become really interesting data. Note that around the 200 case mark, we see the horizontal distance between the Confirmed line and the Recovery line is now measurable (looks like roughly 22 days. If the recoveries accelerate, then we may see a consistent 20-25 day cycle-time eventually.

COVID-19 Update – 3/19/2020

There’s not a lot new in today’s data, other than continued increases in confirmed cases and deaths (primarily in Europe). I’ll update the maps and one or two of the charts here. I’m also continuing to work on the correlation effort I posted about in a previous entry. That will improve as the data improves (i.e., more testing).

The world as the virus continues.  Italy continues to have the most active cases (and deaths), while Europe and New York state are catching up
The world as the virus continues. Italy continues to have the most active cases (and deaths), while Europe and New York state are catching up
The United states map showing overall number of active cases and new active cases in the last 24 hrs
The United states map showing overall number of active cases and new active cases in the last 24 hrs

COVID-19 Severity Correlation – Early Results

I have blended smoking rates for both males and females by country as well as the 2020 population for the countries with the data. I’m curious about how correlated these features are with the severity “slope” of confirmed cases.

Results: See correlation plot below. The severity slope is most correlated with the country’s population, then with male smoking rates. Female smoking rate is slightly negatively correlated with severity, but that could be anomalous due to the granularity of the smoking and population data (I have it by country, not by state or county).

So, what does this tell us? Smoking rates have an impact (particularly male smoking) on the severity, but not near as much as the overall country’s population (and therefore local population density is probably even more correlated).

Correlation Visualization – Best way to read is to look down (or across) the “slope” column/row. The more yellow colors are high correlation with the severity and the darker blue colors are less (or negatively) correlated. The zero color looks a whole lot like the color of “totalSmokingRate’.
Correlation Results Below
slope
slope 1.000000
pop2020 0.141106
maleSmokingRate 0.069965
totalSmokingRate 0.028203
femaleSmokingRate -0.052708

COVID-19 World Update – 3/18/2020

In the data from the rest of the world, we continue to see some of the same countries continue to increase their active cases at a high rate. Germany outpaced Italy for the first time. New York State (the first ‘US’ on the chart below) also continues to increase their case load.

COVID-19 United States Update – 3/19/2020

Quick update for new data. Yesterday saw New York state continue to soar in number of cases but Washington decrease. These are Active Cases, which are the difference between confirmed cases and recoveries, so that shows that our “WIP” (from the entry below) is decreasing in Washington. Other than this, for the US, things continue at about the same pace as the last few days.

StateActive CasesDifference – 24 hrs
New York2479786
Washington959-61
California73858
Florida30797
New Jersey2641
Louisiana25361
Massachusetts2181
Georgia19651
Colorado18224
Texas17061
Illinois1613
Pennsylvania15240
Wisconsin9221
Ohio8619
Maryland8528

COVID-19 Cases v. Recoveries – 3/18/2020

I’ve been thinking about how to assess COVID-19 Confirmed Contraction rates with Recovery rates… this might be key for pointing at societal success in addressing the epidemic. Not sure if the data is mature enough, but what I’m finding does indicate either some potential tampering with the data or more rapid societal recovery after the cases flatten off (hospitals become more effective at that point? Here are a few examples from the most affected countries.

These charts look like something we use in the manufacturing world called a cumulative flow diagram. The interesting thing about a cumulative flow diagram is that the vertical distance between the two lines equals the cycle time. In this case, that refers to the time to clear your patient load. The vertical distance between the two lines, however, equals the “work in progress”, which in this case, would be the number of patients who have the disease (active cases). This may reflect some percentage of the hospital beds in use (what percentage of active cases need hospitalization? Probably depends on the country).

The last chart compares Italy and Iran. This data pretty much makes the case that Iran is way understating their infections and/or overstating their recoveries. The chart tells us that right now, the cycle time for both nations is undetermined (we don’t have enough recoveries yet). But it also shows that the active cases for Italy is 2-3 times that of Iran. That’s just not credible, considering that their case loads started at the same time, Iran has 20 million more people than Italy, and Iran has about 40% of the hospital beds per 1000 people as does Italy. Italians do smoke 36% more cigarettes per year per capita than Iranians (but almost zero percent women smoke in Iran, vs 19% of women in Italy).

Finally, from the China chart (if we can believe it), we can see that the cycle time to clear the patient load is consistently about 20 days. The Singapore chart might give us some concern, though, as it appears that they closed their WIP to just a handful of cases, then it opened back up significantly. Perhaps more data will help us understand this better.

COVID-19 Time Series Analysis – 3/18/2020

One of the things I’m seeing a lot of discussion around is “flattening the curve” and the effects of social isolation. I’m very curious as to why some countries/regions have seen very different growth patterns in confirmed COVID-19 cases. One red herring I’ve seen a lot is the comparison of the first few days/weeks of the Italy outbreak with the first few days/weeks of the US outbreak. The reason this isn’t an honest comparison, of course, is that with an exponential curve one cannot know up front what the value of the exponent is or whether that power curve pattern will continue.

Here’s some time series analysis of the outbreaks in the top regions, first by the world then by US states.

It’s very interesting to see the differences. Assuming the data collection is accurate, it would appear that China and S. Korea have inverted their curves (looks more like a sigmoid curve now) — but for how long? Italy, Spain, and France still seem to be in their initial power curve phase, whereas Iran seems to be increasing linearly (yet another sign that their data is bad, most likely).

In the US chart I also included the Diamond Princess as a point of interest. Note how in a very small, contained sample the number of cases went flat quickly. This may point to the great value of the extreme social isolation that some countries have imposed (from reports I’ve read, Italy seems to have failed this step).

Due to the small size of the images embedded in this WordPress blog, I’m also starting to add PDF versions afterwards that can be downloaded in case you want to see higher resolution.

Time Series analysis for Top Outbreaks in the World. Data from JHU COVID-19 project.
Time Series analysis for US States AND the Diamond Princess cruise liner. Note how the Diamond Princess quickly went flat.

COVID-19 Updates – 3/18/2020

Most European countries are updating their data while we’re sleeping in Arizona. I have re-run analytics and added new ones. Things continue to get worse in Italy, Spain, and Iran. Total numbers of deaths in Italy will catch those in China today or tomorrow and Spain is catching up to Iran. Numbers of new active cases in New York state skyrocketed yesterday. New York and Washington were essentially tied yesterday. Perhaps this reflects that Washington is a week or two ahead of New York and maybe it’s new active cases are slowing down.


World Map showing the areas with the highest numbers of active COVID-19 cases along with the growth in active cases in the last 24 hrs.
US Map showing the areas with the highest numbers of active COVID-19 cases along with the growth in active cases in the last 24 hrs.
US Map showing the areas with the highest numbers of active COVID-19 cases (color – Red is worst) along with the growth in active cases in the last 24 hrs (bubble diameter)
.
Bar Chart that shows current Active cases and deaths in last 24 hrs.

Bar Chart that shows current Active cases and deaths in last 24 hrs.

Bar chart showing New active cases from the last 24 hours by state
Bar chart showing New active cases from the last 24 hours by state. New York and Washington were essentially tied yesterday…