COVID-19 Daily Update: 4/16/2020

Today I’ll share a few different views into how the outbreak is manifesting in different regions.

  1. Raw Numbers of Deaths: This is what gets the headlines, but 1000 deaths in the USA is much less severe than 500 deaths in a much smaller country. Regardless, it is a number we intrinsically understand, so we keep being bombarded by it. Normally I show deaths per 1000 population these days, but the following graph is just cumulative deaths across a number of countries. I put it up here to demonstrate what the trends are.
Cumulative deaths per country (US, China, and Iran excluded)

In the above, we see the rate of deaths per day decreasing in a good number of the hardest hit regions. Spain and Italy’s death rates have been decreasing for about a week. Note that of the 4 most affected countries, though, two (France and England) have death rates that are steadily increasing. At one point, it looked like France would be joining Italy and Spain and start decelerating its death rate, but in the last few days, we’ve seen a new spike. The next grouping of countries (Belgium, Germany Netherlands) has a much lower rate than the top four. These countries have seen similar numbers of cases to the top four, but have managed to keep the death count lower. The third grouping of countries (Brazil, Turkey, Switzerland, Sweden, Portugal) are a mix. Brazil has joined this group recently and is seeing growth in numbers. Turkey has been here for a while, but has kept the death rate low, but steadily increasing. Switzerland has had even more success in keeping their death rate low while still managing an equivalent number of cases to their neighbors. Sweden has moved up into this group recently, with their famous “no-distancing” approach possibly being a contributor. As you can see, different countries are being affected differently by this outbreak, particularly in the number of deaths, and it will be interesting to evaluate what factors contributed after this has passed on and the data improves.

2. Confirmed cases: I think we all know by this point that the confirmed cases metric is a bit inconsistent. However, I assume it’s showing us something of interest, we just need to figure out what it is. One thing I’m assuming from doing a little research is that in most countries, cases get confirmed through a similar process. First, a person gets COVID-19 symptoms, then they beg someone for a test, then they either get sent home to quarantine or they get sent into a hospital (Iceland’s the only country I’ve heard of that probably follows a different process since they’re systematically testing non-symptomatic people). In this process, there’s one common denominator, COVID-19 symptoms. So the confirmed case metric might be a proxy for the number of symptomatic people in a country. It’s probably not a good measure for hospitalized people in a country (unless that country is China and wants to keep its numbers low). In most cases, it’s hard to come by the percentage of confirmed cases that end up in the hospital, so we can’t even calcuate that interesting metric. The table below shows the current state of the world, sorted by Confirmed Cases per 1000 people. You can see lots of interesting things in this table. It makes me ask a number of questions… Why are the outcomes so different for Portugal and Spain? Portugal’s numbers are very similar to Germany’s. And what can explain the differences in the numbers between Italy and Germany? Looking at Israel, they have some of the lowest death numbers in the world. I hear their armed forces are playing a part. How is this working? Why do Iran and Turkey have such different numbers? And so on…


3. We’re starting to see case growth in South Latitudes. The chart below is only looking at how the rates of cases and deaths are growing, so they can change more quickly than overall numbers of cases and deaths. These rates can tell us where the current hotspots are. I’ll be posting this chart periodically so we can watch how COVID-19 spreads (or fails to spread) across the world. Of interest here are the rate of case growth at the far left. This largely represents New Zealand and Australia and might be showing that the conditions are starting to be more supportive of the virus in this region. The latitudes to the right continue to show the same kinds of growth. The actual data for this chart is below. Remembering that the graph below is showing rates of change, note that the deaths per 1000 people for latitudes 40-50 are still higher than any other region (although it looks like other latitude ranges might be growing faster from the chart below).

rates of change for deaths and cases across latitudes.

COVID-19 Special Update – World Data Combined with US Data

Since JHU started separating COVID-19 data into world and US categories, I have mostly been showing the data separately. Now with the US cases emerging as the worst in the world, I’m showing them combined to give people a sense of what is happening.

World Sorted by Deaths per 1000 population

World+US COVID-19 Numbers combined

Above is the data sorted into the categories that I think are the most informative. These are the ones I’ve been showing for a while. What we see here when sorted by Deaths per 1000 population is that the worst-hit US States are at the top of the list. We also see some of the European countries which had previously topped this list moving their way down the list. Of course, as a death is kind of final, the only way to move down is for someone to pass you up. Note that Sweden, who is famously not really doing social distancing is moving up the list with a fairly high rate of change in the deaths category.

World Data sorted by the Highest Death Rates.

World+US COVID-19 Data sorted by the Death Rate

The above is sorted by the slope of the Deaths per 1000 population curve (IROC_d_n), so it represents the areas where the death rate is currently the highest. Note that this number can change from day to day, so more than the Deaths per 1000 table at the top, this represents today’s status (vs. deaths that happened a week ago). In this table we can see that countries like Spain have slowing death rates. They still reported 300 Spanish deaths yesterday, but the rate is slowing. New York’s rate of change for deaths is about 4.5x greater than Spain’s right now. Since these deaths are normalized by population, this is a legitimate comparison. Also note that the change in the slope (dIROC_d_n) shows that New York and Belgium’s death rate is increasing. This means that their death rates are accelerating more than others. New Jersey is showing a much lower rate of acceleration despite having one of the largest rates. What this shows us is that the situations which create these relationships are very different across different localities.



New Active Cases and Deaths that Occurred Yesterday worldwide.

COVID-19 Daily Update: 4/14/2020

Largest numbers of Active Cases and Deaths, Normalized by Population – Non-USA

Not much has changed in a few weeks in the chart above. Italy and Spain continue to have large numbers of active cases but Switzerland has less than 1/10 of the total deaths as Spain and Italy (1/3 of the deaths of those two when normalized by population). UK has a growing number of deaths, but when normalized, the UK numbers are in the Switzerland camp, not the Italy camp.

Iceland continues to recover. Their cumulative flow diagram below shows that they’re managing to maintain a consistent number of active cases with still extremely low numbers of hospitalizations and deaths.

Iceland: Cumulative Flow Diagram of Confirmed Cases, Recoveries, and Deaths

Russia is also experiencing rapid growth in cases. Lots of concern being expressed by Boris Yeltsin. Very different from the articles from a few weeks ago when Russia was apparently able to keep their growing case numbers under wraps. See the exponential case growth in the chart below.

COVID-19 Daily Update: 4/13/2020

Map representing numbers of COVID-19 cases (color) and Case Growth rates (diameter)
Map representing numbers of COVID-19 cases (color) and Death rates (diameter)

In the above two maps, you first can see where the cases are growing fastest and then second where the death rate is increasing fastest. Obviously, case growth is occurring across the US. This is unsurprising, especially since there is more testing happening now. What we don’t really know from our data is whether these cases are symptomatic, whether they’re hospitalized, etc. The second map shows us that the deaths continue to happen in the same cluster areas, NYC/Mass/NJ/Conn, DC, New Orleans, Detroit, Chicago, Denver, Las Vegas, and Seattle. The majority of these are occurring in the NYC cluster.

State Data Table for 4/12/20

State data from 4/12

Latitudes of Cases / Deaths for US States

Cases and Deaths per 1000 by Latitude Ranges in the U

Just like with the rest of the world, the US also seems to be following the Latitude effect. Most of the cases/deaths to date have occurred between 40 and 45 degrees North. I’m currently evaluating if the fastest case/death rate growth also follows this latitude trend or not.

COVID-19 Update – 4/11/20 State Data plus Arizona Cases Flattening?



Arizona COVID-19 Case Rates 4/11/20

The above image shows the ‘S-shaped’ sigmoid curve that we’ve been hoping to see. It may indicate that the first phase of the outbreak is slowing. As you can see, the number of cases in Arizona is decelerating. Again, I hestitate to make any pronouncements, but I’ve been watching this trend for about a week. It could be that the state has been under-reporting data or some other factor could emerge. Or it might be that the infections have peaked for now and are slowing.

Across the US

Nothing much is changing for the heaviest-hit states. Slope of case rates and death rates continues to increase for these states. Vermont was showing signs of flattening out last week and has now has dropped out of the top 10. http://todnewman.com/?p=416

Detailed US State data for 4/11/20


COVID-19 Special Report: Analyzing Claims of Inflated US COVID deaths – 4/10/2020

In the last few days I’ve been hearing that the death numbers in the US were not reliable due to inflation of the numbers in the hospital. On the surface, this seems like a real possibility. The allegation is that if someone dies of Pneumonia but also has tested positive for COVID-19, the death is classified as COVID-19, even if Pneumonia might be more accurate. Evidence for this points to the Pneumonia numbers for March being lower than normal for the month. So this is concerning, because we know that the numbers of infections that are recorded are not very consistent and likely are just a fraction of the people truly infected. This, of course, is largely due to limited testing and oversampling of the symptomatic. But the death data has told the best story of the severity of the outbreak in different countries. So I decided to evaluate this myself to see if this is a realistic claim.

Unfortunately, there aren’t really clean, accessible datasets classifying deaths by county in the US across the year that a person who is doing COVID-19 research at night for fun can realistically engage with. So I had to come up with something that was close. First, I have a dataset from the Census that shows estimated populations per year as well as the numbers of deaths (and live births, and a few other things). This is done by county, which is what I was looking for. I blended this dataset into my COVID-19 death data by county and then was close.

What is missing from this is the breakdown of the causes of these deaths. This isn’t easily found by county in one dataset, so I assumed that the leading causes of deaths each year is going to be fairly consistent across the US. There may be differences in some categories (shooting deaths, farm machinery accidents, etc.), but in general the top 3-4 should be consistent. I figured I’d start by evaluating the leading causes of death in the most highly COVID-impacted county in the world, New York County (NYC).

2016 New York County Deaths by Leading Cause (https://apps.health.ny.gov/public/tabvis/PHIG_Public/lcd/reports/#county)

From the above you can see that NY County had just under 10K deaths in 2016 (about 830 deaths per month). My Census dataset confirms this number, but shows that 2018 saw over 12K deaths. Not sure what the difference was. So I’m going with the 2016 numbers for my base number of deaths. Thinking about which of these causes might be comorbidities with COVID-19, I see CLRD (Chronic Lower Respiratory Disease) as a very likely candidate and 2016 saw 304 cases (just over 25 per month). Stroke seems unlikely, as the symptoms of that condition are very different from COVID-19. Heart disease is obviously the big hitter, with 2902 deaths in 2016 (about 240 per month). Maybe heart disease deaths might be conflated with COVID-19, but it doesn’t seem very likely. Cancer at 2,526 deaths per year (210 per month) seems very unlikely to be classified as COVID-19… cancer patients often have their own hospitals, doctors, and wards, and generally are well understood as cancer for long periods of time. I’m planning to rule the cancer deaths completely out. Pneumonia is 7th on this list (not shown above) with 263 deaths in 2016 (about 22 per month). So lets say that in 2016 there were 25+240+22 deaths that could have potentially been comorbidities with COVID-19 during the 2020 outbreak. This translates to 287 cases, which is 35% of the number of deaths per month in NY County. I will then assume that the exaggeration in the COVID-19 death count will be limited to 35% higher than the true count. We’ll evaluate the raw number of deaths in each county that 35% of the total would equal and then compare to the reported 2020 COVID-19 deaths. Make sense?

Most Conservative Estimate of Potential comorbidities of COVID-19. Assumptions: 1) COVID-19 total deaths happen over 2 months, so calculating 2 months of comorbidities 2) 100% of potential comorbidities are positive for COVID-19. 3) 100% of heart disease deaths would get classified as COVID-19.

Above we see the most conservative case comparing COVID-19 reported deaths by county with the potential comorbidity numbers. The assumptions for this uber-conservative chart are that 1) these reported COVID-19 deaths occured over 2 full months. Because of this assumption, we’ll calculate 2 months worth of comorbidities, 2) 100% of potential comorbidity deaths are positive for COVID-19 and are classified as COVID-19 deaths, and 3) 100% of heart disease deaths in these counties are classified as COVID-19. What do we see? In the hardest hit county, there were 8x the number of COVID-19 deaths reported in this timeframe than the sum of the potential morbidities. So if the New York County numbers are getting inflated by questionable death accounting practices, it is only by about 700 deaths out of a total of 5820. By the way, this 5820 is about half the total deaths that New York County should expect for a normal year! See the other counties here where the COVID-19 deaths under these rigid assumptions are at least half the number of the sum of all the other potential comorbidities, including heart disease. What this is telling us is that COVID-19 has already replaced heart disease as the leading cause of death in these counties for the whole year of 2020.

Slightly Conservative Estimate of Potential comorbidities of COVID-19. Assumptions: 1) COVID-19 total deaths happen over 1.5 months, so calculating 1.5 months of comorbidities 2) 30% of potential comorbidities are positive for COVID-19. 3) 100% of heart disease deaths would get classified as a COVID-19 comorbidity if they are positive for COVID-19.

In the above table, I’ve eased the assumptions a bit to be just slightly conservative. First, I change the 2 months to 1.5, which is much more accurate for most counties. Second, and most importantly, I only assume that 30% of deaths from comorbidities during this timeframe are positive for COVID-19 (the actual number in New York state right now is still under 1% of the population, but we’ll assume that this number is 30x higher in the most susceptible populations. Now see the differences! New York County has the potential to only inflate their COVID-19 deaths by 143 on an overall number of 5820. This translates to an error of 2.4%. And most likely that’s high too.

Conclusion: There is no concern over miscalculation of COVID-19 deaths due to inflation of the numbers by counting comorbidities. I presume this concept made its way to the mainstream media news due to a desire to tell a good story or share comforting stats. Or maybe it was just a political effort. Regardless, this is why I really dislike listening the news discuss stats about this outbreak. Across the political spectrum, they’re failing to properly report numbers and statistics. Here are my suggestions if any professional news people are reading this:

  1. Focus more on the data and understanding what its limitations are and less on flashy graphics. I suspect this error may have something to do with the lack of seasoned data scientists and statisticians on the news team combined with a preponderance of less-experienced, recent grads with whiz-bang Tableau visualization skills.
  2. Stop reporting hard numbers of Cases and Deaths if you’re comparing regions. 200 deaths in California is going to be a much less severe situation than 100 deaths in Orleans Parish, Louisiana.
  3. The Death to Cases ratio is garbage. Everyone wants this number because we want to compare it to the flu. We’ve been getting the flu and measuring the number of cases for hundreds of years. We have a good statistical sample and can estimate the number of cases well. We have NO idea how many people have actually been infected by COVID-19 yet. The best numbers we have are from Iceland because they randomly sampled the whole population. Their death to cases ratio, by the way is .4%. This is a bit higher than typical flu numbers, but don’t rejoice quite yet, because Iceland’s testing and quarantine strategy seems to be keeping their death numbers low.

COVID-19 Update: 4/9/2020 Today’s US State date plus Cases and Deaths in US States by Latitude Bands

State COVID-19 Data from 4/8/2020

4/8 was another rough day for New York. It’s quite a different story between NYC and Los Angeles County, who has .176 cases per 1000 and .004 deaths per 1000. This is about 20x lower on cases than NYC and 50x lower on deaths. I’d really love to know why this huge difference exists. There aren’t a lot of clues in the news. I believe that NYC and California both issued shelter in place orders on the same day. Perhaps the virus was loose in NYC for weeks before any kind of reaction was taken by government, enabling a non-linear transmission effect to occur? If there’s any truth to the effects of latitude that I’ve been uncovering, that might come into play too. See chart below for US States’ Case and Death rates by latitude bands. The results in the US line up with the global results by latitude. Note that the US Population is nearly 2x as large in the 30-40 band, so the narrative can’t be that this effect is simply due to a large number of big cities in the range.

COVID-19 Special Upate: Correlation Study Latest Numbers – 4/8/20

Correlation with Rate of Case Growth – 4/8/20

Mechanics of Building a Correlation Matrix: In case this explanation is interesting or informative to anyone puzzling over these results, to get the above correlation relationship between various features and the rate of growth of COVID-19 cases, I built a large dataset using data from Johns Hopkins (COVID-19 data), the WHO, the World Bank, and a handful of others. In this dataset, I have each country in the world captured as rows in the dataset. Each of the Features above (plus many more) is one of the columns that goes across all of the countries. This is the basic mechanics of putting together a large correlation matrix.

What does this tell us?: First off, the above table just simply lists selected features (‘Female Smoking Rate’, etc.) and their correlation using the Python Pandas correlation function. 1.0 is perfect correlation. As these are the correlations with the feature ‘Instantaneous Rate of Change’, you can see that the correlation of ‘inst_rate_of_change’ is 1.0. It is perfectly correlated with itself. I have eliminated many features with low correlation (meaning 0, not -1) just to make this more readable. This, of course, is because if correlation is close to zero, there’s likely little information about the target (Instantaneous Rate of Change of Confirmed Cases – i.e., today’s Case Growth Rate). However, if the number is between 0.2 and 0.8, I find from years of doing this that there’s enough dependence between the target and the feature to make the case that they are related in an interesting way. Statisticians like to say (probably too often), “Correlation does not Imply Causality” — which is true — but this does not mean that correlation is not valuable as the basis for hypothesis tests for causality. That’s what we’re trying to do here… find environmental factors that might be influencing the different Case Growth Rates across the world.

Is there Anything New Here? Yes, the correlations continue to change as the Case Growth Rates change across the world. By definition, I’m correlating these factors with the current day’s instantaneous slope so the correlations should continue to change. What we’ve been seeing lately is that as the slopes continue to increase across the world the Female Smoking Rate continues to increase in its correlation with the target. I think what this indicates is that the countries with the most severe slopes (Italy, New York, Spain) are probably being hit harder by women who smoke having a higher likelihood at contracting a measurable COVID-19 case. I use the word measurable intentionally here, because these rates are probably driven by countries who are only measuring cases where people have symptoms and require some sort of care. This makes this correlation probably more like a correlation with symptomatic case rates. A subtle point, maybe. One other factor that continues to increase is the negative correlation between case growth and rates of Tuberculosis in a country. This tells us that countries with lots of TB cases have slower COVID-19 case growth rates. This was mildly puzzling to me until 2 days ago when I learned of a study showing that a TB vaccine called BCG may have anti-COVID properties (I’m summarizing broadly. Here’s the link). So that’s pretty exciting to see… even this simplistic approach may have revealed something using Data Science that was not widely known.

Correlation with Rate of Deaths – 4/8/2020

Above is the correlation of the same factors as above with the Rate of Deaths from COVID-19. Note that some of the features that are highly correlated with the Rate of Contracting the Disease are less correlated with the Rate of Deaths from the Disease. This is probably not counter-intuitive. What might be counter-intuitive is that comorbidities like Diabetes rates in a country are negatively correlated with the COVID Death Rates. All I can decide is that it might take reframing the reference point. We’re aware that diabetes, high blood pressure, etc., are contributing strongly to the deaths of individuals who are infected with COVID-19. However, this study is about countries who have high rates of Diabetes, High Blood Pressure, or Air Pollution and the correlation of those factors with the Death Rate. Therefore, it is possible that a country with high rates of Diabetes, for instance, has less people who survive that disease long enough to be affected by COVID-19. Perhaps this is a sign that the advanced health care in some countries might be contributing to the numbers of deaths, largely because susceptible people are living longer in those countries? Or perhaps this is just measuring the fact that countries with high rates of diabetes or pollution have yet to be hit by COVID-19? Time will tell.

COVID-19 Update: 4/8/2020 State data plus Updates on Iceland

State COVID-19 Data from 4/7/20 dat

I’m posting this table most every day now so people can see the changes in th enumbers from day to day. Only four states had over 100 deaths yesterday and most states are seeing their case and death rates taper off a bit. European countries are also seeing case growth slow.

Iceland Update

Cumulative Flow Diagram for Iceland, 4/7/20 data

The chart above is similar to the ones I posted yesterday for Germany and others. The recovery data is being reported again and looks very reasonable. Looking at this as a cumulative flow diagram, we can see that Iceland is maintaining a 14-16 day cycle time for clearing new cases. This is obviously being largely driven by their scientific sampling techniques where they’re getting people who are sick tested and into quarantine right away. In nearly every case in Iceland, the recovery is occurring after the 14 days of isolation/quarantine. Looking at the stats below (from Iceland’s COVID-19 portal) this shows a 2.4% hospitalization rate with only 1/3 of those hospitalized needing to go into the ICU. Interestingly, over half of their cases are diagnosed while in quarantine. There were reports from a few days ago that I haven’t seen the raw data on that indicated that only 1% of those tested were coming back positive and that 50% of the confirmed infections were asymptomatic. This doesn’t quite make sense based on the data below, so more study may be needed. Still, what Iceland represents is a society that understands how COVID-19 is truly spreading and who is able to take steps more quickly than any other country to respond to an infection.

Summary: When thinking about what the real infection rates, hospitalization rates, and death rates might be, Iceland provides one of the only scientific answers. Keep in mind that Iceland’s numbers might not reflect the numbers from other countries completely due to the fact that they’re at a high latitude where most countries are reporting lower numbers. But the fact that they have over three times the number of active cases per 1000 people that the US currently has (4.4 per 1000 compared to 1.2 per 1000) while having nearly no deaths is very interesting.

COVID-19 Update: 4/7/2020 – State Data plus Visualizations of Recovery (Cumulative Flow)

State Data Table – 4/6/2020

I’ve been posting the above data pretty much each day because it tells a pretty solid story of what is happening in the US. Trends toward lower case rates and death rates continue from about a week ago when these trends started to become noticeable. Now even New Jersey is showing a slowdown in their death rate (note that their acceleration over the last 3 days is negative right now). Many of the non-NY area states are showing death rates that are essentially linear. Hoping this is a long-term trend and not just a temporary one.

Recovery Data

A while back I showed what a Cumulative Flow diagram looked like in the manufacturing world. This diagram can also be generated using Confirmed Case data and Recovery data. For a while JHU dropped their recovery data when (I think) they figured out that some countries were gaming that data to make their societies look better (Iran, China, Vietnam…). But since Europe is entering into the recovery portion of their initial outbreak, I thought I’d show a couple of European countries that are doing well (and whose data is trustworthy).

Cumulative Flow Diagram – Switzerland

Switzerland has been a curious case for a while as it is sandwiched in between countries with high death rates but managed to keep its own death rates down. Here we see the raw numbers for their confirmed cases, recoveries, and deaths in one diagram. Things to note:

  1. When looked on the same scale as cases, deaths are very small. Keep this in mind should you be tempted to panic. The vast, vast majority of cases do NOT end in death.
  2. The horizontal distance between the orange confirmed cases line and the green recoveries line at any point is the cycle time for the outbreak. So go on the y-axis to the 5000 case mark and go over to the right on the diagram until you hit the orange line. You can see that the 5000th case happened on 3/20/2020. Now continue on the same horizontal line to the green line. The 5000th recovery happened on 4/3/2020. This shows that the cycle time to “clear” 5000 cases is 14 days. You may remember that early on we were seeing (from China and Singapore data) a cycle time of about 22 days. I wonder if we’re seeing better cycle times in Switzerland due to advanced medical care?
  3. The vertical distance between the lines is the number of active cases at any point. So do the same exercise that you did in #2 except this time go to 4/3/2020 on the x-axis. Trace up to the orange line. On that day you can see there had been 20K confirmed cases in Switzerland. Now go back down to the green line. There were 5K Recoveries that had occurred by 4/3. That means there were 15K active COVID-19 cases in Switzerland on 4/3. In the manufacturing world, we call that the “Work in Progress” or WIP. This is an indicator of how much work — in this case, COVID-19 active cases — in the pipeline.
  4. I suspect we will see the cycle time shrink a bit over the duration of this outbreak as hospitals become more productive. I also expect that we’ll see the active cases at any time shrink down too, because the virus will run out of easy targets. Both of these are good things to watch for, because they’ll indicate that we humans have asserted a bit more control over the situation.
  5. Note that the slope of the confirmed case line is decreasing. This is happening across Europe. Maybe this is a sign that the worst portion of their outbreak is behind them.

Here’s one more CFD. This time for Germany. Note the same trends.