Why Revisit This?
First, because I’m measuring correlation between these factors across the world and the current Death Rate, things change every single day. In my previous posting on this we examined the relationships between these factors and the death rate at that time. When that was done, Italy, San Marino, and Spain had the highest death rate (numbers of deaths per day) of any countries. What we saw then was a much higher correlation between the death rates at that time and Female Smoking (about 10 points higher then). The correlation between numbers of citizens over 65 and the death rate was also much higher then. This can be explained by looking at the countries that had the highest death rates at that time and realizing that they had very different demographics than the countries leading the list now. For instance, the state of New York has about 16% of its population over age 65 whereas Italy has 23% over that age. Therefore, there was a stronger relationship at that time between the age over 65 factor and the death rate. This is an example (a light one at least) of correlation that may not be causation. The fact that Italy had more people over 65 per capita did not necessarily result in those extra deaths (although it could have been causal) just as the fact that New York has less people over 65 than Italy doesn’t mean that their death rates are any smaller now. It’s just a less correlated factor now because the peak of the outbreak is in NY.
What do we learn now?
As I stated, we still see that the Age over 65 is still correlated with death rate, just not as strongly as before. The same thing applies to Female Smoking. A few weeks ago, countries with higher Female Smoking rates also had higher Death Rates. I postulated at the time that Male Smoking rates have much less variation between countries, so therefore, was less of a factor in potential causality of extra deaths. The correlation between the number of nurses per 1000 people has increased a bit, which still seems counter-intuitive. This may just be correlation without causation because the outbreak currently is peaking in countries with more nurses. If there is causation, I can’t imagine why it would be so. Male mean body mass index has remained more highly correlated than most factors and has stayed at about .15 for the last month. This may indicate that countries with higher BMI’s for men are more likely to be experiencing COVID-19 deaths with the advertised co-morbidity of Obesity. This also is consistent with the numbers from around the world that show a slightly to greatly higher percentage of COVID-19 deaths are men. This might indicate that females with a high BMI are surviving but men with a high BMI are not (since female mean BMI is less correlated with death rates). Density remains correlated with deaths as one might expect. The manners the disease is spread seems to indicate areas of high population density might be more likely to see a higher death rate. We know NYC is a very dense area (See table below) and it stands to reason that this density is correlated with the high death rates there. New York City has 8x the density of the next highest county (Nassau County), and has more than 10x the total deaths and 14x the death rate currently.
Negatively Correlated Factors – What do We Learn from These?
What has stayed the same? Temperature remains negatively correlated with the death rate at about the same level. What this tells me is that either 1) the areas affected both now and a couple of weeks ago were coincidentally at similar relative temperatures or 2) temperature does have some sort of causal effect on the death rates. This seems to have been borne out in some recent studies. Also, the negative correlation with Tuberculosis deaths has remained constant over the last few weeks. This indicates that countries with higher deaths from TB have seen lower COVID-19 death rates. Again, might be due to the fact that countries with higher TB rates have had less COVID-19 deaths due to other reasons (temperature, malaria, deaths to TB that then couldn’t die from COVID-19, etc.). However, it is interesting that this has been one of the more negatively correlated factors for a while. This indicates to me that perhaps there is a small causal effect from something due to a country’s susceptibility to TB that is affecting COVID-19 death rates. There are two studies underway to evaluate whether the BCG vaccine for TB offers some protection for COVID-19, but the WHO is cautioning that the evidence is still undetermined. Diabetes rates also are negatively correlated with COVID-19 death rates. This is surprising as Diabetes is a known co-morbidity. However, it may suggest that the areas with the highest death rates right now have a lesser issue with Diabetes rates. Perhaps this is because the regions getting hit hardest now have histories of excellent health care of patients with diabetes.
Again, there may not be much to learn from doing these correlations, but in general, this is a good practice to evaluate the sensitivity between variables and especially target variables that we care about, such as death rates. This shows a number of unsurprising correlations, some of which likely have some element of causality for COVID-19 deaths (but probably not a high rate of causality). It also reveals some surprising correlations that might present opportunities for further research and evaluation. This is sometimes how great breakthroughs are discovered because they can give us better understanding of the likelihoods of our prior beliefs about a subject. Sometimes (maybe even often) our priors are wrong and unevaluated until we look at the data holistically. This can help break us out of groupthink that is driven by emotional responses and not data-driven responses.