Category: COVID-19

  • COVID-19 Special Update – Can Unsupervised Machine Learning Predict Outbreaks?

    Maybe that’s a provocative title, but one of the questions I’m exceptionally curious about is if measurable factors about a locality can be used to predict the locality’s response to a COVID-19 outbreak. I’ve attacked this through a correlation study using features measured by WHO and the World Bank (see LINK here). This project is another attempt to address this question.

    Background

    The Census has a feature online called QuickFacts. This is a really nice tool where you can pull a lot of information about localities in the US (cities, states, counties, etc.). This information covers broad areas of each locality and consists of elements like population, age/race demographics, housing, family/living arrangements, computer/internet access, education, health, economy, transportation, income, business info, and geography/density. As you can see, this amounts to a whole lot of data about specific localities. See image below. The downside of this tool is I haven’t yet found a way to automate the pulling of data, so I had to collect this data on a number of carefully selected counties by hand. My data collection strategy consisted of ensuring I captured data on counties with a wide range of COVID-19 impact as well as counties of different sizes and types. Once I captured a number of counties in the QuickFacts tool I then blended in my data for the Deaths per 1000 population statistic for that county.

    Technique

    Unsupervised Learning is a form of machine learning which allows one to find hidden structure in data when there isn’t a natural label present. I chose this approach to evaluate whether the Census QuickFact data could be used to build a predictive model for COVID-19 impact because it provides a more visual and explainable way of evaluating the predictive model. Also, I can demonstrate results well despite a small dataset. Both of these reasons should hopefully become more evident a few frames down. QuickFacts provides me 65 different data features for each locality, and this is way too much data to evaluate as one would with normal visualization-based analytics. In general, the human brain is wired for three dimensions of data (x, y, and z; also length, width, height). This is why 3D visualizations are easily consumed by humans. Add a few more dimensions of data, however, and it becomes very hard for our brains to see the patterns. To get around this problem and create a model that lends itself well to human visualization, the first step I take in my approach is running an algorithm called Principal Components Analysis. PCA is a technique that in a nutshell can take X features of data and provide the user with n uncorrelated features. In my case, X is 65 and I choose n to be 2, which will allow me to put the data into a 2D plot. This is a very clever trick that was invented by the great statistician Karl Pearson over 100 years ago. The downside is that when I do a plot where the X axis is Principal Component 1 and the Y axis is Principal Component 2, there’s no obvious mapping of the X-Y relationship in my mind because I have no idea what PC1 and PC2 represent other than orthogonal views of my 65 data features. What you have to keep in mind, though, is that even though we can’t explain to our boss what this relationship really means, we DO know that the Principal Component space represents real information and variation on information from all of those 65 features. If you believe me that the location of a datapoint (a county in our case) in PC-space is important, then you can start to understand why this approach is useful. If you look in the diagram below, this is what plotting these 65 features crunched into 2 Principal Components looks like. To make it clearer which of the datapoints are most similar, I also run an algorithm called K-Means, which is a simple unsupervised learning clustering algorithm where I tell it that I believe there will be X clusters (I chose 6 for this example) and it fits the data to that number of clusters. The clusters are identified on the chart below by the large blue numbers. Note that the crude red and green enclosures and the “Heavily Affected” and “Lightly Affected” labels are done by hand after the plot is generated.

    What the Unsupervised Learning Tells us

    When I run this algorithm and build this plot, I can see a clear boundary between the counties on the left of the diagram and the counties on the right. At this point, I won’t know what that means until I do a further evaluation, which I show below. I dump all my data including cluster ID’s into a table and then blend in the Deaths per 1000 population numbers for these counties.

    Once I sort the data by cluster and apply conditional formatting to the Deaths per 1000 column, I can see a crude trend emerge. In clusters 0, 1, and4 I see more COVID-19 impact than in 2, 3, and 5. Noting this and returning to the PCA chart, you can see that the more heavily affected clusters are on the left side of the chart and the more lightly affected clusters are to the right.

    Of course there are exceptions and strangeness that I can’t readily explain here… Maricopa County is clustered with two other large cities (Chicago and Seattle), both of which were hard hit. But when I look at that cluster, it’s not exceptionally tight… there is some Principal Component “distance” between all three. I believe this distance is meaningful. Another strange cluster is number 4, which includes a number of lightly hit suburbs outside the Northeast and the worst-hit county in America, New York. This explains perhaps why it is on the same side of the chart with the more heavily-hit clusters, but I have no idea why they’re together. There’s a reason, but I can’t decipher it without a lot of digging (which I just don’t have time to indulge in). However, overall, this is an interesting trend.

    How this could be used

    IF I was able to collect significantly more data and I continued to see this trend where location on the PC graph had strong correlation with deaths, then I could run PCA on a number of counties that had very few COVID-19 cases and evaluate where they landed on the PC graph. If a county landed in the area occupied by a hard-hit cluster of counties, there’s an indicator that that county may have similar characteristics to those counties and might be at greater risk to COVID-19. Not a certainty, but even an indicator of risk might trigger extra precautions (and even save lives).

    Other Work I’ve done on This Idea

    I mentioned that my notion is that the PC distance between counties might also represent something real and have separate correlation with death rates. I did a quick experiment where I calculated the PC distance between each county using the Pythagorean theorem and then graphed the difference in Deaths per 1000 for two counties against the PC distance between those counties. The results are a bit noisy, but I’ll paste the overall results below for you to review. As you can see, there are three major outliers… NYC, which has been crazily hard-hit and Arizona/LA, both of which have been lightly-hit. The coefficient of determination (R2) of .12 tells me that the trend line in the lower portion of the chart is not a good fit. My eyes tell me the same thing… Therefore, I can’t create a good model that relates the Death Rate to the Distance using all the data. I tried different things like removing the outliers and essentially, the trend line on the data in the lower left of this chart gets about as high as a R2 of 0.45, which is interesting, but certainly not compelling.

    Stuff that Remains

    I’d like to collect more data and do so as the COVID-19 outbreak progresses. There MAY be a better relationship between the deaths and the PC distance, but we may not be able to see it until the disease progresses further. I might spend some calories looking into automating the pull of the census quickfacts data. It’s too time-consuming to do this manually to get the kind of data I think we need.

    Supervised learning. There are additional approaches using supervised learning we can try to map the quickfacts features to the deaths per 1000 label. This could also be used to build a predictive model. I chose the Unsupervised approach first so I could demonstrate it with better visualizations, but I have much better algorithms at my disposal using supervised learning. This needs to wait for more data, unfortunately, so stay tuned.

  • COVID-19 Daily Update: 4/16/2020

    Today I’ll share a few different views into how the outbreak is manifesting in different regions.

    1. Raw Numbers of Deaths: This is what gets the headlines, but 1000 deaths in the USA is much less severe than 500 deaths in a much smaller country. Regardless, it is a number we intrinsically understand, so we keep being bombarded by it. Normally I show deaths per 1000 population these days, but the following graph is just cumulative deaths across a number of countries. I put it up here to demonstrate what the trends are.
    Cumulative deaths per country (US, China, and Iran excluded)

    In the above, we see the rate of deaths per day decreasing in a good number of the hardest hit regions. Spain and Italy’s death rates have been decreasing for about a week. Note that of the 4 most affected countries, though, two (France and England) have death rates that are steadily increasing. At one point, it looked like France would be joining Italy and Spain and start decelerating its death rate, but in the last few days, we’ve seen a new spike. The next grouping of countries (Belgium, Germany Netherlands) has a much lower rate than the top four. These countries have seen similar numbers of cases to the top four, but have managed to keep the death count lower. The third grouping of countries (Brazil, Turkey, Switzerland, Sweden, Portugal) are a mix. Brazil has joined this group recently and is seeing growth in numbers. Turkey has been here for a while, but has kept the death rate low, but steadily increasing. Switzerland has had even more success in keeping their death rate low while still managing an equivalent number of cases to their neighbors. Sweden has moved up into this group recently, with their famous “no-distancing” approach possibly being a contributor. As you can see, different countries are being affected differently by this outbreak, particularly in the number of deaths, and it will be interesting to evaluate what factors contributed after this has passed on and the data improves.

    2. Confirmed cases: I think we all know by this point that the confirmed cases metric is a bit inconsistent. However, I assume it’s showing us something of interest, we just need to figure out what it is. One thing I’m assuming from doing a little research is that in most countries, cases get confirmed through a similar process. First, a person gets COVID-19 symptoms, then they beg someone for a test, then they either get sent home to quarantine or they get sent into a hospital (Iceland’s the only country I’ve heard of that probably follows a different process since they’re systematically testing non-symptomatic people). In this process, there’s one common denominator, COVID-19 symptoms. So the confirmed case metric might be a proxy for the number of symptomatic people in a country. It’s probably not a good measure for hospitalized people in a country (unless that country is China and wants to keep its numbers low). In most cases, it’s hard to come by the percentage of confirmed cases that end up in the hospital, so we can’t even calcuate that interesting metric. The table below shows the current state of the world, sorted by Confirmed Cases per 1000 people. You can see lots of interesting things in this table. It makes me ask a number of questions… Why are the outcomes so different for Portugal and Spain? Portugal’s numbers are very similar to Germany’s. And what can explain the differences in the numbers between Italy and Germany? Looking at Israel, they have some of the lowest death numbers in the world. I hear their armed forces are playing a part. How is this working? Why do Iran and Turkey have such different numbers? And so on…


    3. We’re starting to see case growth in South Latitudes. The chart below is only looking at how the rates of cases and deaths are growing, so they can change more quickly than overall numbers of cases and deaths. These rates can tell us where the current hotspots are. I’ll be posting this chart periodically so we can watch how COVID-19 spreads (or fails to spread) across the world. Of interest here are the rate of case growth at the far left. This largely represents New Zealand and Australia and might be showing that the conditions are starting to be more supportive of the virus in this region. The latitudes to the right continue to show the same kinds of growth. The actual data for this chart is below. Remembering that the graph below is showing rates of change, note that the deaths per 1000 people for latitudes 40-50 are still higher than any other region (although it looks like other latitude ranges might be growing faster from the chart below).

    rates of change for deaths and cases across latitudes.
  • COVID-19 Special Update – World Data Combined with US Data

    Since JHU started separating COVID-19 data into world and US categories, I have mostly been showing the data separately. Now with the US cases emerging as the worst in the world, I’m showing them combined to give people a sense of what is happening.

    World Sorted by Deaths per 1000 population

    World+US COVID-19 Numbers combined

    Above is the data sorted into the categories that I think are the most informative. These are the ones I’ve been showing for a while. What we see here when sorted by Deaths per 1000 population is that the worst-hit US States are at the top of the list. We also see some of the European countries which had previously topped this list moving their way down the list. Of course, as a death is kind of final, the only way to move down is for someone to pass you up. Note that Sweden, who is famously not really doing social distancing is moving up the list with a fairly high rate of change in the deaths category.

    World Data sorted by the Highest Death Rates.

    World+US COVID-19 Data sorted by the Death Rate

    The above is sorted by the slope of the Deaths per 1000 population curve (IROC_d_n), so it represents the areas where the death rate is currently the highest. Note that this number can change from day to day, so more than the Deaths per 1000 table at the top, this represents today’s status (vs. deaths that happened a week ago). In this table we can see that countries like Spain have slowing death rates. They still reported 300 Spanish deaths yesterday, but the rate is slowing. New York’s rate of change for deaths is about 4.5x greater than Spain’s right now. Since these deaths are normalized by population, this is a legitimate comparison. Also note that the change in the slope (dIROC_d_n) shows that New York and Belgium’s death rate is increasing. This means that their death rates are accelerating more than others. New Jersey is showing a much lower rate of acceleration despite having one of the largest rates. What this shows us is that the situations which create these relationships are very different across different localities.



    New Active Cases and Deaths that Occurred Yesterday worldwide.
  • COVID-19 Daily Update: 4/14/2020

    Largest numbers of Active Cases and Deaths, Normalized by Population – Non-USA

    Not much has changed in a few weeks in the chart above. Italy and Spain continue to have large numbers of active cases but Switzerland has less than 1/10 of the total deaths as Spain and Italy (1/3 of the deaths of those two when normalized by population). UK has a growing number of deaths, but when normalized, the UK numbers are in the Switzerland camp, not the Italy camp.

    Iceland continues to recover. Their cumulative flow diagram below shows that they’re managing to maintain a consistent number of active cases with still extremely low numbers of hospitalizations and deaths.

    Iceland: Cumulative Flow Diagram of Confirmed Cases, Recoveries, and Deaths

    Russia is also experiencing rapid growth in cases. Lots of concern being expressed by Boris Yeltsin. Very different from the articles from a few weeks ago when Russia was apparently able to keep their growing case numbers under wraps. See the exponential case growth in the chart below.

  • COVID-19 Daily Update: 4/13/2020

    Map representing numbers of COVID-19 cases (color) and Case Growth rates (diameter)
    Map representing numbers of COVID-19 cases (color) and Death rates (diameter)

    In the above two maps, you first can see where the cases are growing fastest and then second where the death rate is increasing fastest. Obviously, case growth is occurring across the US. This is unsurprising, especially since there is more testing happening now. What we don’t really know from our data is whether these cases are symptomatic, whether they’re hospitalized, etc. The second map shows us that the deaths continue to happen in the same cluster areas, NYC/Mass/NJ/Conn, DC, New Orleans, Detroit, Chicago, Denver, Las Vegas, and Seattle. The majority of these are occurring in the NYC cluster.

    State Data Table for 4/12/20

    State data from 4/12

    Latitudes of Cases / Deaths for US States

    Cases and Deaths per 1000 by Latitude Ranges in the U

    Just like with the rest of the world, the US also seems to be following the Latitude effect. Most of the cases/deaths to date have occurred between 40 and 45 degrees North. I’m currently evaluating if the fastest case/death rate growth also follows this latitude trend or not.

  • COVID-19 Update – 4/11/20 State Data plus Arizona Cases Flattening?



    Arizona COVID-19 Case Rates 4/11/20

    The above image shows the ‘S-shaped’ sigmoid curve that we’ve been hoping to see. It may indicate that the first phase of the outbreak is slowing. As you can see, the number of cases in Arizona is decelerating. Again, I hestitate to make any pronouncements, but I’ve been watching this trend for about a week. It could be that the state has been under-reporting data or some other factor could emerge. Or it might be that the infections have peaked for now and are slowing.

    Across the US

    Nothing much is changing for the heaviest-hit states. Slope of case rates and death rates continues to increase for these states. Vermont was showing signs of flattening out last week and has now has dropped out of the top 10. https://todnewman.com/?p=416

    Detailed US State data for 4/11/20


  • COVID-19 Special Report: Analyzing Claims of Inflated US COVID deaths – 4/10/2020

    In the last few days I’ve been hearing that the death numbers in the US were not reliable due to inflation of the numbers in the hospital. On the surface, this seems like a real possibility. The allegation is that if someone dies of Pneumonia but also has tested positive for COVID-19, the death is classified as COVID-19, even if Pneumonia might be more accurate. Evidence for this points to the Pneumonia numbers for March being lower than normal for the month. So this is concerning, because we know that the numbers of infections that are recorded are not very consistent and likely are just a fraction of the people truly infected. This, of course, is largely due to limited testing and oversampling of the symptomatic. But the death data has told the best story of the severity of the outbreak in different countries. So I decided to evaluate this myself to see if this is a realistic claim.

    Unfortunately, there aren’t really clean, accessible datasets classifying deaths by county in the US across the year that a person who is doing COVID-19 research at night for fun can realistically engage with. So I had to come up with something that was close. First, I have a dataset from the Census that shows estimated populations per year as well as the numbers of deaths (and live births, and a few other things). This is done by county, which is what I was looking for. I blended this dataset into my COVID-19 death data by county and then was close.

    What is missing from this is the breakdown of the causes of these deaths. This isn’t easily found by county in one dataset, so I assumed that the leading causes of deaths each year is going to be fairly consistent across the US. There may be differences in some categories (shooting deaths, farm machinery accidents, etc.), but in general the top 3-4 should be consistent. I figured I’d start by evaluating the leading causes of death in the most highly COVID-impacted county in the world, New York County (NYC).

    2016 New York County Deaths by Leading Cause (https://apps.health.ny.gov/public/tabvis/PHIG_Public/lcd/reports/#county)

    From the above you can see that NY County had just under 10K deaths in 2016 (about 830 deaths per month). My Census dataset confirms this number, but shows that 2018 saw over 12K deaths. Not sure what the difference was. So I’m going with the 2016 numbers for my base number of deaths. Thinking about which of these causes might be comorbidities with COVID-19, I see CLRD (Chronic Lower Respiratory Disease) as a very likely candidate and 2016 saw 304 cases (just over 25 per month). Stroke seems unlikely, as the symptoms of that condition are very different from COVID-19. Heart disease is obviously the big hitter, with 2902 deaths in 2016 (about 240 per month). Maybe heart disease deaths might be conflated with COVID-19, but it doesn’t seem very likely. Cancer at 2,526 deaths per year (210 per month) seems very unlikely to be classified as COVID-19… cancer patients often have their own hospitals, doctors, and wards, and generally are well understood as cancer for long periods of time. I’m planning to rule the cancer deaths completely out. Pneumonia is 7th on this list (not shown above) with 263 deaths in 2016 (about 22 per month). So lets say that in 2016 there were 25+240+22 deaths that could have potentially been comorbidities with COVID-19 during the 2020 outbreak. This translates to 287 cases, which is 35% of the number of deaths per month in NY County. I will then assume that the exaggeration in the COVID-19 death count will be limited to 35% higher than the true count. We’ll evaluate the raw number of deaths in each county that 35% of the total would equal and then compare to the reported 2020 COVID-19 deaths. Make sense?

    Most Conservative Estimate of Potential comorbidities of COVID-19. Assumptions: 1) COVID-19 total deaths happen over 2 months, so calculating 2 months of comorbidities 2) 100% of potential comorbidities are positive for COVID-19. 3) 100% of heart disease deaths would get classified as COVID-19.

    Above we see the most conservative case comparing COVID-19 reported deaths by county with the potential comorbidity numbers. The assumptions for this uber-conservative chart are that 1) these reported COVID-19 deaths occured over 2 full months. Because of this assumption, we’ll calculate 2 months worth of comorbidities, 2) 100% of potential comorbidity deaths are positive for COVID-19 and are classified as COVID-19 deaths, and 3) 100% of heart disease deaths in these counties are classified as COVID-19. What do we see? In the hardest hit county, there were 8x the number of COVID-19 deaths reported in this timeframe than the sum of the potential morbidities. So if the New York County numbers are getting inflated by questionable death accounting practices, it is only by about 700 deaths out of a total of 5820. By the way, this 5820 is about half the total deaths that New York County should expect for a normal year! See the other counties here where the COVID-19 deaths under these rigid assumptions are at least half the number of the sum of all the other potential comorbidities, including heart disease. What this is telling us is that COVID-19 has already replaced heart disease as the leading cause of death in these counties for the whole year of 2020.

    Slightly Conservative Estimate of Potential comorbidities of COVID-19. Assumptions: 1) COVID-19 total deaths happen over 1.5 months, so calculating 1.5 months of comorbidities 2) 30% of potential comorbidities are positive for COVID-19. 3) 100% of heart disease deaths would get classified as a COVID-19 comorbidity if they are positive for COVID-19.

    In the above table, I’ve eased the assumptions a bit to be just slightly conservative. First, I change the 2 months to 1.5, which is much more accurate for most counties. Second, and most importantly, I only assume that 30% of deaths from comorbidities during this timeframe are positive for COVID-19 (the actual number in New York state right now is still under 1% of the population, but we’ll assume that this number is 30x higher in the most susceptible populations. Now see the differences! New York County has the potential to only inflate their COVID-19 deaths by 143 on an overall number of 5820. This translates to an error of 2.4%. And most likely that’s high too.

    Conclusion: There is no concern over miscalculation of COVID-19 deaths due to inflation of the numbers by counting comorbidities. I presume this concept made its way to the mainstream media news due to a desire to tell a good story or share comforting stats. Or maybe it was just a political effort. Regardless, this is why I really dislike listening the news discuss stats about this outbreak. Across the political spectrum, they’re failing to properly report numbers and statistics. Here are my suggestions if any professional news people are reading this:

    1. Focus more on the data and understanding what its limitations are and less on flashy graphics. I suspect this error may have something to do with the lack of seasoned data scientists and statisticians on the news team combined with a preponderance of less-experienced, recent grads with whiz-bang Tableau visualization skills.
    2. Stop reporting hard numbers of Cases and Deaths if you’re comparing regions. 200 deaths in California is going to be a much less severe situation than 100 deaths in Orleans Parish, Louisiana.
    3. The Death to Cases ratio is garbage. Everyone wants this number because we want to compare it to the flu. We’ve been getting the flu and measuring the number of cases for hundreds of years. We have a good statistical sample and can estimate the number of cases well. We have NO idea how many people have actually been infected by COVID-19 yet. The best numbers we have are from Iceland because they randomly sampled the whole population. Their death to cases ratio, by the way is .4%. This is a bit higher than typical flu numbers, but don’t rejoice quite yet, because Iceland’s testing and quarantine strategy seems to be keeping their death numbers low.
  • COVID-19 Update: 4/9/2020 Today’s US State date plus Cases and Deaths in US States by Latitude Bands

    State COVID-19 Data from 4/8/2020

    4/8 was another rough day for New York. It’s quite a different story between NYC and Los Angeles County, who has .176 cases per 1000 and .004 deaths per 1000. This is about 20x lower on cases than NYC and 50x lower on deaths. I’d really love to know why this huge difference exists. There aren’t a lot of clues in the news. I believe that NYC and California both issued shelter in place orders on the same day. Perhaps the virus was loose in NYC for weeks before any kind of reaction was taken by government, enabling a non-linear transmission effect to occur? If there’s any truth to the effects of latitude that I’ve been uncovering, that might come into play too. See chart below for US States’ Case and Death rates by latitude bands. The results in the US line up with the global results by latitude. Note that the US Population is nearly 2x as large in the 30-40 band, so the narrative can’t be that this effect is simply due to a large number of big cities in the range.

  • COVID-19 Special Upate: Correlation Study Latest Numbers – 4/8/20

    Correlation with Rate of Case Growth – 4/8/20

    Mechanics of Building a Correlation Matrix: In case this explanation is interesting or informative to anyone puzzling over these results, to get the above correlation relationship between various features and the rate of growth of COVID-19 cases, I built a large dataset using data from Johns Hopkins (COVID-19 data), the WHO, the World Bank, and a handful of others. In this dataset, I have each country in the world captured as rows in the dataset. Each of the Features above (plus many more) is one of the columns that goes across all of the countries. This is the basic mechanics of putting together a large correlation matrix.

    What does this tell us?: First off, the above table just simply lists selected features (‘Female Smoking Rate’, etc.) and their correlation using the Python Pandas correlation function. 1.0 is perfect correlation. As these are the correlations with the feature ‘Instantaneous Rate of Change’, you can see that the correlation of ‘inst_rate_of_change’ is 1.0. It is perfectly correlated with itself. I have eliminated many features with low correlation (meaning 0, not -1) just to make this more readable. This, of course, is because if correlation is close to zero, there’s likely little information about the target (Instantaneous Rate of Change of Confirmed Cases – i.e., today’s Case Growth Rate). However, if the number is between 0.2 and 0.8, I find from years of doing this that there’s enough dependence between the target and the feature to make the case that they are related in an interesting way. Statisticians like to say (probably too often), “Correlation does not Imply Causality” — which is true — but this does not mean that correlation is not valuable as the basis for hypothesis tests for causality. That’s what we’re trying to do here… find environmental factors that might be influencing the different Case Growth Rates across the world.

    Is there Anything New Here? Yes, the correlations continue to change as the Case Growth Rates change across the world. By definition, I’m correlating these factors with the current day’s instantaneous slope so the correlations should continue to change. What we’ve been seeing lately is that as the slopes continue to increase across the world the Female Smoking Rate continues to increase in its correlation with the target. I think what this indicates is that the countries with the most severe slopes (Italy, New York, Spain) are probably being hit harder by women who smoke having a higher likelihood at contracting a measurable COVID-19 case. I use the word measurable intentionally here, because these rates are probably driven by countries who are only measuring cases where people have symptoms and require some sort of care. This makes this correlation probably more like a correlation with symptomatic case rates. A subtle point, maybe. One other factor that continues to increase is the negative correlation between case growth and rates of Tuberculosis in a country. This tells us that countries with lots of TB cases have slower COVID-19 case growth rates. This was mildly puzzling to me until 2 days ago when I learned of a study showing that a TB vaccine called BCG may have anti-COVID properties (I’m summarizing broadly. Here’s the link). So that’s pretty exciting to see… even this simplistic approach may have revealed something using Data Science that was not widely known.

    Correlation with Rate of Deaths – 4/8/2020

    Above is the correlation of the same factors as above with the Rate of Deaths from COVID-19. Note that some of the features that are highly correlated with the Rate of Contracting the Disease are less correlated with the Rate of Deaths from the Disease. This is probably not counter-intuitive. What might be counter-intuitive is that comorbidities like Diabetes rates in a country are negatively correlated with the COVID Death Rates. All I can decide is that it might take reframing the reference point. We’re aware that diabetes, high blood pressure, etc., are contributing strongly to the deaths of individuals who are infected with COVID-19. However, this study is about countries who have high rates of Diabetes, High Blood Pressure, or Air Pollution and the correlation of those factors with the Death Rate. Therefore, it is possible that a country with high rates of Diabetes, for instance, has less people who survive that disease long enough to be affected by COVID-19. Perhaps this is a sign that the advanced health care in some countries might be contributing to the numbers of deaths, largely because susceptible people are living longer in those countries? Or perhaps this is just measuring the fact that countries with high rates of diabetes or pollution have yet to be hit by COVID-19? Time will tell.

  • COVID-19 Update: 4/8/2020 State data plus Updates on Iceland

    State COVID-19 Data from 4/7/20 dat

    I’m posting this table most every day now so people can see the changes in th enumbers from day to day. Only four states had over 100 deaths yesterday and most states are seeing their case and death rates taper off a bit. European countries are also seeing case growth slow.

    Iceland Update

    Cumulative Flow Diagram for Iceland, 4/7/20 data

    The chart above is similar to the ones I posted yesterday for Germany and others. The recovery data is being reported again and looks very reasonable. Looking at this as a cumulative flow diagram, we can see that Iceland is maintaining a 14-16 day cycle time for clearing new cases. This is obviously being largely driven by their scientific sampling techniques where they’re getting people who are sick tested and into quarantine right away. In nearly every case in Iceland, the recovery is occurring after the 14 days of isolation/quarantine. Looking at the stats below (from Iceland’s COVID-19 portal) this shows a 2.4% hospitalization rate with only 1/3 of those hospitalized needing to go into the ICU. Interestingly, over half of their cases are diagnosed while in quarantine. There were reports from a few days ago that I haven’t seen the raw data on that indicated that only 1% of those tested were coming back positive and that 50% of the confirmed infections were asymptomatic. This doesn’t quite make sense based on the data below, so more study may be needed. Still, what Iceland represents is a society that understands how COVID-19 is truly spreading and who is able to take steps more quickly than any other country to respond to an infection.

    Summary: When thinking about what the real infection rates, hospitalization rates, and death rates might be, Iceland provides one of the only scientific answers. Keep in mind that Iceland’s numbers might not reflect the numbers from other countries completely due to the fact that they’re at a high latitude where most countries are reporting lower numbers. But the fact that they have over three times the number of active cases per 1000 people that the US currently has (4.4 per 1000 compared to 1.2 per 1000) while having nearly no deaths is very interesting.