Update: Soccer Analytics in Practice at the Youth Club Level

I took a bit of a break from this series to go off and capture data. My goal was to see if an xG and Luck-based approach to measurement would be useful at the youth club level. Here’s a quick report on my approach (it is reusable) and the results so far.

Approach

  1. I built a data input sheet that can be taken to soccer games and used without great knowledge of statistics and soccer.
  2. The data input sheet has instructions on the bottom right corner. It’s as simple as putting an ‘x’ on the page where you estimate a shot was taken by your team and an ‘o’ on the page where you estimate a shot was taken by the opposition. If the shot is “on goal” (I make this simple by saying it is on goal if a) it’s a score, b) the goalie touches it, or c) it hits the goal frame) I put a check-mark next to the x or o. If the shot results in a goal, I put a circle around the ‘x’ or the ‘o’. It’s about that easy. Sometimes I’ll put notes near the marks. I like to identify if the shot was the result of a penalty and is a free kick (‘fk’). I also put the scorer’s name near goal marks.
  3. After the game, I add up shots on goal for each team and then multiply each shot on goal by the probability of goal in the region it was taken. You can see the legend shows a range of probabilities. I actually use these probabilities starting with the lowest (green)… [0.05, 0.1, 0.15, 0.3, 0.5, 0.7, 0.8]. My approach is NOT to include penalty kicks in this process because in my opinion, PK’s don’t really speak to what I’m trying to measure, which is expected goals and luck. You might say it demonstrates luck to even get a PK, and I’d agree, but that’s a different kind of luck in my opinion.
  4. The total sum of the shots on goal times their probability of goal number equals the team’s expected goals (xG). Luck is calculated by the actual score minus the xG number. See below for an example scored game.
Example xG score tracker. My team won this game 2-0.

Season Results so Far

I’ve been able to easily collect these metrics so far this season. I believe it is easy enough to delegate to a student team manager (I call them my ‘statistician’) in the Fall season for the school team that I coach. Below are the results for our 2023 club season so far.

Table of Results 2023
Area chart of Results. Wins are to the Left, Losses to the Right.

Analysis

Here are a few things that are probably obvious.

  1. xG appears to be a strong predictor of a win. Note how the higher (light purple) xG for FC Tucson tends to be stronger in wins (left side of the plot) and lower on the right side (ties and losses).
  2. They say you make your own luck, but perhaps sometimes it’s just outside of your control (note my previous analysis of luck due to venues and officiating). Maybe just knowing that the luck might be tilted against your team is positive.
  3. Sometimes you make your own bad luck too… In the game on the plot where the FC Tucson team showed the most bad luck (Slammers FC), our team totally dominated the game in all aspects. Shots on Goal, Possession, and xG. But some of our bad luck was due to the fact that the Slammer’s best players were defenders and our shots were taken further away.
  4. This is a big takeaway… THE SHOT CHARTS ARE REALLY VALUABLE! Though I don’t actually coach this club team I have already been able to sit down with players and parents at their request and describe the flow of the game along with areas where our shot choices were driven by our inattention or even to defensive schemes of the opposition.

Even More Analysis of Soccer Outcomes using the Luck Metric!

Referee Images - Free Download on Freepik
referee image from https://www.freepik.com/free-photos-vectors/referee

In the previous entry in this series (see link here) we studied if it was possible that the different styles of playing surface might actually be correlated with increased or decreased luck. In this series, we define luck as the number of actual goals in a game that either exceed or fall short of the expected goals metric (which relies on statistical measures of the likelihood of goals given certain measurable activities in a game). We look at luck for both home teams and away teams, because we all intrinsically understand that home teams tend to have better luck when playing at home. See the previous entry to read about what we found.

Today, we’ll do one more evaluation (probably not the last, but definitely one that piques my interest) to determine if individual head referees presence in a game is correlated with greater or lesser luck. The reason we look at head referees only is that I have a data source which lists who the head referee is for a very large set of games. In theory, the head referee controls the flow of the game and contributes the most to uncertainty of the outcome. We’ll look at officiating for both the MLS and the Premier League to see if 1) certain refs affect the luck metric more often and 2) if the impact to luck is significant or not.

MLS 2023

The methodology here in general is to evaluate how both “home luck” and “away luck” can be grouped across the individual head referees. Then we take the mean value of luck and also the standard deviation (how much the luck values tended to vary from game to game that the individual officiated). These are plotted in a similar way to how we plotted the field surface plots. Keep in mind that we’re attempting to describe the entire distribution of games that an individual official was part of. Since we make the assumption that this distribution follows a Gaussian distribution (bell curve) we believe we can describe the impact across all the games with just the mean luck value and the standard distribution. Below we can see the results, where the square describes the mean and the lines describe the variation.

Home Team “Luck” distribution by Head Referee 2023
Away Team “Luck” distribution by Head Referee 2023

Analysis: I’ll analyze the 2023 MLS results and then leave the analysis of the 2022 MLS games and the Premier League games to the reader. What do I see?

  1. Remember we’re evaluating how each official impacts home and away luck (remember, luck describes actual goals in excess of the number of goals we statistically predict using the expected goals metric). We see very different mean values of luck for the individual referees, but it is hard to say that the extra “lucky” goals are causal due to the participation of the referee. It takes much more work than this kind of statistical analysis to determine causality. It could be that the referree’s impact is actually correlated with some other event that is more causal about the lucky or unluckiness experienced. That’s just statistician speech to make sure we don’t all grab pitchforks and torches!
  2. We do see that certain referees are more likely to be associated with higher or lower luck. Some of the referees’ results are close to the average (mean) of the entire distribution of referees. These are mostly the ones in the middle. This means that statistically, the luck experienced during the games are about the same when these “middle” refs officiate.
  3. However, there are officials off on the edge who do seem to have a statistically higher impact on the luck experienced in the games. The two in red (on the home luck chart) actually are two standard deviations away from the mean value of luck across all the refs. This means that their luck outcomes are different than 95% of all the other referees. This tends to show that the fact that these refs are way out on the edges is not due to chance, but is actually due to something the refs are doing different.
  4. Note that there are 4 or 5 refs in the Home Luck chart who have a mean impact in their games of close to one goal! You can see that the error bars for these refs vary, but at least one seems to almost always have a one goal impact on a game. One could say that these officials are more likely to give penalty kicks in the box (a very high probability of a score). That might be a good guess, because the expected goals metric that I use actually excludes penalty shots (because they’re random — and therefore “lucky” — events that cannot be predicted). But maybe this metric shows that with certain referees, penalty shots are more probable and are therefore less random.
  5. Another interesting thing to see is that the luck impact across all the officials is much lower for the away teams. This means that away teams are less likely to be impacted by the presence of individual officials. This is probably reasonable to assert, given the notion that officials in every sport probably have an unconscious bias for the home team (whose fans are screaming about any calls that go against their teams).
  6. We do see one official whose presence is well-correlated with “good luck” for the home team and “bad luck” for the away team. When I dug in to try to understand why this one official stands out, I discovered that they are very rarely the head official (often getting assigned to do video replays). I also noticed that officials that are outside 67% of the other officials likewise rarely get to be head refs. Perhaps the MLS is paying attention to this (see this webpage for details on this).

Other Charts

MLS 2022

Home Team Luck distribution by head referee (MLS 2022)
Away Team Luck distribution by head referee (MLS 2022)

Note the one official that has no error bar? This is most likely because he was the head official only one time in 2022. It’s small data, but observe that it follows the exact opposite trend that we see this official following in 2023! Weird. We also see more dramatic shifts in mean values for the outlier refs in 2022 than we see in 2023.

Premier League 2023

Home Team Luck distribution by head referee (Premier League 2023)
Away Team Luck distribution by head referee (Premier League 2023)

Premier League 2022

Home Team Luck distribution by head referee (Premier League 2022)
Away Team Luck distribution by head referee (Premier League 2022)

Wrapup

So what do you see in the MLS 2022 and the Premier League charts? There are definitely some interesting trends and differences. Feel free to leave comments on what you see and we can dialogue about them!

Further Evaluations of Soccer Outcomes using the Luck Metric

In a previous article (link) I discussed how to create and evaluate a simple metric that describes the difference between the number of goals a team is expected to make (using the xG metric) and the actual number of goals they score. I’m calling this difference “luck” because it describes how much a team under- or over-performs the expectations made by the way they play a game. Soccer, perhaps more than other sports, is heavily influenced by these over- and under-performances.

I previously discussed how luck seems to be distributed across teams in MLS and the Premier League both when they are the home team and when they are away. We plotted their mean home luck and away luck against their other metrics that we’ve determined to be predictive, 1) the ratio of xG for a team to xG for the opponent (xG ratio) and 2) the amount the team pays in salary. We could see that teams that have favorable luck at home and/or away tend to perform better. Perhaps this is an example of how a team can “make their own luck”, meaning that perhaps in soccer not all luck is purely random chance. Most likely there are elements buried inside this luck metric that are based off of things we can’t easily measure. Stuff like good preparation, team chemistry, and the two things we’ll evaluate next in this series, the venue a team plays in, and the official overseeing the match. Today we’ll discuss venue.

The reason the intersection of “luck” and venue came to my mind was due to a discussion with an MLS player recently about analytics. We were talking about the strange difference between the relationship between the xG ratio and performance across the MLS and the Premier league (see this link to see this difference). He mentioned a number of elements about the MLS that could explain this difference:

  1. The different ways that MLS teams travel (bus, train, commercial air) vs. the ways that Premier League teams travel (more money = much nicer).
  2. The long distances that MLS teams travel and the widely-varying geographies and altitudes that British teams don’t have to face. Sometimes these distances, especially if it is to be a longer bus ride, influence a team’s willingness to “get a game over with”.
  3. The venue. I was not aware of this, but the player mentioned that there were still six teams in the MLS playing on artificial turf. Here’s a wikipedia page providing the details of all MLS stadiums. Sure enough, there are actually seven fields using some kind of turf, ‘Lumen Field’, ‘Providence Park’, ‘BC Place Stadium’, ‘Gillette Stadium’, ‘Mercedes-Benz Stadium’, ‘BMO Field’, ‘Bank of America Stadium’. When I did a simple grouping operation to evaluate the mean luck score for home and away teams on turf and then compare these numbers to games on grass, I see a difference. Stay with me and I’ll describe it.

Breaking down Luck by Playing Surface

2023: Interestingly, in 2023, we see both Home and Away teams performing slightly better in terms of “luckiness” when playing on TURF! This is likely close to how the MLS player imagined the result would be. This means that home teams outperformed their expected goals by a bit more (.229 on turf to .167 on grass) and away teams slightly underperformed their expectations (-0.058 on turf vs. -0.014 on grass). This makes sense that the “turf-based” home team is more familiar with their playing surface and they therefore outperform expectations more then how a “grass-based” team outperforms on their grass surface. Yes, this is confusing, but it appears that turf gives their teams a bigger advantage than grass gives their teams. My guess is that this is based on the fact that there are more grass fields and they are very familiar to all teams. Away teams, however, always seem to underperform compared to home teams and we see this underperformance to be more noticeable on turf. So in essence, in 2023, the data indicates that teams with turf had a measurable advantage at home greater than the advantage teams with grass saw. In 2022, we don’t see these exact results, however, with Home Team luck being a tossup between turf and grass and Away teams still seeing poorer performances (-0.064 on turf vs, 0.161 on grass). Still, this shows a small advantage for the Turf-based teams.

Detailed Views of Luck for 2023 (season still incomplete)

Here are some errorbar plats that will allow us to see some of this detail more clearly. NOTE that stadiums with turf fields have their labels on the plot in red. Other things to be aware of… the vertical lines represent the range of luck results (standard deviation) and the squares represent the mean luck values at each stadium. Nodes with no vertical bars tend to be stadiums where only one game was played, therefore there was no variation of luck. The results are sorted from greatest to least luck.

2023 error bar plot for Home Team Luck by Venue (note Turf playing fields in red)
2023 error bar plot for Away Team Luck by Venue (note Turf playing fields in red)

Detailed Views of Luck for 2022 (season still incomplete)

2022 error bar plot for Home Team Luck by Venue (note Turf playing fields in red)
2022 error bar plot for Away Team Luck by Venue (note Turf playing fields in red)

What Do We See in these Plots??

  1. The “Luck Slope” for both home and away teams is steeper for 2023 than 2022. My guess is that this is due to the fact that the 2023 season is still being played. It will be interesting to see if the difference in luck between the top venues and the bottom ones flattens out as the season progresses.
  2. But even though the season isn’t complete, the data from 2023 is interesting. So far, we can see that for the Home Teams, the “red” venues (these have artificial turf surfaces) tend to be more towards the left of the chart. This is the “higher luck” side. Conversely, the same venues that are positive for the Home teams are on the left side of the Away Team chart, meaning that the turf fields are less lucky for away teams.
  3. If you do a study field-by-field, the “luckier” venues in 2022 are not the same ones seen in 2023. There could be lots of variables other than playing surface that could describe this. Take a look and see what you can uncover! For example, Lumen Field (home of the Seattle Sounders) is incredibly unlucky for the Sounders in 2023 (and is lucky for their opponents!) but in 2022 it was about middle of the road. Despite this unluckiness, the Sounders are 2nd in the MLS Western Division right now! One observation I’d make is that the Sounders are one of a couple of teams where their home luck and away luck do not diverge much. For a good visualization of this, see 2023 chart at this link.
  4. There are a whole lot of different analyses that could be done using this data. Feel free to discuss in the comments section of the blog! I probably haven’t thought yet about what you noticed!


Soccer Analytics: Home and Away “Luck”

Will this improbable shot succeed?

As I mentioned in my first post, the game of soccer, due to it’s many degrees of freedom in play, is very non-deterministic. What does this phrase mean? There’s a philosophical meaning for the word “deterministic” which essentially says that all events, including human action, are ultimately determined by causes understood to be external to the will. There’s also an engineering meaning to the word where a deterministic system is repeatable with very high precision because it is a function of the inputs and the initial conditions. For instance, anti-lock brake systems are designed to be deterministic. We don’t want any surprises there!

The opposite of deterministic systems would be a “stochastic” system which has one or more aspects that could be considered randomly sampled and thus can be analyzed statistically but not precisely predicted. So a “non-deterministic” game like soccer can also said to be “stochastic”, because there are many variables in the game which all have their own probability distributions. Whew! All of this so I can talk about luck!

Luck

Wikipedia’s definition of luck is a pretty good one, “Luck is the phenomenon and belief that defines the experience of improbable events, especially improbably positive or negative ones.” Over the last two block articles about soccer analytics, I’ve described how sometimes unpredictable events result in scoring goals or failing to score goals. These events could be anything from officiating decisions, a player being surprisingly out of position right when the opponents pass comes to him, a gust of wind that causes a ball to just barely tick up off the crossbar, etc. Since goals in soccer are a much more rare event than points (runs, 3 point shots, field goals, touchdowns, hockey goals) scored in other popular sports, when they are impacted by improbable “luck” it is much more noticeable. If a touchdown is scored after a missed pass interference call and the scoring team goes up 35-14, that is just 7 out of 35 points. If a soccer official calls a questionable foul in the box and the offended team scores their penalty kick (70% chance of scoring), that might win the game 1-0. The luck of having the official see the play as a foul essentially won the game for one team and lost it for another.

Measuring Luck in Soccer

Note that it is impossible to measure the factors that caused the official above to call the contact in the box as a foul (perhaps he ate to many burritos before the game? Maybe his attention was distracted by a low-flying seagull? Perhaps he just hates the color green?). What we hope to do is find a proxy for the measurement of luck that “mostly” captures events when teams are expected to score a certain number of goals but either fail to achieve that number or exceed that number. So in this case, actual goals scored minus the number of expected goals could be seen as outperformance of the expectations for whatever reason. I’ll just call that overperformance “luck”. I also see the opposite where an opponent’s expected goals minus the number of actual goals scored could be viewed as your team’s defensive luck. Averaging the offensive luck and defensive luck will constitute overall luck.

Charts (of course)

In the charts below, I’m measuring the overall luck for teams when they are playing at home vs. when they are playing away. This luck is averaged across all games in the season. I’ve overlaid these two new lines (the yellow and the green) on top of the blue annual salary bars and the orange “no penalty expected Goals” ratio. These home and away luck lines augment the orange xG ratio by bringing in the disparity between xG and actual goals (which, as I’m suggesting, can be seen as luck)

MLS 2022 Season xG, Salary, Home Luck, Away Luck
English Premier League 2022 Season xG, Salary, Home Luck, Away Luck

Conclusion

So what new information does the two luck features add to these charts? We have already noticed that:

  1. The Premier League clearly has a different financial structure than MLS (more on this in a later article)
  2. Therefore, a team’s annual salary is more indicative of success in the Premier League than in the MLS.
  3. xG ratio is predictive of success in both leagues, but more so in the Premier League
  4. Total points during the season is also highly correlated with overall success.

Now we look at the two luck lines to see what they add. What do we see?

  1. Having either Home Luck or Away Luck being smaller than zero is bad for the team’s performance. This is pretty obvious when you think about it, because it shows that the team is failing to convert on opportunities that are expected, whether on offense or defense or both. Why are they failing? Probably for unmeasurable reasons (the team is not getting along, the refs hate the coach, no fans are showing up at home, the team is practicing too hard and is tired during the game, etc.). The teams above the half-way point in the standings all have either a Home or an Away luck average higher than zero. The very top teams tend to have both Home and Away Luck averages above zero.
  2. It seems that a big divergence in Home and Away Luck, especially when one is in negative territory, indicates poorer performance. Note the last 6 teams in the Premier League chart. They all have a fairly large gap. The very worst teams see this gap at Home, and the next worst teams (Southampton and Everton) see the worst luck Away. But all have a pretty large gap between the home and the away. We see similar things in the MLS, where the very worst team by points (DC United) has the worst Home Luck in the league. Orlando City has the next worst Home Luck, but they make up for it through having one of the very highest Away Luck numbers (might be interesting to look into this club).
  3. What do you see? Weigh in on this in the comments? I answer them all to the very best of my ability.

Soccer Analytics: MLS and Premier League Comparison

In the previous entry in this series we discussed the relationship between team performance (points in the standings) and a ratio of expected goals for to expected goals against. We also showed the impact of the team’s salary on their performance. Note that we did this all for the US MLS soccer league. Here’s what we saw from 2022:

MLS 2022 season results: Impact of npxG ratio and team salary on points

This shows a strong relationship between points (the teams on the left side of the chart were the highest ranked) and the xG Ratio. But there doesn’t appear to be any correlation between the team salaries and performance. This could mean a lot of different things, but the well-known relationship in the English Premier League between salary and performance seems to be absent in the MLS. So I wondered, what would this graph look like for the teams in the Premier League during 2022? Would we see the same trends or something different? So here goes:

Premier League 2021-2022 season results: Impact of npxG ratio and team salary on points

A few things are obvious from this comparison.

  1. The premier league teams are paid WAY more than MLS. We knew that this was likely to be the case, but this is an order of magnitude higher! Perhaps Manchester United is reflecting Ronaldo’s salary in that big outlier!
  2. In the Premier League, it is clear that there is a strong direct correlation between team salary and performance. This is very unlike what we saw in the MLS. I can think of a few reasons… first, the MLS has a kind of salary cap that I have read prevents them from using salary as effectively as the European leagues. Second, the Premier League has relegation, where teams that end at the bottom of the league (sorry, Norwich City) get relegated to the second tier league while the top performers in the second league get pulled up. This is likely to have major effects on the salary. There are likely many more reasons.
  3. Note how smoothly the xG Ratio descends down the point scale compared to the MLS. In the MLS chart, we saw a general trend with some outliers, but it is very clear that the xG ratio correlates strongly with performance in the Premier League.

Why is this interesting?

Well, what we see here are two measures that are easy to collect which are nice proxies for team performance. In the Premier League, we know that increasing team salary tends to lead to improved performance. We also know in both leagues that increasing the number of expected goals by focusing on creating more quality shots (instead of concentrating on only perfect shots) and reducing your opponent’s number of quality shots leads to better performance. This is important, because of the chance involved in converting a shot (about 1 out of 10 shots are converted). Expected Goals gives teams a good measure to try to optimize.

Book Review, “Anna Karenina” by Leo Tolstoy

Anna Karenina

Anna Karenina by Leo Tolstoy

My rating:
5 of 5 stars


Anna Karenina
is widely considered to be one of the top novels of all time, and I certainly wouldn’t disagree. There are aspects of this book’s greatness, however, that keep bringing me back to it every few years. As I get older, I see more and more amazing insights into human nature in this book than I ever noticed before. Tolstoy tells us, without telling us, that we should serve others, go deeper, and travel further in, and we leave his novel wanting these things for ourselves.

Here we see two all-time great personalities with incredible depth, Anna Karenina and Kostya Levin and eagerly follow their lives. Many other interesting characters live inside these pages, but in general they exist to shine more light upon the two major ones. Sadly, the reader isn’t aware for much of the novel that one character is on the ascent and the other is descending. Both are very sympathetic and engaging in very different ways.

Themes that this book undertakes that might have been unpopular at the time of writing abound. One major theme is that of the loosening of restrictions on the common class, the rural peasants who were enslaved serfs not too long in the memory of the characters. This change in the social fabric of Russia is seen in clear contrast to the often-frivolous, excessive lives of the urban wealthy elite. Another major theme is that of sanctification versus decline. Sometimes characters who early on appear to have a broad excess of humanity find themselves in a downward spiral just as other characters who struggle to understand themselves and others improve and begin demonstrating goodness and grace to others. As Kostya Levin, an impulsive and argumentative landowner discovered late in the book, “if goodness has causes, it is not goodness; if it has effects, a reward, it is not goodness either. So goodness is outside the chain of cause and effect.” This realization is a major breakthrough for Levin, who is struggling mightily to discover his purpose and place.

Throughout the book, it is hard not to adore the character Anna Karenina herself. She reminds one of the classmate in school who was confident and well-liked and didn’t understand or care about why. Anna comes from a lesser background but has easily made a charming path into the acceptance of the nobility. Her ability to be very decisive during challenging times turns into a flaw, though, and her life — unnoticed by anyone — begins to unravel.

This is a long book with incredible amounts of detail. As a writer myself (mediocre at best in comparison to Leo Tolstoy), I found many admirable examples where Tolstoy fits a beautiful, surprising set or event into the story in ways that seem natural and obvious. The book will be challenging, and therefore valuable, to any who struggle with the attachment of too much value to material things. Tolstoy reminds the reader over and over that the elements of one’s life that constitute goodness owe no debt to wealth and possessions.



View all my reviews

1/3/22: A View of Omicron a Couple of Weeks in

Here’s a bunch of views from the Arizona Dept of Health Services.

Cases per Day

Arizona cases per day, from AZDHS Data Dashboard, 1/3/22

“As you get further on and the infections become less severe, it is much more relevant to focus on the hospitalizations as opposed to the total number of cases,” Dr. Anthony Fauci

Hospitalization Stats (by Day)

Inpatient and ICU Bed status – COVID and non-COVID patients. From AZDHS. 1/3/22

Discharges are one of the best data points for showing positive trends in hospital capacity. Normally, discharges peak right before the hospital bed use peaks. There was a peak of discharges around 12/1 that signaled the bed use decrease you can see to the right of the chart above. I wonder if the second discharge peak we’re seeing now signals a larger bed use decrease?

COVID Hospital Discharges by Day, AZDHS, 1/3/22

Deaths

Deaths were already trending lower before Omicron arrived, but they might be trending much lower (need another week or two to know for sure).

AZ COVID Deaths by Day, AZDHS, 1/3/22

Other Visualizations

Here’s my standard Case Rate (color) and Acceleration (Diameter) chart. What do we see here? It does seem like the higher rates and accelerations are in the more dense parts of the country. Prior to Omicron’s arrival, the brighter colors were trending in the northern (colder) parts of the country. It appears like the case breakouts are trending more southern now. We can see big outbreaks in Miami, Denver, El Paso, and NYC.

Case Rates and Accelerations, 1/3/22

Data Tables

Note that a lot of states seem to not be reporting (Delta_Active is very unlikely to be zero right now). Case Rates (IROC_confirmed) are through the roof for most states. Deaths appear very low considering the case acceleration.

State Data Table, 1/3/22

Things that make you scratch your head

Here are two charts that I put together a while back when it became clear that the states with higher vaccination rates were doing much better than the ones with the lowest vaccination rates. Now we see opposite behavior during Omicron. I’m not really sure how to explain this. Weather differences?

Cases per 1000 per Day – States with Lowest Vaccination Rates 1/3/22
Cases per 1000 per Day – States with Highest Vaccination Rates 1/3/22

What do we see here? Pretty much all of these states (not New Mexico) is sharply accelerating cases per 1000 right now. The states on the top are accelerating at a much lower rate. My guesses are weather and higher density, but those are just guesses. Other ideas??

Have COVID-19 Strains become Less Virulent?

Virulence: Virulence is a pathogen’s or microorganism’s ability to cause damage to a host. In most contexts, especially in animal systems, virulence refers to the degree of damage caused by a microbe to its host. The pathogenicity of an organism—its ability to cause disease—is determined by its virulence factors. (Wikipedia)

Here’s some Images from the Arizona Dept. of Heath Services data dashboard that I think tell a story that could indicate decreased virulence of the Delta variant.

  1. COVID Cases by Day in Arizona – Entire Pandemic: In the image below we see the cases per day since around April of 2020. You can easily see three surges of cases. The first happened in the summer of 2021 and coincided with a huge, relatively uncontrolled outbreak in Northern Mexico. Many of the cases during this time occurred in border counties of Arizona. The second surge occurred in the winter of 2020 where the entire U.S. saw a spike of cases that correlated with the average daily low temperatures dropping to below 40 degrees. The latest surge corresponded with the more-transmissible Delta variant and has seen two spikes. This surge has been less of a spike and more of a “slog” where perhaps we are seeing the combination of the arrival of the Delta variant in the late summer merge with the more traditional cold-weather pattern for a virus where the night-time temperatures drop. Understandably, the lack of relief is wearing out health care workers and challenging hospitals. Note that the number of cases per day for the second spike of the Delta outbreak is roughly equivalent to the first summer outbreak.
COVID-19 Cases by Day (https://www.azdhs.gov/covid19/data/index.php#confirmed-by-day) – 12/21/21

2. Hospitalization – Cases by Day: Below you can see hospitalization for the three major outbreaks. The winter outbreak hospitalization by day far exceeded the first summer outbreak. Likewise, the first summer outbreak’s hospitalization per day is just under double the peak of the Delta variant outbreak. The only problem with the Delta outbreak is that it is lingering. Similar cases per day and less hospitalization per day. Just over a longer time. This naturally creates problems in hospitals processing sick people through their system due to the need to navigate bottlenecks that form. Just like in a factory, bottlenecks are going to be less of a problem in a quick surge of production than they are in long, tiring runs of production where errors and inefficiencies compound.

3. Deaths per Day: In the image below, we see similar patterns to hospitalization. If you look closely, you can see that the peaks of the deaths are a week or two behind the peaks of hospitalizations. Again, we see the same pattern as we see with hospitalization. Though cases during the Delta wave are roughly equal to the first summer wave, the deaths are around half.

COVID-19 Deaths by Date of Death (https://www.azdhs.gov/covid19/data/index.php#deaths) – 12/21/21

Thoughts

Does this data show that Delta variant is less virulent than the preceding variants?

Perhaps. It’s quite possible that during the first summer wave we did a worse job of measuring cases. COVID tests are pretty ubiquitous now in late 2021 and maybe we’re collecting a higher percentage of the cases. Conversely, it’s also possible that people have inferred or imagined that Delta is less of a risk to them and are not getting tested if they experience mild symptoms. Either of these could be true and both would impact the usefulness of the case number. Additionally, the new variable of COVID vaccinations that was introduced in early 2021 has certainly reduced the impact of the Delta variant. It would take some work to decipher whether the virulence of Delta to unvaccinated people was equal or less than previous variants.

This is one of the challenges of measuring cases for the purpose of scientific analysis. It is very hard in a real-world study to control for the measurement variables across numerous regions and measurement authorities (governments, hospitals, universities). This is one of the reasons why we still don’t know much about this virus, despite having measured it for around a year and a half.

My Opinion: Oftentimes the concerns around measures will balance out when data is considered in very large batches (“big data”). My suspicion is that human nature is the constant across the measurement of all of these surges and we can take what is presented to us and assume that Delta is less virulent than the previous strains, either due to the virus itself or due to the boosts to our immune systems from either natural immunity or the COVID vaccines that most people have received.

Omicron and the future: We’ll continue evaluating the hospitalization and death metrics in the context of cases. My suspicion is that as Omicron arrives, it will dominate and gradually eliminate Delta and previous variants still lingering out there. If Omicron is less virulent, perhaps then we’ll see a leveling off of the cases to some background number and then we can say that COVID-19 has become endemic. If Omicron is not less virulent, then we’ll have a rough month or two ahead of us.

Welcome to the Era of Omicron

I took a bit of a pause on monitoring COVID during the Delta outbreak as at some point, people seemed to be much less interested. However, I’m hearing folks with questions now that a new, more contagious variant has emerged. A recent pre-print paper (not peer reviewed yet, so might be revised in the future) shows that the omicron variant multiplies 70x faster in airways but 10x slower in lungs. This explains why the variant appears to be more contagious but less threatening than Delta. See here for a pretty good description of the findings.

Might Omicron be a Good Thing or a Bad Thing?

Some reports predict that the faster-spreading variant will create more risk for humans, especially since it seems to evade the defenses from vaccinations to some degree. Others are reminding us that most pandemics end with a very virulent but less threatening variant that out-competes all of the more deadly variants. This is how the Spanish Flu ended. Hopefully the latter possibility is true, but time will tell. There are already reports from South Africa that hospitalizations (or at least severe ones requiring oxygen) are significantly down under omicron than they were during a similar period of the delta outbreak there.

Latest Data – Before the Wave from Omicron Hits

Here’s the latest data by state. I’ll include some recent state data tables later in the post for comparison’s sake. Note that the case rates have peaked up a bit in cold states over last week’s data. Perhaps this is the effect of Omicron or perhaps it’s just due to cold weather. Some states (like Arizona) have fallen down the list in the last two weeks.

State Data Table, sorted by case rate. 12/16/21

Arizona County Comparisons

Here’s a view on the death rates and case rates across the top Arizona counties by population since about June of 2020. I found it pretty interesting for comparison’s sake. I see a couple of interesting things here:

  1. Pima County, Maricopa County, and Pinal County all show nearly identical rates throughout the pandemic. Why is this interesting? Pima County — at least to my eye — has taken much more stringent public health measures than the other two counties from day one. Pinal County in particular seems to have gone out of its way to take as few public health measures as possible. But their rates and numbers are very similar (although Pinal County has fewer deaths per 1000 persons than Pima or Maricopa). What does this mean? No one knows for sure, but there is a strong indicator here that the measures we humans think will keep a virus at bay may not be very effective in the real world (vs. the lab).
  2. Yuma County had the steepest surge during the summer of 2020, but the case and death rates have been very flat ever since. This could be due to a higher vaccination rate on this border county or might even be due to natural immunity. I have no idea.
Case Rates across top AZ Counties by Population – 12/17/21
Death Rates across top AZ counties by population – 12/17/21

Older State Data Tables for Comparison

Perhaps the below will be interesting to data nerds now or in the future.

State Data Table from 12/8/21

State Data Table – 12/8/21

State Data Table from 11/30/21

State Data Table – 11/30/21

State Data Table from 11/20/21

State Data Table – 11/20/21

Delta Surge Update – Demographics Focus 8/13/21

Hospitalization (Arizona)

One question that hasn’t been well addressed in the media (all political bents) is whether the COVID Delta surge was driving hospitalization and who, indeed, was being hospitalized. My thinking is that this is our prime metric of the danger of a COVID surge these days. Here’s a chart showing the Arizona hospitalization numbers by demographic. It’s a bit messy for a couple of reasons: 1) Arizona keeps “catching up” on hospitalization numbers by dumping large count backlogs into a single day. I suspect this is a hard metric to keep up with due to all the hospital systems in the state and their state of enthusiasm (?) about reporting data… 2) I stopped capturing the daily snapshot from AZDHS’ web site sometime in May when the data got really boring and moved to weekly (or so). This means my trends aren’t as granular as before, but they’re still accurate.

Arizona Hospitalization (beds used) Data by Age – AZDHS data, collected by T.N. – 8/13/21

What do we see above? Note that at the left of the chart, the hospitalization by age is fairly random and driven by low numbers and statistics. However, if you can ignore the glitch in the middle, the trend is pretty clear towards the right (the Delta Surge). Hospitalization numbers are clearly trending up (but are still not significantly higher than in May. What does this trend reveal? Surprisingly, the over65 age group is still getting hospitalized at much higher rates than their percentage of the population would indicate. No way to know if these are vaccinated people or not. That’s a big gap in the data. They’re matched in numbers by the much-larger 20-44 age group and followed closely by the 45-54 and 55-64 groups. The under 20 age group remains the least hospitalized. This seems to go against some of the news reports that are indicating that the Delta variant is having more severe outcomes in the youngest cases. That doesn’t seem to be the case right now in Arizona at least.

Below I’m showing the hospitalization numbers for all age demographics. As you can see, the Delta surge (furthest right) has not been surging in the hospitals the same way the earlier two surges did. Keep your eye on this chart as things move forward.

AZ Hospitalization since 4/20 (https://www.azdhs.gov/covid19/data/index.php#hospitalization)

Cases – Pima County

In my county (Pima) the Delta surge has resulted in proportionately less cases than in the much-larger Maricopa County. My suspicion is that this is due to the notably higher vaccination rates in Pima County. But again, the big question is which demographics are getting infected during the current surge?

Pima County Cases by Age Demographic – 8/13/21

Again, ignoring the loss of granularity by my moving to weekly data capture, you can see the trending on cases from the lows of May until now. It’s no surprise that the 20-44 age group is leading the case counts. In general, across Arizona, this group is much less likely than older demographics to get vaccinated. Plus, there’s more of them. However, the most interesting part of this chart is that the under 20 group is the next highest increase in cases. This group is largely unvaccinated, but it’s not clear how many of them are between 12 and 20 and how many are under 12. This is an error in data collection “strategy” that’s been a problem throughout COVID. Perhaps no one expected at the start that the under 16 demographic (school age) would be so interesting for this pandemic. The rest of the demographics (more vaccination and older) are barely seeing any case rate uptick since May. So, again, fairly surprising that the youngest demographics are the primary ones getting the Delta variant of COVID. No doubt “breakthrough” cases are happening in vaccinated people, but perhaps they’re not symptomatic enough to get counted. Or maybe there are just very few of them (despite what the headlines would indicate).

I just show Pima County here, but statewide, the trend is similar. At the state level, the case rates in the older demographics are slightly higher than Pima county and the younger demographic case rates are noticeably higher. This, again, is driven by the much higher rates and lower vaccination in huge Maricopa County.

Deaths

There isn’t much change to death rates during the Delta surge from the low period of May. Deaths are still very low, as you can see from the height of the stacked blue and red bars in the chart below. The only thing that *might* be interesting is that the ratio of deaths in the over65 demographic to deaths in every other demographic is much lower now. Sometimes we see this when deaths are low, but during the two previous surges, this ratio trended between 2.5 and 4. Right now it ranges around 2 or lower. This ratio is the green line in the chart below (and the red bars are “over65” deaths and blue bars are “under65” deaths). What might this mean? Again, I suspect it is the power of the vaccine to limit deaths in the over 65 community. I keep tracking this number and I hope that it doesn’t trend up again.