New Blog “Tag”. Soccer Analytics.

Arizona Youth Soccer, credit Tod Newman

I’ve been thinking about Soccer analytics for some time now. I coached a Middle School soccer team last season and decided to develop some simple measurements that might allow the team to see improvement. I selected shots, shots on goal (good shots), and turnovers (losing the ball for more than 3 seconds). As it turns out, without a focused team manager, it is difficult to collect these simple measures, even when carefully defined. Middle School attention spans are not long, everyone!

So in this light, I recently picked up a copy of Ryan O’Hanlon’s book “Net Gains” (link to Amazon) and was inspired to tune up my old COVID stats and visualizations (check out my COVID-19 tag if you really want to relive those times) for something much more interesting to me now. Since I haven’t seen much in the way of MLS analytics, I figured that might be a good place to start.

What do we Know about Soccer Analytics?

First, soccer is a highly unstructured game which typically low numbers of scores. Think about baseball… the players are often in set positions, both defensively and offensively. The batter stands in the batter’s box, the pitcher is on the mound, runners stay within the basepaths and stand on the bases. Defensive players tend to stand most of their time in the same spots. A Baseball diamond is a huge space that players will only occupy a small portion of throughout the game. It is rare that an official makes a single call that flips the outcome of the game.

Soccer Analytics Are Hard because Soccer has Low Structure!

Now think about soccer. There are many variants of soccer formations. Some clubs have traditionally used a 4x4x2 or a 4x4x3. But there are creative variants of these formations that could get adopted for special situations. There are very few spots on the pitch where players have a low probability of occupying during a game. This contributes to making soccer a very hard game to collect data on and analyze. This difficulty has also led to a lack of “killer” metrics that are indicative of team success. Indeed, in the book ‘Soccereconomics’, the authors Simon Kuper and Stefan Szymanski, find that in European leagues the amount spent on players’ wages is the most highly correlated measurable with team success that is known! And of course, with many games decided by one goal or tied, a single call from an official can reverse the outcome of the game. This is discouraging, to say the least, for anyone that wants to find any other signals hiding under that noise. Billy Beane, the former GM of the Oakland Athletics baseball team became famous for finding soft signals in the data that the high-spending teams hadn’t been paying attention to. These are hard to find in soccer.

One Early Metric I Like (And Think I can Collect)

One metric that I’m interested in is Expected Goals (both For and Against). This is a measure of (my words) the times when a team makes good decisions to put themselves into the position of taking a good shot. Most of the data indicates that in soccer, ten decent shots on goal will on aggregate score one real goal. So most shots have an Expected Goals score (xG) of 0.1. Some shots from better locations have a higher factor. Overall, it isn’t hard to count up the xG during a game. A team that has 3 xG but only 1 goal in a game could be thought to have fallen on the bad side of the luck that drives much of what happens in soccer. The xG for a team’s opponent can also be calculated. I use a feature called npxG that I find on the site FBRef.com (link to site) because it takes penalty shots out of the mix (I’m not a big fan of penalty shots, which seem highly subjective to me, and therefore unpredictable). Then the ratio of npxG “for” your team to npxG “against” your team is a very good ratio to measure with one number how your team performed.

Early Analysis on MLS Soccer

I collected data and did some data engineering on it to allow me to plot two things for each MLS Club. First, the annual salary for the club (in millions of dollars) and second the npxG ratio. The hypothesis is that when these are plotted for the teams in rank order by their number of points for the season, maybe we’ll see some trends.

2022 MLS Results

2022 MLS End of Season Results, comparing final points, npxG Ratio, and Team Annual Salary

This is a pretty satisfactory result ans shows a trend that correlates a high npxG ratio with success. Actually, the top npxG ratio of all goes to LAFC, who won the 2022 championship over Philadelphia (the second highest ratio). The trend is not linear down, reflecting the impact of chance on the results of individual games. Note however that there is no trend at all regarding team salaries and final results. I have seen papers that indicate that others haven’t found any trends with MLS salaries either, ostensibly due to the way the MLS implements a salary cap.

Can we Predict the 2023 MLS Championship yet?

2023 MLS Current Status, comparing current points standings, npxG Ratio, and Team Annual Salary

So as is obvious, the trend is non-existent after 15 or so games of the 2023 MLS season. My suspicion is that it’s too early in the season for “luck” to have filtered down to its normal level.

Plans for Future Analysis

I’m planning to evaluate more MLS seasons for this trend and incorporate a number of other metrics that are interesting and available (% possession is one that I tried to estimate for my Middle School soccer team, but an accurate % possession might have good correlation with performance. I’ll roll these kinds of articles out periodically. Please weigh in if you have interest and/or expertise to contribute!

4 Replies to “New Blog “Tag”. Soccer Analytics.”

  1. As a parent with 3 kids heavily active in competitive soccer, I am quite interested in your analysis. One thought I had was regarding xG ratio. I’d guess that it’s higher in youth soccer, primarily on the basis that kids are taught to place a ball well early on and their strikes improve over time. But… the quality, size and experience of the keeper doesn’t improve at the same rate. Wonder if it would be possible to prove (or disprove) that xG is higher for younger teams and thus teams with strong strikers or one-touch bangers tend to run a score up?

    1. Love the thought process Blaine! I’m interested. I’m definitely going to have some sort of way to capture xG next year on my Middle School team. I’m hoping the school’s math department will help me recruit a team manager / statistician for me to teach.

  2. While a good attempt to quantify Salary/xG (Expected Goals) is a fool’s gold measurement. The
    items below are directly tied to playing the game as it supposed to be played for youngest U4 to Elite
    futbollers.
    1. Chemistry/personnel
    • While salary is a good starting place for metric collection it is a false narrative on success. Second, the isolation to only MLS does not provide a true metric collection for this. If we analyze not only MLS but expand the data mining for all leagues and all divisions, we will see a range of trends that have no correlation. Non “A” level of the pyramid does show some great improvements with money infusion. The injection of new players but also impact the whole width and breath of the club have direct results in measured success. New facilities, support personnel, training staff to include the coaching staff provides improvement over short periods of time that result in net positive gains. We can see that at the elite levels – money and salary for talented players can only have minimal effects. The richest clubs in the world – Real Madrid, Barcelona, Manchester United, Liverpool and Bayern Munich don’t always provide league winners – 2023 results – Manchester City, Barcelona, Bayern Munich. The fact that soccer is not an individual sport, when the elite athletes have a very small margins of difference invalidates the salary metric. A factor that does show results but not necessarily consistent and works like a mixture of trial and error is team chemistry/personnel. A pure striker who has no mid-field conductors will have little to no chances, a holding midfielder will provide a preliminary line of defense before a back four. A midfield three or four must control tempo and chance creation is the hub of a team. The keeper working as the first attacker is advantage pushing the field ten forward. The striker success rate in the attacking third directly correlates to victories. Finding the right combination of a third can get “lucky”, two thirds can dictate the game for opportunities and complete control of all three thirds make opposing team work virtually impossible.
    2. Possession
    • As detailed above the combination of thirds controlling the tempo, defense, recovery, and chance creation is the out-weighing factors for success. Many cogs impact this and these options are not easily realized. Personal may not only match the mold stated above but the overall philosophy might not match the mind-set of the players chosen. So coupled with possession (other team can’t score without the ball) the mind-set of tactics is important as well.
    3. Success in final 1/3
    • A pure striker is hard to find. One that has 3 or 4 to 1 success rate is what is desired for success. Again, having a striker or strikers that can “finish” is not the end all. Possession to allow the midfield/defense to feed the attackers opportunities to score. This movement without the ball is little to no training for all soccer players that make the generation of events positive for the attackers and quickly returning to the attack mode – by suffocating defense to return possession to the attacking team is the start of positive play.

    1. Thanks so much, Coach Mo! I really appreciate your wisdom in this area. You point out a lot of the reason why Soccer analytics have been slow to mature. The game is complex!

Leave a Reply

Your email address will not be published. Required fields are marked *