Soccer Analytics: Does Counterpressing Work?

counterpressing example

In soccer, there are legendary coaches who have asserted that upon losing the ball, teams that regain it within five, six, or even eight seconds have a higher chance of keeping the ball, and indeed, scoring. This is the foundation of what Jurgen Klopp called “Gegenpressing” and led to the rise of the RB Leipzig team in the Bundesliga, whose coach Ralf Rangnick stated that goals were most often scored within eight seconds of winning the ball from the opposition. This seems like an amazing statistic, but is it data driven or is it merely legend?

In the 2021 paper “Data-driven detection of counterpressing in professional football” (link) the authors, Pascal Bauer and Gabriel Anzer, describe a method for using supervised machine learning to detect counterpressing in video. If automated detection was possible then they hoped to be able to better evaluate some of these counterpressing rules of thumb.

History of the Research into Pressing Tactics

Much of the data behind the counterpressing strategies started with a man named Charles Reep. He was one of the first who studied the game for the purpose of collecting data that might be able to reveal new insights. He captured piles and piles of data — many of these in hand-written notes — to better understand the game. There is much that can be said about Charles, but this is too short of a post to discuss his successes and miscues. To our question about transition successes, however, in one paper that he authored in 1968 he found that 30% of the time that a team forced a transition and gained possession they were able to make a shot on goal and indeed, 25% of all goals came from regained possessions in the attacking quarter. This data wasn’t much used outside of Reep’s circle but in 1999 A.J. Grant collected data from the 1998 World Cup and confirmed these numbers. This relationship between transition “recaptures” and goals has been confirmed in papers from 2014 and 2018 as well. There have also been studies that learned that teams relied on counterpressing more often when behind in the score than when ahead. This would indicate to me that some teams know of the power of counterpressing, but don’t structure their main strategy around it, much like the press in basketball. Additionally, studies have discovered that teams that recover the ball more quickly after losing it tend to win more games. All of these things seem intuitive, but it’s helpful to see that there are measurements and data behind the notions.

The paper concludes a few things that I find valuable.

  1. First off, researchers have been able to discern counterpressing strategies using machine learning. This is very important, because it reduces the labor required to classify significant events and approaches in soccer.
  2. Using these automated detection methods, these same researchers also found that counterpressing is more likely be successful near the sidelines and that numerical superiority near the ball when it is turned over increases the chance of winning it back. Both of these, of course, makes good sense to me.
  3. Within the German Bundesliga, teams follow very different transition strategies and these differences could be detected by the machine learning. Each of these approaches had different levels of success regarding turnover recovery and goal scoring.
  4. Successful teams—measured against their final ranking—tend to use the counterpressing strategy more efficiently, providing credibility to the coaches that use it as a major offensive counter-attacking strategy.

Conclusion

Though there seems to be data that ties a fast recovery during transition with a higher probabiity of actually scoring, I was actually unable to find any data that actually quantified the number of seconds after a turnover where a transition was more likely to lead to a goal. Perhaps the number is somewhere in the mountains of tablets where Charles Reep recorded his data, or perhaps its just legend. But the data seems very clear that pursuing a counterpressing strategy with players who are highly fit and can fly all over the field (people like Tyler Adams??) allows teams to have a higher probability of scoring in games than teams who do not. Of course if Lionel Messi plays for the non-counterpressing team, all bets are off.

Soccer Analytics: Home and Away “Luck”

Will this improbable shot succeed?

As I mentioned in my first post, the game of soccer, due to it’s many degrees of freedom in play, is very non-deterministic. What does this phrase mean? There’s a philosophical meaning for the word “deterministic” which essentially says that all events, including human action, are ultimately determined by causes understood to be external to the will. There’s also an engineering meaning to the word where a deterministic system is repeatable with very high precision because it is a function of the inputs and the initial conditions. For instance, anti-lock brake systems are designed to be deterministic. We don’t want any surprises there!

The opposite of deterministic systems would be a “stochastic” system which has one or more aspects that could be considered randomly sampled and thus can be analyzed statistically but not precisely predicted. So a “non-deterministic” game like soccer can also said to be “stochastic”, because there are many variables in the game which all have their own probability distributions. Whew! All of this so I can talk about luck!

Luck

Wikipedia’s definition of luck is a pretty good one, “Luck is the phenomenon and belief that defines the experience of improbable events, especially improbably positive or negative ones.” Over the last two block articles about soccer analytics, I’ve described how sometimes unpredictable events result in scoring goals or failing to score goals. These events could be anything from officiating decisions, a player being surprisingly out of position right when the opponents pass comes to him, a gust of wind that causes a ball to just barely tick up off the crossbar, etc. Since goals in soccer are a much more rare event than points (runs, 3 point shots, field goals, touchdowns, hockey goals) scored in other popular sports, when they are impacted by improbable “luck” it is much more noticeable. If a touchdown is scored after a missed pass interference call and the scoring team goes up 35-14, that is just 7 out of 35 points. If a soccer official calls a questionable foul in the box and the offended team scores their penalty kick (70% chance of scoring), that might win the game 1-0. The luck of having the official see the play as a foul essentially won the game for one team and lost it for another.

Measuring Luck in Soccer

Note that it is impossible to measure the factors that caused the official above to call the contact in the box as a foul (perhaps he ate to many burritos before the game? Maybe his attention was distracted by a low-flying seagull? Perhaps he just hates the color green?). What we hope to do is find a proxy for the measurement of luck that “mostly” captures events when teams are expected to score a certain number of goals but either fail to achieve that number or exceed that number. So in this case, actual goals scored minus the number of expected goals could be seen as outperformance of the expectations for whatever reason. I’ll just call that overperformance “luck”. I also see the opposite where an opponent’s expected goals minus the number of actual goals scored could be viewed as your team’s defensive luck. Averaging the offensive luck and defensive luck will constitute overall luck.

Charts (of course)

In the charts below, I’m measuring the overall luck for teams when they are playing at home vs. when they are playing away. This luck is averaged across all games in the season. I’ve overlaid these two new lines (the yellow and the green) on top of the blue annual salary bars and the orange “no penalty expected Goals” ratio. These home and away luck lines augment the orange xG ratio by bringing in the disparity between xG and actual goals (which, as I’m suggesting, can be seen as luck)

MLS 2022 Season xG, Salary, Home Luck, Away Luck
English Premier League 2022 Season xG, Salary, Home Luck, Away Luck

Conclusion

So what new information does the two luck features add to these charts? We have already noticed that:

  1. The Premier League clearly has a different financial structure than MLS (more on this in a later article)
  2. Therefore, a team’s annual salary is more indicative of success in the Premier League than in the MLS.
  3. xG ratio is predictive of success in both leagues, but more so in the Premier League
  4. Total points during the season is also highly correlated with overall success.

Now we look at the two luck lines to see what they add. What do we see?

  1. Having either Home Luck or Away Luck being smaller than zero is bad for the team’s performance. This is pretty obvious when you think about it, because it shows that the team is failing to convert on opportunities that are expected, whether on offense or defense or both. Why are they failing? Probably for unmeasurable reasons (the team is not getting along, the refs hate the coach, no fans are showing up at home, the team is practicing too hard and is tired during the game, etc.). The teams above the half-way point in the standings all have either a Home or an Away luck average higher than zero. The very top teams tend to have both Home and Away Luck averages above zero.
  2. It seems that a big divergence in Home and Away Luck, especially when one is in negative territory, indicates poorer performance. Note the last 6 teams in the Premier League chart. They all have a fairly large gap. The very worst teams see this gap at Home, and the next worst teams (Southampton and Everton) see the worst luck Away. But all have a pretty large gap between the home and the away. We see similar things in the MLS, where the very worst team by points (DC United) has the worst Home Luck in the league. Orlando City has the next worst Home Luck, but they make up for it through having one of the very highest Away Luck numbers (might be interesting to look into this club).
  3. What do you see? Weigh in on this in the comments? I answer them all to the very best of my ability.

Soccer Analytics: MLS and Premier League Comparison

In the previous entry in this series we discussed the relationship between team performance (points in the standings) and a ratio of expected goals for to expected goals against. We also showed the impact of the team’s salary on their performance. Note that we did this all for the US MLS soccer league. Here’s what we saw from 2022:

MLS 2022 season results: Impact of npxG ratio and team salary on points

This shows a strong relationship between points (the teams on the left side of the chart were the highest ranked) and the xG Ratio. But there doesn’t appear to be any correlation between the team salaries and performance. This could mean a lot of different things, but the well-known relationship in the English Premier League between salary and performance seems to be absent in the MLS. So I wondered, what would this graph look like for the teams in the Premier League during 2022? Would we see the same trends or something different? So here goes:

Premier League 2021-2022 season results: Impact of npxG ratio and team salary on points

A few things are obvious from this comparison.

  1. The premier league teams are paid WAY more than MLS. We knew that this was likely to be the case, but this is an order of magnitude higher! Perhaps Manchester United is reflecting Ronaldo’s salary in that big outlier!
  2. In the Premier League, it is clear that there is a strong direct correlation between team salary and performance. This is very unlike what we saw in the MLS. I can think of a few reasons… first, the MLS has a kind of salary cap that I have read prevents them from using salary as effectively as the European leagues. Second, the Premier League has relegation, where teams that end at the bottom of the league (sorry, Norwich City) get relegated to the second tier league while the top performers in the second league get pulled up. This is likely to have major effects on the salary. There are likely many more reasons.
  3. Note how smoothly the xG Ratio descends down the point scale compared to the MLS. In the MLS chart, we saw a general trend with some outliers, but it is very clear that the xG ratio correlates strongly with performance in the Premier League.

Why is this interesting?

Well, what we see here are two measures that are easy to collect which are nice proxies for team performance. In the Premier League, we know that increasing team salary tends to lead to improved performance. We also know in both leagues that increasing the number of expected goals by focusing on creating more quality shots (instead of concentrating on only perfect shots) and reducing your opponent’s number of quality shots leads to better performance. This is important, because of the chance involved in converting a shot (about 1 out of 10 shots are converted). Expected Goals gives teams a good measure to try to optimize.

New Blog “Tag”. Soccer Analytics.

Arizona Youth Soccer, credit Tod Newman

I’ve been thinking about Soccer analytics for some time now. I coached a Middle School soccer team last season and decided to develop some simple measurements that might allow the team to see improvement. I selected shots, shots on goal (good shots), and turnovers (losing the ball for more than 3 seconds). As it turns out, without a focused team manager, it is difficult to collect these simple measures, even when carefully defined. Middle School attention spans are not long, everyone!

So in this light, I recently picked up a copy of Ryan O’Hanlon’s book “Net Gains” (link to Amazon) and was inspired to tune up my old COVID stats and visualizations (check out my COVID-19 tag if you really want to relive those times) for something much more interesting to me now. Since I haven’t seen much in the way of MLS analytics, I figured that might be a good place to start.

What do we Know about Soccer Analytics?

First, soccer is a highly unstructured game which typically low numbers of scores. Think about baseball… the players are often in set positions, both defensively and offensively. The batter stands in the batter’s box, the pitcher is on the mound, runners stay within the basepaths and stand on the bases. Defensive players tend to stand most of their time in the same spots. A Baseball diamond is a huge space that players will only occupy a small portion of throughout the game. It is rare that an official makes a single call that flips the outcome of the game.

Soccer Analytics Are Hard because Soccer has Low Structure!

Now think about soccer. There are many variants of soccer formations. Some clubs have traditionally used a 4x4x2 or a 4x4x3. But there are creative variants of these formations that could get adopted for special situations. There are very few spots on the pitch where players have a low probability of occupying during a game. This contributes to making soccer a very hard game to collect data on and analyze. This difficulty has also led to a lack of “killer” metrics that are indicative of team success. Indeed, in the book ‘Soccereconomics’, the authors Simon Kuper and Stefan Szymanski, find that in European leagues the amount spent on players’ wages is the most highly correlated measurable with team success that is known! And of course, with many games decided by one goal or tied, a single call from an official can reverse the outcome of the game. This is discouraging, to say the least, for anyone that wants to find any other signals hiding under that noise. Billy Beane, the former GM of the Oakland Athletics baseball team became famous for finding soft signals in the data that the high-spending teams hadn’t been paying attention to. These are hard to find in soccer.

One Early Metric I Like (And Think I can Collect)

One metric that I’m interested in is Expected Goals (both For and Against). This is a measure of (my words) the times when a team makes good decisions to put themselves into the position of taking a good shot. Most of the data indicates that in soccer, ten decent shots on goal will on aggregate score one real goal. So most shots have an Expected Goals score (xG) of 0.1. Some shots from better locations have a higher factor. Overall, it isn’t hard to count up the xG during a game. A team that has 3 xG but only 1 goal in a game could be thought to have fallen on the bad side of the luck that drives much of what happens in soccer. The xG for a team’s opponent can also be calculated. I use a feature called npxG that I find on the site FBRef.com (link to site) because it takes penalty shots out of the mix (I’m not a big fan of penalty shots, which seem highly subjective to me, and therefore unpredictable). Then the ratio of npxG “for” your team to npxG “against” your team is a very good ratio to measure with one number how your team performed.

Early Analysis on MLS Soccer

I collected data and did some data engineering on it to allow me to plot two things for each MLS Club. First, the annual salary for the club (in millions of dollars) and second the npxG ratio. The hypothesis is that when these are plotted for the teams in rank order by their number of points for the season, maybe we’ll see some trends.

2022 MLS Results

2022 MLS End of Season Results, comparing final points, npxG Ratio, and Team Annual Salary

This is a pretty satisfactory result ans shows a trend that correlates a high npxG ratio with success. Actually, the top npxG ratio of all goes to LAFC, who won the 2022 championship over Philadelphia (the second highest ratio). The trend is not linear down, reflecting the impact of chance on the results of individual games. Note however that there is no trend at all regarding team salaries and final results. I have seen papers that indicate that others haven’t found any trends with MLS salaries either, ostensibly due to the way the MLS implements a salary cap.

Can we Predict the 2023 MLS Championship yet?

2023 MLS Current Status, comparing current points standings, npxG Ratio, and Team Annual Salary

So as is obvious, the trend is non-existent after 15 or so games of the 2023 MLS season. My suspicion is that it’s too early in the season for “luck” to have filtered down to its normal level.

Plans for Future Analysis

I’m planning to evaluate more MLS seasons for this trend and incorporate a number of other metrics that are interesting and available (% possession is one that I tried to estimate for my Middle School soccer team, but an accurate % possession might have good correlation with performance. I’ll roll these kinds of articles out periodically. Please weigh in if you have interest and/or expertise to contribute!