What are the Expected Goals In Football

How the most effective advanced statistic applicable to football works.

Share:

Sabermetrics (from SABR, Society for American Baseball Research), or the empirical analysis of what happens on the diamond through the use of advanced statistics, has radically changed the way we analyze performance and scouting in the world of baseball. Bill James, one of the earliest and most staunch supporters of this approach, called sabermetric analysis “the pursuit of objective knowledge about baseball”. By aggregating simple statistics which very often have a solely descriptive nature and which alone do not allow to determine if a certain player is better than another, the sabermetrics aims to create advanced metrics that correlate well with the probability of obtaining winning results and that of consequently they are able not only to evaluate the performance of a player in past seasons,

 

In the wake of the success of the sabermetry in baseball witnessed by the notorious story of Billy Beane , general manager of the Oakland Athletics, on whose story the book, which later became the film “Moneyball: the Art of Winning an Unfair Game” is based, the use of statistics advanced has also spread to other sports, starting with the American ones, such as basketball and hockey, up to football. But in the green rectangle this type of analysis has not found the same fertile ground offered by the other sports in which it literally flourished.

 

The main problem, at least at the level of “public” analytics, is that in football, despite having a broad spectrum of simple statistics available, there is a significantly lower volume of individual events. Mike L. Goodman in one of his last pieces before Grantland closed, mentioned how in the past season of MLB as many as 143 players had registered more than 500 appearances at the pot: a figure not even comparable to that of just 7 players with at least 100 shots. in the last Premier League season, much less that of just 2 players with at least 50 shots on goal.


In the last Serie A season, the distribution of individual shooting volumes turned out to be very similar to that of the Premier mentioned by Goodman. Only 7 players have made more than 100 total shots and only one (Gonzalo Higuaín) has made more than 50 shots on goal.

 



Even the comparison with the volumes of other American sports remains merciless. Last year in the NBA, Stephen Curry attempted 1341 shots for a field goal. Likewise in the NHL, Alexander Ovechkin shot 795 times. The only players to have reached 1000 total shots (without considering penalties) in the last six years (!) Are Cristiano Ronaldo and Lionel Messi.

 

“If you don’t shoot, you can’t score” We
have therefore tried to get around this far from indifferent obstacle through the use of team statistics, rather than relating to individuals, working mainly on shots. By aggregating large amounts of data over multiple seasons, several analysts have come to the same (predictable) conclusion: the best teams are those that shoot the most, all the better if they are on target and at the same time take fewer shots. On the other hand, a quote curiously attributed to both Johan Cruyff and Wayne Gretzky (!) Pragmatically reads “If you don’t shoot, you can’t score”.
Match by match the outcome of the shots taken and conceded (i.e. the conversion rate that determines the number of goals scored or conceded) is subject to a consistent level of variance, but over the course of a whole season the difference between shots taken and conceded tends to correlate well with points in the standings. Put simply, if a team concedes a few shots and makes many, it may not win every game in which it succeeds in imposing its superiority, but at the end of the season, barring a significant influence of variance and consequently of fate, it will most likely find itself at the top of the ranking (and vice versa).
The advanced stat that measures shot superiority is the Total Shots Ratio (TSR). The TSR is the ratio between the number of shots taken and the sum between the number of shots taken and the number of shots taken. The result can be expressed both in decimal and percentage form: every 10 overall shots, a team with a TSR of 70% (or 0.7) makes 7 shots and suffers 3. Similar to the TSR is the Shots on Target Ratio (SoTR), as it represents the percentage of shots on goal made compared to the sum of shots on goal made and suffered.

 

Both of these statistics derive in some way from equivalent statistics already existing in hockey (in particular from indices such as Corsi, Fenwick, FenClose …), as well as the PDO. In this case it is not an acronym, but the creator of the Vic Ferrari index, thus renamed the sum, expressed as a percentage, between the conversion rate (the number of goals scored divided by the number of shots taken) and team save rate (1 minus the number of goals conceded divided by the number of shots conceded) multiplied by 10, in honor of one of the commentators on his blog Irreverent Oiler Fans. In general, the average PDO in a championship is always 1000, therefore, in light of how relatively random and rapid the regression towards the average of this index is, we could consider, in a rather approximate way, “unlucky” teams with a lower PDO 1000 and those with a PDO higher than 1000 are “lucky”.

 

These first advanced metrics applied to football have proven to have their own predictive value, albeit different from each other. Investigating TSR and PDO, both James Grayson and Sander Ijtsma have come to the conclusion that TSR is highly dependent on a team’s skills, as even over several consecutive seasons, the TSR values ​​of a given team tend to remain fairly constant. On the contrary, the PDO and its components (conversion rate and save rate), tend to undergo even considerable fluctuations already between game and game, so they can only represent a rudimentary measure of how much luck (in the form of variance) affects performance at short term of a team.

 

Quality over quantity

Indices such as TSR, SoTR and PDO are strictly linked to the assumption that all shots are equal to each other, i.e. that each shot has the same probability of being converted into a goal, without considering determining aspects of the shot itself such as the distance from the goal, the position, game situation etc. A shot from inside the small area after discarding the goalkeeper affects the calculation of these advanced stats, in the same way as a shot thrown from 40 meters between a forest of legs.

 

On a purely intuitive level it is legitimate to say that some shots are “worth” more than others. Reading in a statistical report that a team has made 20 shots in one game could suggest that it has become very dangerous, creating many opportunities to score. But if 15 of those shots are unrealistic conclusions from a distance, even a simple visual test immediately allows us to contextualize, and above all reconsider, the “dangerousness” of those occasions.

 


The 21 shots of the Viola in Fiorentina 1-2 Roma would suggest a deeply undeserved defeat, but considering that only 6 out of 21 were made inside the area (2 of which blocked), Sousa’s men certainly cannot curse their own bad luck.

 

But in an empirical analysis nothing can be left to chance, much less to intuition. This is why the assumption that position influences the outcome of the shot also needed to be corroborated by a statistical basis. In this sense, one of the best known studies is Paul Riley’s Shot Position Average Model ( SPAM ). Analyzing over 30,000 shots over three Premier League seasons, Riley was able to determine how many shots are needed to score a goal from outside the box, from the sides of the penalty area and from the center of the penalty area as well as situations such as a penalty kick or a direct free kick. The most important result of the study was that the number of shots needed on average to score a goal from each position remained substantially unchanged from season to season.

 


Paul Riley’s SPAM model, summarized in the simplicity of this explanatory diagram.

 

The further step taken by Riley, in fact simultaneously with many other members of the public analytics community, was to increase the “granularity” of the model, that is, in the case of a “discrete model” like Paul’s, to divide the field in further zones, or maximizing the level of detail by choosing a “continuous model” in which each shot is no longer evaluated according to the zone from which it was taken, but individually.

 

Thus were born the first models of Expected Goals (abbreviated with xG or ExpG) a method of measuring the quality of the opportunities created (or granted) by a team with the aim of measuring how many goals that same team would have scored (or suffered) in average based on the quality and quality of shots taken (or granted). By assigning to each present, past or future shot the respective probability of being converted into the net, we are now able to evaluate each shot on a scale ranging from 0 to 1: obviously the higher the value, the higher the possibility that the shot is converted into goal. All models (public and otherwise) of Expected Goals arrive at the same result. The essential discriminant that differentiates one model from another is that of the calculation method and the parameters used in the calculation itself. Factors of common use are the position and the type of shot (for example of foot or head), after which the roads also diverge considerably. We will therefore analyze two public models, trying to highlight their particularities and differences.

 

Paul Riley’s xG model
As anticipated, Paul Riley continued his studies on the subject and came to the development of an Expected Goals model only for the Premier League, but which has the advantage of being “100% public “. In his model Riley has abandoned the idea of ​​considering all shots, but has only considered shots on goal: on the other hand, if a shot does not end in the face of goal, it cannot even end up in the net.

 

The other determining factors in Riley’s model are the location and type of shot. Position is fundamental in the calculation as Riley, examining a sample of 13,000 shots on goal, divided the field into 46 different sectors, calculating for each of them the probability that a shot on goal from that particular sector will end in network. The other fundamental discriminant, the type of shot, instead distinguishes between shooting for action, direct free kick or penalty kick.

 

The advertising of Riley’s model consists in the fact that anyone can check the results (updated periodically) of the model and check the data on the Expected Goals of all the Premier League teams and all the players with at least one shot on goal in the season, simply consulting the interactive charts accessible from his blog where a more detailed explanation of his model is also available .

 


According to Riley’s model, on average Jamie Vardy’s shots on goal should have generated “virtual” 10.97 xG, compared to 15 actually scored by the Leicester forward. In this sense Vardy is going beyond expectations (that is, he is an “over-performer”).

 

The Michael Caley Model
One of the best known and most sophisticated public models of Expected Goals is that of Michael Caley . Over the years, Caley has constantly updated his model, both by refining the computation method and by broadening the statistical base (expanding the data-set more and more by adding more and more shots), as well as updating the factors taken into consideration by the model itself. . A very detailed explanation of the latest version of his method is available at this address .

 

Compared to Riley, Caley considers all the shots, without distinction between those outside and in the face of the goal, dividing them according to the type, but without taking into account penalties and, like Riley, own goals. His model distinguishes between 6 types of shooting and each of them corresponds to a different equation.

 

  1. Direct Free
    Shots 2. Shots resulting from dribbling on the goalkeeper
    3. Headers assisted by a cross
    4. Headers not assisted by a cross
    5. Non-headed shots assisted by a cross
    6. Non-headed shots not assisted by a cross (or “regular shots”)

 

The first two factors considered within these categories are the distance to the goal and the angle of fire (in the form of “relative angle of fire”).

 


In this conversion map developed by Caley, the importance of distance and angle of shot on the probability of scoring a goal is evident.

 

Another important factor is the type of assist that led to the shot. In his studies Caley demonstrated the superiority of through balls and the danger zone as determining components of the quality of the assist, in terms of the probability of conversion of the created opportunity. Starting from these considerations, Caley divided the key steps that determined the creation of the occasion into various types, assigning each type a different efficiency calculated on the basis of empirical evidence.

 


Caley’s research has empirically demonstrated all the inefficiency of crosses.

 

Another differentiating factor is the type of offensive action (for example counter-attacks or consolidated possessions) which is also an indirect measure of the defensive pressure, one of the most important factors that at present are only indirectly considered by the public models of Expected Goals. Other indirect indicators of defensive pressure and classification are the so-called “big chances” defined by Opta as the opportunities that reasonably expect us to be converted, dribbling before the shot and defensive errors all factors that usually increase the probability of scoring. Finally Caley also takes into consideration the finishing skill of the individual player,

 

Expected goals allow you to measure any type of scoring opportunity. In the Caley model, this incredible Luis Suaréz sill error recorded a value of 0.91 xG.

 

Caley periodically updates his database, where in addition to xG he also records other important advanced statistics for the Premier League and the other three major European leagues . It is also very active on Twitter , where it continuously releases xG maps of individual matches, players or teams that you have probably already seen on the pages of the Last Man.

 

xG map for #Arsenal – #LCFC . Arsenal’s unrelenting pressure at 11v10 (helped by some odd Ranieri subs) pays off. pic.twitter.com/dz3rZZ1wkn

– Michael Caley (@MC_of_A) February 14, 2016

Arsenal Leicester seen through Caley’s Expected Goals. Above we can read the “virtual score”, calculated as the sum of the xG value of each conclusion (from which penalties and own goals are excluded), represented on the map by the square-shaped indicators. The larger the gauge, the greater the xG of that roll.

 

In his latest update of the model, Caley illustrated how effective xGs are in predictive terms compared to other advanced statistics, so much so that he himself simulates the outcome of the championships through Monte Carlo simulations , based on xGs of this and last season and the wages of each team.

 

In the Premier League, Bundesliga, Serie A and La Liga, Caley’s model gives excellent correlation values ​​with the final classification. However, in lower level leagues, such as Ligue 1, Eredivise and MLS, the correlation is much weaker, so much so that other statistics are preferred. Probably a lower average quality determines a significant interference of the variance in terms of conversion such as to compromise the predictive capacity of the Expected Goals.

 

Performance measurement and xG problems Expected
goals allow you to assess the level of performance by determining whether a team (or a single player) is going beyond expectations (over-performing) or is traveling below expectations (under-performing) , simply by calculating the difference between goals scored (or conceded) and expected goals.

 


The goal difference and the goal difference expected of the Serie A teams. According to Caley’s model, Chievo and Bologna are going beyond the expected performance, while Udinese and Carpi are significantly disappointing expectations.

 

Due to the regression towards the mean, we can therefore say, based on how much the expected performance differs from the observed one, if the team in question will realistically suffer the effects of the regression and evaluate how much and in what direction they could affect. To give a practical example, let’s think of Inter , which already at Christmas seemed destined for an inexorable negative regression of their saving percentage.

 

At club level, the use of xGs could allow you to know in advance the potential regression to make corrections that allow at least to “cushion the fall” caused for example by a potential drop in your conversion rate, or to postpone (or because no anticipate) decisions, even drastic ones, such as the exclusion of an attacker or even the exemption of a coach. The xGs are already used in professional football, so much so that it is now (almost) normal to hear Tuchel talking about goals expected in an interview with Die Zeit, or Wenger saying that in August Arsenal were collecting little judging by how much they created in terms by xG.

 

The not insignificant limitation of xG is that an analysis of this type cannot tell us how and why the level of performance observed is not respecting that of expectations. The measure of the performance differential is a fundamental starting point that until a few years ago seemed little more than science fiction (again as regards the public level), but an in-depth analysis with the aim of obtaining the answer to the questions of how and why it cannot ignore a tactical analysis of what is happening on the pitch.

 

Another major criticism of the use of Expected Goals is the lack of an indicator (at least a direct indicator) of defensive pressure in the calculation. In a nutshell, where the defenders are at the time of the shot. There are already studies that go in this direction , but unlike NBA basketball, in football the space-time traceability of movements is still not massively used, much less at a public level, limiting the possibilities of developing this intuition.

 

Then there are the effects due to world-class players who, as we have seen, Caley has tried to correct (for example, we know that Messi in his career has a goal-to-goal ratio of about 1.3) and those due to ” super-teams ”(Barcelona, ​​Real Madrid and Bayern Munich), that is those formations that are able to continuously go beyond the expected performance calculated with the use of a model based on xG.

 

It could also be objected that during a match, not all dangerous actions translate into a shot and that therefore the expected goals tend to underestimate the overall danger of the opportunities created. To meet this need, models have been created that are able to integrate models based on shots by calculating the danger of passages in terms of xG, or real progression index of the ball in the playing field (therefore also through dribbling and conduction of the balloon), as in the case of the BPI conceived by Daniel Altman.

 

In a low-scoring sport like football, anything that allows you to gain even a minimal margin over your opponent becomes fundamental. In this sense, advanced statistics, of which xG are the main expression, are already revolutionizing the work behind the scenes of the analyst and the observer. The Expected Goals are a statistic that can certainly be perfected as well as very malleable and have the not inconsiderable advantage of well representing the reality of what happens on the playing field. Their superiority over all other public shooting statistics is proven and right now they are the most advanced football analytics discipline, a discipline that is still in its infancy compared to the development it has had in other sports.

 

 

by Abdullah Sam
I’m a teacher, researcher and writer. I write about study subjects to improve the learning of college and university students. I write top Quality study notes Mostly, Tech, Games, Education, And Solutions/Tips and Tricks. I am a person who helps students to acquire knowledge, competence or virtue.

Leave a Comment