In this
post, we look forward to the new season of La
Liga by looking back to the previous one. More specifically, we’ll zoom in
on the offensive contribution of
those forwards who put in stellar performances on this front. A player’s
offensive contribution is typically determined "quick and dirty" by considering goals and
assists, often by simply adding the two. Naturally, there are much more
sophisticated – and accurate – ways to gauge the offensive contribution a
player represents to his team. One such method that I will
employ here has been proposed by Thomas Severini, Professor of
statistics at Northwestern University, and concerns a regression model with
multiple predictor variables.
The
intuition and rationale behind the statistical model are as follows: For all
teams in the Primera División, over the last seasons, we know how many goals
they scored and a bunch of other statistics, including how many shots on goal,
how many attempts at goal from outside the box, from inside the box, how many
dribbles, passes, etc. they executed. What we are interested in to find out, is what are the
variables that actually relate (most) to the number of goals scored by a given
team. To this extent, in our model, we consider each team’s performance over
the last five seasons, leading to 100 (i.e. 5 times 20 teams) observations. The
statistical model allows for filtering out those variables that yield important
additional information about the dependent variable (here: goals scored).
A concise
model that allows us to explain no less than 85 percent (i.e. the R2)
of the goals scored in the past five seasons of La Liga turns out to be the following:
– 13.7 –
0.10112*shots_outside_of_box + 0.41682*shots_on_target + 0.0132*successful_passes
Most
important when interpreting this model – rather than the actual number, which
is hard to interpret – are the variables included and their respective signs:
Shots on target has a positive sign, implying that more shots on target tend to coincide with more goals. The sign of the variable shots
outside of box is negative as it negatively adjusts the impact of shots on
target attempted from outside the box in terms
of their success probability. Successful passes turn out to constitute another
important aspect of the offensive contribution, whereas, for example,
successful dribbles, do not. Also noteworthy is the relative difference between the respective coefficients of variable shots on target and of successful passes: A shot
on target will have an impact on the offensive contribution over thirtyfold (i.e. 0.41682/0.0132) the one of a successful pass. In case I would have data on the area of the
pitch where the passes took place (e.g. final third), the model could be made
even more accurate and I could also include non-forwards.
Following
Severini's Analytic Methods in Sports (2015), to apply to above team-level model at the level of the individual player, we merely need to divide the intercept by 10 (because of the
ten field players). Thus, for an individual player,
Offensive
Contribution = – 1.37 – 0.10112*shots_outside_of_box + 0.41682*shots_on_target
+ 0.0132*successful_passes
The top 10
of offensive contributors for the 2015-2016 La Liga season is as follows:
Figure 1: Top 10 offensive contributors, La Liga 2015-2016 |
Naturally,
some players received more playing time than others, e.g. due to injury. For
comparative purposes, therefore, it is also helpful to consider offensive
contribution assuming all players would have played all matches – at the level
of their offensive contribution when they were actually fielded:
Figure 2: Top 10 offensive contributors, La Liga 2015-2016, assuming all players would have played all of their team's games |
Key
observations:
1.
The
number 1 in terms of offensive contribution is Lionel Messi.
2.
Although
Cristiano Ronaldo came closest to Messi in terms of offensive contribution, Neymar would have jumped
Cristiano, had they both had the same playing time.
3.
All
members of MSN as well as of BBC are included in the top ten,
assuming all players played the same number of games.
4.
MSN’s combined offensive contribution is larger
than BBC’s.
5.
There
is one “ugly duckling” in the top 7 (8), otherwise made up entirely of “usual suspect” stars: Jonathan Viera of Las Palmas. The 26-year-old Canary
Islander tends to remain under the radar of more traditional measures of offensive
contribution.
6.
The
top 10 is completed by club topscorers who had an exceptionally prolific season: Depor’s
Lucas Pérez, Betis’ Rubén Castro, Bilbao’s Aduriz and Real Sociedad’s Agirretxe
– who missed over half the season due to injury.
7.
Relative differences are quite substantial, e.g. Messi's offensive contribution is almost double the rankings' number 10.
While it is
difficult to perform even better in the season following an exceptionally good one – among others, due to a principle known in statistics as
“regression to the mean” – I particularly look forward to finding out whether Las
Palmas’ Viera can really make a name for himself this season and whether Messi and Cristiano Ronaldo can remain at the very top, or whether
youngster Neymar will make a move towards absolute supremacy.
Note that by means of our regression model, we are measuring correlations, not
causation. This concretely means that, over the past five seasons, the
selected variables were accurate predictors of offensive contribution in La Liga. However,
it does not imply that a player’s future offensive contribution will move
accordingly. Especially if players were to “act upon” the above model, e.g. by
dribbling less and shooting or passing more instead, the included predictor
variables may (or may not) lose part of their positive correlation with the
dependent variable.
Let the new season commence! J