There is a lot of skepticism about passing statistics. In particular, # of passes completed and pass completion % are generally eschewed by the soccer analytics community. For the most part these criticisms are valid. These metrics don't control at all for the difficulty or type of pass being attempted. For example, a player such as Barcelona's Xavi might rack up 100 passes a game completed at a 90% clip, but for the most part they are all short passes completed under minimal defensive duress. His passing statistics are as much a product of Barcelona's "tiki-taka" system as they are a reflection of his individual skill. Therefore, if we place these statistics within the context of a team's system, might that be more informative? Perhaps.
Pass Usage Rate (%)
The concept is pretty simple: take a player's passes/90 and divide it by the team's passes/90. Additionally, I have introduced a concept called Pass Share, which frames the Pass Usage Rate (%) in a slightly different manner: Pass Share takes a players Pass Usage Rate (%) and divides it by an average player's Pass Usage Rate (9.1%) (assumes all 11 players share equally in passing).
See below for results from MLS 2013. For avid watchers they aren't that surprising, though seeing the two Montreal Impact central defenders so high up the list (Nesta and Ferrari) is interesting.
Friday, January 24, 2014
Monday, November 11, 2013
Parity is a popular topic of conversation in MLS. Because of league salary and roster rules, it has traditionally been very difficult for any team to consistently stay at the top. Similarly, unless you are Toronto FC, it is not unusual for teams to go from the bottom of the table one year to the top the next. Alexi Lalas famously (infamously?) proclaimed the league to be “the most competitive league in the world.” Is he right?
We looked at a representative group of 14 other leagues from around the world and tested them on three key metrics we believe are the best measures of league parity (or competitiveness, I consider them interchangeable).
To measure this, we looked at the standard deviation of points per game (PPG) for each league. In effect, this measures the variance in results across the league. A lower number means teams are more closely grouped towards the average, a higher number means more teams are further from the average (both good and bad).
Year Over Year Parity
This table is the average change in year over year points per game. This measures how much results vary from year to year. The EPL obviously has a very low number in this metric as generally the top 5 teams have been the same for the past handful of years (as have the mid-table teams). It should be noted that this is only from one year’s worth of data, and likely would be different if looked at over multiple years.
The Haves (10%) vs. The Have Nots (90%)
Quite simply, this measures how much goal differential the top 10% of clubs in each league are responsible for. A competitive league should not have the top couple teams hording all the results. For example, look at the difference between who is responsible for the majority of the goal differential in the Bundesliga (Bayern/Dortmund) and MLS (Chivas USA/DC United).
We have taken these three factors and equally weighted each one, assigning a standard deviation (either + or -) for each league and each metric. Add them up and MLS is indeed the most competitive league in this 15 league sample. Interestingly, Brazil was not far behind. Of course, there are multiple ways one can measure parity and competitiveness, and this is just one of many approaches.
Wednesday, October 30, 2013
This article originally appeared in Sounder at Heart
|1||Sporting Kansas City||5.7|
|2||Seattle Sounders FC||7.2|
|4||Real Salt Lake||7.7|
|6||San Jose Earthquakes||8.5|
|17||New England Rev.||9.3|
|18||New York Red Bulls||9.3|
|19||CD Chivas USA||9.7|
Friday, September 27, 2013
This article originally appeared on Statsbomb
The two most ubiquitous stats for attackers are goals and assists. And why shouldn’t they? After all, goal differential explains ~85% of the variance in a league table. Creating goals is a very valuable skill. But how repeatable of a skill is it? After scoring 11 goals in his first 19 EPL games for Newcastle United in 2010, Andy Carroll was sold to Liverpool for a staggering 35 million pounds. Carroll would score only six times in his next 44 EPL appearances before he was loaned out and subsequently sold to West Ham. I do not bring up Carroll because I think he was a poor acquisition for Liverpool, I bring him up because he exemplifies the variable nature of goalscoring. When it comes to goals and assists, what is the signal and what is the noise?
Key Passes are better than Assists
The original intent of this piece was to test the persistence or repeatability of key passes. To the uninitiated, key passes are passes that directly lead to an attempt on goal. There has been some legitimate criticism of the fact that key passes don’t take proper account of the quality of the chances being created, but for now it’s the metric we have. I looked at every player in the EPL who averaged over 0.7 key passes per 90 in any season from 2009-2013. I then looked at the year over year relationship for key passes: how well do year 1 key passes predict year 2 key passes (n=184)? Quite well, it turns out. While not overwhelming, the relationship is evidence that key passes are a somewhat repeatable statistic.
Next, I took the same sample and looked at how well year 1 assists predicted year 2 assists. There really isn’t a relationship. Assists are basically random from year to year.
On a hunch, I looked at how well year 1 key passes predicted year 2 assists.
Granted, this is not a great relationship either, but it is significant that key passes actually predict assists better than assists. And, unlike assists, key passes have some degree of repeatability.
Shots are better than Goals
Earlier this year Ben Pugsley undertook a similar study, but he primarily looked at shooting statistics. The statistic with the best predictive relationship? Shots per 90.
Ben also found no year over year relationship for assists (although his r^2 differed slightly from mine) or goals (below).
Ben was kind enough to, as I had done with key passes and assists, run a regression comparing year 1 shots to year 2 goals.
As with key passes and assists, shots predict goals better than goals predict goals. Of course an r^2 of 0.12 is hardly predictive, but at this point in soccer analytics knowing what does not work is just as important as finding out what works.
Expected Goals Created Model
So if goals and assists don’t work, what might? Key passes and shots, taken at their face, are not nearly sophisticated enough. Luckily, much work has been done on shot location/type and expected goals (here and here and many other places). As far as I know, adjusting for shot location/type hasn’t been attempted yet for shots resulting from key passes, but that is a logical next step. Theoretically, an expected goal and expected assist model would be the best predictor.
Goals and assists are the unpredictable results of a more repeatable underlying process. By understanding and quantifying this process, we can move towards the signal and away from the noise.
Thursday, September 19, 2013
I also wrote a lot of words about my methodology and criticism of the process below.
I have been a little critical of the "MLS 24 Under 24." It is still so early in the player's careers and so much can change. For evidence, look no further than last year's winner Darren Mattocks, who has fallen completely out of favor with Vancouver. Naturally, as one does, I voiced my skepticism over Twitter:
"24 under 24 is like looking at a "power ranking" 5 games into the year. Fun, but hopelessly futile."
Andrew Wiebe of mlssoccer.com responded and offered to send me a ballot. "Hopelessly futile" challenge accepted.
I am a little familiar with the ballot after seeing Matt Tomaszewicz aka "The Shin Guardian" post his ballot on his website. Finding a consistent approach to grading players is the hardest thing. Contrary to its intention, I actually think the rubric makes the process harder rather than easier (see Criticism). There are five categories you must grade each player on from 1-20 (why 1-20?): Technical, Tactical, Physical, Personality, Potential. Not wanting to get bogged down with this process just yet, I instead "force ranked" my top 24 and, only after doing this, assigned point totals. I assume the majority of media members who filled this out did the same.
In my opinion, the two questions that really matter in this exercise are "how good is this player now?" and "how good could this player become?" I am a stats guy so, to answer the question of how good each player is now, I started with each player's statistical profile, courtesy of whoscored.com and squawka.com. Context is key when looking at statistics, so every player's profile is compared against: other players at their position around the league, other players at their position in 24 Under 24, other players at their position on their own team.
Figuring out the potential of each player is obviously a much more subjective process. Most players on the list I have only seen play 5-15 times, limiting my ability to pass any sort of conclusive judgement. Large factors in determining potential are a player's physical capability and their age/how much time they have spent in MLS.
Once force-ranked, coming up with the point total for each player is fairly difficult. The Personality category immediately jumps out as problematic: who cares if you're a nice guy or a jerk, as long as you play well and make your team better? Further, while I have biases and might be able to guess what type of personality a player has, I really do not know them well enough to have an informed opinion. I am sure this goes for most media members as well. I basically punted and gave everyone a "10".
To further simplify matters, I took Technical and Tactical to represent how good the player is now and Physical and Potential to represent how good this player could become. As such, a player's score was the same across each set of categories.
The grading rubric is unnecessarily complex and, in my opinion, distorts the voting process. A player's personality counts just as much as their technical ability or potential? It does not make sense. But the real problem is the variance in point totals. The range in my point total was 14 (between 74 and 60) across 24 players, less than a point difference between each ranking. Looking at the only other public ballot (that I could find), Matt Tomaszewicz had the exact same range of 14 (between 72 and 58), though he somehow scored 25 players (?).
The problem is best exemplified by a player like Shane O'Neill, who was ranked #3 by Matt and just outside the 24 by me. Under the current methodology, combining our two ballots, O'Neill would actually slide all the way to #20. Harsh. The problem with the current methodology is that instead of measuring variance of opinion, it ends up measuring variance of methodology. The easy solution is to do a more traditional (i.e. Heisman, MVP, etc.) style where #1 gets 24 points, #2 gets 23 points, #3 gets 22 points.... #24 gets 1 point.