By now, expected goals are a part of the hockey lexicon. There is tremendous value in assigning probability to shots finding the back of the net, especially since properly built models can be exceptionally more predictive of future scoring than any other publicly available measure. This has been exhaustively studied and the research is compelling.
But expected goal models need to be consistently calibrated and fine-tuned to ensure that observed changes are appropriately modeled. For an extreme example, think about how scoring rates would change if defencemen were banned from the National Hockey League and teams could only use forwards. Or, for a real-life example, if the league decided to significantly downsize goaltender equipment. Both changes – one significant, one subtle – could influence the probability of any shot finding the back of the net. Therefore, we consistently must study these outputs to ensure that changes within the construct of the league are appropriately reflected.
That brings me to Saturday night’s game between Chicago and Toronto, which was a bit of a light-bulb moment for me. Some Maple Leafs fans on social media were certain that Frederik Andersen, and perhaps only Frederik Andersen, was responsible for the blowout loss. Responses were eerily similar to this one, with people rightly pointing out how few expected goals the team had given up relative to many actual goals were conceded.
There is no doubt Andersen played poorly. But I went back and looked at every goal and scoring chance, and I certainly felt as if the expected goals were being understated relative to the danger Andersen faced. It was another data point in what has become a common theme – teams seeing a higher rate of goals against relative to expected goals against during the 2019-20 season.
Consider our expected goal and actual goal variance rate by season, and notice the spike this year:
Every year has in-season variance, but in 2019-20 it has all been in one direction. Teams are simply outscoring our expectations, repeatedly. And this isn’t some clunky model issue – league scoring is at the highest it has been since the 1995-96 and 2005-06 seasons. Year to date, expected goals have understated actual goals by 240.
There are obviously a number of theories as to why. Some asked whether the league’s early decision to play with shot coordinates – a key predictor variable in expected goal models – was having an impact. But that issue was resolved in November, and in January 2020 alone we are still seeing significant variance despite the verified correction.
It also doesn’t appear to be a specific model issue. Three of the most robust and accurate – Evolving Hockey, Natural Stat Trick, and Hockey Viz – all are reporting serious underages.
There is any number of on-ice theories as to why scoring rates have increased, and/or why expected goal rates are underwhelming relative to those numbers. The goaltenders did experience a downsizing in equipment. Teams did get more sophisticated with matters like power play deployment (and the associated four forward approach), or when to pull the goalie when trailing late, or the emphasis of hockey skill over toughness and physicality in the bottom-six of forward groups. All are contributors, yet at aggregate, it’s highly unlikely they would explain such a significant change.
There may be another question at hand, too. Let’s move away from expected goals and look at shot metrics, which have always been reliable in terms of their predictive power in the modern era of hockey.
For years, any shot-based measure has regressed quite favourably against goal scoring. In other words, if you know how frequently a team is shooting, you reasonably know how well a team is scoring – true for both sides of the ice. So if you had an expected goal problem, you could always revert to using shot rates.
That’s not the case in 2019-20. The relationship between any of these measures and scoring is tiny. Look at the r-squared values when regressing any of our shot measures against scoring rates this season. (I will use the Evolving Hockey model for a moment here).
It is worth noting that the relationship on the offensive side (how many shots a team takes relative to how many goals a team scores) is materially higher than it is on the defensive side this year, which suggests – at least in part – that a piece of this may boil down to goaltender play. As in: the more variant goaltender performance is, the less we understand about the relationship between shots against and goals against, or chances against and goals against, or expected goals against versus goals against.
This is one of those stories where I don’t quite have a conclusion, and I’m not sure I will anytime soon. Having done this for more than a decade, the one thing that I know is that when you introduce these questions to a large public forum, someone is smart enough to figure out the answer.
That person just isn’t me.
Data via Evolving Hockey, Natural Stat Trick, and Hockey Reference