Do Naive xG Models underestimate Expected Goals for Top Teams?

After documenting the implementation of a simple xG model I have spent quite a bit of time thinking about what makes a good model and how you could go about quantifying its quality. Coincidentally, a few weeks ago Tom Worville posted the below chart which sparked a bit of a discussion around the relationship between high shooting output and overperformance of xG. Look at how clinical Son's been in terms of scoring above expected (t/t @Torvaney for the gridline idea here) pic.

xG Model - Accuracy and Goodness-Of-Fit

In the first part of this series we constructed a simple expected Goals-model, solely relying on two predictors: the distance and angle from goal for each shot. As a reminder see below the visualization of our xG-estimates from the first part of this series: Our model passed the eye test, i.e. it maps shot locations to xG-values that make intuitive sense to us. In this post we want to evaluate the quality of this model more formally with tidymodels’ yardstick package.

xG Model - Design and Implementation with R Tidymodels

I have recently gone through the Google Machine Learning crash course and was looking for a project to apply these skills to. Coincidentally, it is also not that long ago that tidymodels has gained some traction (at least in my twitter feed) and I am keen to try it out. Of course an Expected Goals-model is a great excuse to combine the two items above. It is relatively easy to set up, readers of this blog will not need a lengthy introduction to the thought process behind it and the feature set used to explain the probability of shots leading to goal is very intuitive.