Baby Scouting : World Edition

The other day I wrote a piece that used a bit of data to question peoples beliefs (on both sides of the spectrum) about when should English Premier League clubs start to recruit players (Baby England : Scout Onwards).

The piece stimulated a bit of debate on Twitter with two camps of thought clearly emerging; 1) Talent is Talent – Scout them early! 2) What the hell are they doing!

The debate has encouraged me to further the data collection process that was completed for Baby England, in order to create a more worldwide view. The results and discussion on the new data is presented after a brief detour. The debate on twitter touched on a few different areas of research and its worthwhile delving into some of the thinking that is generating this schism of thought:

Camp 1: Talent is Talent – Scout them early!

“Premier League Teams are scouting players early and they are making it to the England Senior Team, so its working… The reason its working is because talented players are talented from an early age its quiet simple. After 16 its only Smalling and Carrick – England are fine without them. Also… just look at Messi when he was 7”

Camp 1 Spokesperson, 2016

Evidence 1:

Evidence 2: See 1:14

“Case Closed!”

Camp 1 Spokesperson, 2

Camp 2: What the hell are they doing!

The backbone of Camp 2’s argument is that children develop emotionally, mentally, socially and athletically at very differing rates and recruiting players at a young age will let huge amounts of talent drift away. Essentially, ‘Selection Bias’ – we will use the concept of Relative Age Effect (RAE) to explain that Selection Bias is very evident professional football.

Relative Age Effect

The Relative Age Effect is a phenomenon that suggests that athletes at elite level are more likely to be born in the first 3 months after the eligibility cut-off date for a particular age group in sports. Credit

NB: Eligibility in the UK runs from August – July and elsewhere from January – December

Further Reading :
1. Football talent spotting: Are clubs getting it wrong with kids? – Alistair Magowan (BBC Sport)
2. Relative age effect in european professional football. Analysis by position – J.Salinero et al
3. The ‘Matthew Effect’ – Ross Tucker

Back to ‘Selection Bias’

Simon Gleave was fighting for Camp 2 and stated:

Based on the evidence found within RAE research, it is logical to think that Simon’s point does have a huge impact on the outcomes for the players selected and also for the players deselected. This point is at the core of Ross Tucker’s philosophy on Talent ID and is expressed excellently in this presentation and explanatory article.

New Research : World Edition

The aim is to look at when ‘successful’ players are signed up to academies in more countries that just England to see if there are any global patterns, evidence of best practice and anything to add to the discussions.

Sample Size

  1. Countries selected: England, France, Netherlands, Belgium, Italy, Spain, Portugal, Germany, Brazil and Argentina
  2. All players that have played 10 caps or more for the countries selected for the study

Data Integrity Acknowledgements

  1. Squad lists for the national teams were obtained from Wikipedia – there are a few casualties of this method (David De Gea and more)
  2. Each player’s youth team history was gleaned from Wikipedia – there will obviously be some mistakes and inaccuracies
  3. If a player’s youth history was difficult to ascertain or clarify the age on sign-up he was removed
  4. In some cases I had to decide if a non-top tier club would count as an ‘academy’, I will have made some mistakes in classification.
  5. I used ‘Peak Career Market Value’ from Transfermarkt as a proxy for quality. Some players are yet to reach their peak value.

Results : Overview Screenshot 2016-04-24 13.47.05

Results: Volume Distribution

AllR

Results : Per Country Volume Distribution

All Positions

  • There is obviously a lot of differences depending on the country with Germany looking like ‘The Sausage of Equality’, England visually differing from others.
  • Italy is the only country to have its widest distribution during/post adolescences

Results : Per Country Volume Distribution – by Position

Goalkeepers:
Goalkeepers
Defenders:
Defenders
Midfielders:
Midfielders
Forwards:
Forwards

  • England’s desire to find their forwards early is amazingly evident, especially when compared to Spain.
  • Germany’s ‘Sausage of Equality’ is once again a joy to behold!

Results: Are the Best Players Found Early?

All

  • There is a general linear relationship observed with the tightest confidence brackets at the age of 11.
  • There are some big names recruited after 16 – Sergio Busquets!
  • Angel Di Maria is the youngest player to be signed up to an academy at aged 4… the transfer fee was 35 footballs.

Results: Are the Best Players Found Early? – by Position

Goalkeepers:
Screenshot 2016-04-24 11.35.31
Defenders:
DEFS
Midfielders:
Mids
Forwards:
FWDs

  • Interestingly goalkeepers join forwards as the positions where more value will be found at a young age group
  • There is a slight trend towards finding better midfielders later in children’s development
  • I would prefer to pick my back 4 from a group that were signed at 10 or below… maybe personal preference.

Results: Are the Best Players Found Early? – by Country

Spain:
spaPortugal:
pot
Italy:
Italy
Holland:
HolGermany:
gerFrance:
fra.png
England:
eng
Brazil:
braBelgium:
Bel
Argentina:
Arg

  • To be honest, I am not sure what insights to draw from these charts… any suggestions?
  • Germany’s plot is interesting as its so wide and open, players of differing values joining at differing times.
  • Argentina’s plot potentially shows a football culture / infrastructure where the best young players can gain exposure to scouts… but I am clutching at straws.

Relative Age Effect on Show?

Screenshot 2016-04-24 14.54.22.png

  • RAE is very clearly visible and active within this sample size
  • Interestingly, the age of academy sign-up of 3rd and 4th quarter players is higher, suggesting that late-developers are spotted but scouts have a tendency to get drawn towards ‘older’ players at the younger ages.

Germany vrs England

Screenshot 2016-04-24 15.00.15.png

  • 1st plot: Interestingly Germany shows a RAE that is closer to the average of the sample size
  • 2nd plot: Although the age’s are comparable, this does suggest that Germany are better at identifying/developing ‘late bloomers’

The Race to the Bottom

Ross Tucker writes: (link)

The answer to that question should be clear by now.  The major sports teams have engaged in a progressive race to the bottom because of competition between themselves, and also between sports.  This latter battle cannot be ignored – Real Madrid can lose out to Barcelona when a potentially great football player chooses the Catalans, or they could lose out to rugby if that player decides to stay in Argentina to play for the Pumas.  Similarly, track and field loses to basketball, rowing to rugby, rugby to football, triathlon to swimming, and so on.

And so is created a competitive market where supply is very limited, but demand is enormous.  The cost doesn’t appear until much later, however, when the player reaches maturity and the ‘bet’ made by the team has actually come to maturity.  That is, Messi was a bargain at 8, he was getting costly by 15, and by 21, priceless.

Therefore, what the major sports have done is assess their desire for efficiency, and given how much money they have, realised that it doesn’t actually matter if they waste $100,000 on 100 players (cost of $10 million), because if the 101st player is Lionel Messi, then they’re way ahead of the game.  That’s not to say they don’t care at all – if they could find a Messi once every 50 rather than 100 players, then they save $5 million, but the cost of NOT identifying him is enormous.  That’s the opportunity cost I spoke of earlier, and to wealthy sports and teams, this is the driver, not expense.

Can we see this ‘Race to the Bottom’ from the sample size?

Screenshot 2016-04-24 15.28.19.png

  • There is a trend that clearly agrees with Ross Tucker and shows the Race for the Bottom
  • However, this may well be down to sample size issues and we have to take the graph with a pinch of salt.
  • That being said, the Race for the Bottom is something that most of the readers will agree that exists.

Conclusion:

  1. Elite talent can be found at a younger age, therefore clubs have to scout at a young age. Significant talent is snapped up by aged 8, therefore due to economic realities clubs will and should scout at these age groups.
  2. ‘Late-Developers’ or ‘Over-Looked Talent’ is a major market!! To name a few: Sergio Busquets, Daniele De Rossi, Paul Pogba, David Silva and many more. England are very poor in recruiting from and developing the late-developers. However, I feel it’s an easy argument to make against the clubs and the scouts. The majority of the burden should be carried by the English FA for not implementing structures elite development / coaching outside of the club system. Unlike in Germany, where there are expert coaches working with talents that were not selected by the professional clubs, in England those players can languish in their grassroots teams and continue to be coached by parents or volunteers. The DBF have taken the responsibility for the development of the ‘deselected’ and that is why we see Germany’s impressive ‘Sausage of Equality’.
  3. Camp 2 are definitely the winners! However, they must concede that it is worthwhile aggressively scouting and recruiting the best U8s. Yet this has to be balanced with acknowledgment of the RAE and a contribution to the development of ‘deselected’ talent.

Closing Take Home

I can’t find the tweet now, but Simon Gleave tweeted something along the lines of:

“The academy that takes advantage of the talent left behind from recruiting ‘old’ players will see huge benefits”

Simon – please feel free to correct me on the above…

 

STRASLE Graph : Striker’s Adrenaline-Stimulated Luck Evaluator

On the wet streets of Manchester there is a subdued mood, United seem rudderless whilst City have been limping over the line and taken recent solace in celebrating the anniversary of Sergio Agüero’s goal against QPR. Grabbing at straws through the mist both sets of fans have clenched onto two young strikers who’s goalscoring record-sheets have been helping to absorb the fans’ tears.

Marcus Rashford and Kelechi Iheanacho have been hitting the back on the net at such an impressive rate, its created a whirlwind of excitement fuelling projections of the pair being world-class strikers of the future.

“He’s a very good young player. I see some of myself in him for sure – he has courage and he’s fast and is very good with the ball. I think for the strikers they have to be hungry to score and I see that with him. He has an amazing future”

Ronaldo (BRA) on Rashford

No doubt, an exciting narrative but how excited should the fans be? Are their performances repeatable?  Expected Goals legend Michael Caley recently stirred the debate with a xG + xA / 90 graph that inspired this post.

The objective analysis of goal scorers has evolved and improved in recent years, with per90 metrics and expected goal models. Yet there also emerged some consensus that the output of goal scorers is best observed over a longer period of time, circa 18 months and 4,500+ minutes. When it comes to young strikers we don’t currently have the required sample size to reach more reliable projections of future goal scoring output. In the absence of reliable modelling there needs to be some data-stimulated debate within recruitment departments of clubs looking to invest in a young high-scoring striker.

Introducing… The STRASLE Graph (Striker’s Adrenaline-Stimulated Luck Evaluator)

The purpose of STRASLE is to help stimulate debate : Is a strikers goal scoring output sustainable and repeatable? The Actual-Expected xG -/+ and Conversion Rate is plotted for each young striker is plotted for all midfielders and forwards that have scored but played less that 1,200 minutes. The filtering does not remove older players, however STRASLE could be used to help assess goal scorers with a low number of minutes due to injury or deselection.

Strasle3

The Eye-Test

  1. If a club were weighing up transfers for Rashford or Iheanacho, they should proceed with caution due to both players having an unrepeatable conversion rate as well as a high Actual-Expected Goals difference.
  2. Daniel Sturridge shows excellent levels of performance despite his low minutes this season. Numbers which based on previous high performance in previous seasons would give a recruitment team confidence in their repeatability. This is especially due to the number of shots taken gives the conversion rate greater validity.
  3. Bertrand Traore, Iwobi and Origi locations would facilitate an interesting discussion.

Conclusion

Rashford and Iheanacho’s scoring output is certainly not sustainable! However, they both sit in realms of extreme conversion rates and positive xG difference. With more minutes there conversion rate will fall, their xG difference may move closer to average. The whirlwind of excitement will drop but left behind could be some amazing players.

It is therefore the job of the clubs and coaching staff to manage expectations and to keep the players working on continued improvements rather than getting swept up in the noise.

City seem to be doing this with Iheanacho…

“It was very important for Kelechi to demonstrate once again that he’s a very good player. He’s not just a striker – he provided two assists. I’m very happy for him. He has things to improve but he’s working in the correct way”

Manuel Pellegrini on Iheanacho

Notes:

  1. Thanks to Paul Riley for his amazing Premier League 2015/16 xG Map and Table as well as the data behind it.

KPI Density Plots V.2

After posting my KPI Density Plots (post containing explanation) yesterday, they got some positive response on twitter and importantly I got some good constructive feedback from Marek Kwiatkowski and Thom Lawrence. I quickly implemented their idea and it no doubt improved things – although Marek has tempted me into the difficult task for finding a different font… leaving that one for a rainy day… (font suggestions welcome!)

Within the comments section of the blog post I received this great idea from Boris Zlatopolsky, this is an idea I feel has some legs and will bash it out soon:

Like your use of distributions, gives the various metrics a context. However when comparing two (or a few) players within the same distribution, I feel you’re putting too much accent on the distribution. I wonder what a more understated (almost transparent) but larger (at least taller) distribution would look like, with the players compared shown as circles within the distribution. Then you can fit a few players even if they’re in the same area of the distribution. You can then also bring in the size of the circle, for example to show number of 90s. Perhaps Walcott does score at a similar rate to Mahrez but doing it consistently over more games is interesting.

In addition to these changes I also wanted to add more KPIs and section them into logical groupings: 1) Finishing 2) Creating 3) Defence. I feel these give a good overview of offensive players in a quick way and comparative to their competitors.

I therefore introduce KPI Density Plots V.2, with the assistance of three excellent attacking midfielders – Ozil, Payet and Coutinho;

MOzil FullDPayet FullPCout Full

KPI Density Charts – An R Experiment

Sliding through my Twitter feed a few months ago I saw the following visualisation from the Analtyics FC Gang (Tom Worville, Ben Torvaney, Sam Gregory and Bobby Gardiner) and it stuck a chord with me for the following reasons:

  1. Neymar is bloody good!
  2. Provided a easy and instinctive way of showing where players are placed in relation to their others across various KPIs.
  3. There are excellent opportunities to compare players side-by-side in a visually intuitive way.
  4. Offers a good initial way of exploring your data and attempting to pick up on patterns and narratives that could be drawn out of the data.

CWtWmqQWUAAfwFS

 

I have been learning R over the last few months and have been meaning to try and recreate the visualisation and ideally improve on it.  Well, I finally got around to it and wanted to share the process as my first post on this blog, share some of the outputs from my version and in a second post cobble together a mini-tutorial as I certainly learnt one or two things when I was coding it up.

I have used a OPTA data set that fell off the back of a truck… a little bit battered, it was easy to dust off and utilise. The data was at a player level of aggregation with totals and averages for the EPL 2015-16 season. There are some limitations to the data but it will do for this purpose or exploring an idea.

Design

Improvements

I felt I could tweak a few things to improve the graphs, namely:

  1. Focus the KPIs on more specific areas and create a few different types of graphs i.e. finishing or creation
  2. Add quartile ranges to the graphs to add even more context yet still easy to read
  3. Add a little bit of colour

Sample Size

  1. Midfielders and Forwards
  2. Players having played 750 minutes or more in the EPL 2015-16 Season
  3. Jamie Vardy has been removed due to racist tendencies

Graph 1. The Changing of the Guard 

Rooney v Kane.jpg

Graph 2. The New Kid on The Block and the Flop(?)

Walcott v Mahrez 

Other Profiles

There are many KPIs to use and more profiles to setup other than just finishing. I will get to these over the coming days and push them out along with the tutorial of how to create these in R. In the meantime, if you want to see the finishing profiles or any players let me know.