Introduction
For some reason, it crossed my mind to try to find and download the world cup squad details from the 2022 Cup, held in Qatar. I found the details easily on Wikipedia although every squad was contained in its own table,
The details are all here
Please note, the Regional analysis and the number of countries by club analyses that I have created and added here are only being made available to Blog readers and not included in the video that also accompanies this post.
Power Query
So, I asked Power Query to go to that web page and I downloaded all 32 separate tables: one table for each country's squad.
I have to Append the 32 tables to generate my all squads combined table and while that was not difficult, it took me time because of the connection to Wikipedia. I then created an index number in each of the 32 pages, before I combined them, so that I could use them to attach the Squad/country names to them.
My initial squad tables looked like this, the England table:
Note the Index number 5 column ... England was Wikipedia's table number five!
Attach the append as new so that all 32 tables retained their own identity.
By the way, initially, I saved all of the queries as connection only but that didn't work so well and I had to start the job again.
Table to Append
I had to find a way to attach the name of the squad to its table as there were separated. I did that by creating a table, as follows and merging that table with my now merged all squads combined table
Once I had put the above two tables together and made some other changes, my final table looks like this:
Analysis and Exploration
Once I had my table finished, these are the things I did by way of analysis and exploration:
- Histograms of Age, Caps and goals
- Correlation matrix:
- Graphs to support the matrix, here's just one of them:
I used this formula to find out how many players there were in each of the four positions that players were assigned to: =COUNTIFS(All_Countries[Pos],AA3#)
I created a unique list of all home clubs and evaluated the number of players on duty at the tournament: here are just the top 19 clubs, sorted A-Z:
I used this formula, =SORT(UNIQUE(YEAR(All_Countries[DOB]))), to evaluate the distribution of players according to their birth year:
Here is the table:
They provided us with estimates of the players' ages but I reworked that column, using the opening date of the tournament as the cut off for finding their true ages: =(AI1-All_Countries[DOB])/365.25
Histograms
And here are the histograms of their published ages, graph 1 and their actual ages, as per my calculations, graph 2: you can see that the age they provided in their table was nowhere near enough and I have included a histogram of the residuals, the differences between their estimate and my estimate.
Which Club: Which Players?
Since football is a family game and we usually adopt our team at or near birth, I am sure many people who come to this page and my file would want to be able to find out who were the players, if any, who represented their country and the club they support.
For example, I have followed and supported Burnley FC all of my life, ever since Primary School and I could see from the listings that two players from Burnley FC were present in Qatar for the tournament.
I have included a Data Validation cell and then a FILTER() formula that tells me who the players were who represented their club and their country: By the way, this output is on the same worksheet as the columns that show the count of players and the clubs they represented.
Barcelona had 17 players on duty in Qatar and I used an adjusted FILTER() formula to highlight the players and the country they represented:
Finally, I set out an evaluation of average ages by country:
I used the GROUPBY( function for that
=GROUPBY(All_Countries[[#All],[Country]],AH2:AH833,AVERAGE,3)
Region Analysis
This section is found here in this blog post only and is not in the video.
I thought it would be interesting to name the regions of the world from which the club and players came and analyse my findings. This is my table of the Regions I created: I should add that I am certain that you could suggest other region names, for example, Costa Rica could be in Central and South America by some standards but, for this analysis, I divided the Americas into North and South. Similarly wit the South East Asia region.
Here is my analysis: Players by Region
Clubs by Region
There we are but we have to note that these assignments only place players in the countries that they represent. Therefore if we assign by Club and Region, that is not the same as assigning them by the country they represent and/or were born in;
Countries by Club
The final thought that struck me as I got to this point in my analysis is that it could be interesting to find out how many countries each club represented. For example, Burnley FC had two players in their squads but they represented Wales and Morocco, two countries. What are the top ten clubs in terms of their representations at Qatar FIFA World Cup 2022?
Do note, in my analysis of the players by club, we can see that Tottenham Hotspur had 11 players at the tournament but they represented just 10 countries. Similarly, Barcelona: 17 players representing 8 countries. Here are the top ten most diverse clubs at the FIFA World Cup, 2022:
Conclusions
Overall, this analysis took time because of the speed of the connection to Wikipedia and then because I had to carry out a relatively large number of relatively small changes and formulations and so on.
You can Download my file here and you can watch my presentation of these materials on my YouTube channel: https://youtu.be/n3ed00HZmzQ
Duncan Williamson
11th April 2025 updated 12th April 2025
Regional and Countries by Club Analysis added 13th March 2025
No comments:
Post a Comment