Baad Stats

If you read my blog at all, you’ll know I bristle at the misuse of statistics in order to ‘prove’ dubious theories. I stumbled across another example on Twitter this afternoon. 


On the face of it, it a third-tier English side selling twice as many tickets as a first tier Scottish team doesn’t look good. But this is where regression analysis comes in. For a start,  Bradford has a higher population than Aberdeen (roughly twice the size.) England in general has ten times the population of Scotland, so direct comparisons like this aren’t helpful. 

Bradford have also introduced cut-price season tickets for under-11s – £5 for the entire season (46 league games.) 


Aberdeen are in “the big boy’s league”. And they’ll probably sell more season tickets than any other Scottish club, outwith the Old Firm. Leicester City and Villarreal only sell around 20,000 season tickets in their first tier leagues, and they’re able to compete with the big boys. 

None of that really matters though. The original tweet was meant at a dig at Aberdeen, and with 96 retweets and 242 likes, it certainly succeeded in its goal. 

Advertisements

Lies, Damned Lies, and the Big Count 2006.

In my previous blog, I wrote about the emergence of soccer data analysts focussing on the Scottish game, and how I wasn’t convinced a lot of the metrics they presented were accurate and disconcerted by how the methodology behind the metrics is less than transparent, proper methodology being key in producing accurate statistics. Since that blog Rangers and St Johnstone were both knocked out of the Europa League at the first possible hurdle, the sort of thing that always hurtles Scottish football into apocalyptic self-flagellation and also led to analysis going into overdrive with all sorts of information and graphs being posted online.

One such chart I saw on Twitter was the number of youth players in Scotland, compared to Denmark, Norway and the Republic of Ireland, nations of comparable size and geographically local. At first glance, the table was damning, with Scotland only having half as many youth players as the other three. It was implied that this paucity of young players was the reason Scotland were struggling in international competition.

Over the last few years, despite being keen on the use of statistics to aid informed decision, I’ve come to realise there’s a lot of misinformation and bias about, deliberate or otherwise. Whenever any source publishes a big stat such as the one above, regardless of whether it’s in the sphere of politics or football, my snidey senses start to tingle. Where’s the bullshit? I reflexively ask.

There are a few ways that statistics can be presented in a way that looks a bit iffy. One is cherry-picking: why have Denmark and Norway been selected as comparisons, and not England, Wales or Northern Ireland, with whom Scotland is more akin. What’s the source of the original data? Have variables and outliers been accounted for? What’s the baseline data? What’s a standard deviation? Are like items being compared?

The source of the data was shared later in a reply, and I had a quick look at it on Friday – on a first glance, it didn’t fill me with confidence. Firstly, the methodology for the data gathering was described as follows;

“The data for Big Count 2006 was collated in the first half of 2006 via the standard practice of a questionnaire as well as an online tool. The response rate was over 75%. FIFA used Big Count 2000, a UEFA survey from 2005 and other internal analyses to supplement missing data from associations and for plausibility purposes.”

A survey was sent out to the member associations to be completed, and if the information wasn’t returned, they would fill in the gaps from other sources.

The information for England, Scotland, Northern Ireland, and Wales was also interesting. Always a bit of an anomaly in World football due to being 4 sporting nations within 1 sovereign country, the FIFA report has applied the UK’s population to each of its constituent countries, which dramatically skews the players per population percentage for the smaller three countries. Finally, the data was from 11 years ago, which is useful in telling us how events 11 years ago are shaping today, but the fact no follow-ups seem to have been carried out means that we only have a snapshot in time, with no progress or decline to benchmark the figures against.

I started my own analysis by copying the data into a spreadsheet. This would allow me to calculate the figure for Youth players as a percentage of the each county’s respective population, allowing for a more like-for-like comparison.

Number 

Association 

Region 

Pop 

Youth 

% of Pop

1 

Faroe Islands 

Nordic

47246 

4040 

8.55% 

2 

Norway 

Nordic

4610820 

272958 

5.92% 

3 

Iceland 

Nordic

299388 

16000 

5.34% 

4 

Republic of Ireland 

UK & Ireland 

4062235 

174498 

4.30% 

5 

Liechtenstein 

Central 

33987 

1400 

4.12% 

6 

Sweden 

Nordic

9016596 

319599 

3.54% 

7 

Denmark 

Nordic

5450661 

188724 

3.46% 

8 

Slovakia 

Central 

5439448 

169561 

3.12% 

9 

Netherlands 

Western 

16491461 

510091 

3.09% 

10 

San Marino 

Southern 

29251 

823 

2.81% 

11 

Austria 

Central 

8192880 

221547 

2.70% 

12 

Germany 

Central 

82422299 

2081912 

2.53% 

13 

Luxembourg

Western 

474413 

11874 

2.50% 

14 

Czech Rep 

Central 

10235455 

208451 

2.04% 

15 

Belgium 

Western 

10379067 

208551 

2.01% 

16 

Finland 

Nordic

5231372 

101334 

1.94% 

17 

Andorra 

Southwestern 

71201 

1366 

1.92% 

18 

France 

Central 

60876136 

1034046 

1.70% 

19 

Switzerland

Central 

7523934 

127700 

1.70% 

20 

Croatia 

Southeastern 

4494749 

64495 

1.43% 

21 

Ukraine 

Eastern 

46710816 

658540 

1.41% 

22 

England 

UK & Ireland 

60609153 

820000 

1.35% 

23 

Scotland 

UK & Ireland 

5116900 

67123 

1.31% 

24 

Wales 

UK & Ireland 

2966000 

31200 

1.05%

25 

Spain 

Southwestern 

40397842 

419485 

1.04% 

26 

Slovenia 

Southeastern 

2010347 

20831 

1.04% 

27 

Italy 

Southern 

58133509 

557453 

0.96% 

28 

N Ireland 

UK & Ireland 

1742000 

16000 

0.92% 

29 

Serbia 

Southeastern 

9396411 

85412 

0.91% 

30 

Cyprus 

Eastern 

784301 

6644 

0.85%

31 

Greece 

Southeastern 

10688058 

86779 

0.81% 

32 

Malta 

Southern 

400214 

2773 

0.69% 

33 

Hungary 

Central 

9981334 

63744 

0.64% 

34 

Portugal 

Southwestern 

10605870 

64922 

0.61% 

35 

Bosnia-Herzogovina 

Southeastern 

4498976 

26570 

0.59% 

36 

Israel 

Eastern 

6352117

33883 

0.53% 

37 

Georgia 

Eastern 

4661473 

23990 

0.51% 

38 

Poland 

Central 

38536869 

185808 

0.48% 

39 

Albania 

Southeastern 

3581655 

14000 

0.39% 

40 

Estonia 

Northern 

1324333 

5042 

0.38% 

41 

Macedonia 

Southeastern 

2050554 

7760 

0.38% 

42 

Latvia 

Northern 

2274735 

6550

0.29% 

43 

Lithuania 

Northern 

3585906 

9764 

0.27% 

44 

Bulgaria 

Southeastern 

7385367 

17389 

0.24% 

45 

Romania 

Southeastern 

22303552 

48010 

0.22% 

46 

Turkey 

Eastern 

70413958 

131916 

0.19% 

47 

Belarus 

Eastern 

10293011 

18760 

0.18% 

48 

Azerbaijan 

Eastern 

7961619 

14120

0.18% 

49 

Russia 

Eastern 

142893540 

196170 

0.14% 

50 

Kazakhstan 

Eastern 

15233244 

20500 

0.13% 

51

Armenia 

Eastern 

2976372 

2915 

0.10% 

52 

Moldova 

Eastern 

4466706 

2603 

0.06% 

 

In 2006, Gibraltar and Andorra were not members of FIFA, and there was apparently no data available for Montenegro, so they’re not included in the above table.

Compared with the average of all nations across Europe, Scotland’s number of youth players is slightly under par (1.31% to an average of 1.61%,) but was healthier than that of Spain, Italy, and Portugal. Intriguingly, the top 7 nations for youth players in the above table consist of 5 of the 6 Nordic countries. Whether this is down to the Nordic countries having a more Sportacus approach to getting kids active, or a different definition of what ‘registered youth player’ is, I couldn’t say. Broadly the same countries all top the table for ‘Teams’ per head of population, but as there doesn’t seem to be any hard and fast definition of what a ‘Team’ actually is, we might do well to take the FIFA stats with a pinch of salt. The Faroe Islands apparently have a football team for every 155 people, which seems implausible to me.

The Republic of Ireland having nearly 3.5 times as many youth footballers as Scotland was another interesting data point, given that football on the Emerald Isle has to compete with rugby and Gaelic Athletic Association sports. Football does appear to be the most popular team sport in Ireland with a 2008 Sport Ireland report suggesting 9% of people play soccer. The four countries with the highest number of clubs per population are Ireland, Scotland, England, and Wales in that order with the first two having twice as many as the latter two. Ireland however apparently has three times as many teams as Scotland, as well as more youth players.

I looked for breakdowns of population by date, and using the CIA World Factbook for 2016 (yes, I know,) and the 2011 UK Census, I was able to generate numbers of how many people aged between 0 and 14 each country had. While not exactly aligning with the 0-18 age bracket of the dataset, I felt this would give a better grasp of what percentage of young people were registered as footballers in each country.

Interestingly, this suggested that the percentage of young people aged 0-14 in Ireland was 10% higher than in Scotland.

Association 

Population aged 0-14

Youth % of total Population

Republic of Ireland 

1065440 

26.23% 

N Ireland 

354703 

20.36% 

Wales 

519128 

17.50% 

Scotland 

855000 

16.71% 

England 

9372010 

15.46% 

 

So Ireland have a larger pool of younger players to get involved in football. But what about the remaining deficit?

Number 

Association 

Pop 

Other 

% of Pop 

1 

Germany 

82422299 

10000000 

12.13% 

2 

Scotland 

5116900 

302500 

5.91% 

3 

Italy 

58133509 

3207700 

5.52% 

4 

Croatia 

4494749 

232715

5.18% 

5 

Spain 

40397842 

1915000 

4.74% 

6 

Faroe Islands 

47246 

2000 

4.23% 

7 

Sweden 

9016596 

375000 

4.16% 

8 

England 

60609153 

2415200 

3.98% 

9 

Switzerland 

7523934 

240000 

3.19% 

10 

Austria 

8192880 

260000 

3.17% 

11 

Slovenia 

2010347 

55200 

2.75% 

12 

Georgia 

4661473

122400 

2.63% 

13 

Cyprus 

784301 

20200 

2.58% 

14 

Romania 

22303552 

556700 

2.50% 

15 

Republic of Ireland 

4062235 

98800 

2.43% 

16 

Norway 

4610820 

110000 

2.39% 

17 

Finland 

5231372 

120000 

2.29% 

18 

San Marino 

29251 

650 

2.22% 

19 

Hungary 

9981334 

203100 

2.03% 

20

France 

60876136 

1233100 

2.03% 

21 

Portugal 

10605870 

210000 

1.98% 

22 

Denmark 

5450661 

100000 

1.83% 

23 

Iceland 

299388 

5100 

1.70% 

24 

Netherlands 

16491461 

250000 

1.52% 

25 

Moldova 

4466706 

66150 

1.48% 

26 

Serbia 

9396411 

134500 

1.43% 

27 

Wales 

2966000 

41000 

1.38% 

28 

Greece 

10688058 

145400 

1.36% 

29 

Armenia 

2976372 

37900 

1.27% 

30 

Slovakia 

5439448 

68700 

1.26% 

31 

Belgium 

10379067 

128200 

1.24% 

32 

Bulgaria 

7385367 

90400 

1.22% 

33 

Israel 

6352117 

77000 

1.21% 

34 

Turkey 

70413958 

847000 

1.20% 

35 

Estonia 

1324333 

15700

1.19% 

36 

Poland 

38536869 

424300 

1.10% 

37 

Belarus 

10293011 

113000 

1.10% 

38 

Lithuania 

3585906 

38100 

1.06% 

39 

Azerbaijan 

7961619 

82700 

1.04% 

40 

Russia 

142893540 

1443800 

1.01% 

41 

Czech Rep 

10235455 

103100 

1.01% 

42 

Andorra 

71201 

700 

0.98% 

43 

Albania

3581655 

34000 

0.95% 

44 

Macedonia 

2050554 

19000 

0.93% 

45 

Liechtenstein 

33987 

310 

0.91% 

46 

Luxembourg 

474413 

4100 

0.86% 

47 

Bosnia-Herzogovina 

4498976 

36200 

0.80% 

48 

Latvia 

2274735 

17900 

0.79% 

49 

Malta 

400214 

3100 

0.77% 

50 

Ukraine 

46710816 

314700 

0.67%

51 

N Ireland 

1742000 

10500 

0.60% 

52 

Kazakhstan 

15233244 

79600 

0.52% 

 

There’s a column further along to the right in the FIFA dataset marked ‘Company or Army Teams, Schools and Universities, Street Football’. Scotland has the second highest % of population for this metric, behind Germany’s curiously round number. It might not be well known, but many club teams in Scotland don’t allow their young players to also play for their school team – could it be there is a high number of youth Scottish players included within this figure of 302,500? Without any clear guidance on methodology from FIFA, it would be down to the individual associations to interpret what data fell under which category. And colour me cynical, but I’m sceptical that roughly 16% of Irish children between 0-14 are registered footballers.

There’s one more oddity with this collection of data. When I added up the numbers of Professionals, Amateurs, Youth, Futsal, Beach Soccer, and ‘Other’, the totals didn’t match the total number of players in the third column. What’s more, they didn’t match in a very specific way, as you can see in the table below.

Association

Pop

Players

Check

Difference

Russia

142893540

5802536

2290536

3512000

Germany

82422299

16308946

16308946

0

Turkey

70413958

2748657

1044657

1704000

France

60876136

4190040

3028040

1162000

England

60609153

4164110

3901110

263000

Italy

58133509

4980296

4721296

259000

Ukraine

46710816

2273017

1007017

1266000

Spain

40397842

2834190

2568190

266000

Poland

38536869

2000264

1081264

919000

Romania

22303552

1034320

665320

369000

Netherlands

16491461

1745860

1388860

357000

Kazakhstan

15233244

510420

107420

403000

Greece

10688058

760621

504621

256000

Portugal

10605870

547734

342734

205000

Belgium

10379067

816583

571583

245000

Belarus

10293011

373810

138810

235000

Czech Rep

10235455

1040357

789357

251000

Hungary

9981334

527326

330326

197000

Serbia

9396411

441682

266682

175000

Sweden

9016596

1006939

927939

79000

Austria

8192880

967281

856281

111000

Azerbaijan

7961619

306370

102370

204000

Switzerland

7523934

571700

472700

99000

Bulgaria

7385367

327033

141033

186000

Israel

6352117

283866

120866

163000

Denmark

5450661

511333

401333

110000

Slovakia

5439448

622668

497668

125000

Finland

5231372

362649

250649

112000

Scotland

5116900

420589

413589

7000

Georgia

4661473

222186

149186

73000

Norway

4610820

543165

462165

81000

Bosnia-Herzogovina

4498976

200240

105240

95000

Croatia

4494749

362514

342514

20000

Moldova

4466706

168570

76570

92000

Republic of Ireland

4062235

421644

351644

70000

Lithuania

3585906

135874

53874

82000

Albania

3581655

164730

87730

77000

Armenia

2976372

151353

79353

72000

Wales

2966000

173550

108550

65000

Latvia

2274735

85285

26285

59000

Macedonia

2050554

93896

41896

52000

Slovenia

2010347

116925

85925

31000

N Ireland

1742000

92320

49320

43000

Estonia

1324333

57024

25024

32000

Cyprus

784301

52403

39403

13000

Luxembourg

474413

47580

36580

11000

Malta

400214

24853

13853

11000

Iceland

299388

32408

26608

5800

Andorra

71201

5037

3737

1300

Faroe Islands

47246

8094

7694

400

Liechtenstein

33987

3315

2515

800

San Marino

29251

2836

2236

600

 

Apart from Germany’s data, the number for every other nation are out by a considerable, and suspiciously round number. Why? Some of the individual data is rounded up, but what could be missing that inflate the total players by this amount? Margin of error? The percentages aren’t the same. Perhaps some formula was used that is unknown to me as an amateur statistician, but I find it all a bit odd.

I’m loathe to read too much into these statistics. Or, I’m loathe to take them on face value. That’s often the case with statistics, and carrying out regression analyses helps us better understand the data we have, and as a result, the subject we’re trying to research. Perhaps I’m naïve, but I would like the football statisticians out there to examine things in a little more depth than they currently do.

Sources:

http://resources.fifa.com/mm/document/fifafacts/bcoffsurv/statsumrepassoc_10342.pdf

http://www.fifa.com/mm/document/fifafacts/bcoffsurv/bigcount.statspackage_7024.pdf

http://www.sportireland.ie/Research/Ballpark_Figures_2008_/Ballpark_Figures.pdf

https://www.cia.gov/library/publications/the-world-factbook/

http://www.ons.gov.uk/ons/rel/census/2011-census/population-and-household-estimates-for-the-united-kingdom/rft-table-3-census-2011.xls

 

 

Frightening Reverse to a Club Ranked 4th in Luxembourg

There’s a tendency in our hyperbolic, disposable culture where anyone can amplify their brain farts to the world using social media, to hype everything as the ‘best’ or ‘worst’ ever, when it’s probably not the best or worst that week. That said, it’s difficult not to reflect on Rangers’ 2-0 defeat to Progrès Niederkorn as the club’s worst ever. A part-time team, from Luxembourg, knocking out one of the biggest clubs in the U.K., who’d spent millions in player recruitment over the summer? Inconceivable!

And yet it happened. It’s hard to actually think of a worse result in the club’s history. Some people consider the league defeat to Annan in 2013 to be the worst, but that was more-or-less a one-off at the end of a horribly gruelling season, physically and mentally, when the team were already 20-odd points clear at the top of the league. Failing to progress (you just can’t avoid that pun) past the Europa League first qualifying round at the hands of a part-team team from Luxembourg? That’s pretty bad, particularly when you consider that getting into the next round would have at least seen the club net over a million pounds in ticket sales for the home leg (not including expenses.)

I’m angrier about this Rangers result than I have been for years. I think I felt that this was the club’s chance to make a proper fresh start, and to try and close the gap on Celtic. But no. This Wednesday afternoon Rangers have made themselves, and Scottish football a laughing stock.

So where did things go wrong? I blogged last week about how Rangers’ defence had been suspect for the last season or so, but in all honesty the attack hasn’t been any better. While many of us bought into Mark Warburton’s concept of attacking, pressing, fluid football, the fact is that by February 2017 his Rangers team had all the urgency and drive of a Britpop band in 1998. Match after match would see the midfield languorously stroking the ball along the edge of the opposition penalty area, but no-one seemed to have any idea of how to convert possession into goals.

The formation didn’t help. Warburton’s 4-3-3 was often more of a 2-1-2-5-0, with Wallace and McKay playing wide left, Tavernier and Miller drifting all over the place, and generally nobody in an actual centre-forward position. I have my issues with James Tavernier, and I’ve made clear my scepticism about his alleged high xA scores, but there’s no denying he occasionally puts a fantastic cross in but when he does there never seem to be any Rangers players in the box. Or, as happened last night, the other full-back puts a cross in and it’s Tavernier and the attacking midfielder getting in each other’s way trying to head the ball in, and still no actual attacking players in the box.

Caixinha’s side, even with his signings, seem to suffer from the same existential malaise regarding scoring goals that Warburton’s did. He preferred a 4-2-3-1 formation in both legs against Progres, but they rapidly descended into a globular amorphous mess again. But it’s not just the formation, the style of play is chronic as well. Attacks are built up slowly, with pretty but inconsequential passing about in midfield (the one exception in the home leg against Progres, the quick free kick, we scored from.) None of the players seem to know where their team-mates are supposed to be at any time. I never know where Tavernier and Miller (I love Kenny, but he needs to pick a position,) are supposed to be playing at any time. We break like glue being poured over quicksand. ‘Transition’ is one of the new football buzzwords, changing defence into attack. We counter-attack so slowly that we let the opposition filter back into their defensive shape and I honestly think that’s by design on the part of both Warburton and Caixinha. The Blizzard, the football magazine, often make jokes about editor Jonathan Wilson once commenting that ‘goals are overrated’ in modern, tactical football. He meant that midfield dominance was more important than pumping six past the opposition. But contemporary football has forgotten how to score at all now. I’m reminded of a Chris Brookmyre’s novel One Fine Day in the Middle of the Night, where he talks about the ‘Bullet Deadliness Quotient’ of cinema.

“An action film establishes its own rules of gunplay. In some, every bullet is potentially lethal — even the old shot to the shoulder can look worryingly near to the upper-chest area. But in others, machine guns can seem the least deadly weapon known to man. To illustrate, at one end of the spectrum there’s your Tarantino movies: reputations aside, there’s not that much gunplay, so when somebody lets off a shot, it’s for real, and it’s usually fatal. High bullet-deadliness quotient. At the other end, there’s your John Woo movies: zillions of rounds goin’ off an’ the only thing they ever hit is glass. Low bullet-deadliness quotient. In a high BDQ film, if the baddie draws a bead on somebody, get ready for ketchup. In a low BDQ film, that’s just a bad day for the janitor. And both types are fine by me, as long as the rules are followed consistently.”

Modern football has a High Goal Deadliness Quotient. Winning the game isn’t the thing now. It’s win the midfield battle, then score the only goal of the game. And for Rangers, winning the midfield battle has become so all-consuming, that the thought of scoring a goal has become a massive Herculean undertaking. No-one seems able to handle the responsibility, save Kenny Miller, occasionally. The tie against Progres should have been put to bed in the first leg at Ibrox, but because we’re playing this odd ‘false 9’ system, scoring goals has become an afterthought. A plan B, if you will.

Absurdly, Rangers now have a month until their next competitive game, against Motherwell in the league. Our pre-season starts here, after we’ve already been knocked out of Europe. Pedro Caixinha has opportunity to work with his squad (at least eight, potentially more, new players will need time to bed in, plus Rossiter and Kranjcar are almost like new signings,) on tactics, systems, shapes, etc., to ensure that when we start the league season, we hit the ground running. Oddly enough, the twitter account Football Cliches posted last night about German Bundesliga teams running up huge scores against regional amateur teams in their pre-season preparations. I’m sure Rangers used to do this in the 90s. I think it’d be useful to have a couple of bounce matches against amateur teams and give them a right doing. Well, attempt to anyway. It would do the team’s confidence a world of good.

I miss goals. They definitely weren’t overrated.

Soccermetrics and Scottish Football

Sabermetrics and the Oakland A’s

 

Sabermetrics has a lot to answer for. Perhaps chiefly the question “what on earth are sabermetrics?”

The term is an acronym originating from the amateur baseball statistic organisation the Society for American Baseball Research (SABR). Set up in the 1970s by Bob Davids, its ethos is to analyse baseball by poring over statistics outwith the traditional ‘box scores’, to try and discern if certain patterns were apparent.

Box scores refer to a certain set of numbers in baseball that record all the various elements of the game – number of times a player got up to bat, how many individual runs they scored, how many home runs, etc. It’s quite a fearsome looking set of data, but more importantly, box scores are used to generate some of baseball’s most beloved and widely used statistics, such as batting average and runs batted in. For most people reading this in the UK that aren’t cricket aficionados, these terms will probably make very little sense. The first refers to hits divided by at bats, or roughly speaking, how many times a player hit the ball divided how many times he swung at it. In modern baseball, this tends to be about one in three, which is notated as 0.300, for instance.

Runs batted in, on the other hand, refers to the number of plays that a batter was involved in that resulted in a run for his team, roughly analogous to an assist in football. Roughly. The final component of the three major baseball statistics is home runs, the number of times a player hit the ball so far, he was able to run around all four bases and back home before the defence were able to catch him out.

The SABR were dissatisfied with baseball’s reliance on these statistics and scores. They believed that they didn’t truly represent the complexity of the game and started looking for other metrics that would help them understand what was going on.

Fast forward thirty or so years to 1999, and the Oakland Athletics baseball team general manager Billy Beane hires Paul DePodesta, a young Harvard Graduate and sabermetrics nerd. The two had similar misgivings about the traditional baseball scouting methods, which on Beane’s part was down to him retrospectively analysing his own relative failure of a career, and realising that this decline could have been identified as a teenager if statistics had been used.

Baseball scouting has traditionally been mostly a subjective affair. Prospective talents were assessed on their possession of the 5 tools; running, throwing, fielding, catching, and hitting, as well as some other tangentially related criteria. Beane possessed all five tools in prodigious amounts, and signed with the New York Mets in 1980, but his big league career wasn’t all that it promised. He eventually scored 3 career home runs, and had a batting average of .219, a number that is generally considered poor. By comparison, Darryl Strawberry, who was the Met’s contemporaneous first round draft pick went on to hit 335 home runs, with a batting average of .259. More notably, Strawberry was successful enough to have appeared in a baseball-themed episode of the Simpsons in 1992, by which time Beane’s Major League career was already over.

Prior to the 2002 Major League season, Bean and DePodesta hatched a plan to recruit a number of players to help them challenge for the championship and potentially the World Series with the big-budget teams in the league. DePodesta’s theory was two baseball statistics correlated strongly with winning more than any other – on-base percentage (OBP) and slugging percentage (SP). As these weren’t traditionally highly regarded statistics, impoverished baseball teams could look for value by signing players that had high OBP or SP, but not necessarily high batting averages or RBI numbers.

Perhaps the key thread of the Moneyball story (obscuring some of the nuanced elements) is the signing of Scott Hatteberg, a catcher by trade until he ruptured a nerve in his throwing arm. Surgery didn’t entirely resolve the problem, and in the summer of 2002 he was a free agent, looking at the prospect of having to retire…until Beane offered him a contract as a first baseman. According to baseball wisdom, re-training to from one position to the other, at the age of 32, is very difficult. But the A’s were happy to write this off in order to get Hatteberg’s OBP stats in their locker.

On-base percentage, in short, refers to how often a batter gets on base. It’s a different beast to batting average and runs batted in, because it factors in ‘balls’. Even for those people that aren’t remotely interested in baseball, the phrase ‘3 strikes and you’re out’ will probably be in their idiolect. This refers to the batter having three attempts to hit a ball thrown by the pitcher – if he or she fails, then they are out. However, it’s perhaps lesser known that the pitcher gets four attempts to throw a valid pitch. If he or she doesn’t manage that, then the batter gets to ‘walk’ to first base (and by extension, any of his team-mates on 1st, 2nd, or 3rd bases, get to move on. And if the player on third advances one base, then that results in a run.

Despite his injury (which he still has issues with 15 years later,) Hatteberg could still hit. One of the memorable moments of Moneyball, both the book and the film, is when Hatteberg scores a home-run in the last inning of a game to secure a 12-11 win for the A’s, and a record 20th win in a row, after they’d let a 11 run lead slip.

While U.S. sports (and cricket for that matter) have a highly-developed relationship with statistics, the same can’t be said for football. That’s perhaps because football has a comparatively simpler scoring method, and isn’t as easy to compartmentalise as say baseball or American football. Nevertheless, in the last couple of decades, statistics have started to play a more important role in analysis of the game, perhaps due to the emergence of sabermetrics, fantasy football, and more developed betting markets. Many teams now have in-house statisticians that analyse various different metrics to see where they can eke out that vital additional percentage point of performance. And it’s not just the clubs – fans themselves have enthusiastically adopted various statistical methodologies to try and pinpoint where teams are doing well and where they’re performing badly.

Soccermetrics and the Magic Bullet

 

When I first started taking an interest in football in the last 80s, the only data that statisticians concerned themselves with recording were goals scored and attendances. The first I remember hearing of ‘assists’ was in the early 90s when Fantasy Football took off. Suddenly, goals alone wasn’t enough of a metric to make the game interesting enough for punters. Smartphones and broadband ushered in the era of online betting, and the resultant boom of new markets needed new data – number of corners, time of first throw-in…suddenly you could bet on anything, and in order to maintain the market, the bookmakers had to send bodies to games and record this data.

The problem with football, unlike baseball, and cricket, and American football, and basketball and ice hockey, is that it’s a very different sport. It’s almost unique in its gameplay. It’s not predicated on set plays, like baseball, cricket and US football. It has a bigger playing area than basketball. There are more players than ice hockey. Finally, while football has some rigid laws (set number of subs, offside, handball), it’s incredibly fluid in terms of gameplay. You have a reasonably large goal at one end of the pitch, and you can use any part of your body apart from your hands to get the ball into the goal. While the process of retraining Scott Hatteberg from catcher to first baseman was long and painful and gave everyone involved kittens, a footballer can play, and be expected to be more than competent, in three different positions in as many minutes.

And on top of that, there are a multitude of combinations that teams, and pods of players can use to create goal scoring chances – throw-ins, corners, free kicks, 1-2s, solo runs, crosses, incisive passes. Combined with interchangeability of the outfield players from one position to another, the sport is incredibly complex. And that’s why I’m slightly suspicious of attempts to break it down into one number, be that Expected Goals (xA), Expected Assists (xA), or some other metric. Football fans, I think, have a bit of a thing for the ‘Magic Bullet’ – one factor, or more commonly one player, that if removed from the team, will make everything perfect and guarantee success. Otherwise known as the scapegoat, this was the role Andy Halliday ended up filling in the second half of the 2016/17 season.

It’s intriguing to see the number of blogs and Twitter accounts that have emerged over the last year or so, providing statistical analysis of the Scottish game, using some fairly complex methodologies that require copies amounts of data to function. I’m not sure where this data comes from, so I’m prone to take their findings with a pinch of salt, if only because I can’t see their working. I’m also slightly wary about the trend to break football down into numbers, because I’m not sure the nature of the game lends itself to that.

Baseball, and American football for example, are very compartmentalised sports, with clear delineation between offensive and defensive plays. In both sports, one team gets a chance to attack while the other defends – once their allotted turn is over, they switch sides. In gridiron, there’s at least an opportunity for an interception to allow the defending team to score points. Further to that, in baseball attack and defence comes down to one pitcher versus one batter. Their team-mates can’t help much; it’s all down to them. Soccer on the other hand is very organic, moving from defensive to attacking to not much happening phases within moments of one another. Defenders score, attackers tackle. It’s telling then that many of the stats now used by football analysts are imported from ice-hockey, which is closer in nature to football. But hockey’s still a different game, with half as many players, smaller goals, and a smaller pitch. So how effective are these metrics?

I’m also a bit wary about where the raw data comes from. I’ve been interested in football statistics for the last ten years, and I maintain my own records of Rangers and Scotland games, with all the data I can find. The trouble is, within Scottish football, comprehensive data isn’t always available. As far as I know, Opta, one of the world’s leading football data provider, doesn’t cover the Scottish game. The BBC provides attempts at goal, fouls, and possession as part of their match report, but not much else. One blog, The Backpass Rule, uses data from Stratabet, which is a source I may have to investigate in more detail.

Expected Goals and Expected Assists

 

I’d been struggling with some of my own statistical models recently when I saw a post on Twitter by a Scottish football analyst suggesting that Kyle Lafferty would be a decent signing for Hearts because he has a consistently good Goals Above Replacement (GAR) stat, despite the fact he’s rarely played or scored over the five years. GAR appears to have come from ice hockey via baseball, and purports to show many more or fewer goals a player is scoring than a replacement you could procure for less money. If they have a negative GAR score, then they’re not doing very well at all. That’s all well and good, but it’s not immediately clear how the GAR baseline is calculated, or if it’s consistent from one statistician to the next. That’s the same reason why I have issues with the increasing use of Expected Goals. Each statistician keeps their source data and calculation methods private, because they’re trying to eke out a living in the world of sports data. That’s fair enough, but it means that we can’t see how the baseline data was created, or how new analyses are mapped to the baseline. I’ll give you an example.

Statistician A creates an xG model using the goals scored in the English Premier League, from 2012 to 2017. He includes where the shot was taken from on the pitch. Statistician B includes the geographical location of the shot, but also factors in whether the attempt was a shot or a header. Statistician C includes both of the above parameters, then factors in if the team taking the shot were winning, losing or drawing at the time. All three of these models will generate different results, particularly when mapping new shots at goal against the baseline data. Will they match exactly? There’s a fair amount of subjectivity when it comes to these sort of metrics, and that’s why I think I’m a little sceptical of them, particularly when you can’t see the source data. The blog 11Tegen11 explains this well.

xG is the big thing in football analytics right now. I like the general concept of it, but I feel there are limitations to it. Firstly, there’s the source data – how big is the sample size, how was the regression analysis carried out? Secondly, is the analyst matching the chances to the data fairly? I sometimes think the way the resultant data is presented can be misleading, due to axis contraction, or lack of labels etc. One of the most common ways to present xG data is in accumulative units, either as a running total over the season, or presented as a per 90 minute metric. While I think xG can be useful to analyse a team’s shooting performance, I think it would be more helpful if the number of shots attempted were factored in.

One of my issues with xG is when it’s used as a running total. For example, Player A has 5 attempts at goal. His first four are decent chances (xG of 0.255 each,) but he only scores with his 5th, a thunderbolt from 30 yards out (xG of 0.05.) His total xG is 1.07, and his actual goals are 1. (That’s not unusual in football – against Switzerland in Euro 96, Ally McCoist had four attempts at goal. Three were from the edge of the six yard box, and two of them were on target. The actual attempt he scored from was from 22 yards out, and would have a far lower xG value.)

Player B also has 5 attempts at goal. But his first six are speculative efforts from distance, with an xG of 0.06 each. He then scores a tap in with an xG value of 0.5 in the last minute – total xG 0.086, actual goals 1. In this situation, Player B looks to be more efficient, but actually took more shots. While I realise that the point of xG is to show the respective quality of attempts, I’m not sure how we get to see that from the use of cumulative xG total over the course of a season.

Expected Goals versus Actual Goals

 

The table below shows the goals per shot stats for all Rangers players that had an attempt at goal last season. The column labelled TGPS represents how much over or under the team goal per shot figure for 2016/17 (0.104) each player was.

Season 

Shots For 

Goals 

Shots Against 

GA 

GFPS 

GAPS 

2016/17 

537 

56 

381 

44 

0.104 

0.115 

2015/16 

608 

88 

230 

34 

0.145 

0.148 

2014/15 

507 

69 

280 

39 

0.136 

0.139 

Table 1: (For the last three seasons at least, Rangers have had a lower goals per shot ratio than their opponents, over the course of a season. I’d be interested to see xG used to try and work out why this is the case.)

Player 

Shots 

Goals  

GPS 

TGPS 

Waghorn 

71 

7 

0.099 

-0.005 

Miller 

63 

11 

0.175 

0.071 

Garner 

62 

7 

0.113 

0.009 

Tavernier

45 

1 

0.022 

-0.082 

McKay 

39 

5 

0.128 

0.024 

Holt 

30 

0 


 


 

Forrester 

29 

3 

0.103 

-0.001 

Windass 

25 

0 


 


 

Halliday 

23 

3 

0.130 

0.026 

Wallace 

23 

3 

0.130 

0.026 

Toral 

23 

2 

0.087 

-0.017 

Hyndman 

20 

4 

0.200 

0.096 

Hill 

15 

3 

0.200 

0.096 

Dodoo 

13 

3 

0.231 

0.127 

Kranjcar 

11 

1 

0.091 

-0.013 

Wilson 

10 

0 


 


 

Kiernan 

10 

1 

0.100 

-0.004 

Barton 

6 

0 


 


 

Hodson 

4 

1 

0.250 

0.146 

Senderos 

3 

0 


 


 

Beerman 

3 

0 


 


 

O’Halloran 

3 

0 


 


 

Rossiter 

2 

0 


 


 

Barjonas 

1 

0 


 


 

Bates 

1 

0 


 


 

Table 2: Rangers players’ goals per shot, and above/below team average goals per shot.

As you can see, some players had a pretty bad goals to shot ratio, while still being around the GPS average (Tavernier, Forrester, Toral, and Waghorn being the main culprits.) Some were a lot better (Dodoo, Hodson, Hill, Hyndman, and Miller.) Here’s the thing though – some players shoot a lot. Martyn Waghorn for instance ranked 1st in terms of shots attempted, despite being only 11th in terms of minutes played. His goals per shot ratio wasn’t far off Scott Brown’s. Which is bad. Likewise, Harry Forrester was 17th for minutes played, but 7th in terms of shots attempted. In turn, you could reasonably expect Tavernier, Forrester, and Waghorn to have relatively large xG scores.

In the table below, I’ve projected an xG figure for each player based on Rangers average GPS figure of 0.104) for the season. It’s not exactly how an xG model works, but it shouldn’t be far off.

Player 

Open Play Shots 

Open Play Goals 

Projected Total xG 

GPS 

Harry Forrester

29 

3.016 

0.103 

Joe Garner 

62 

6.448 

0.113 

Kenny Miller 

63 

11 

6.552 

0.175 

James Tavernier 

38 

3.952 

0.000 

Martyn Waghorn 

69 

7.176 

0.072 

Table 3: Open play shots, goals, goals per shot, and projected xG for key attackers in 2016/17

So, if there’s a correlation between xG and actual goals, then looking at the above Tavernier and Waghorn were attempting a lot of poor quality shots, while Forrester, Garner, and Miller were a little more judicious with their efforts. The point is that xG is only a useful statistic when number of shots taken and actual goals scored are factored in, otherwise it doesn’t necessarily show us the whole picture. A player might just be attempting lots of shots. And then there’s still the question about subjectivity on the part of the person compiling the stats, especially when it’s one individual who’s doing it all by themselves in their spare time. Errors and bias can creep into the reporting. Even the BBC aren’t immune to transcription errors in their match reports.

Expected Assists is an even more convoluted statistic, by its very nature. At least with xG, the end result will either be a goal or not a goal. Trying to work out if a pass results in a shot or not adds another layer of complexity to the equation. You have to factor in things like the start and end points of the pass, its velocity, the position of the recipient, the location of any defenders, and so on. I say these are factors, but I haven’t actually found any sources that reveal what their xA models are built from. We see the same issues with the presentation of xA as we do with xG – obfuscation, strange labelling, and cumulative totals. How many passes were attempted to reach the xA total? How does the xA stand up against the actual assist total?

Interestingly, for xG, if a player’s actual goal tally is lower than his cumulative xG, it’s considered to be due to his own poor finishing. However, from what I’ve seen of xA, low actual to xA ratios seem to be considered the fault of the recipient of the pass. This is interesting, but it remains to be seen if it’s actually true or not – I’ve yet to find a precis of a xA model, or xA data that I can scrutinise. Does xA factor in overhit crosses, or those that are too close to the goalkeeper? Many xA models credit James Tavernier with large xA scores, but he had a relatively low number of actual assists – more on that later.

The Myth of Billy Beane

 

As the years have progressed, despite my love of both the original Moneyball book and its film jadaptation, doubt began to prey on my mind, mainly surrounding Scott Hatteberg’s role in the story. Were his on base percentages so crucial to the A’s that they would risk playing him at first base? How many runs did he actually help deliver? In the 2002 season, Hatteberg’s RBI was 61, of 800 runs scored in total, or 7%. In contrast, top scorer Miguel Tejada’s RBI was 131, or 16%. (Tejada would go on to win the divisional MVP award.) Baseball teams do tend to play a lot of games in comparison to football teams – the 2016 World Series Champions, the Chicago Cubs, played 178 games on their way to glory. So they do need a lot more players on their roster.

There are a few articles, thinkpieces, and books that debunk Michael Lewis’ book. https://www.si.com/more-sports/2011/09/22/moneyball-impact, https://www.theatlantic.com/entertainment/archive/2011/09/the-many-problems-with-moneyball/245769/, https://www.amazon.com/Beauty-Short-Hops-Circumstance-Moneyball/dp/0786462884 are 3 for a start. The 2002 A’s were something of a (literal) Hollywood story, and it’s therefore no surprise to learn that it potentially contains a few liberties and errors. One of the biggest appears to be the notion that Jeremy Giambi’s on-base percentage would replace that of his elder brother Jason…but Jeremy had played most of the A’s games in 2001. So how could he replace anyone?

Hirsch and Hirsch put much of the A’s success down to the fortitude of their defence, rather than any additional runs earned by Beane’s sabermetric driven trading, pointing out that while the A’s had the 9th best batting record in the league in terms of runs scored per game, they had the sixth best defence. Cliff Corcoran observes: “It (Moneyball) isn’t about trying to turn water into wine by pretending Scott Hatteberg can replace Jason Giambi. It is about wringing the maximum number of wins out of each additional dollar by identifying value where other teams have yet to detect it. It’s about marginal wins, the ones that separate a playoff team from a runner up. It’s about getting over the hump.”

The concept of Moneyball would spread throughout the rest of the MLB, and eventually make a foothold in soccer, with clubs throughout Europe adopting statistics to try and target key players that might otherwise be overlooked. Billy Beane himself has advised the San Jose Earthquakes and AZ Alkmaar, and famously former Rangers manager Mark Warburton left his previous club Brentford because they were implementing a Moneyball type approach to scouting that he didn’t agree with. Despite the growth of analytics in football over the last decade, they’re still treated with suspicion by some in the game, much like the baseball traditionalists didn’t agree with what Billy Beane was doing. Moneyball probably represents the battleground between traditional scouting and post-match analysis, and the brave new world of metrics. But does there have to be a winner and a loser? What if there’s some middle ground between the two?

The Battle of James Tavernier

 

There are few players in the Rangers squad that divide opinion quite like James Tavernier. For every fan that thinks his lack of defensive nous mean we should cut our losses, another thinks he’s our most important player. A number of soccermetric sites have published stats that purport to show just how vital a player he is, but I’m not convinced.

Nailing my colours firmly to the mast, I not only disagree that he’s one of our best player, I think he’s actively one of our least effective. And here’s why.

Firstly, he can’t defend very well. Not many modern full backs can, in fairness, but Tavernier takes it to new levels. In his defence (pardon the pun,) I don’t think right-back is his best position and I’m not entirely sure how he’s ended up there. His lack of concentration, awareness, tackling, and marking have cost Rangers a number of goals in the last couple of seasons. His fans will tell you that he’ll do better with better players alongside him, and covering for him, and that his offensive stats outweigh his defensive errors, and will refer you to xA data to back this up.

He does in fairness, play as a de facto attacker, which perhaps excuses his defensive frailties. The problem is that his attacking stats aren’t great – 1 goals and 6 assists (as per my methodology) in 35 games were among the worst of all Rangers’ regular starters. And if you defend his attacking stats by pointing out he’s a full-back, then surely it’s fair to expect his defending to be better?

Let’s look at his stats from the 2016-17 season.

Minutes per goal

Player 

Minutes 

Goals

MPG 

Dodoo 

611 

3 

203.67 

Forrester 

728 

3 

242.67 

Miller 

2767 

11 

251.55 

Hyndman 

1027 

4 

256.75 

Waghorn 

1885 

7 

269.29 

Garner 

1889 

7 

269.86 

Kranjcar 

392 

1 

392.00 

McKay 

2402 

5 

480.40 

Toral 

961 

2 

480.50 

Hill 

1979 

3 

659.67 

Halliday 

2241 

3 

747.00 

Wallace

2386 

3 

795.33 

Hodson 

808 

1 

808.00 

Kiernan 

2130 

1 

2130.00 

Tavernier 

3145 

1 

3145.00 

Table 4: Minutes per goal 2016/17

Out of all the 15 players that scored goals for Rangers last season, Tavernier scored the fewest per minutes played.

Minutes per assist

Player 

Minutes 

Assists 

MPA 

Windass 

1251 

5 

250.20 

Waghorn 

1885 

7 

269.29 

Dodoo 

611 

2 

305.50 

Toral 

961 

3 

320.33 

Wallace 

2386 

7 

340.86 

Hyndman 

1027 

3 

342.33 

Miller 

2767 

8 

345.88 

Forrester 

728 

2 

364.00 

Tavernier 

3145 

7 

449.29

McKay 

2402 

4 

600.50 

O’Halloran 

626 

1 

626.00 

Garner 

1889 

3 

629.67 

Holt 

2491 

2 

1245.50 

Hill 

1979 

1 

1979.00 

Foderingham 

3330 

1 

3330.00 

Table 5: Minutes per assist 2016/17

In terms of minutes per assist, he ranked 9th.

Minutes per second assist

Player 

Minutes 

2nd Ass 

MP2A 

Kranjcar 

392 

3 

130.67 

Wilson A 

180 

1 

180.00 

Hyndman 

1027 

3 

342.33 

Miller 

2767 

7 

395.29 

McKay 

2402 

5 

480.40 

Toral 

961 

2 

480.50 

Holt 

2491 

5 

498.20 

O’Halloran 

626 

1 

626.00 

Waghorn 

1885 

3 

628.33 

Tavernier

3145 

5 

629.00 

Halliday 

2241 

3 

747.00 

Wallace 

2386 

3 

795.33 

Windass 

1251 

1 

1251.00 

Wilson 

1762 

1 

1762.00 

Hill 

1979 

1 

1979.00 

Table 6: Minutes per second assist 2016/17

And for second assists (where he set up the assist for a goal,) he was 10th.

The season before was a different story. Tavernier racked up 29 combined goals and assists in 36 starts, but of Warburton’s core of key players, he seemed among the least able to reproduce his Championship form in the Premiership. The table below sets out the contributions (goals + assists) per minute played for those players that appeared and contributed a goal in 15/16 and 16/17.

Table 7: Season on Season contributions per minute difference

Several statistic blogs have published figures showing that Tavernier has a high xA score, among other stats. As mentioned above, the workings of xA models remain quite opaque, so it’s hard to tell how assist quality is actually measured.

In further defence of Tavernier, many will point you in the direction of the SPFLRadar, a stats blog that specialises in using radar charts to benchmark SPFL Premiership players against others in their position. In April they generated a radar chart for Tavernier, comparing his stats at the time to those of every other full-back in the league. And the Englishman did come away looking good, being near the top of most of the 11 criteria. But again, I have a couple of issues. Firstly, I’m not sure where the account gets their data from – I did ask, but the person that runs the account said he wasn’t at liberty to tell me. That’s fair enough. But I’m too cynical to buy into data I can’t validate myself.

Secondly, I’m not a huge fan of radar charts in general. Aside from the use of truncated axes, I don’t think they’re good at displaying more than, say, three data points – i.e., minimum, maximum, and the x value. When you’re using radar charts to compare one player’s stats against 12-20 other players, we don’t get a feeling for outliers, or where the mean is. Let me illustrate.

Table 8: Example player radar chart

Here’s a radar chart for Player X’s stats, comparing him against all other players in his position in the league. You might look at it and think “wow, he’s really good at running, shooting, tackling and passing, but his heading’s awful. He’s much better than Player Y and Player Z.”

Here’s the same data, but with non-truncated axes, and showing you the other data ranges.

Table 9: Example player radar chart without truncated axes

Presented in this format, we can see that Player X is only slightly better than Player Y. Even Player Z, who scored the 2nd lowest in almost every category, looks average instead of appalling. Even adding an ‘average’ figure to a radar chart will help the viewer better ascertain if a player’s stats are really good, or simply better than most.

The other thing about using relative values rather than absolute is that get a slightly skewed perspective on the actual data. For instance, Tavernier scored highly on attempted crosses per 90, with 5.7. But his cross completion was only 26%, so he was only completing one cross per match. That might explain why his assist stats were relatively low compared to other right backs in the division – 0.15 per 90 versus 0.24 per 90, according to SPFLRadar.

It’s interesting to compare Wallace and Tavernier’s assist stats here:

 

Tavernier 

Wallace 

Crossing % (as per SPFLRadar) 

~25% 

~34% 

Assists per 90 (as per Jay Mansfield) 

0.200

0.264 

Pass Completion (as per SPFLRadar) 

~78% 

~88% 

Table 10: Rangers’ full-backs’ assist stats 2016/17

Note that my assists per 90 stats in the table don’t match SPFLRadar’s – an occupational hazard of soccermetrics.

I constantly read on Twitter and on Rangers’ fan sits that Tavernier and Barrie McKay would have a higher number of assists if it weren’t for the profligacy of the strikers they were playing with last season. While I can’t deny that the Rangers attack last season wasn’t as potent as it could have been, every other player in the team had the same shower of numpties to aim their passes at, and it didn’t hold them back any. Well, any less at least. Windass, Waghorn, Toral, Wallace, Hyndman, and Miller all played more than 900 minutes in the league, and all had better assist rates than Tavernier.

A common phrase mentioned in relation to Tavernier is ‘positioning’, and this is, for me, where his main flaws as a player lie. When Mark Warburton came in as Rangers manager, he brought with him a playing style I would describe as ‘Total Football-lite’, based on the Ajax philosophy where any player can move into any other position during the course of the match, retaining the set formation, but ensuring that constant attacking pressure is put on the opponents. Under Warburton, Tavernier seemed to have a lot of freedom to drift from his right-back berth. Looking at the goals he’s scored from open play for Rangers (all under Warburton I might add,) 50% came from right-back starting positions, while 17% and 33% came from central-midfield and centre-forward positions respectively. As a comparison, over the same time period, Lee Wallace has scored the same number of goals from open play, but with 2/3rds coming from a left-back starting position.

I often wonder if players drifting from their natural position causes more problems than it solves. Wayne Rooney for instance, and particularly since Sir Alex Ferguson retired, never seems to have a set position throughout a game. Which is fine if you want an influential player to roam the pitch, but it has knock-on effects – 1, the player can’t be in two places at once, so if your striker is in the left-back area when you’re trying to score, that’s an issue. 2, the player who would normally be in the position the wandering player is in has to either find another position, or get in the way.

That’s how it should work in theory – if Tavernier goes forward from right-back, a midfielder should drop in to cover. But if Tavernier moves into a centre-forward position, then an attacker has to drop out wide, and that for me is part of the reason why the team has been so disjointed going forward the last 18 months or so. Look at the goal Tavernier scored against Dumbarton in April 2016 – he took up a centre-forward position, meaning that Jason Holt had to drop in to right-back. There was a similar case against Morton in September of 2015, with Waghorn having to drop into midfield to accommodate his right-back’s foray into the penalty area. Admittedly Tavernier did eventually put the ball in the net on both occasions, but so far he’s found it difficult to replicate his goal-scoring form at a higher level than the Scottish Championship, but Warburton’s selection had a huge amount to play in that. It’ll be interesting to see what tactical changes Pedro Caixinha makes – David Bates has already noted that the Portuguese’s approach is more tactical than Warburton’s was.

However, in terms of analysing Tavernier’s defensive performance, we have to do things a little differently…

The Case Against the Defence

 

Defensively, barely any Rangers players escaped the 16/17 season with any credit, apart from maybe David Bates, Myles Beerman, and Aidan Wilson.

 

I don’t know if it’s obvious or not, but there’s a fairly strong correlation between the number of goals a team concedes and where they finish in the table: the fewer you let in, the higher up you’re likely to be. More or less. And Rangers have conceded the highest goals per game by a Rangers team in a top flight season since 1985-86, when they finished fifth.

Table 11: Goals conceded per game against points earned per game, from SPFL Premiership, Premier League, La Liga, Bundesliga, and Serie A, 2016-17

As per the chart above, there is a strong correlation between conceding more goals and earning fewer points, with not a huge number of outliers (a good defence contributing towards wins was also true for the 2002 Oakland A’s, as mentioned earlier.) When Mark Warburton took control of Rangers in the summer of 2015, we were told that was the way his teams played – a cavalier type of total football, the archetypal ‘you score 3, and we’ll score 4’. Many people in the Rangers support welcomed this approach, some genuinely because they wanted to see out-and-out attacking football, and some because they always want what they don’t currently have. And while the football at times was enjoyable to watch as the light blues skipped to the SPFL Championship title in 2015-16, in retrospect, the cracks were starting to show – points dropped towards the end of the campaign, the Scottish Cup Final lost, and a poor away record.

Warburton’s recruitment involved disposing of the solid and unspectacular club player of the year Darren McGregor, who would eventually lift the Championship title with Hibs, and every other centre half at the club. He replaced them with the modern1 defensive duo of Rob Kiernan and Danny Wilson, the latter returning to the club after six years’ absence. Fitting with the Total Football style ethos, Warburton preferred his defenders to be comfortable on the ball, and both players do like to stroll out from the back to link with the midfield. Unfortunately, like many of their contemporaries, neither is particularly gifted at preventing children from gaining access to a tenement entranceway.

1 ‘Modern defenders don’t appear to have any concept of tackling, or clearing, or man-marking, or you know, defending.

Warburton appeared to address this problem ahead of the club’s return to the top-flight when he secured the services of Clint Hill and Philip Senderos, two players with experience at the highest levels of the game. However, the latter would barely feature, drawing criticism when he did. Hill fairer better, especially since he turned 38 midway through the campaign, and developed a good reputation with the fans – scoring against Celtic always helps in that regard.

However, Warburton’s defence reinforcements failed to deliver. In only the third game of the season, the club were hammered 5-1 by their arch rivals Celtic, and the pressure on the manager began to grow. And the defence continued to leak goals. It wasn’t for lack of trying on Warburton’s part; he tried different players, and occasionally a different formation. A 4-1 drubbing at Tynecastle followed in February, and by then the Englishman was on the brink. Warburton and the club parted company a week later, with the latter claiming the manager resigned, and the former saying ‘Naw ah didnae.’

After the odd resignation/sacking episode, Graeme Murty was brought in as caretaker, but he couldn’t do much better, shipping 6 goals in four games. And then Pedro Caixinha was appointed Warburton’s permanent replacement. Remarkably the club kept 4 clean sheets in Caixinha’s first five games in charge, after managing only 8 in the previous 28 games before the Portuguese suffered his own 5-1 defeat to Celtic, Rangers’ biggest ever defeat to their arch-rivals at Ibrox. That said, performances seemed to improve a little under the new manager, and he even gave debuts to some of the Development Squad stalwarts, such as the aforementioned Bates, Beerman, and Wilson.

Looking at raw data about the goals Rangers conceded isn’t very instructive, and this I think is another area where analysis of football will divest from analysis of baseball. In a baseball duel between a pitcher and a batter, an individual error by each player is likely to have a potent impact on their team’s score, while football, more of a team sport, is a bit more forgiving. Besides, I simply don’t have the granularity of data to look at tackles and headers won to see if there are any interesting patterns. I had a metric in my seasonal data spreadsheets that noted the win percentage for any given player, and at the end of the 2016/17 season, I noted it had thrown up that Rangers had only won 39% of matches that Clint Hill started, 11 points below average. This was interesting as Clint Hill had won some renown as being one of Rangers’ least-worst players last season. However, when I carried out some regression analysis on that figure, it turned out that Rangers’ goals conceded wasn’t any worse when Hill played, but that the team did draw a lot – the low win % was down to Rangers not converting drawing positions into winning positions through the midfield and attack not scoring enough chances.

That’s one of the pitfalls of trying to boil a complex sport down into a handful of metrics, and particularly individual metrics. An action in football is the result of a number of individual stakeholders’ actions creating events that either result in a goal or not. It’s all a bit chaotic despite people being paid millions of pounds a year to oversee the process strategically, and I’m still not sure you can condense all that information down into a single number.

What I can do is look at the 44 goals Rangers conceded, and log individual errors. There’s a bit of subjectivity here, but I have to work with what I have. Rangers’ most commonly selected defenders last season were Lee Wallace, Robbie Kiernan, Clint Hill, Danny Wilson, and James Tavernier. I analysed the 44 goals, and noted any errors any of the 5 might have made in the lead up to conceding. The results are below:

Player 

No Error

No % 

Minor 

Minor % 

Major 

Major % 

Total 

Minutes

Major per minute 

Minor per minute 

Wallace 

25 

86% 

3 

10% 

1 

3% 

29 

2386 

0.0004 

0.0013 

Hill 

22 

81% 

3 

11% 

2 

7% 

27 

1979 

0.0010 

0.0015 

Kiernan 

17 

63% 

3 

11% 

7 

26% 

27 

2130 

0.0033 

0.0014 

Tavernier 

23 

55% 

10 

24% 

9 

21% 

42 

3145 

0.0029 

0.0032 

D Wilson 

12 

55% 

7 

32%

3 

14% 

22 

1762 

0.0017 

0.0040 

Table 12: Defensive errors by player, per minute for 2016/17

Major errors were those that directly lead to conceding a goal. Minor errors are those that indirectly lead to a goal. I can already hear people yelling that Tavernier wasn’t at fault for 19 goals last season, so in the interests of transparency, my working is in the table below:

Opponent 

Venue 

Score 

Description

Tavernier Error?

Hamilton 

Home 

1-1 

Cross from right, cut out by Hill. Falls to Accie player, who turns and scores.

No 

Dundee 

Away 

2-1 

Corner comes in from the left, Barton fails to track his man. 

No 

Motherwell 

Home 

2-1 

Motherwell break down right, cut inside Tav, deflected cross to middle of 6 yard box. 

Yes – caught upfield, does well to chase back, but tackle is weak.

Kilmarnock 

Away 

1-1 

Break down R’s inside right, ball in behind CB for Boyd to score. 

Partial – caught upfield and on wrong side of attacker.

Celtic 

Away 

1-5 

Corner from right. Header at back post. 

No 

Celtic 

Away 

1-5 

Slack pass from Kiernan. Opp FW breaks, cuts inside, scores. 

No 

Celtic 

Away 

1-5 

Break through centre, pass in behind, Opp FW finishes. 

No 

Celtic 

Away 

1-5 

Break down left, long cross to back post. Opp FW brings down and finishes. 

Partial – caught upfield and tracks back for a bit before seemingly losing interest.

Celtic 

Away 

1-5 

Cross to back post, Opp FW slots home. 

Yes – half-hearted attempt to cut out cross

Aberdeen 

Away 

1-2 

Long ball headed on, Opp FW in behind, slots home. 

Yes – ballwatching. Man runs off him

Aberdeen 

Away 

1-2 

Direct free kick. 

Yes – ballwatching, gets wrong side, then dives into tackle

St. Johnstone 

Home 

1-1 

Hodson caught in possession, Opp FW turns and shoots. 

N/A 

Ross Cty 

Away 

1-1 

Corner from right. Opp player unmarked and scores a header.

No 

Partick 

Away 

2-1 

Cross from left bounces about a bit, opp wins bounce, and bundles home. 

Yes – half-hearted challenge 

Hearts 

Away 

0-2 

Ball across face of goal, opp player sneaks in to score. 

Yes – ball watching and fails to cut out cross.

Hearts 

Away 

0-2 

Deep run down left, ball across face of goal, tap in for opp. 

Partial – caught upfield 

Aberdeen 

Home 

2-1 

Free kick from left headed in.  

No 

Hamilton 

Away 

2-1 

Mistake by Halliday, cross converted. 

No 

St. Johnstone 

Away 

1-1 

Short pass by Kiernan, converted by opp.

No 

Celtic 

Home 

1-2 

Corner from right missed by defence, converted at back post. 

No 

Celtic 

Home 

1-2 

Cross across box from left side, converted at back post. 

Partial – ballwatching and failure to pick up any player.

Hearts 

Away

1-4 

Long cross to the back post, converted with a looping header. 

No 

Hearts 

Away 

1-4 

Halliday loses possession, opp converts with a low strike from distance. 

No 

Hearts 

Away 

1-4 

Quick free kick down the right, cross-shot across goal, tap in. 

Yes – ballwatching, then lazy run back

Hearts 

Away 

1-4 

Deep cross, keeper fumbles, opp tap-in. 

No 

Ross Cty 

Home 

1-1 

Break down left, deep cross to centre of PA where opp converts. 

No 

Dundee 

Away 

1-2 

Break down the right, cross to the back post, low shot into bottom corner.

Yes – gets nowhere near anyone.

Dundee 

Away 

1-2 

Deep free kick to back post goes in. 

No 

Inverness 

Away 

1-2 

Shot from edge of area rebounds to opp player. Long shot goes into bottom corner. 

Partial – deflects throw in into path of opp. 

Inverness

Away 

1-2 

Overhead kick from a long bouncing ball. 

No 

St. Johnstone 

Home 

3-2 

Short pass to teammate inside box, curls past the keeper. 

No 

St. Johnstone 

Home 

3-2 

Corner from left, defence fails to clear, opp scores. 

No 

Celtic 

Away 

1-1 

Miscontrol by Holt, pass to opp, low shot into goal.

No 

Motherwell 

Home 

1-1 

Corner from right, easy header for opp. 

N/A 

Celtic 

Home 

1-5 

Foul by Beerman for penalty. 

No 

Celtic 

Home 

1-5 

Miscontrol by Hyndman falls to opp. Feeds teammate to score. 

Partial – ballwatching 

Celtic

Home 

1-5 

Break through centre, pass to inside left, shot low into goal. 

No 

Celtic 

Home 

1-5 

Free kick from left headed in. Player ran off Windass. 

Partial – misses chance to clear 

Celtic 

Home 

1-5 

Tav gives ball away, opp drives forward and scores a low shot.

Yes – gives ball away, lazy in chasing back. 

Partick 

Away 

2-1 

Switch of play from right to left, low cross headed home. 

Partial – weak challenge allows switch of play. 

Hearts 

Home 

2-1 

Break from Rangers attack, break tackle, pass for easy conversion.

No 

Aberdeen 

Home 

1-2 

Interception, 1-2, solo run and finish. 

No 

Aberdeen 

Home 

1-2 

Cross to back post, headed back across, looping header in. 

Partial – fails to stop cross coming in 

St. Johnstone 

Away 

2-1 

Stramash in penalty area, falls to opp, shoots home.

Partial – not paying any attention 

Table 13: Tavernier’s defensive errors

I do have some sympathy for the argument that Tavernier has spent the best part of two season being left exposed by his defensive midfield colleagues not covering for him. At the same time, one wonders if poor decision-making is behind him making so many offensive runs, when a break might not be on. Besides, we didn’t actually concede a lot of goals as a result of our right-back being stranded up the pitch. We lost goals because of our right-back not paying attention, and that’s arguably a harder trait to coach out of someone. All things considered, I’m not sure the signing of Ryan Jack (a more consummate defensive midfielder than Andy Halliday or Jason Holt) will actually benefit Tavernier any. Unless Jack is to man-mark Tavernier at all times. Individual defensive errors is not a stat I’ve seen applied in Scottish football as yet, but Squawka feature it. That would be a useful metric for measuring a defender’s defensive performance.

Conclusion

 

It’s taken me a long while to write this blog, partly because it’s required a lot of research and analysis, and partly because I think it makes me sound like a massive Luddite. As I’ve noted, I’ve been a keen proponent of using statistics to analyse football for years now, as humans are subjective beasts and we view the world through a lens of our own bias. Data helps us understand what is actually happening. But data is data – raw numbers. It needs to be interpreted by a human mind to become information. And bias can creep in there. I’m sceptical about boiling down all chance quality data into one cumulative metric, bet that xG, or something else, but that doesn’t mean that quality data isn’t useful to help us understand how effective our teams are.

Overall, I’m not sure what role all this data and analytics is filling. Are the clubs using xG and xA and controlled zone entry to analyse their own performances? Are we to use these various metrics produced by keen fans for information only, or to influence opinion? The way they’re presented at the moment seems to lean more towards the latter, which concerns me as the average person doesn’t really look much into statistics that back up the opinions they’ve already formed. What would be ideal would be something like Squawka carrying data on the Scottish game, so then fans can be better informed about analyses and indexes and ratings that have been carried out.

I’m watching the Confederations Cup final as I write this, a tournament where referees have had the option to peruse video evidence if they wish. But as we’ve seen in other sports, the final judgement in these situations is still made by a human being, and humans make mistakes and odd decisions. I’m not quite ready to put all my faith in soccermetrics yet. I think we can certainly use data, but we still need to rely on our own judgement as fans, and as coaches.