At the end of the 2017/18 Fantasy Premier League season I found my enthusiasm waning somewhat. I went into the final run of games quite dejected at my team’s performance over a season in which I was hoping to build upon 2016/17’s overall rank of 16k but found myself repeatedly languishing just outside the top 100k. In truth, this ennui had been dragging since the anger phase had slowly burned out around Christmas when I slipped into a resigned state of torpor. The poor start I had experienced knocked me back somewhat and I became unable to trust my data or my instincts. The fact is I became lazy, leaning on the ‘wisdom of the crowds’ approach (e.g. aggregating and subsequently poaching ideas from the FPL communities) and focusing heavily on forthcoming fixture lists to make my transfers. When I did attempt to add a little flourish into the season by careening off-piste early in the season – a Mahrez captaincy and, forgive me, a Bakayoko 4^{th} midfield punt – they backfired less in a blaze of glory, rather a dribble of a nosebleed which ensured that I remained gun-shy and treading water for the remainder of the season.

The focus of this blog is to explore whether paying specific attention to the fixtures as an approached mentioned above was a sound strategy, or whether we collectively fetishize the ‘good run of fixtures’ by assuming that the ‘lesser’ opponents provide more opportunities for points. (note: I’m ignoring studying the Wisdom Of The Crowds approach for the moment).

In order to obtain an objective measure of a fixture’s favourability I collected the average UK bookmakers’ odds for Premier League games for the last four seasons (1,520 games, including 2014/15) from the superb treasure trove that is football-data.co.uk. The odds collected there are in decimal format (e.g. 5.00) rather than the traditional shop-window format of fractions (e.g. 4/1).

The decimal format communicates how much money will be returned from a £1.00 stake; in the case of a 4/1 bet, the decimal value is 5.00 which means that for every £1 placed you will get returned £4 profit and your original £1 stake, making a total of £5 (5.00). We can use this to calculate the implied probability of a team winning: 5.00 is the same as 4/1 odds, which in simple terms can be expressed as “will succeed one time for every four times it fails.” If we think about it as succeeding once in every five attempts (1/5), then the implied odds for success are 20%. Therefore, a team with an average betting market value of 5.00 has a 20% chance of winning that game. For reference here is a table of odds related to percentage:

Implied probability | Decimal | Fraction |

10% | 10.00 | 9/1 |

20% | 5.00 | 4/1 |

30% | 3.33 | 23/10 |

40% | 2.5 | 3/2 |

50% | 2 | 1/1 (evens) |

60% | 1.67 | 2/3 |

70% | 1.43 | 10/23 |

80% | 1.25 | 1/4 |

90% | 1.11 | 1/9 |

It is worth noting that the implied home, away and draw percentages in many cases did not equal 100% when added together, likely due to the aggregation process of taking averages from more than a dozen bookmakers. An example picked at random would be Everton v Sunderland from 25/02/2017, where the odds were: home, 71.43%; draw, 20.12%; away, 12.71% = 104.26%. In order to ensure all baseline odds were equal, I have proportionally adjusted the probabilities to equal 100% in all cases, so this fixture becomes: home, 68.51%; draw, 19.30%; away, 12.19% = 100%.

The distribution of odds across the 1,520 games is as follows:

*Figure 1: Count of matches by adjusted probability odds of home and away team winning, 2014/15 to 2017/18*

For the curious, the highest adjusted odds for an away team victory in the last four Premier League seasons, that lonely ‘1’ in the top right corner, was when Huddersfield hosted Manchester City in November 2017, a match which the eventual champions won 2-1 after being a goal behind at half-time. The highest adjusted odds for a home game in the same time period? The reverse fixture: Manchester City’s 88.25% against Huddersfield in May 2018, which finished 0-0.

In reality, the two meetings between Huddersfield and Manchester City in 2017/18 where partly what prompted this research. My laziness in picking Raheem Sterling as Triple Captain in gameweek 37 was predominantly due to two ‘good’ fixtures, one of which was at home to Huddersfield despite Huddersfield’s strong showing in the reverse fixture. The definition of good was obviously in part driven by Manchester City’s form, but also by Huddersfield’s position towards the bottom of the table and inferior resources. However, as it played out these two fixtures where Huddersfield had adjusted odds of just 6.07% and 3.44% to win finished with an aggregated score of just 2-1 to Manchester City.

From here on out I will be mostly referring to odds in decimal terms to avoid confusion with other percentage measurements later in the article.

##### Exam Questions

I approached this research without a specific idea in mind, but the guiding principles seemed to be:

*Do favourable odds increase a team’s chance of a clean sheet? Conversely, do poor odds decrease a team’s chance of a shut out?*

*Do favourable odds increase the probability that a team’s midfielders and forwards will deliver attacking returns?*

Of course, the answers to these should be apparent on the surface. After all, good teams score more goals and concede fewer than the bad teams. The bookmakers know this and price according to probability, as do the price-setters at FPL Towers; there is a reason why the defenders from the top six costs more than those from newly promoted clubs. However, I have come to be more interested in the frequency of such events to explore whether it is worth the extra outlay to get David de Gea over Fulham’s Marcus Bettinelli in the 2018/19 game, for example.

##### Goalkeepers / Defenders – Clean Sheets

*Figure 2: Share of games where a clean sheet is kept, by the odds of the team winning*

The data doesn’t initially reveal anything ground-shaking: the higher the probability of a team winning, the higher the chance of it keeping a clean sheet. What is interesting though is that there appear to be three distinct groups within these data:

- The rank outsiders (>5.00 odds, or <20% probability of winning) which will keep a clean sheet infrequently (21% of the time when combined together). 42% of games will feature a game with a team which fits this description.
- There is a prominent middle group where the odds of winning are between 1.67 and 5.00 the odds of a clean sheet are bunched up between 37-48%. 72% of games will feature a team that fits this description, and the odds of a clean sheet being kept do not differ as much as we might expect; for example, the near 10.00s include Brighton at home to Manchester United in 2017/18 whilst the near 1.67s include Liverpool at home to Norwich in 2015/16. On paper, few would back Brighton and most would back Liverpool for a clean sheet, but the former sit within a group that will keep a clean sheet 37% of the time and the latter 48% of the time. So whilst Liverpool would be more favoured to keep a clean sheet (note: they didn’t), their probability is not streaks ahead of Brighton (note: they did keep a clean sheet in this game) as I would have instinctively thought.
- The overwhelming favourites (<1.67); 28% of games will feature an overwhelming favourite, and they will keep a clean sheet on 65% of occasions.

When we split the clean sheet success groupings by whether the teams played home or away, we reveal a few more curiosities about the data.

*Figure 3: Share of games where a clean sheet is kept, by the odds of the team winning, split by home and away*

- If a team has a lower than 30% probability of winning (e.g. >3.33 odds), they will keep a clean sheet in 30% of games away vs. just 24% at home.
- Teams with odds of winning between 3.33 and 2.00 playing away from home dramatically underperform relative to their home counterparts. In the 2.00-2.50 range, which for context includes games such as reigning champions Leicester away to Hull on the opening day of 2016/17, Arsenal away to Southampton and Newcastle in 2017/18, and Tottenham away to Middlesborough in 2016/17, the clean sheet success percentage is just 19%.

There is subjectivity in how this data can be interpreted, and I would encourage the reader to explore his/her own interpretation from this. My view is:

- The relative success of the rank outsiders away from home can be viewed as a reflection of the requirement to play on the back foot in the face of significantly stronger opposition, and it is interesting to see that it works three times out of every 10.
- The 3.33 to 2.00 odds teams away from home are favourites and expected to take the initiative, but do not appear to be sufficiently intimidating that their opponents do not see an opportunity for goals and thus come onto them, resulting in the poor away from home clean sheet record.
- The strength of the >1.67 teams is such that they are formidable away and at home, although the home advantage does favour their chances of keeping a clean sheet more. (note: the 0% success of the 1.25-1.11 group away from home is misleading as there is only one game in this group, the aforementioned Huddersfield vs. Manchester City game)

##### Goalkeepers / Defenders – Average Points per Player

We established that there are a few quirks in the clean sheet data, such as outsiders faring better away from home than at home, and slightly favoured teams performing poorly away from home in terms of keeping clean sheets. Now, let’s translate that to FPL performance. The chart below shows the average points per game per player (60+ mins played) in each of the team’s probability ranges.

*Figure 4: Average FPL points per player, by the odds of the team winning (60+ minutes played)*

This data shows that there is a steady climb through the probability odds for both defenders and goalkeepers, but the goalkeepers will start from a higher baseline and whereas the defenders will climb at a steeper rate. Let’s think about this in practical terms:

- A goalkeeper for a rank outsider will score double the average score of a defender on the same team in the same game (3.06 vs. 1.56 for the >10.00 group). They may not keep a clean sheet (they will fail in 82% of games) but they will find more opportunities for save points and there will be less attacking opportunity for the defenders. Bookings for goalkeepers are also far rarer, whereas defenders attempting ‘tactical’ fouls will get called up in such games.
- The cross-over where the average defender points exceed that of the goalkeepers occurs around 50% (2.00). This is when the favoured team’s defenders will be permitted time on the ball to execute passes (bonus points) and get more involved in attacking plays. Goalkeepers, by contrast, are likely to find they have less to do as the opposition sits back more hoping for the occasional chance.

A previous blog of mine suggested that there was very little value to be found in the budget defenders compared to the upper-mid to premium price range and better value to be found in the budget goalkeepers. The data here seems to back this up from a different approach.

The shallower gradient of the goalkeepers’ line is suggestive that the goalkeepers from unfavoured teams can perform to a similar level to those from favoured teams. The other way to consider this though is to assume that there is little relationship between a goalkeeper’s team’s probability of winning and the FPL points they will score. This latter point is evidenced in the following chart, which plots the average probability of a team throughout a season with the average points per game scored by the goalkeeper.

*Figure 5: Average FPL points per goalkeeper, by the average odds of the team winning throughout the season*

The R2 value represents how correlated the data is: 1.000 represents a perfect correlation, whereas -1.000 represents a perfect negative correlation. What we’re seeing he is 0.248, which is closer to 0 than to 1.000. 0 would indicate that the data is completely random and that there is absolutely no relationship between the numbers. This tells us that the relationship between FPL points for goalkeepers and the team’s probability of winning is related but only very loosely; it is closer to random than a co-dependent relationship. Taking this same approach for the defenders’ data shows a 0.718 R2 value, which indicates that the relationship between odds and FPL performance is far more closely aligned.

From this we can infer the following: the average points of each defender are related to the probability of a team winning far more than the goalkeeper. We can therefore be far less confident in the performance of expensive goalkeepers than expensive defenders (based on the fact that FPL price is generally related to team probability). However, we must be careful before jumping in and declaring that budget goalkeepers from low-probability teams are the best option because we have seen there is no correlation. In 2017/18, Burnley, Swansea and Brighton were all teams which had a low average probability but whose goalkeepers scored high, whereas Bournemouth and Watford also had low probabilities but did not score high.

The lesson I am taking away from this research (along with other analyses I have done previously) is that I with defenders it is better to pick players from higher probability teams as there is a higher chance of points (related to clean sheets, undoubtedly). However, for goalkeepers there is no evidence that probability of the team winning hinders points-scoring potential, so fixtures or strength of the team do not need to be taken into consideration when picking a goalkeeper. This implies that a strategy where I can pick up a £4.5m goalkeeper from an unflavoured team and this £4.0m deputy to maximise spend elsewhere in my squad. The difficulty of course is going to be in selecting a Swansea or Brighton and not a Bournemouth or Watford.

##### Midfielders / Forwards

In order to explore the question of whether fixtures are beneficial for midfielders and forwards I have taken a slightly different approach. For defenders, the average points across the backline and team clean sheets are acceptable metrics because the main source of a defender’s points is the clean sheets which are team achievements. By contrast, the attacking eight in an FPL line up are dependent primarily on individual actions, namely goals and assists for the points.

Like clean sheets, the chart below the distribution of goals per game follows a logical progression with team favourability; the higher the odds of a team winning, the higher the number of goals they are likely to score. Similarly, the better the probability the more points midfielders and forwards will score. This provides no great surprises. For the FPL manager, however, the task needs to be understanding whether his/her particular player is likely to score or assist during the game.

*Figure 6: Average goals per game, by the odds of the team winning*

*Figure 7: Average FPL points per player, by the odds of the team winning (60+ minutes played)*

Unfortunately, that is far more difficult to predict. The below chart plots the number of points scored by a premium FPL asset (Eden Hazard, selected… not quite purely at random, but rather because he was the first player I thought of to have been active for the past four seasons) against the favourability of the opponent to win in those games.

*Figure 8: FPL points by the odds of the team winning (60+ minutes played) – Eden Hazard 2014/15 – 2017/18*

We can see that there is a slight correlation; the size of the maximum hauls decrease the harder the opposition becomes, but in reality the data is almost randomly distributed: the R2 value of 0.05 does not lend itself to any kind of predictability (a reminder that a R2 value of -1 or +1 would indicate a perfect positive or negative correlation)

This is not limited to Hazard. The R2 values of midfielders mostly hover within 0.2 of the completely random middle ground (-0.2 to 0.2). The average of all midfielders and forwards who have scored at least one goal and played more than 2,000 minutes in the last four seasons is 0.083.

The data suggests that on an individual player level it is not possible to guarantee point returns based on the perceived strength of the opposition. This very much explains the frustrations when ‘banker’ picks like Harry Kane occasionally labours against teams Tottenham are expected to dominate but then within weeks has hit consecutive hat-tricks against similarly unfancied teams. The reality of the situation is that Tottenham forwards and midfielders in aggregate will score more points against a weak team, the average R2 value across all midfielders and forwards is nowhere near strong enough to say that you shouldn’t be backing Kane almost as much against a top team as a bottom team.

The probability of goals for a team does follow with the favourability of the team, as seen earlier, however this does not transfer to individual players; chasing attacking points in FPL does inevitably involve some luck. The increase in average goals and FPL points per team does mean that there is an increase in number of points being scored by a favoured team. This is where the concept of ‘coverage’ comes into player. Many FPL managers will be familiar with the term, but for the uninitiated it refers to the practice of buying a cheaper attacking asset (e.g. Son, Alli, Eriksen) in the hope of offsetting the potentially explosive impact of a premium player (Kane); the theory goes that if the threat of Kane scoring a double-digit FPL haul in a game against a weak opposition is great then his attacking teammates will likely pick up auxiliary points providing assists or even the odd goal.

It turns out there is merit in this theory, although it is not especially strong. The below chart shows the minimum number of midfielders or forwards per team to score six or more points in games (x axis) relative to the favourability of the team (represented by the lines). The y axis shows probability of the event occurring.

*Figure 9: Probability of at least X players scoring six or more points in a game, midfielders and forwards only*

To put this into simple terms, every favourability group has a 100% chance of at least 0 attacking players scoring six or more (obviously). Moving to the left, we see the variations in favourability rankings start to have an impact on the FPL potential; in 35% of the games where a team with >10.00 odds plays, at least one attacking player will score six or more points, but for the 5.00-10.00 group (still rank outsiders but marginally less so) the probability increases to 50%. For both groups however, the prospect of at least three players scoring a decent attacking return decreases to less than 4%.

At the other end of the spectrum, the odds improve but perhaps not as much as FPL managers would like it to. At least three players playing for teams that have odds of between 1.67 and 2.00 will score at six or more points around 24% of the time. This means that, referencing the previous paragraph, there was very little chance of three Huddersfield attacking players scoring decent returns against Manchester City, but if we then look at the same happening for Leicester against Huddersfield then the odds are not vastly more probable, only around 21 percentage points. If we extend this to the Manchester City team in the same Huddersfield game, the figure increases to around 40%. Even so, assuming you can even pick the correct three players, the ‘banker’ game for Manchester City will fail to generate returns for the ‘triple up’ strategy six in every 10 times.

##### Conclusion

Goals, it transpires, are rare commodities in football. What I take from these data (and again, I encourage all reading this to draw their own conclusions) is that odds are good indicators of a team’s likelihood of scoring goals or preventing the opposition from doing so. However, from an individual player / FPL perspective this is only relevant data for defenders. There is a steep rise in average points scored by the defenders as we move through the favourability groups. For goalkeepers, the presence of save points means that the position starts to show behaviour where FPL performance begins to detach itself from fixtures, although the presence of team-earned points from clean sheets means there remains some positive correlation linked to fixtures.

Within the attacking groups (forwards / midfielders), the individual FPL performance (not team goals it must be stressed) is even more loosely correlated with fixtures than the goalkeepers; there is an almost complete disconnect between a team’s fixture favourability and an individual’s FPL points. However, there is a connection between fixture favourability and the number of players within a team to score a decent amount of points although this is by no means a guarantee that the ‘coverage’ strategy will work.

Perhaps it is the lack of recognisable pattern between who an attacking player will score against that leads to the high prices for forwards in the FPL game. Whilst much of the ire is direct towards a £10m+ striker for failure to score in a game where his team are 60%+ odds of winning, we need to consider it from the other perspective also, that he is still a threat when his team’s odds drop too, which is not true for defenders.

##### Application to team structure

In summary, most of what is found here follows common sense; good teams win more which follows the bookies’ odds. However, the data does back-up findings from an earlier blog I wrote regarding how a squad is assembled.

**Goalkeepers:** the earlier blog showed that goalkeepers increased in points per minute the more you spent, but decreased in value (points per million). Based on what I’ve seen in this blog, that fixtures are not a key indicator of points, I’m tempted to go with a one club, starter and back-up tactic for £8.5m (£4.5m starter, £4.0m back-up) to save on a x2 £4.5m rotation. My provisional thinking for this is the Lossl / Hamer combination at Huddersfield.

**Defenders:** the upper-premium defender range (£5.5-6.5m defenders from the top six teams) have the highest points per million and points per minute value, and this analysis shows that if you pick these players from the top teams then the number of points they will score will significantly increase if they are from a team that are highly favoured in the odds.

**Midfielders:** this new research has shown that team odds do not impact a midfielder’s points potential significantly, but previous work has shown that you get what you pay for in this department. Therefore the top options on mid-ranking teams as well as many premium options as you can afford should be considered, although economies will clearly need to be made somewhere.

**Forwards:** previous research has shown that there is diminishing value the more you spend, and that points per minute are erratic and difficult to predict due to the predominance of lower price forwards to come off the bench. However, data here shows that there are more points potential in the forwards than the midfielders. My initial conclusions from the older blog that one premium forward and two rotating lower-standard forwards (Zaha and Arnautovic are obvious considerations), or even one throwaway £5.0m option would be the best approach considering the better certainty of points per million in the midfield bracket. How I rotate those two (studying fixtures for a team is useful, for an individual forward not so much, which is the unhelpful paradox at the denouement of this analysis) and the teams they come from may well be a defining feature of my FPL season.

Thanks for reading. As always, contact me on @mathsafe_fpl on Twitter if you’d like to

Am thinking the same possibility about 8.5 spend on keepers. Fabianski and a non playing 4 (if no playing fours are available) Because like you say keepers often rack up the points when not expected. Penalty saves and save points are more likely yo be earned in tougher fixtures.

There are Two risks to this approach though 1)Injury (if 4.0m keeper not from same club) 2) Picking a goalie who turns out to be playing for a rubbish team that under performs expectations in terms of clean sheets etc. The upside is saving 0.5 and not missing out on the One or Two big hauls

LikeLike

great work ! thanks !

LikeLike

Fantastic analysis. Entertaining and enlightening read. Really helps understand and balance empirical evidence that aids (and punishes) many of my statistical decisions

LikeLike

I think that there ia an error in Figure 3. It the clean sheet probability of 2.0 to 2.5 in Figure 2 is 44%, then the equivalent two bars in Figure 3 have to average to 44%. Probabilities should therefore be 47% home, 41% away.

I’d also suggest that the coverage argument does apply to individual players, as it reduces the chance of say Kane blanking. Even assuming that goals are scored at random, 4 players getting returns means that Kane is half as likely to blank as in a game where only two players get returns. Given that a high proportion of goals go through Kane, I’d suggest that the relationship should be even stronger.

Cracking post though. Will return to read more.

LikeLike

Great blog, cheers

LikeLike