Olivier Jean is a Canadian Olympic gold medalist with a long, successful career in both short-track and long-track speed skating. After recently retiring from skating and subsequently completing a Master’s degree in Management Innovation & Entrepreneurship, he has now taken an interest in applying data analytics to speed-skating in order to assist coaches and athletes to improve their race outcomes.
ProSensus recently worked with Olivier to understand if some key talent-prospecting questions can be answered through the analysis of existing speedskating data. Olivier’s vast subject matter expertise was combined with ProSensus’ strength in data analysis for this short and collaborative study. Multivariate modeling was used to identify key performance indicators of athletes that will be successful in the new speedskating race format known as “Mass Start”.
Data-Driven Talent Prospecting for Mass Start
In recent years, talent prospecting in the world of professional sports has become more reliant on pure numbers and stats while traditional scouting (that draws from the expertise of a subject-matter expert to evaluate less quantitative physical and intellectual assets such as agility, strength, drive, aggression, etc.) has taken a backseat. In 2020, as the COVID-19 pandemic continues to impose restrictions on the size of group gatherings, in-person scouting is more difficult than ever. Moreover, sports management professionals are recognizing the competitive advantage that can be gained by intelligently analyzing the vast amounts of data that is readily available.
Jac Orie brought attention to the use of analytics in the world of speed skating by producing numerous Dutch gold-medal skaters between 2002 – 2018. (Notable athletes included: Gerard van Velde in 2002, Marianne Timmer in 2006, Marc Tuitert in 2010, Sven Kramer 2014-2018, and Stefan Groothuis in 2014.) Orie proved to the world that utilizing historical data results in success; Orie used test data generated by skaters to calculate speed and stamina to improve his training program.
Mass Start is a relatively new long-track speed skating competition that was featured in the 2018 Winter Olympics for the first time. Mass Start involves up to 24 skaters racing simultaneously from the same starting line and completing 16 laps of the Olympic track. This unique race format requires a premeditated strategy, for example, some athletes choose to intentionally start out slow and then attempt to break away to the lead at a rehearsed distance or time along the race.
Applying Multivariate Statistics to Ladies Mass Start
Since Mass Start is a new race format, there isn’t much historical data that showcases the performance of each skater in previous mass start races. However, any potential skater to be chosen by a country to compete in the Mass Start would have results from previous long-track races (such as the 500m, 1000m, 1500m. etc.).
In the following example, 15 female skaters were used to investigate the correlation between historical performance in other long-track events to success in the Mass Start race. A PLS model was developed to explore this correlation.
The Score Plot
The score plot shown on the right provides a summary of the data where each point represents a skater. Skaters close to each other tend to have similar performance.
When colored by Mass Start ranking, a clear gradient exists across the score plot (blue = lower ranking = better skater). This indicates a consistent correlation exists between the model input data and the model output; in other words, athletes with the best potential for success in Mass Start can be identified from the model input data.
By looking at the score plot, it is evident that all of the best performing Mass Start skaters (blue) are located in the positive T1 region. Therefore, we can investigate the “high-potential skater” criteria from a loading plot that shows the positive drivers (variables) of T1.
The Loading Plot
The loading plot highlights the (unsurprising) characteristics of a “high-potential” Mass Start skater:
- Previous pack-style race experience (Head-to-head competition decision making experience measure)
- Faster time in the long races (3000m and 5000m) in the current and/or previous seasons (Endurance measure)
- Better overall world cup ranking (Performance constancy measure)
- Participated in more world cup events in the current season and/or overall (Long track experience measure)
Let’s translate these findings back to some familiar names. For example, Ivanie Blondin (announced as Speed Skating Canada’s Athletes of the Year on Sep 2020) who won the 2020 overall Mass Start World Cup ranking, and Irene Schouten who came second were located in the positive T1 score plot region.
Looking back at their past performance, we can see that they both had the “high-potential skater” criteria with previous pack-style race experience, faster times in all the races including both the 3000m and 5000m races as well as participating in more events and having a better overall ranking.
Comparing these results to Karolina Bosiek, who came last in our dataset, we can see that she had no previous pack-style race experience, was usually much slower (longer time) than average in the long races, did not participate in the 5000m race that year, had participated in fewer events in the 2018/2019 seasons, and did not have a good ranking overall that year.
The next step in any model is model validation to assess whether or not the “high-potential skater” criteria is applicable to new skaters that were not included in the original dataset. In this example, four additional skaters were used to test the model:
- Karolina Gasecka (average MS rank 18)
- Qi Yin (average MS rank 14)
- Saskia Alusalu (average MS rank 17)
- Valerie Maltais (average MS rank 16)
As expected, these skaters (who had a bad MS ranking) were projected onto the negative T1 region which was previously shown to not be the “high-potential skater” region.
Qi Yin and Valerie Maltais are good examples to analyze in this dataset. Despite having previous pack-style race experiences their season-best performance was mostly average if not slightly slower in the longer races. Also, they had lower overall long track skating experience and had participated in fewer world cup events in the 2018/2019 season failing to meet the “high-potential skater” criteria collectively. Both of those skaters have only recently transitioned to long track speed skating after a successful short track speed skating career, explaining their current low long track racing experience. These two skaters are a great example of why one should not use one variable to make a decision but rather consider multiple variables simultaneously.
Technical efficiency, fitness and pacing strategy are recognized as key performance factors during individual speed skating events.[5-6] Naturally, these characteristics are also important for the Mass Start, which are highlighted by the importance of having a faster time in long races. However, tactical decision-making seems to be the critical success factor for head-to-head competition.
Expert decision making, especially under stress and fatigue, takes time and experience to develop. This is consistent with our data showing previous pack-style racing experience as a characteristic of high-potential Mass Start skaters. It is important to point out, that athletes with extensive pack-racing experience obtained outside long track speed skating dominated the 2018 Mass Start Olympic podiums. Five of the six Mass Start medal winning in Pyeongchang had international successes in either short track speed skating, inline speed skate, or ice skating marathon before transitioning to long track speed skating.
Extending our analysis, knowing the importance of strategy and decision making, we can also look at race data to identify what factors during the race are most important for winning. In this analysis, we are including the skater’s time for each of the 16 laps as well as their position for each lap and correlating that to their final position at the end of the race.
Comparing the 2019 World Cup (Japan) race profiles of Ivanie Blondin (1st) and Karolina Bosiek (10th), one can see that they both had a very similar race profile in the first 10 laps but Ivanie was able to skate a lot faster (<30 seconds) for the last 4 laps securing the first place.
Ivanie Blondin’s ability to finish the race faster than Karolina Bosiek can probably be explained in large part by her better fitness and technical efficiency as demonstrated by her faster season best times over individual distances.
By analyzing all international Mass Start finals since 2015, it shows that the lap times in the later laps of the race (laps 12-16) and their corresponding skater’s position were the highest contributors of success. The skater’s position in the first lap was also detected as a significant contributor, as we would expect because skaters must start the race according to their international ranking.
For this current analysis, we used the International Skating Union electronic race timing chips data that gives positioning only once a lap. Many movements can happen inside a 400m lap, so the lack of intra-laps position data is the biggest limitation for this analysis strategy. Video analysis to track athletes positioning every 50m would portray a more precise representation of winning decision-making.
This analysis and dataset certainly have limitations. Nonetheless, it does highlight some clear KPIs that a skater should have in order to be successful in the Mass Start. KPIs become important when leaders have to form their team, choosing athletes with the highest potential for success. Choosing the Mass Start athletes based on the results of the time trial distances represents particular difficulties caused by the fundamental differences between the two types of races.
Many countries have to make this hard decision because the Mass Start athlete pool is often too small to allow local competitions to be representative of the demand of international races. We are confident that any informed decision on choosing a female Mass Start athlete should always contain the following data:
- Previous pack-style race experience
- Endurance measure
- Performance constancy measure
- Long track experience measure
This is just the beginning of where data analytics can be used in speedskating and other sports with track or lap race formats. To clarify the understanding of the success factors of the Mass Start, many other questions could be answered by using the data currently available for free on the International Skating Union web page:
- What are the characteristics of high-potential men Mass Start skaters?
- Does age matter?
- How much long track experience is needed for athletes transitioning from other sports?
- What are the impacts of team skating strategies?
- Could individual race split time be a better Mass Start success indicator than final race time?
If the 2018 Winter Olympics is being referred to as the first “Big Data Olympics” , it’s definitely not going to be the last one!
- Blog.unbelievable-machine.com. 2020. Big Data Olympics: The Secret Behind The Dutch Speed Skaters’ Successes. [online] Available at: <https://blog.unbelievable-machine.com/en/big-data-olympics>
- Thrillist. 2020. What You Need To Know About Mass Start, The Winter Olympics’ Most Exciting New Event. [online] Available at: <https://www.thrillist.com/news/nation/what-is-speed-skating-mass-start-winter-olympics-2018>
- 2020. [online] Available at: <https://www.betsul.com/spbr2/sqi/50/stage/546293/items#> [Accessed 30 November 2020]
- Speedskating.ca. 2020. Speed Skating Canada – Our Sport. Our History. Our Passion.. [online] Available at: <https://www.speedskating.ca/>
- Konings, M.J., Hettinga, F.J. Pacing Decision Making in Sport and the Effects of Interpersonal Competition: A Critical Review. Sports Med 48, 1829–1843 (2018).
- Konings MJ, Elferink-Gemser MT, Stoter IK, van der Meer D, Otten E, Hettinga FJ. Performance characteristics of long-track speed skaters: a literature review. Sports Med. 2015
- Ichinose G, Miyagawa D, Ito J, Masuda N. Winning by hiding behind others: An analysis of speed skating data. PLoS One. 2020