Ian Williams: An Updated Race Equivalency Calculator Attempt
Ian Williams: An updated formula for marathon-running success
This is another attempt by someone in trying to optimize the marathon race equivalency calculation portion. There's little doubt that the 5k, 10k, and HM relationships are strong. But the M sits atop the mountain with a difficulty unlike the others in properly estimating finish time from other race distances previously completed. Why is it even important to have a good race equivalency going into a race day? Well, running a marathon can literally come down to a few seconds per mile vs best performance and literal blow-up. It all comes down to the physiological difference between the M and the other races. Once you pass a certain threshold the ticking time bomb that is pace will starting counting down. And unless you pace perfectly, things can go haywire quick. So a good race equivalency or honest assessment of race day goal pace can be extremely beneficial. The classic formula used in most online calculators is Peter Rigel's formula:
M = HM x 2^1.06
Which means your M is 2.08 times slower than your HM.
I've previously reviewed a new-age calculation from Vickers (LINK TO COME).
So let's dive right into Ian Williams attempt at adjusting the classic marathon race equivalency calculator.
Sample size - 1071 different HM to M relationships. Good, but about half the size of Vickers data set (although Vickers used 5k, 10k, and HM performances). Williams did cut the data set to runners who had completed at least 5 HM and Ms, thus more experienced runners who knew what they were getting themselves into.
Sample collection - An internet "logging system" open to anyone using fetcheveryone.com to find participants. The article does not speak to potential issues of representativeness and selection bias. I'm not terribly concerned about the selection bias. There is literally no data as to whether this data set resembles a normal population set (male/female/age/training history/representative finishing times, etc.). I have reason to believe that the majority of William's data set is from runners at 2:00 half marathon or less (based on the displayed data and groups he chooses to display). The male median time in this study was UNK for the marathon versus 4:11 for NYC marathon, and 4:16 for Running in the USA. The female median time in this study was UNK, 4:38 in NYC, and 4:41 in Running in the USA. So my best guess on what I can surmise from the data set is that while the median national time is close to 4:16-4:41 in the US, very little of this data set (if at all) was based on runners around or slower than the national average.
I can't tell initially from the article whether the data is logged daily or just once at the end. That would call into question the chance for error. More measurements would reduce the chance for error. If you've got the entire data set (like a Strava history), then everything is there. But if the dataset Williams used relied solely on self-reporting, then it could make for a much higher chance for error.
Also, I can't tell if this is recent HM vs recent M. Or if it is PR HM vs recent M.
Alight, so let's dive in!
As previously stated, Rigel is:
M = HM x 2^R
Williams sets out to redefine R with a new value that makes the calculator more accurate for more people.
Williams starts by using his dataset of 1071 runners to define the relationship between their HM and M performances.
The very first thing that sticks out to me - no y-axis defined. What exactly am I looking at here? It would appear to be a histogram or distribution plot of the relationship of the 1071 runners HM to M. 1.06 represents the current Rigel. Williams proposed 1.15 is a better R value since it falls further towards the middle. I would not deny that either based on the graph. It certainly appears the 1.15 falls much closer to the middle than 1.06. And if being conservative on pacing for the marathon is an important variable (which I believe it is), then being on the slower side for predicting won't prevent a great marathon performance (because you can negative split the back half of the race). But I wasn't satisfied having no y-xais. So I made one for him:
I actually used photoshop to measure the height of each of his bars. Then I assumed this shown data set represented the whole 1071 runners. Which may or may not be the case. I don't believe anyone is faster than 1.01, but slower than 1.30 is certainly possible. Although, I certainly don't know. I feel relatively confident because the total height of the bars added together was 49.79 or very very close to a whole number of 50. That means I could calculate the number of runners per bar:
So when I look back at the 1.01 bar, it really represents 0.25% of the population or a guess of 2.7 runners. Makes sense. Only 3 runners out of 1071 were able to hit a 1.01 R value. So, does my data extraction work? Well Williams states in the article that less than 5% of the runners had a R of 1.06. His other linked article says 49 total runners at 1.06 or less. That jives closely with what I've got. Remember mine are in bars of 1.06. But that probably really means 1.055 to 1.064. So the numbers will be off slightly, but not terribly. So keep in mind when the data set talks about runners at exactly 1.06, it's really only talking about 29 total runners. A much much much smaller data set suddenly.
But what does that mean in actual time conversions?
So for example, someone with a R of 1.01 with a HM time of 2:00:00 was able to run a M in 4:01:40. For someone after 5 HM/Ms to run a virtual identical pace between their HM and M is astounding. Almost too astounding... That brings up another question about the dataset. The relationship between HM and M can't be viewed under a microscope. There are variables of race day that matter so much for performance. Race crowding, elevation, and weather just to name a few. If someone is running a uphill HM in hot weather in 2:00:00 and then a downhill cold weather M in 4:01:40, then the data starts making more sense. Regardless, it's another reason to cast question on this. Vickers did a better job attempting to correct this. So since Vickers is such a great guy and released his dataset to the public we can map Vickers dataset in the same manner as Williams. Vickers has a total of 862 runners in his dataset (including what I believe is a slower median population meaning it is more representative of the US population of marathon runners) that have matching HM and M condition races (and if not matching than an adjustment was used).
Hooray! I'd say for the most part the datasets follow a similar trend. Not the same, but similar.
So the initial conclusion was 1.15 is a better predictor R for HM to M than is 1.06. It does split the middle of the data set (with 47% on both sides). So better. Williams dataset says the midpoint is 1.15 with a 25-75% range of 1.10 to 1.19 and Vickers dataset says the midpoint is 1.13 with a 25-75% range of 1.09-1.17.
So for a 2:00 HM runner, what does that mean?
Rigel - traditional calculator (1.06) = M of 4:10:12 Williams - 1.15 = M of 4:26:18 (range of 4:17-4:33) Vickers - 1.13 = M of 4:22:38 (range of 4:15-4:30)
Since you are likely to see a better performance in the marathon with a conservative start, this new value of around 1.13-1.15 looks good to me. Slower is better at the beginning so you can leave some room for error in the second half of the race. Go out too fast in the beginning and the risk of blowing up is much much higher.
The problems start to arise when he starts to parce the data apart to make other conclusions about training in general that leads to performance.
Does gender matter?
Matches what I've read before. Women are better pacers during a marathon (more even/negative splits and less positive splits, (or faster at the end)), hypothesized that women are better at burning fat then men, and hypothesized that women are better at dissipating heat than men. So if a woman and a man have equal HM times going into the M, the woman is more often than not going to beat the man.
So I agree with the conclusion.
Are faster runners better?
The bottom grey line represents the top 10% of runners with that HM time in each subset of data. So Williams pieced apart the dataset into secondary pools with HM times of 1:20, 1:25, 1:30, 1:35, 1:40, etc. Given the relative smoothness of the line we can tell this is the case. Remembering back, there are only 67 total runners with a 1.06 or less in the dataset of 1071. There are only 256 with a 1.10 or less. There appear to be 9 subsets of data. As would make sense, there are likely fewer runners in the dataset at 1:20-1:30, then there is at 1:50-2:00 (if this dataset is anything like a normal population of HM runners). So the data at the beginning of the line is probably based off very few runners.
The first thing that jumps out to me is that the relationship between HM time and R (for M) is pretty equal for the top 10% across all HM times. A 1:20 10% runner is around 1.06, but so is a 1:55 runner. And the difference between the two is quite small anywhere in-between.
So the variation of the mean is not coming from the top 10% becoming worse converters, but the bottom portion of the population as the HM time slows are getting worse at being converters. So the question would follow, what are the top 10% runners doing that are all near 1.06 across all HM times that the bottom 10% are not? Seems to suggest that regardless of HM time you can be a good converter if you're doing the right things in training. And those in the slower HM times tend to have more runners doing the wrong thing in training (hence bad converters).
What about training mileage?
So per Williams this graph is the "typical" amount of miles run by experienced marathon runners (not their first) going for a PR marathon attempt. This does not have to be the same dataset he used to create the previous graph, but rather a measuring stick he created. So this original dataset doesn't have to be correlated with success in any way or being a good converter.
So the graph on the surface tells a story that most of us know. The people with faster marathon finishing times run more miles. But you know me, I don't like to look at miles, I like duration. So if I were to standardize these mileages across each subset by either Marathon Pace or EB pace (which tends to be the average pace I schedule runners at or 1.12 times slower than MP), then what does the dataset look like?
A 2:20 runner runs 1200 miles in 16 weeks. The MP of 2:20 is a 5:21 min/mile. If the 2:20 runner were to average MP for the 16 weeks of training, then they would do 6:40 hours of training per week (or 106 hours total). If we instead used EB, then the 2:20 runner averages 7:28 hours per week. The 2:20 is clearly the outlier, because look at the other subsets of data. The 2:40, 3:00, 3:20, 3:40, 4:00, 4:20, and 4:40 all run about 5:00 hours (if at MP) or 5:30 hours (if at EB) per week. So on the surface the 2:40 to 4:40 runners would appear different, but when taking into account their relative training pace, they're all actually very similar. This comes down to training load and why I like to evaluate training plans by time moreso than mileage. Two runners doing 80% of training at easy with 9 hours of total running per week will be reaping similar training benefits regardless if one runs a 2:20 M and the other a 4:40 M.
For reference, the marathon training plans I write tend to be in the 7-8 hours average range for 16 weeks. So my plans are like the outliers in the 2:20 M time group.
This is a hard graph for me to interpret. Based on the shape and description, I believe this is a cumulative graph. Meaning that once a runner has been passed in the data set it continues to get counted. So a runner in the 1.06 success portion means that 12% of runners who have sufficient mileage achieve a 1.06. And 60% of runners with sufficient mileage achieve a 1.15 OR LESS. Since the graph does not go down EVER, I don't believe the interpretation of the graph is when r=1.15 is achieved 60% of runners with a 1.15 had sufficient mileage because for that to be the case the addition of insufficient and sufficient on the graph should always equal 100%.
Here's where the interpretation of the graph gets tricky for me. Going back up to the original dataset, there are 580 runners who achieved a 1.15 or better (or 54.19% of the dataset). A total of 60% of runners with sufficient mileage ran 1.15. So the sufficint mileage group and the total group are 60% vs 54.2%. Seems to me these are not very far off from each other. Using this information, I should be able to calculate the number of runners in the 1071 dataset with sufficient mileage and insufficient mileage. I'll save the math, but it comes down to 820 runners have sufficient and 251 runners had insufficient. That allows a 60% success rate in sufficient and 35% success rate in insufficient while maintaining a total of 580 runners in the total dataset.
So going back to 1.06 then, we have 67 total runners at 1.06 OR LESS. From the graph, approximately 12% of the sufficient group hits a 1.06 vs ~5% for insufficient. So, how does that look in the raw numbers? Well that's where I can't make sense of it. If 12% of 820 runners are successful at 1.06 OR LESS, then I've got 98.4 runners. But only 67 runners in the whole data set were successful at 1.06 or LESS. So my original interpretation can't be right, can it? Therefore, I'm confused on this one.
I believe the basic premise is correct, those who run more tend to be more successful. But I can't figure out how to interpret this graph.
What about long runs?
A common consideration for marathon training plans is the long run.
Can't say I've ever heard of the 5L = 100 mile rule of thumb. Where the 5 longest runs in a 16 week plan summed together equal over 100 miles is a good sign. Again, I standardized this information by time:
If a 2:20 runner does 110 miles, then they are averaging 22 miles per Long Run. If the pace is MP, then they are doing it in 1:57:33. If the pace is LR pace (roughly 8% slower than MP), then it's duration is 2:07. So the faster runners, tend to do less total duration on their longest run cumulatively over the course of the plan. Sounds about right to me. I'm of the mindset that the cutoff should be around 2:30 for a training run duration limit at LR pace. Seems like the runners doing 2:20-3:00 marathon times are in that range. And many of the runners doing up to 2:45+ are in the 3:20 or slower M time range. So faster runners are spending less total time in any single training run.
So where the amount of time spent training was near equal across the board, the same doesn't appear for the 5L evaluation. On a training plan like mine, where does 5L typically fall?
Since it's based on time, I've got the MP and LR paces for different paced M finish times. I then calculated the peak of training as 2:30 duration limit. So a 3:20 runner will max at 18 miles and a 4:40 runner at 13 miles. Then, I like to hit peak only twice during a plan and then reduce every previous "high" week by one mile. So for me, a 2:20 runner would be doing 124 miles as 5L (higher than the 110 from Williams dataset) and a 4:00 runner would do 70 miles (or far lower than the 95 miles in Williams dataset).
Now, since all of the runners 5Ls are pooled together, I can't evaluate this graph by duration. But I can point out something troubling to me. The grey lines again represent top and bottom 10%. I already showed reasonably well that my assumed dataset matched Williams graphed dataset. Yet, I estimate he has maybe 10 to 11 total runners out of 1071 above or at 1.30 R. This graph shows the bottom 10% of 85, 90, 95, 100, and 110 at or higher than 1.30. How can that possibly be when there are only 10 to 11 runners in this area? Another new dataset? Confused again.
What I do get from this graph is that a difference of 85 (17 mile avg) vs 100 (20 mile avg) yields an R difference of 1.15 vs 1.21. For a 2:00 HM runner, that's 4:26 vs 4:37 (4% diff). Not an insignificant difference, but not as big a difference as the "are faster runners better" difference which was more like 1.10 vs 1.20 from faster runners to slower runners (5-7% difference). So something other than 5L plays a bigger role in predicting good converters vs bad ones.
So I can take this graph one step further. Williams gives data on 16 week training mileage and 5L from 16 weeks. Which means I can calculate his subset data's % by Marathon time.
The 2:20 runners had a 5L of 110 and 16 week total of 1200 miles. Therefore, their %5L of total was 9.2%. So not only are the better converters around 10% of total, but so are the faster runners. It's possible then to think that if one were to train like a faster runner/better converter they could achieve a lower R (and better M time relative to HM performance). So balance is important. I preach that a ton. So it's not the total mileage of the 5L that matters near as much as the % of which 5L makes up the total plan. So spend less time on the long run, and more time spent training during the week.
So where do my plans fall?
As covered previously, a 5L for me for a 3:00 runner will be around 95 miles. They'll do about 7 hours of training on average regardless of current fitness level. Their pace will be around EB (1.12x slower than MP) as an average for the plan. Therefore, we can calculate the average mileage and total mileage for each subset underneath my scheme. This comes out to a nearly identical 11% 5L as a % of the total training mileage across the board. So my plans are closer to the R values of 1.06-1.07 (or my training plans are better representative of runners who tend to get faster M times relative to their HM times).
What about training pace?
In my book, it's pretty darn important. Pace matters more than mileage, because to me mileage is just a function of time and pace spent training.
Unsurprising to me, runners at the faster paces actually train far slower than final race pace. I hark on this all the time. It suggests that if someone were to slow down in training, they too might yield better race results (or be faster).
An interesting graph. I interpret this to mean that until your average is about 40 seconds slower than race pace, you are more likely to run slower than a 1.15 conversion then you are to run faster than it. Those who run too fast in training tend to be the ones who run worse relative performances against their HM times. So, train slower! Sure seems like somewhere between 40-70 seconds is a sweet spot. There aren't actually that many runners at 80+ seconds, but those who do are pretty successful relatively on achieving a less than 1.15 R value.
So what about my plans?
According to Williams, runners with a race pace of 6:00 tend to run on average 72 seconds slower. So they'd be doing about a 7:12 average. Those at 8:00 with 35 seconds slower, at 8:35. For me, my training plans nearly always equal EB which is 1.12x slower than MP. So a 6:00 runner would average 6:43 and a 8:00 runner a 8:58. So my time differential across the board falls between 40-72 seconds. Going back to the graph Williams presented and that just so happens to appear as the sweet spot for beating the R of 1.15 (or being a better converter and achieving a faster M relative to HM performance).
On being rested
And then there's this graph....
This appears to be saying that peak mileage is reached in "x" week of the 16 weeks. Not surprising to see that the tallest bar is 13 weeks of the traditional taper (3 weeks out). Using the y-axis I can determine how many runners peak at either 13, 14, or 15 weeks of the training plan. It is about 170. The total dataset is 1071 runners. Problem is, when I run through the numbers I only get ~680 total runners, not 1071 runners. Where did the rest of the data set go???
But even ignoring that, the alarming part is this. The traditional taper is 3 weeks. Some do 4 weeks and others 2 weeks. But in this specific dataset there are a huge (roughly 65%) number of runners doing the taper at 5 weeks out or MORE??? Some hitting highest mileage week in Week 1? And not just a few people, but 3% of this graph's population. That seems astoundingly high. Maybe they did 10 miles every week for 16 weeks and thus hit their max mileage in week 1, but that seems odd to me from a dataset standpoint.
The conclusions we can draw from this: