The Hidden Truth About Garmin and Coros: VO2 Max Accuracy That Will Shock You
ByNovumWorld Editorial Team

Resumen Ejecutivo
- Garmin’s VO2 Max estimation accuracy can plummet to 91% for users who underestimate their maximum heart rate by just 15 bpm, demonstrating that the algorithm is only as good as the biometric input it receives.
- A recent study by Engel et al. (2025) found that Garmin consistently underestimated VO2 max in highly trained athletes by approximately 6.3 ml/min/kg, exposing a critical failure point in the regression models used by major tech companies.
- Smartwatch-based lactate threshold estimates, while convenient, lack the granularity of blood analysis and should be viewed as rough approximations rather than prescriptive training targets.
The quantified-self movement has sold millions of athletes a comforting lie: that a $400 computer on your wrist can replicate the physiological rigor of a laboratory. The reality is that the gap between consumer wearables and gold-standard metabolic testing is widening, not shrinking, despite the aggressive marketing claims from Silicon Valley. While companies like Garmin and Coros battle for market share with increasingly complex algorithms, the fundamental physiology they attempt to capture remains stubbornly resistant to simplification.
The Garmin vs. Coros Showdown: Who Really Gets VO2 Max Right?
The market rivalry between Garmin and Coros is less about hardware and more about a war of proprietary algorithms, both claiming to solve the impossible equation of estimating maximal oxygen uptake without gas analysis. Garmin, leveraging its acquisition of Firstbeat Analytics, has long dominated the narrative with its “Performance Condition” and VO2 max metrics, often citing a study on the Garmin Fenix 6 that reported a 95% accuracy rate with a margin of error less than 3.5 ml/kg/min. This statistic is frequently parroted in running forums as proof of wrist-based infallibility, yet it obscures the specific conditions under which this accuracy was achieved: controlled, steady-state efforts with verified heart rate straps.
Coros, the challenger in this space, argues that their error margins are lower, particularly for ultra-distance runners, but they face the same insurmountable physics problem: they cannot measure oxygen consumption. Joe Heikes, a Garmin Product Manager, admitted that the algorithm relies entirely on correlation, using inputs like age, sex, estimated maximum heart rate, height, and weight as the foundation for the calculation. The mechanism is purely mathematical; the watch observes your pace and your heart rate, then compares that relationship to a massive dataset of lab-tested athletes to guess where you fall on the fitness spectrum.
This creates a “black box” problem where the user has no visibility into the regression coefficients. If the watch assumes a standard heart rate decline curve for your age, but you are an outlier with exceptional cardiac drift, the VO2 max estimation will be fundamentally flawed. The validity of wrist-worn trackers is frequently called into question because these devices are essentially guessing your physiology based on how you compare to the average population, rather than measuring your individual metabolic response.
The Overlooked Impact of Heart Rate on VO2 Max Accuracy
The single greatest point of failure in the VO2 max estimation chain is not the GPS tracking or the pace calculation, but the optical heart rate sensor. Garmin’s algorithm is heavily dependent on the assumption that the heart rate data fed into it is an accurate reflection of cardiac strain. Research indicates that Garmin VO2 max accuracy can drop to 91% for runners who underestimate their max heart rate by 15 bpm and to 93% for those who overestimate it by the same amount. This sensitivity turns a simple setting in your user profile into a massive variable that can distort your fitness score by several points.
Heather Milton, an exercise physiologist at NYU Langone’s Sports Performance Center, emphasizes that tracking VO2 max over time is useful for determining training effectiveness, but she implicitly acknowledges the fragility of the data inputs. The mechanism here is straightforward: VO2 max is the product of cardiac output and arterial-venous oxygen difference. Since the watch cannot measure the latter (oxygen extraction), it relies entirely on heart rate as a proxy for cardiac output. If the optical sensor loses lock during high-intensity intervals—a common occurrence due to motion artifact and poor perfusion—the algorithm receives corrupted data.
Wrist-based photoplethysmography (PPG) shines light through the skin to detect blood volume changes, but this technology is easily disrupted by the mechanical vibrations of running. During interval sessions, where heart rate variability is high and rapid changes occur, the lag in optical readings can be significant. This lag leads the algorithm to believe the runner is working less hard than they actually are for a given pace, resulting in an underestimated VO2 max score. The accuracy of smartwatches in predicting performance is therefore capped by the physical limitations of optical sensors, which cannot yet match the fidelity of a chest strap.
The Lactate Threshold Debate: What Experts Aren’t Telling You
Beyond VO2 max, both Garmin and Coros have ventured into the treacherous territory of estimating lactate threshold (LT), a metric that is notoriously difficult to pinpoint without invasive blood sampling. Garmin devices provide lactate threshold estimates that show “acceptable agreement” in some studies (MAPE = 7.52%), but this statistical acceptance masks a dangerous potential for misuse. Nattai Borges of Central Queensland University highlighted the use of lactate threshold to dictate training practices, noting that these figures should be interpreted alongside perceived exertion.
The mechanism of lactate threshold estimation involves analyzing the relationship between heart rate and pace, looking for a non-linear inflection point where the body begins to accumulate lactate faster than it can clear it. However, this physiological inflection point is often subtle and influenced by factors like hydration, glycogen status, and fatigue—variables the watch cannot detect. The algorithm is essentially looking for a pattern in the noise of heart rate drift.
Alicia Dodd from My Vital Metrics states that Garmin cannot directly measure VO2 Max or lactate threshold and instead estimates them using algorithms based on correlations. This distinction is vital: correlation is not causation. The watch might guess your lactate threshold correctly because you happen to fit the average profile, but it has no way of knowing if you are a “responder” or “non-responder” to specific training stimuli. Relying on these estimates to set precise training zones is a trap that can lead to overtraining or undertraining, as the watch lacks the biochemical context to define the true boundary between aerobic and anaerobic effort.
The Performance Pitfalls: Real-World Limitations of Smartwatch Metrics
The marketing gloss of “smart coaching” often obscures the raw data limitations inherent in wrist-worn devices. A study assessing the accuracy of smartwatch-based estimation found that while devices like the Apple Watch Series 7 showed promise, they still exhibited mean absolute percentage errors exceeding 15% in certain cohorts. The error is not random; it is systematic. Devices tend to overestimate VO2 max in users with poor fitness levels and underestimate it in those with higher fitness levels, creating a regression to the mean that punishes elite performers.
This phenomenon is particularly evident in the findings of Engel et al. (2025), who found that the Garmin Forerunner 245 consistently underestimated VO2 max in highly trained athletes by approximately 6.3 ml/min/kg. For an elite runner, a 6-point deficit is the difference between a podium finish and packing up early. The algorithm is likely trained on a dataset dominated by recreational athletes, creating a model that fails to extrapolate to the physiological extremes of the elite population.
Furthermore, the reliance on GPS for pace introduces another layer of error. In urban environments or on trails with poor satellite reception, GPS drift can make the runner appear slower than they actually are. When the algorithm sees a slow pace paired with a high heart rate (due to anxiety or terrain), it assumes a low level of fitness. This “data pollution” means that your VO2 max score is as much a reflection of your running environment as it is of your cardiovascular health.
Looking Ahead: The Real Consequences of Misinterpreting VO2 Max Data
As the wearable market matures—Wearable Devices Ltd. reported revenues jumping from $82,000 in 2023 to $522,000 in 2024—the pressure to deliver actionable insights is pushing companies to make increasingly definitive claims based on probabilistic data. The danger lies in the user interface: a single number displayed prominently on the wrist, updated daily, which implies a precision that the underlying technology cannot support. Athletes who base their tapering or peak training on these fluctuating numbers are essentially letting a blindfolded algorithm drive their training bus.
The discrepancy in VO2 max estimates is not merely an academic concern; it leads to misguided training strategies. If a watch underestimates an athlete’s capacity, they may hold back in key workouts, failing to stimulate the necessary adaptations for improvement. Conversely, an overestimation can push an athlete into the overtraining syndrome valley, increasing the risk of injury and burnout. The Engel et al. (2025) study serves as a stark warning that for high-performance scenarios, relying on consumer-grade tech is a liability.
The industry is moving toward multi-sensor fusion, attempting to combine optical HR, GPS, and even altimeter data to refine these estimates. However, without a direct measure of oxygen saturation or lactate concentration, these remain approximations. The “future” promised by marketing departments—where the watch knows your body better than you do—is still constrained by the laws of physics and the limitations of regression analysis.
The Bottom Line
The evidence tilts toward Garmin having a more reliable VO2 max estimation for the general population compared to some competitors, but the systematic underestimation for elite athletes renders the metric dangerous for the very demographic that cares about it most. The technology is impressive as a trend-tracking tool, but it is woefully inadequate as a diagnostic instrument.
Athletes must stop treating the number on the screen as ground truth. The actionable protocol is simple but requires discipline: use a chest strap for every run to ensure heart rate accuracy, validate your maximum heart rate through a field test rather than relying on the “220 minus age” formula, and treat the watch’s VO2 max and lactate threshold estimates as “ballpark” figures rather than definitive training zones. For anyone serious about performance, a yearly lab test remains the only gold standard, and the smartwatch should be used to track the direction of fitness, not the magnitude.