The Headline
In the first 11 days of ABS, teams have challenged 541 called pitches across 137 games — about 3.9 challenges per game. More often than not, they were right: 55.1% of challenges were overturned, meaning the umpire's original call was wrong more than half the time a team thought it was worth challenging.
That's the surface number. The interesting story is underneath.
The Fielder Side Owns the Challenge Game
Of the 541 challenges, 294 (54%) were initiated by the fielder side — primarily catchers — and 247 (46%) by batters. The fielder side doesn't just challenge more; they win at a significantly higher rate.
challenging_player_type = "catcher".The 10-percentage-point gap (59.5% vs 49.8%) persists after controlling for count state, inning, edge distance, pitch type, and batter handedness in a logistic regression (OR = 1.48, p = 0.034). It is the only non-count feature that reaches statistical significance.
Why does the fielder side win more often? We can describe the gap but not yet explain it. Without opportunity denominators — how many challengeable pitches were NOT challenged — we cannot distinguish "catchers pick better pitches to challenge" from "catchers have more obvious pitches available" or "catchers challenge less selectively." Three hypotheses remain plausible but untested:
- Trajectory read: Catchers see the pitch's full trajectory and may have better spatial calibration for where it crossed the plate.
- Framing instinct: Years of framing may give catchers calibrated intuition for zone-edge location.
- Information advantage: Catchers know the intended target and can estimate the pitch's actual location relative to the zone.
Whether this is a population-level phenomenon or an individual skill cannot yet be established with 5–13 challenges per catcher. Wilson 95% confidence intervals on individual rates overlap substantially — Dingler's 7/7 spans [64.6%, 100%], O'Hoppe's 10/12 spans [55.2%, 95.3%], Quero's 5/13 spans [17.7%, 64.5%]. We're watching for skill separation as the season progresses.
The Count Tradeoff
Count state is the strongest predictor of challenge outcomes. But the relationship creates a genuine strategic tension: the counts where challenges succeed most often are worth the least, and the counts where they're worth the most succeed least often.
At 0-0, challenges overturn 62% of the time — but the run-value swing of flipping a single ball/strike at 0-0 is only 0.070 runs. At 3-2, challenges succeed just 39% of the time, but when they work, the swing is 0.305 runs (a walk vs. a strikeout). The expected value per challenge at 3-2 (0.119) is nearly three times the EV at 0-0 (0.043).
The 3-0 count is an outlier: all 8 challenges were overturned (100%), with the highest swing value (0.230 runs). But with only 8 observations and confidence intervals spanning 68–100%, this is a suggestive data point, not a reliable rate.
The strategic implication: teams should not evaluate challenge decisions by overturn rate alone. A 39% success rate at 3-2 generates more expected value than a 62% success rate at 0-0.
Defense Is Winning the Run-Value Transfer
Every successful challenge shifts run expectancy from one side to the other. When the fielder side overturns a ball to a strike, the defense gains; when a batter overturns a strike to a ball, the offense gains. These are opposing sides of the same transfer — challenges don't create runs, they move them.
So far, defense has captured the larger share: 20.0 runs of corrected leverage from fielder-side overturns vs. 15.3 from batter overturns, a net -4.8 run impact for offenses. This asymmetry flows directly from the fielder-side success rate advantage — they challenge more often and win more often, so they capture more of the run-value transfer.
The 3-2 count generates the most total corrected leverage (7.0 runs) despite the low overturn rate, because the swing value per overturn is so large. The 0-0 count generates the second most (4.8 runs) through sheer volume (110 challenges, 68 overturned).
The Late-Inning Fade
Overturn rates decline as games progress, from roughly 60% in innings 1–5 to roughly 48% in innings 8–9.
The effect does not reach conventional significance in the regression (OR = 0.93 per inning, p = 0.051). Three possible explanations, none confirmed:
- Declining selectivity: Teams may burn their clearest challenges early and challenge more marginal pitches late.
- Higher stakes, lower threshold: Late-game leverage makes teams more willing to challenge even low-probability pitches.
- Umpire focus: Umpires may concentrate more in high-leverage late situations, though we have no direct evidence for this.
More data is needed to confirm this pattern. With only 11 game dates and per-inning samples of 48–71 challenges, the noise level is high.
Edge Distance Doesn't Predict Outcomes
Within the challenged-pitch sample, overturn rates are remarkably flat across edge distances: 56.8% for pitches within half an inch of the zone edge, 50.8% for pitches 1.5–2 inches away, and 57.1% for pitches 3+ inches away. The logistic regression confirms this (OR = 1.03, p = 0.73).
Critical caveat: this is only true conditional on a challenge being made. Edge distance almost certainly matters for umpire accuracy in general (our earlier analysis shows a cliff at 0.5 inches). But within the self-selected sample of pitches that teams chose to challenge, edge distance doesn't predict success. Challengers appear to self-select based on their own confidence rather than a simple distance threshold.
What We Can't Say Yet
This analysis covers only the 541 pitches that were challenged, not the full population of called pitches. Without opportunity denominators — the count of challengeable pitches that weren't challenged — we cannot answer:
- Are teams under-challenging or over-challenging at any count?
- Is the current challenge mix optimal under a finite challenge budget?
- Would different challenge allocation strategies produce better outcomes?
We also can't yet separate individual skill from noise. In our data, all fielder-side challenges have challenging_player_type = "catcher" — catchers appear to be the ones pressing the button. But with only 5–13 challenges per catcher, the early leaders — Dillon Dingler (7/7), Logan O'Hoppe (10/12), Agustín Ramírez (7/9) — have confidence intervals that overlap heavily. A hierarchical model with shrinkage would be the right tool for skill estimation, and it needs more data than 11 days provides.
CalledThird tracks all of this nightly. As the sample grows, the signal will separate from the noise.
Methodology
Data: 541 ABS challenges scraped from the Baseball Savant gamefeed API, Mar 26 – Apr 5, 2026. Each challenge record includes the abs_challenge nested object with is_overturned, challenging_player_type, and edge_distance.
Challenge value: Computed using Tom Tango's RE288 count-state linear weights. The swing value at each count is the run-expectancy difference between a ball and a strike at that count. Corrected leverage = swing value of each overturn, signed by beneficiary.
Regression: Logistic regression with 11 count-state indicator variables (0-0 reference), is_fielder_side (1 if challenger is catcher/fielder, 0 if batter), inning, edge_distance_in (standardized), is_fastball, is_righty_batter. 17 features total. Model accuracy: 60.4% vs 55.1% baseline. Full regression output saved in analysis-abs/data/regression_output.json.
Challenger identity: The Baseball Savant API field challenging_player_type returns "batter" or "catcher" for all 541 challenges in our sample. We use "fielder-side" at the topline level because MLB rules allow any defensive player to challenge, even though catchers appear to initiate all fielder-side challenges in practice.
Confidence intervals: Wilson score intervals throughout. Two-proportion z-tests for key comparisons.
Limitations: No opportunity denominators (only challenged pitches, not all called pitches). Selection bias from challenger self-selection. Small player-level samples (5–13 per catcher). 11 game dates. The fielder-side success rate advantage may reflect pitch selection differences rather than decision-making skill.