How We Do the Math
Full transparency on our data sources, zone model, and statistical methods. If you can't see the work, you can't trust the numbers.
Data Sources
| Source | Data | Frequency |
|---|---|---|
| Statcast (via Baseball Savant) | Pitch-level tracking: location (plate_x, plate_z), strike zone boundaries (sz_top, sz_bot), pitch type, velocity, outcome | Daily, 6h delay |
| MLB Stats API | Umpire assignments, game metadata, player identities | Real-time |
| ABS Challenge Data | Challenge usage, outcomes (when MLB publishes this data) | Per-game |
All pitch-tracking data comes from MLB's Statcast system, which uses Hawk-Eye cameras installed in all 30 MLB ballparks. Pitch location accuracy is ±0.3 inches. We access this data through the public pybaseball Python library.
ABS Zone Model
The Automated Ball-Strike (ABS) system defines a strike zone differently than the traditional rulebook zone. Our zone model replicates the ABS definition as closely as publicly documented:
Horizontal Boundaries
The plate is 17 inches wide. The ABS zone extends to the edges of the plate — a pitch is a strike if any part of the ball crosses any part of the 17-inch plate. Given a baseball diameter of ~2.9 inches, this means the zone extends ~1.45 inches past each edge.
Vertical Boundaries
The top and bottom of the zone are defined per-batter using Statcast's sz_top and sz_bot fields, which are measured from the batter's stance. The ABS zone applies the same ball-diameter extension at top and bottom.
Zone Distance
For each called pitch, we compute the minimum distance from the ball center to the nearest zone edge, in inches. A negative distance means the ball was inside the zone; a positive distance means outside.
Accuracy Computation
An umpire's accuracy on a given pitch is binary: correct or incorrect.
- Correct call: Called strike and the ball was in the ABS zone, OR called ball and the ball was outside the ABS zone.
- False strike: Called strike, but the ball was outside the ABS zone.
- Missed strike: Called ball, but the ball was in the ABS zone.
Game-level accuracy = correct_calls / total_called_pitches × 100. We only count called pitches — swings, foul balls, and hit-by-pitches are excluded because the umpire doesn't make a judgment on those.
Run Value Impact
Not all missed calls are equal. A missed strike with the bases loaded in the 9th inning matters more than one in the 2nd with nobody on. We quantify this using delta run expectancy: the change in expected runs for the batting team caused by the count change resulting from the miss.
Run expectancy values are computed from the 2025 season's base-out-count state matrix using Statcast data.
Statistical Standards
- All sample sizes are stated explicitly
- Correlations are reported with r-values and p-values
- We distinguish between statistical significance and practical significance
- Uncertainty ranges are shown when relevant (confidence intervals, prediction intervals)
- We use proper corrections for multiple comparisons when applicable
- All code and data processing is version-controlled in our GitHub repository
Limitations
- Statcast tracking has ±0.3 inch precision — borderline calls within this margin are genuinely ambiguous
- The ABS zone definition may differ slightly from MLB's actual implementation (which is not fully public)
- Batter height calibration (sz_top/sz_bot) can vary within a game as batters change stance
- We cannot account for intentional catcher framing effects on umpire perception