Skip to content

Data Sources

Source Data Frequency
Statcast (via Baseball Savant) Pitch-level tracking: location (plate_x, plate_z), strike zone boundaries (sz_top, sz_bot), pitch type, velocity, outcome Daily, 6h delay
MLB Stats API Umpire assignments, game metadata, player identities Real-time
ABS Challenge Data Challenge usage, outcomes (when MLB publishes this data) Per-game

All pitch-tracking data comes from MLB's Statcast system, which uses Hawk-Eye cameras installed in all 30 MLB ballparks. Pitch location accuracy is ±0.3 inches. We access this data through the public pybaseball Python library.

ABS Zone Model

The Automated Ball-Strike (ABS) system defines a strike zone differently than the traditional rulebook zone. Our zone model replicates the ABS definition as closely as publicly documented:

Horizontal Boundaries

The plate is 17 inches wide. The ABS zone extends to the edges of the plate — a pitch is a strike if any part of the ball crosses any part of the 17-inch plate. Given a baseball diameter of ~2.9 inches, this means the zone extends ~1.45 inches past each edge.

Vertical Boundaries

The top and bottom of the zone are defined per-batter using Statcast's sz_top and sz_bot fields, which are measured from the batter's stance. The ABS zone applies the same ball-diameter extension at top and bottom.

Zone Distance

For each called pitch, we compute the minimum distance from the ball center to the nearest zone edge, in inches. A negative distance means the ball was inside the zone; a positive distance means outside.

Accuracy Computation

An umpire's accuracy on a given pitch is binary: correct or incorrect.

  • Correct call: Called strike and the ball was in the ABS zone, OR called ball and the ball was outside the ABS zone.
  • False strike: Called strike, but the ball was outside the ABS zone.
  • Missed strike: Called ball, but the ball was in the ABS zone.

Game-level accuracy = correct_calls / total_called_pitches × 100. We only count called pitches — swings, foul balls, and hit-by-pitches are excluded because the umpire doesn't make a judgment on those.

Run Value Impact

Not all missed calls are equal. A missed strike with the bases loaded in the 9th inning matters more than one in the 2nd with nobody on. We quantify this using delta run expectancy: the change in expected runs for the batting team caused by the count change resulting from the miss.

Run expectancy values are computed from the 2025 season's base-out-count state matrix using Statcast data.

Statistical Standards

  • All sample sizes are stated explicitly
  • Correlations are reported with r-values and p-values
  • We distinguish between statistical significance and practical significance
  • Uncertainty ranges are shown when relevant (confidence intervals, prediction intervals)
  • We use proper corrections for multiple comparisons when applicable
  • All code and data processing is version-controlled in our GitHub repository

Limitations

  • Statcast tracking has ±0.3 inch precision — borderline calls within this margin are genuinely ambiguous
  • The ABS zone definition may differ slightly from MLB's actual implementation (which is not fully public)
  • Batter height calibration (sz_top/sz_bot) can vary within a game as batters change stance
  • We cannot account for intentional catcher framing effects on umpire perception