Reference

Metrics Glossary

CalledThird invents some metrics and inherits others. This page is the reference: every metric we use, what it measures, and how to read the numbers. Each entry links to the article that introduced or validated it.

Umpire metrics

IOAI — In-Out Asymmetry Index

How much more (or less) generous an umpire is on the outside corner versus the inside corner, relative to the league baseline.

Formula: Residual called-strike rate on the 4-inch outside edge band (|plate_x| ∈ [0.83, 1.17] ft, normalized so “outside” is always away from the batter) minus residual called-strike rate on the inside edge band. Residual = empirical − baseline-model expected.
Range: Roughly −0.20 to +0.20 in 2025 data. Positive = outside-generous. Negative = inside-generous.
Worked example: Stu Scheurwater: +0.120 (calls outside-corner strikes 12 percentage points above league baseline). Alex MacKay: −0.070. The full 2025 league spans 18.9pp.
Reliability: Most stable umpire feature. Cross-half 2025 correlation r = 0.51. Cross-method per-umpire correlation r = 0.98.
Read more: The 19-Point Strike Zone

EAR — Edge Aggression Rate

How aggressively an umpire patrols the rulebook edge band — does he give the edge or shrink the zone?

Formula: Residual called-strike rate on pitches with zone_dist_inches ∈ [0, 2] — the “shadow band” just outside the rulebook zone.
Range: Roughly −0.10 to +0.10. Positive = gives the edge. Negative = shrinks the zone.
Reliability: Second-most stable feature. Cross-half r ≈ 0.5. Persistent across both our independent statistical pipelines.
Read more: The 19-Point Strike Zone

BSR — Borderline Strike Rate

Of pitches in the borderline region (~3 inches from the rulebook edge), what fraction are called strikes? A descriptive aggregate, not residualized.

Range: League average is around ~50%. Aggressive umpires push it to ~60%+; conservative umpires fall to ~40%.
Note: Used in our Four Kinds of Zone 2×2 (accuracy × BSR). BSR is related to EAR but is descriptive rather than residualized — it doesn’t correct for the location distribution of pitches.

Wrong Calls / Game

The average count of false strikes (called strikes that were balls) plus missed strikes (called balls that were strikes) per game an umpire works.

League average 2025: About 10.9 wrong calls per game on average (about 7.2% of all called pitches).
Read more: The Umpire Effect

HLB — High-Low Bias DISPUTED

Tendency to call the top of the zone vs the bottom. We don’t feature it.

Status: Method-dependent. One of our pipelines finds r = 0.42 cross-half; the other finds r = 0.19 with a 95% CI that crosses zero. We report it but don’t rank umpires on it.

CCZE — Count-Conditioned Zone Expansion NOISE (per-umpire)

Per-umpire tendency to give a wider zone on 3-0 vs 0-2 counts. Real at the population level, not at the individual umpire level.

Why it’s here: The broadcast claim “Ump X expands the zone with two strikes” doesn’t hold up — cross-half r ≈ 0 in both our methods. We report it as the canonical example of a folk-wisdom metric that’s actually noise.
Read more: The myth of the two-strike zone

Pitcher metrics

Tunneling Divergence

How separated a pitcher’s pitch types are at the plate, after looking identical at the hitter’s decision point ~23.9 ft from the rubber.

Formula: Jensen-Shannon divergence between per-pitch-type centroids at decision-point coordinates (dec_x, dec_z) vs final plate coordinates (plate_x, plate_z). Higher = pitches look more alike at the decision point but more different at the plate.
Range: Roughly 2 to 20 in raw units (inches of effective separation). Top tunnelers cluster around 10+.
Read more: The Pitch Tunneling Atlas · The physics behind it

Plate Sep / Decision Sep

The raw geometric separations between pitch-type centroids at the plate (plate_sep) and at the decision point (dec_sep). The ratio drives Tunneling Divergence.

Units: Inches. A high-tunneling pitcher has small dec_sep (pitches look alike when the hitter has to commit) and large plate_sep (they end up far apart).

Command Variance / Starter Scatter

How tightly a starter hits his intended target locations across an outing. Captured as a scatter of intended vs actual locations.

Use: Lower scatter = better command. The Explore tab shows this as a tightness metric on the Pitchers > Command sub-tab.
Read more: Do pitchers lose their command?

Walk Spike Attribution

For a 2026 pitcher, how much of his walk-rate change is attributable to the ABS zone change vs his own pitch-mix behavior.

Read more: Three weeks later: The Walk Spike is fading

Hitter metrics

Coaching Gap (Δ predictable wOBA)

The wOBA edge a hitter extracts specifically on predictable pitches — pitches where the at-bat context strongly forecasts what’s coming.

Headline finding: Low-chase hitters extract roughly +0.04 wOBA more on predictable pitches than high-chase hitters. Power doesn’t matter for this; contact rate doesn’t matter; discipline does.
Read more: The Coaching Gap that lives where hitters don’t chase

Chase Tertile

Hitter discipline classification — low / mid / high — based on how often they swing at pitches outside the zone.

Use: The pivotal split in the Coaching Gap finding. Low-chase tertile is where the predictable-pitch edge lives.

April Sell Score / Hot Start Stability

For 2026 hot starts, a regression-aware projection of how much of the early-season pace is likely to sustain.

Use: Splits the 2026 hot starts into “sell” (likely regression) vs “sleeper” (probably real, baseball isn’t talking about them yet). Dual-agent validated.
Read more: The April Sell List

ABS & Challenges

ACG — ABS Conformance Gap

Among pitches that became ABS challenges, the fraction where the umpire’s original call was overturned by the robot.

Status: Defined per-umpire but currently too thin to publish (most umpires have 10–15 challenges in 2026 so far). All-Star break revisit.

Challenge Overturn Rate

The fraction of challenges (by player, team, or count) that the ABS system overturns.

League average 2026: About 53% league-wide. Catcher-initiated challenges hit ~61%; batter-initiated ~45%.
Read more: The best ABS challengers are catchers

Challenge Value

For each overturned challenge, the wOBA-shift saved or captured based on the count it occurred in.

Why count matters: A 3-2 wrong call is worth roughly 0.690 wOBA. A 1-0 wrong call is worth 0.130. Smart challenges happen at high-value counts.
Read more: Minnesota buys leverage. Cincinnati buys certainty.

Counts & Game state

Count Leverage / wOBA per Count

The expected wOBA outcome of a plate appearance given the current count state.

Anchors: 3-2: 0.690 · 2-2: 0.384 · 0-0: 0.095. Wrong calls scale with this.
Read more: The count tells you everything · The count that matters

Reliever Strand Rate

For relief pitchers, the fraction of inherited runners that fail to score after the reliever enters.

Note: 2026 leaderboard live; uses MLB official inherited-runner counts joined from boxscores. Stability is a separate question we’ll revisit at season end.
Read more: The fireman’s dilemma