When MLB pitchers throw their most predictable pitches — the ones a model can flag as we know what's coming — the top third of hitters by plate discipline produce a .366 wOBA. The bottom third produce .321. That's a 62-point edge for patient hitters against a 25-point edge for chasers — a 2.5× difference in how much advantage each hitter type extracts from pitcher predictability.
In other words, the coaching gap is real. It just requires the right hitter to collect it.
The finding, in one chart
Each pair of bars is one chase tertile. The darker bar is wOBA on predictable pitches, the lighter bar is wOBA on everything else. Hover or tap a tertile to see the gap. 2022–2025 pooled (~393k terminal-outcome pitches).
Five seasons, five replications
To make sure we weren't chasing a one-season pattern, we ran the same test on every MLB season from 2022 through 2026. The low-chase advantage shows up every single year. Four of the five seasons reach permutation p<0.01 on the spread. The fifth — 2026 — is the opening month of the current season, with roughly one-sixth the sample of a full year, and the tertile gaps individually still all clear zero.
Per-tertile wOBA gap with 95% CIs for each MLB season, 2022–2026. 2026 is the strict holdout: the pitch-prediction model was fit on 2022–2024 and never saw 2026 data in training.
The replication on 2026 data is the cleanest possible out-of-sample test. The effect doesn't require being in-sample. It's a real structural feature of how MLB hitters respond to predictability.
Two agents, three estimators, one survivor
Every finding in this piece was produced twice, independently. A Bayesian hierarchical model built by Claude and a gradient-boosted machine-learning pipeline built by Codex analyzed the same dataset in parallel, then cross-reviewed each other. We report all three standard estimators: pooled between-batter, batter fixed effects, and matched pairs within-pitcher-season.
Low-chase coaching gap across six estimates. The pooled and fixed-effects estimators converge tightly near +.040 wOBA in both analyses. Matched pairs — the strictest within-batter design — attenuates toward zero because pitch-level outcomes are dominated by structural variance.
The pooled and fixed-effects estimators land between +.038 and +.044 wOBA across both agents. The matched-pairs estimator is attenuated and implementation-sensitive — not because the effect disappears, but because at our sample size it simply doesn't have the power. We verified this directly:
Simulated statistical power for each estimator against an implanted +.025 wOBA effect, as a function of sample size (units = matched pitcher-batter pairs). Pooled between-batter reaches 80% power around N=1800. Matched pairs is still below 50% at N=2000.
This is the honest version of the story. The between-batter estimators agree across two independent methods and five seasons. The matched-pairs estimator is noisier at the sample sizes baseball affords us — something to improve with more seasons, not a reason to discount what the other two designs already show.
The mechanism, hitter by hitter
The finding only works if hitters who improved their overall chase also improved their discipline on the specific pitches the model flagged as predictable. Across 659 year-over-year hitter-season transitions spanning 2022 through 2025 (three completed year-pairs), they do.
Dropping overall chase by 1 percentage point cuts chase on predictable bait by about a point too (Spearman r = +0.53). The contact payoff — fewer whiffs on predictable swings — is real but weaker (+0.17), consistent with discipline mattering more than bat-to-ball for this story. Green dots in the upper-left of each plot are where the coaching gap actually gets collected: hitters who got more patient, and cashed the patience on the pitches a model could already flag.
Who actually cashes in
We identified 36 MLB hitters in 2025 who sit in the “quality hitter” quadrant — low chase rate (bottom tertile) crossed with high expected wOBA on contact (top tertile). This is the archetype that extracts the full gap.
36 qualifying 2025 hitters in the low-chase × high-xwOBAcon quadrant. Tap a row to highlight.
Toggle to the Free Swingers view above to see the mirror: the 18 qualifying hitters in 2025 who sit in the high-chase × low-xwOBAcon quadrant. When these hitters face a predictable pitcher, they still swing at whatever. The scouting report goes unused.
The most disciplined hitter inside the quality-hitter group is Juan Soto at a .180 chase rate — no qualifying hitter with his level of contact quality swings at fewer out-of-zone pitches. Behind him: Kyle Schwarber (.213), Seiya Suzuki (.216), Aaron Judge (.224), Spencer Torkelson (.230), James Wood (.246), Francisco Lindor (.250), Bryan Reynolds (.252), Matt Olson (.256), Willy Adames (.258).
A revealing edge case: Luis Arráez. He's the most extreme contact hitter in baseball (.058 whiff rate, nearly a sixth of the league average) and a two-time batting champion. But his chase rate is .357 — top-third in the league. Result: he's classified as a free-swinger in our framework, and despite his contact wizardry he extracts only a league-baseline coaching-gap boost, not the quality-hitter premium. Contact ability without discipline doesn't cash the predictability check.
The quality-hitter premium, replicated
These 36 hitters extract an extra +0.029 wOBA on predictable pitches beyond the league baseline — confirmed by two independent analyses using different statistical methods, across all three estimators.
Extra wOBA gap for the quality-hitter 2×2 group beyond the league baseline, by estimator and agent. Both methods agree on the sign and approximate size across all three designs.
How to spot the pitcher
Some 2025 MLB pitchers were far more readable than others. For each qualifying pitcher (≥400 pitches in 2025) we fit a per-pitcher fastball-vs-offspeed logistic model using count state, batter handedness, and pitch number, then report the 5-fold cross-validated AUC. 0.5 is the coin-flip baseline; 1.0 means count and context tell you everything. The 2025 final-season range stretched from 0.497 to 0.838.
Predictability score: mean next-pitch accuracy of the Layer-2 contextual model across 2025 outings. Higher = more predictable.
The most readable arm in baseball in 2025 was José Alvarado (0.838) — his cutter-sinker mix reads almost perfectly off count. Garrett Crochet leads starters at 0.768: once you know the count, his fastball-vs-offspeed split is close to a coin flip flipped six out of ten times. On the other end, genuine mixers sit just at the baseline — Dylan Cease is the clearest ace-level example (0.525, 3,113 pitches); Jason Adam is the reliever benchmark (0.517 with the broadest pitch-type palette in baseball). Kyle Hart (0.497) and Matt Gage (0.499) edge below the coin-flip line in 2025 by sampling variance.
The live leaderboard on the Coaching Gap tracker updates nightly with 2026 in-progress values. Because 2026 is a partial season, the live tracker uses a looser 150-pitch floor and individual estimates have wider noise — expect AUCs ranging roughly 0.40 to 0.85 there until the full season lands.
The lineup arithmetic
A single quality hitter, facing predictable-pitch exposure across a full season, collects about 4 extra runs (roughly 0.4 wins) compared to a league-baseline hitter. Run that out across a full batting order:
- A lineup of 9 quality hitters vs a lineup of 9 free-swingers (worst-to-best tertile differential): ~52 runs per season, or about 5 wins.
- That's roughly the difference between a wild-card team and a division winner — produced entirely by hitter discipline choices in response to predictable pitching.
Teams don't field 9 quality hitters. No team in 2025 fielded more than 5. But even two or three quality hitters in a lineup compound the edge against the league's roughly 15% of genuinely predictable pitchers.
What didn't matter
We spent six rounds testing every other dimension we could think of. Most of them didn't survive.
All 17 hypotheses tested across six rounds. Click a row for detail. Two survivors, three disputed between the two agents, twelve nulls.
Specifically: pitcher type (archetypes via unsupervised clustering failed stability tests), pitch sequencing (a famous 2023 finding that “same pitch twice in a row” costs hitters 0.030 wOBA did not replicate on our larger dataset — the effect is effectively zero), team scouting heterogeneity (no single team's lineup systematically extracts more gap than another), times through the order × predictability (the classic TTO penalty doesn't sharpen on predictable pitchers), and stuff quality × predictability (no interaction).
Seventeen tests. Twelve nulls. Three where the two methods disagreed. Only chase-rate discipline and the quality-hitter 2×2 survived cross-method replication.
How we know it's real (and how we know it isn't a tonight's-bet)
An effect this counter-intuitive deserves a sharper validation than “the regression coefficient is significant.” We ran two:
The per-matchup test — null
For every starter-vs-batter pairing in 70,000+ historical games (2022–2026), we computed the slate-style predicted edge from this same model and checked whether the at-bats that actually happened bore it out. Spearman correlation ≈ 0. The realized wOBA gap hovers around zero across every predicted bin. Individual matchups are dominated by sampling noise. Don't bet single games on this.
The longitudinal test — real
We then asked the harder, more honest question: do batters who actually improved their chase rate year-over-year see better wOBA on predictable pitches the next year? Across 495 batter-season transitions:
- Big improvers (chase rate dropped ≥ 0.04, n=68): gained +0.012 wOBA on predictable pitches
- Stable (n=366): essentially zero change
- Big decliners (chase rate rose ≥ 0.04, n=61): lost −0.029 wOBA — statistically significant (CI excludes zero)
Spearman correlation between Δchase and Δpredictable-pitch-wOBA: r = −0.17 across all 495 transitions. The mechanism is real on the timescale of months and seasons. It's not real on the timescale of a single Tuesday-night at-bat.
The live tracker follows this at the level the signal actually lives: the Trends tab shows current MLB chase improvers + decliners; the Validation tab walks through both calibration tests with current numbers.
What it means
For teams
Plate discipline is the scouting multiplier. If your front office is paying for a predictive-analytics department, the value of that investment depends on whether your hitters can act on the output. Aggressive swingers can't — they're making roughly the same swing decisions regardless of what the scouting report says.
Roster construction should weight chase rate as the marginal trait that unlocks scouting value. Two hitters with identical wOBA but different chase rates have very different real values to a team that knows which opposing pitchers are most predictable.
For hitters and hitting coaches
The longitudinal test gives you a number you can actually trust. Drop your chase rate by 4 percentage points and the data says you'll add ~0.012 wOBA on predictable pitches the next year. Let it climb 4 points and you'll lose nearly 0.030. Discipline-driven swing-decision work — the kind Driveline and modern hitting labs have been refining — pays in the exact place this finding identifies.
For fans
The matchup story is real at the league level but unreliable at the at-bat level. So don't try to bet tonight's slate on it. Track the season-long trends instead: which hitters' chase rates are dropping (likely to over-perform vs predictable arms), which are rising (likely to give back value). The live tracker lists the top 20 of each every night.
For broadcasters
When a well-known “predictable” pitcher comes in — a José Alvarado, a Garrett Crochet — the at-bat worth watching is the patient one. Those are the contrasts where the predictability actually becomes a liability, even if any single such PA is dominated by sample noise.
Where this came from
This analysis ran across six rounds of parallel dual-agent research. A Bayesian hierarchical model (built by Claude) and a gradient-boosting machine-learning pipeline (built by Codex) independently analyzed the same dataset, then cross-reviewed each other's work at every stage. We built a pitch-prediction model on 2.1M training pitches (2022–2024), held out 2025 for validation and 2026 as a strict test set, and tested 17 hypotheses about where the “coaching gap” against predictable pitchers might concentrate.
Only two survived the cross-method gauntlet: the chase-rate tertile gradient (this article) and the quality-hitter 2×2 (its refinement).
Methodology note. Pitch-level analysis used terminal-outcome wOBA against contextualized predictability flags from a held-out Layer-2 contextual pitch-prediction model. Between-batter pooled estimates were 0.038–0.043 wOBA across chase tertiles (p<0.001, both methods). The matched-pairs estimate at our sample size was implementation-sensitive and ranged 0.006–0.017 wOBA, reflecting the signal-to-noise ratio of pitch-level outcomes (approximately 1.5% structural variance). Full transparency: seven more hypotheses passed marginal significance under one method but not the other — see the comparison memo in the public research repo for the honest accounting.
Full methodology, code, and reproducibility artifacts are linked below.
Cite this analysis
CalledThird. "The Coaching Gap That Lives Where Hitters Don't Chase." CalledThird.com, April 19, 2026. https://calledthird.com/analysis/coaching-gap-patience
All CalledThird analysis is original research. If you reference our findings, data, or charts in your work, please link back to the original article. For data inquiries: hello@calledthird.com