In The Count Tells You Everything, we showed that broad-population pitch prediction barely beats knowing the count. But that was a league-wide result across 1,081 pitchers. The natural follow-up: are some individual pitchers reliably predictable?

We selected 14 elite starting pitchers — all with 4,000+ pitches across the 2024–2025 seasons — and trained individual models for each. The answer: most aren't. But five are.

The Expanded Sample: 14 Starters

We pulled 73,735 pitches across 14 starters from 2024–2025 Statcast data. Each pitcher got their own XGBoost, random forest, extra trees, and logistic regression models trained on a strict temporal split (training through June 2025, testing on the rest).

The baseline: each pitcher's pitch mix conditioned on count and batter handedness. If a pitcher throws fastballs 60% of the time on 3-0 counts against righties, the baseline "predicts" fastball for every such pitch. Prediction accuracy is the percentage of pitches where the model correctly identified the pitch type before it was thrown.

Approach Prediction Acc. vs. Baseline
Best pooled model (XGBoost) 43.83% +1.68pp
Best per-pitcher (Random Forest) 44.21% +1.32pp

Neither passes our +2pp threshold. As a group, these 14 elite starters are about as predictable as the league average. The broad per-pitcher thesis fails.

The Five Who Stood Out

But averages hide variation. When we look pitcher-by-pitcher, five exceeded the +2pp gate — and one of them blew past it:

Pitcher Model Pred. Acc. Top-2 Gain
Chris SaleRandom Forest58.14%89.76%+8.48pp
Tarik SkubalLogistic Reg.38.61%63.30%+4.17pp
Seth LugoRandom Forest27.98%45.72%+3.35pp
Logan GilbertLogistic Reg.51.24%80.86%+2.92pp
Corbin BurnesExtra Trees56.67%80.51%+2.39pp

As a cohort, these five produce a +3.49pp weighted gain over the count+handedness baseline. But what does "predictable" mean when accuracy ranges from 28% to 58%?

The key is repertoire size. Chris Sale throws 4 pitches, so random guessing gets you ~25%. His 58% accuracy is more than double chance. Seth Lugo throws 7 pitch types — random guessing is ~14%, and even the count-conditional baseline only reaches 24.6%. Lugo's 28% beats that baseline by +3.35pp, which is statistically meaningful against a 7-way classification. But practically, you're still wrong 72% of the time.

The honest framing: prediction accuracy is how often the model names the exact pitch type correctly, and it scales inversely with repertoire complexity. Sale at 58% is genuinely actionable — his top-2 accuracy of 90% means you can narrow his next pitch to two options almost every time. Lugo at 28% passes the statistical gate but isn't practically useful on its own.

Chris Sale: The Proof of Concept

Sale's behavioral fingerprint is unusually compact. Four pitches, with two accounting for 83% of his repertoire:

SL 43%
FF 40%
CH 11%
SI 6%

Chris Sale's 2024–2025 pitch mix. Lowest repertoire entropy in the 14-pitcher sample.

The model exploits Sale's count-driven tendencies. His behavior in specific states is extremely concentrated:

Situation Dominant Pitch Share
3-0 countFour-seam fastball90.0%
0-2 vs LHHSlider76.1%
1-1 vs LHHSlider64.4%
3-1 vs RHHFour-seam fastball63.2%
0-2 (overall)Slider55.2%

Sale also shows strong pitch-to-pitch sequencing patterns: slider-after-slider 55.6% of the time, fastball-after-fastball 47.3%. These dependencies give models additional structure to exploit beyond just the count.

Does It Hold Up Across Seasons?

Predictability that only works within a single season isn't useful. We tested whether 2024-trained models still work on 2025 data — a full cross-season test (H2):

Pitcher Cross-Season Pred. Top-2 Mix Stability
Chris Sale57.36%87.94%0.016 avg JSD
Corbin Burnes56.53%79.90%0.032 avg JSD
Logan Gilbert46.61%75.56%0.035 avg JSD
Tarik Skubal36.54%62.06%0.023 avg JSD
Seth Lugo28.64%46.07%0.036 avg JSD

Sale's cross-season accuracy (57.36%) barely drops from his within-season number (58.14%). His pitch-mix drift, measured by Jensen–Shannon divergence across rolling 14-day windows, is the lowest in the cohort (mean JSD = 0.016). Even though he shifted from more changeup/sinker in 2024 to more slider in 2025, his count-dependent structure remained remarkably stable.

Corbin Burnes is the second-strongest case: 56.53% cross-season accuracy with moderate mix stability. Gilbert, Skubal, and Lugo show weaker but positive cross-season carry.

Why Five Pitchers, Not Five Hundred?

What separates the predictable pitchers from the rest? Three factors emerge from the data:

  1. Low repertoire entropy. Sale has the lowest overall pitch entropy (1.65 bits) in the 14-pitcher sample. Fewer pitch types with more concentrated usage = more structure for models to learn.
  2. Extreme count concentration. When Sale's count is 3-0, the pitch is determined 90% of the time. The less his behavior varies within a game state, the more predictable he is.
  3. Pitch-to-pitch dependency. Strong sequencing patterns (slider-after-slider at 56%) provide information beyond the count. Most pitchers show much weaker sequential structure.

The nine pitchers who didn't pass — including Logan Webb, Zack Wheeler, Framber Valdez, Sonny Gray, George Kirby, Bailey Ober, Aaron Nola, Shota Imanaga, and Freddy Peralta — either have too many pitch types, too little count-dependent structure, or too much within-season variation for models to exploit consistently.

The Honest Limitation

Even for the predictable five, there's a fundamental asymmetry problem. If a model can learn that Sale throws a slider 76% of the time on 0-2 against lefties, the opposing team's advance scouts know this too. The information isn't proprietary — it's visible in the same public Statcast data we used.

This doesn't mean the analysis is useless. It means the right application isn't "predict the next pitch for betting edge" — it's scouting, game preparation, and understanding which pitchers to target with ABS challenges on borderline counts where their pitch selection is most predictable.

Verdict

Most pitchers are not reliably predictable beyond the count baseline. But a narrow cohort exists where structured behavior — concentrated repertoire, rigid count tendencies, strong sequencing — creates genuine, season-stable signal. Chris Sale is the clearest case: 58% next-pitch accuracy, 90% top-2, stable across seasons, and driven by readable patterns that our models can quantify even if scouts have always sensed them intuitively.

The path forward isn't a generic prediction engine. It's pitcher-specific intelligence for the five to ten starters whose behavior is structured enough to matter.


Methodology

Data source: 2024–2025 MLB Statcast data. 73,735 pitches across 14 starting pitchers with 4,000+ pitches each. Accessed via pybaseball.

Features: Count (balls, strikes), batter handedness, pitcher handedness, previous pitch type and result, pitch count in at-bat, times through the order, runner positions, score differential, inning. No post-pitch measurements (release_speed, spin_rate, etc.).

Models: XGBoost, random forest, extra trees, logistic regression. Both pooled (all 14 pitchers) and strict per-pitcher variants.

Baselines: Count + batter handedness conditional: most common pitch type for each (count, handedness) state, per pitcher.

Splits: Chronological train/test split at 2025-06-07 (70/30). No random shuffling. No future information in any feature.

Cross-season test (H2): Train on all of 2024, test on all of 2025. Measures whether predictability is stable across years.

Stability metric: Jensen–Shannon divergence (JSD) computed on pitch-type distributions across rolling 14-day windows. Lower JSD = more stable repertoire.

Selection gate: +2pp accuracy gain vs. count+handedness baseline in strict per-pitcher evaluation.

If you find an error, tell us — we'd rather be corrected than wrong.

Full methodology documentation →