Thursday, September 1, 2016

Hitter-Controlled/Pitcher-Controlled and Component Top 5's/Bottom 5's

While I was making renovations on the Fanduel spreadsheet, I was researching something - I have no idea what, now - and I stumbled upon's archives of The Baseball Analyst and the introduction to the October 1987 issue, written by Bill James.

In it, James observed that the spread of homeruns hit by hitters was greater than the spread of homeruns allowed by pitchers. The worst homerun hitter (in 1986) hit 0 homeruns per 1,000 plate appearances, while the most prolific hit 59.6 per 1,000 - a range of zero to 59.6. Pitchers, meanwhile, allowed between 10.5 (the stingiest) and 44.4 (the most homer-prone) per 1,000 batters faced. In other words, pitchers allowed "a substantially tighter shot pattern", or a smaller standard deviation, of homeruns per plate appearance than hitters hit.

James continued:

"Calculating this for each pitcher and each hitter, you would find that the standard deviation of home runs/1000 plate appearances was something for hitters and something less for pitchers--let's say, at a guess, 6.0 for hitters and 4.0 for pitchers. If the shot pattern (or the standard deviation) was exactly the same for both pitchers and hitters, then one would have to say that the occurrence was 50% controlled by the hitter, and 50% controlled by the pitcher. In this situation, we might say that the home run was 60% controlled by the hitter, and 40% controlled by the pitcher.

"I have thought of doing this for years, but one problem was always getting stable data."

So I thought I would test this with my eight binary components, and fortunately, nearly 30 years later, "getting stable data" is no longer a problem. (Thank you, internet.) But as Bill James pointed out, using single season data is problematic, because

"...a great percentage of the spread of occurrence over the course of a season is actually random
variance....If you study spreads of occurrence over the course of a single season, then, you're going to be comparing a pitcher with 1100 BFP against a hitter with 600 PA, and you're going to get very different variance in the two samples simply because one of them is much larger than the other."

To counteract this, I'll look at statistics for the last ten years - 2006 to 2015. For each component, I'll look at the top 200 batters and top 200 pitchers, ranked in order of opportunities. But "opportunities" means something different depending on the component. For the first component, $BB, opportunities are simply PA (or BF). For the next one, $SO, opportunities are PA/BF minus walks and HBP (or, plate appearances that didn't result in a walk or a hit batter). $SB opportunities are SB + CS (stolen base attempts).

This should give us a stable data set for each component for hitters and for pitchers. For example, Adrian Gonzalez had the most plate appearances in baseball between 2006 and 2015, but since he attempted only ten stolen bases in that time (tied for 601st-most), he should not be included in the data set for $SB. Also, using floating groups of 200 players should even out the spread of opportunities for hitters and pitchers.

The results:

         ------Batters------  ------Pitchers----- % Determined by
Component Opp. Spread St.Dev.  Opp. Spread St.Dev. Batter Pitcher
$BB         2932-6768 2.8% 2496-8923   2.0%    59%    41%
$SO         2594-6081 5.8% 2229-8247   4.1%    59%    41%
$HR         2126-5338 2.0% 1808-7004   0.6%    76%    24%
$H          2030-5277 2.0% 1741-6772   1.3%    61%    39%
$XBH         603-1744 3.6%  504-2014   2.4%    60%    40%
$3B          153- 439 5.9%      130- 482   2.6%    70%    30%
$SA          738-1957 8.2%      598-2085   3.4%    70%    30%
$SB           56- 484 7.3%       54- 340   9.0%    45%    55%

The "Opp. Spread" shows the range of opportunities for each data set: the number of opportunities by the batter or pitcher with the 200th most opportunities, followed by the number of opportunities by the batter or pitcher with the most. (Remember, opportunities = PA or BF for $BB.) As you can see, the spread for each component is somewhat similar for batters and pitchers, but wider at each end for pitchers (until you get to $SB).

As it turns out, Bill James' "guess" of a 60/40 split is almost dead-on for $BB, $SO, $H, and $XBH. Ironically, the example he used, homeruns, is much higher - more than 3-to-1 determined by the hitter (although $HR does remove walks and strikeouts from the equation, whereas Bill just used raw HR per PA). $HR is the most hitter-determined component, while $SB is the only component more pitcher-controlled than hitter-controlled. Once a runner takes off, his success depends even more on the pitcher's ability to prevent base-stealing than on his own skill at stealing bases.

So all of this might be useful...if I were planning on creating my own batter and pitcher projections and regressing them to the mean. Luckily, since I'm still partially sane, I'm perfectly happy to trust the good folks at Steamer and the projections they've created.

However, this study did show which batters and pitchers of the last ten years have been the best and worst at each component. I found it interesting, so I thought I'd share.

First, the top 5 and bottom 5 batters for each component:

Rk  Player             $BB

1   Jim Thome          17%
2   Carlos Pena        17%
3   Joey Votto         17%
4   Adam Dunn          16%
5   Carlos Santana     16%

196 Delmon Young        5%
197 Mike Aviles         5%
198 Jose Lopez          4%
199 Miguel Olivo        4%

Rk  Player             $SO

1   Juan Pierre         6%
2   Placido Polanco     7%
3   Jeff Keppinger      7%
4   Carlos Lee         10%
5   Alberto Callaspo 10%

196 J. Saltalamacchia 33%
197 Drew Stubbs        33%
198 Chris Davis        35%
199 Adam Dunn          35%
200 Mark Reynolds 36%

Rk  Player             $HR

1   Adam Dunn          10%
2   Ryan Howard        10%
3   Carlos Pena         9%
4   Mark Reynolds       9%
5   Mike Napoli         8%

196 Nick Punto          1%
197 Jamey Carroll       0%
198 Jason Kendall       0%
199 Juan Pierre         0%
200 Ben Revere          0%

Adam Dunn - master of the Three True Outcomes - had only a 49% chance ((1 - .16) * (1 - .35) * (1 - .1)) of putting the ball in play.

Rk  Player              $H

1   Joey Votto         36%
2   Miguel Cabrera 35%
3   Austin Jackson 35%
4   Joe Mauer          35%
5   Matt Kemp          35%

196 Vernon Wells 27%
197 Carlos Pena        27%
198 Jose Bautista 26%
199 Pedro Feliz        26%
200 Carlos Quentin 25%

Ranking players by component allows us to see clearly a batter's strengths. In Joey Votto's case, it mostly confirms what we already knew. His greatness as a hitter comes from totally dominating two components - $BB (3rd) and $H (1st). That, along with very good power (28th in $HR), makes up for his below-average contact (146th in $SO).

Rk  Player            $XBH

1   Chris Young        34%
2   Seth Smith         34%
3   David Ortiz        33%
4   Stephen Drew 32%
5   Scott Rolen        32%

196 Jamey Carroll 17%
197 Juan Pierre        16%
198 Ichiro Suzuki 15%
199 Luis Castillo 13%
200 Ben Revere         13%

Rk  Player             $3B

1   Dexter Fowler 28%
2   Michael Bourn 26%
3   Carl Crawford 25%
4   Juan Pierre        25%
5   Will Venable 25%

196 Yadier Molina       2%
197 Brian McCann        2%
198 Albert Pujols       1%
199 Paul Konerko        1%
200 Victor Martinez     1%

Rk  Player             $SA

1   Rajai Davis        48%
2   Juan Pierre        35%
3   Carlos Gomez 32%
4   Carl Crawford 31%
5   Jose Reyes         31%

196 Paul Konerko        1%
197 Adrian Gonzalez     1%
198 Billy Butler        1%
199 Pat Burrell         0%
200 Jim Thome           0%

Billy Hamilton, who ranked 544th in opportunities (times on first), had a $SA of 63% through 2015.

Rk  Player             $SB

1   Chase Utley        90%
2   Jason Bay          88%
3   Tony Campana 88%
4   Alexi Casilla 88%
5   Jayson Werth 87%

197 Yunel Escobar 57%
198 David DeJesus 54%
199 Andre Ethier 51%
200 Kosuke Fukudome 50%

Click here for the top 5's and bottom 5's for pitchers.

No comments:

Post a Comment