Friday, August 18, 2017

Generations of Baseball Players

"A generation, like an individual, merges many different qualities, no one of which is definitive standing alone. But once all the evidence is assembled, we can build a persuasive case for identifying (by birthyear) eighteen generations over the course of American history. All Americans born over the past four centuries have belonged to one or another of these generations" (Generations page 68).
William Strauss and Neil Howe wrote the book on generations - literally. The aim of this post - and, indeed, this entire blog - is to apply their theory to baseball. You can call it my manifesto.

Therefore, by the end of this post, I hope to have built a persuasive case for identifying (by birthyear) NINE generations over the course of baseball history. I will claim that all ballplayers born over the past two centuries belong to one or another of these generations.

Why do we need definitive baseball generations? For one, to provide context for "best of their generation" conversations and arguments (best player, best pitcher, best 3rd baseman, best leadoff batter, etc.) Beyond that, having definitive generations restores meaning to baseball's hallowed leaderboards. For example, Roger Connor hit 138 home runs in his career - a modest total by today's standards, but the most ever hit by a player born before 1887. Similarly, Miguel Cabrera's career batting average of .318 ranks 55th all-time (as of this writing), but it ranks first among players born after 1960.

So if we can agree that sorting baseball players by generation is useful, how exactly should we go about doing it? I'll start with Strauss & Howe's definition:
"A GENERATION is a cohort-group whose length approximates the span of a phase of life and whose boundaries are fixed by peer personality" (page 60).
Earlier in the book (page 44), Strauss & Howe defined a "cohort" as "any set of persons born in the same year" and a "cohort-group" as "any wider set of persons born in a limited set of consecutive years."

The authors laid out (on page 56) four "phases of life," each 22 years long: youth (age 0 to 21), rising adulthood (age 22 to 43), midlife (age 44 to 65), and elderhood (age 66 to 87). Obviously, most major league careers fall almost entirely within the second phase, rising adulthood. And since the longest careers tend to last about 22 years, we can say then that the length of a baseball generation approximates the span of a very long major league career.

And that brings us to "peer personality" - "the element in our definition that distinguishes a generation as a cohesive cohort-group" (page 63). Strauss & Howe measured the similarity of cohorts by the similarity of their peer personality. In a pair of articles published to his website in July 2015 (and placed behind a subscription paywall), Bill James "measured the Similarity of Seasons...by the similarity of their statistical image."

Strauss & Howe "use peer personality to identify a generation and find the boundaries separating it from its neighbors" (page 64). Bill James used similarity scores to identify "natural groups of seasons" and find the "fault lines" separating them.

And while Strauss & Howe could "apply no reductive rules for comparing the beliefs and behavior of one cohort-group with those of its neighbors" (page 67), we have baseball's rich statistical record at our disposal for comparing the "behavior" of baseball cohort-groups.

So here then is my modified definition:
A BASEBALL GENERATION is a cohort-group whose length approximates the span of a very long major league career and whose boundaries are fixed by statistical image.
Bill James' similarity scores for seasons used 30 statistical categories, including both counting stats (hits, homeruns, strikeouts), and rate stats (batting average, on-base percentage, earned run average). But, as Kerry Whisnant explained, using counting stats to compare cohorts, like using them to compare players or seasons, will mean that only cohorts with "similar numbers of plate appearances" will be similar.

And the traditional rate stats "confound" talents, as Jim Albert explained on page 24 of an article in By the Numbers. "A batting average confounds three batter talents: the talent not to strikeout, the talent to hit a home run, and the talent to hit an in-play ball for a hit."

Peer personality has nothing to do with the raw numbers of a generation, but rather the collective behavior of its members. Strauss & Howe elaborated (on page 63 of Generations):
"The peer personality of a generation is essentially a caricature of its prototypical member. It is, in its sum of attributes, a distinctly personlike creation. A generation...can be safe or reckless, calm or aggressive, self-absorbed or outer-driven, generous or selfish, spiritual or secular, interested in culture or interested in politics."
Likewise, the "statistical image" of a baseball generation (like the statistical image of a team or a league) is essentially a caricature of its average player. It is, in its sum of attributes, like an individual player. A baseball generation can be patient or free-swinging, adept at making contact or prone to striking out, powerful or light-hitting, good at hitting the ball "where they ain't" or bad at avoiding defenders, aggressive on the base-paths or station-to-station.

So I'll use eight rate statistics - what I'll call the "attribute rates" - to measure the similarity of baseball cohorts. ("Each rate describes something specific," Tom Tango wrote, describing four of the rates.) These eight rates - representing eight different skills, or tools - taken together, reveal the "statistical image" or "peer personality" of a baseball generation; how its members collectively played the game.

The first rate, BF/G, uses pitching stats only. The next three, $BB, $SO, and $HR - the "three true outcomes" - draw from both batting and pitching stats (for the formulas listed below, I differentiate between batting and pitching with a small 'b' or 'p' in the variables). The last four use batting stats only.

BF/G - the number of batters a pitcher faces per game. = BF / G

$BB - the percentage of plate appearances that end in a walk or a hit by pitch.
= (bBB + bHBP + pBB + pHBP) / (PA + BF)

$SO - the percentage of plate appearances ending in a strike (called, swung and missed, or batted) that are strikeouts.
= (bSO + pSO) / (PA + BF - bBB - bHBP - pBB - pHBP)

$HR - the percentage of batted balls that are homeruns.
= (bHR + pHR) / (PA + BF - bBB - bSO - bHBP - pBB - pSO - pHBP)

$H - the percentage of balls batted into the field of play that are hits.
= (H - HR) / (PA - HR - BB - SO - HBP)

$XBH - the percentage of base hits that go for extra bases (doubles or triples).
= (2B + 3B) / (H - HR)

$3B - the percentage of extra-base hits that are triples. = 3B / (2B + 3B)

$SB - the percentage of successful stolen bases per (approximate) times on first.
= SB / (H - 2B - 3B - HR + BB + HBP)

So, now all we need are the career totals for every batter and every pitcher in MLB history. Then, on a separate sheet in Excel, it's just a matter of using SUMIF formulas to add up the necessary batting and pitching totals for each cohort. From the firstborn player (Nate Berkenstock, 1831) to the lastborn (Julio Urias, 1996), there are 166 MLB cohort birthyears (through 2016; I'm typing this just days after Ozzie Albies became the first 1997-born major-leaguer). The table below shows the career totals for 1980s cohorts:

Batting TotalsPitching Totals
BornPAH2B3BHRBBSOHBPSBGBBSOHRBFHBP
1980151,35234,6887,0836714,57513,62128,1951,5702,24313,65711,49624,9823,630137,5801,321
1981138,49532,2506,3368343,40910,44425,5921,0682,76614,42714,33929,7604,482162,8211,476
1982167,40239,2897,9578134,30213,46530,3971,5972,51416,64213,58628,0854,128153,4481,434
1983210,54049,43710,0739815,72118,42038,8151,7213,42219,55716,30136,7865,174197,7871,708
1984144,81233,2786,5557163,55311,52928,4561,1872,41416,43814,77934,7084,425175,0981,496
1985109,48025,5135,1286102,8368,07722,1681,0071,88216,63511,10426,2323,256128,0411,199
1986124,54227,4085,3975953,34510,21927,1211,1221,77712,89911,47729,3643,770147,6021,205
1987121,87327,9015,6136993,3409,78525,8659761,87212,6679,86823,0213,000115,8161,004
198862,59314,2352,6074041,2514,44212,8145101,39210,0798,43222,0062,518102,863801
198969,49615,6553,0243481,7525,22114,4556238597,2916,30416,4192,06679,846717

All the batters born in 1980 combined for 151,352 plate appearances, 34,688 hits, and 4,575 homeruns. All the pitchers born that year combined for 11,496 bases on balls, 24,982 strikeouts, etc.

Then I can calculate the attribute rates for each cohort:

BornBF/G$BB$SO$HR$H$XBH$3B$SB
198010.1.097.204.039.291.257.087.060
198111.3.091.202.036.294.249.116.083
19829.2.094.201.036.297.251.093.061
198310.1.093.204.037.300.253.089.065
198410.7.091.217.035.297.245.098.069
19857.7.090.224.036.301.253.106.072
198611.4.088.228.037.291.249.099.060
19879.1.091.226.038.300.257.111.065
198810.2.086.230.032.298.232.134.093
198911.0.086.226.036.293.243.103.052

Next I need the standard deviations of each rate. I have 166 cohort birthyears, but many of the very early and very recent cohorts were not (or aren't yet) well-represented in the major leagues. So I'll set minimum requirements of 10,000 total plate appearances and 10,000 total batters faced, and therefore only include the 143 cohorts from 1850 (Al Spalding) through 1992 (Bryce Harper) in the population for my standard deviations.

Also, I'll need to assign weights to each rate. I wanted the "three true outcomes" rates ($BB, $SO, and $HR) to weigh double the other rates, because they use both hitting and pitching statistics, and I wanted the $XBH and $3B rates to weigh half the other rates, because they both deal with breakdowns of base hits. Finally, I wanted the weights to add up to 1,000, so that if two groups are exactly four standard deviations apart in every category, their similarity score will be zero.

BF/G$BB$SO$HR$H$XBH$3B$SB
St. Dev.8.3.013.047.012.011.019.071.040
Weight1002002002001005050100
Multiplier3.03870106142292241648175633

To find the similarity score between two groups, start at 1,000 and subtract a penalty for each attribute rate. The penalty is the difference between the two groups, times a multiplier. The multiplier is the rate's weight divided by (4 times its standard deviation).

For example, the 1980 cohort has a $BB rate of .097 and the 1981 cohort has a $BB rate of .091, a difference of .006. So the $BB penalty for 1980 and 1981 would be the difference (.006) times the multiplier (3,870), which is about 23. Add up the penalties for all eight rates and subtract from 1,000, and that is the similarity score.

To find "Epochs and Eras," Bill James asked of "every season in baseball history: Is it more like the season before it, or more like the season after it?" He then made two-year comparisons, three-year comparisons, four-year comparisons... comparing "each season to every other season in baseball history within 15 years before or after."

Instead of comparing each baseball cohort to other neighboring cohorts, I'm comparing them to the 15-year cohort-groups before and after. To find Baseball Generations, I'm asking of every baseball cohort: Is it more like the 15-year cohort-group before it, or more like the 15-year cohort-group after it?

Is the 1980 cohort more similar to the 1965-1979 cohort-group, or more similar to the 1981-1995 cohort-group?

1980 to 1965-1979 - 943
1980 to 1981-1995 - 916

The 1980 cohort is backward-looking, more similar to the cohort-group before it (943) than the cohort-group after it (916). What about 1981?

1981 to 1966-1980 - 938
1981 to 1982-1996 - 954

The 1981 cohort is forward-looking, more similar to the cohort-group after it (954) than the cohort-group before it (938).

To get previous and next cohort-groups for all 166 cohorts, I calculated attribute rates for every possible 15-year group, from the group before the first cohort (1816-1830) to the group after the last cohort (1997-2011), and all 180 groups in between. Attribute rates are calculated from batting and pitching totals. Cohort batting and pitching totals are found by adding up the career totals of the individual batters and pitchers belonging to each cohort; cohort-group batting and pitching totals are found by adding up the cohort totals of the 15 individual cohorts belonging to each cohort-group. (The 1816-1830 and 1997-2011 groups both have totals and rates of zero across the board, of course.)

The table below shows the similarity scores of the 1973-1993 cohorts to their respective previous and next 15-year groups. I also calculated a "forward score" for each cohort, which is simply its next-group similarity score MINUS its previous-group similarity score. The forward score shows just HOW forward- or backward-looking a cohort is. A positive forward score indicates a cohort is forward-looking and a negative score indicates it is backward-looking, and a score above +50 (or below -50) means that the cohort is VERY forward- (or backward-) looking.

BornPrev.NextForward
1973957950-7
1974953919-34
1975950941-9
1976951945-7
1977955944-12
19789409488
1979939931-8
1980943916-27
198193895416
1982958948-10
1983956941-15
198493296128
198591596550
198692795527
19879349417
198887894265
19899269337
19908848927
199188693852
199289194352
199388190625

I've shaded the positive forward scores green and the negative forward scores red. Every cohort from 1973 to 1983, except for 1978 and 1981, is backward-looking. Every cohort from 1984 to 1993 is forward-looking.

While Bill James declined to develop a "specific protocol...based on this method," he did state, as a general rule, that "an 'epoch' is formed by a series of forward-looking seasons, followed by a series of backward-looking seasons." But what he was really looking for was the "hard break" between epochs - a series of backward-looking seasons (the end of one epoch) followed by a series of forward-looking seasons (the beginning of a new epoch). He was looking for "fault lines" separating "natural groups of seasons," just as Strauss & Howe looked for boundaries separating cohesive cohort-groups.

I showed the 1973-1993 cohorts in the table above, not because those cohorts form a cohesive group, but because they're halves of two different groups; the second half of one group and the first half of the next group, with the boundary between the two groups appearing to fall between 1983 and 1984. But it's not a clean break; not all of the 1973-1983 cohorts are backward-looking, and the 1981-1983 cohorts are all fairly similar to both their respective groups.

I know if a cohort is forward- or backward-looking, and how forward- or backward-looking it is; now I need a way to determine if a cohort is part of a forward- or backward-looking trend. And since I do want a specific protocol for defining generations by an objective process, I'm adding what I'll call a "trend score" for each cohort. The trend score - as its name applies - checks each cohort's forward score to see if it is part of a trend. If a cohort's forward score is positive (forward-looking), the trend score adds to it the forward scores of the next two cohorts. If its forward score is negative (backward-looking), the trend score adds to it the forward scores of the previous two cohorts.

When at least three backward-trending cohorts are followed by at least three forward-trending cohorts, I draw a generational boundary between the last backward-trending cohort and the first forward-trending cohort.

Rather than trying to explain any further how or why trend scores work now, I'll go ahead and start locating generational boundaries and explain them as I go. Here are the first ten baseball cohorts, 1831 to 1840:

BornPrev.NextForwardTrend
1831204-788-992-992
1832204-71-275-1,268
1833545-59-604-1,872
1834545-82-627-1,507
1835-4248061,230929
1836821732-89514
183797-115-211929
1838834732-101-402
1839109-70-179-492
1840736665-71-352

Bill James gave a couple of caveats to his rule for defining epochs:
"1) Sometimes it is not a series of backward-looking years that ends an epoch, but just one year, and 2) Sometimes what ends an epoch is not a backward-looking phase, but rather a large difference between two adjacent seasons."
I take it to also be true that sometimes it is just one cohort, or a large difference between two adjacent cohorts, that STARTS a generation, and I tried to build these caveats into my trend scores. Even though the 1835 cohort is the only forward-looking cohort in its group, it is SO different from the cohorts that came before that it should be the start of a new generation. (The 1835 cohort consists of Harry Wright, the firstborn player to have a real major league career; the two players older than him appeared in just one game each as forty-somethings.) So even though the 1836 and 1837 cohorts are backward-looking, they're forward-trending because 1835's forward score is so high it overwhelms their negative scores.

The next several generational boundaries are easy to spot, without trend scores. We can draw one between 1856 and 1857:

BornPrev.NextForwardTrend
1852877756-121-576
1853878624-254-578
1854799795-4-380
1855893735-158-417
1856861758-103-265
185778384259206
185882684418174
1859787916129206
186084787528165
186184289149238

And 1873 and 1874:

BornPrev.NextForwardTrend
1869940933-727
1870900884-16-3
1871909886-22-45
1872923885-38-76
1873943917-26-86
18749199291044
18758538853282
1876920921158
18778779254876
1878911920942

And 1892 and 1893:

BornPrev.NextForwardTrend
188889790912-67
1889935894-42-97
1890930892-38-67
1891926863-63-142
1892922862-61-161
189388492036132
189490591711185
189585093484225
189686495489171
189787192251110

And 1911 and 1912:

BornPrev.NextForwardTrend
190790495248-83
1908956903-53-49
1909959880-79-83
1910963879-84-216
1911944882-61-224
191288694458212
191387594772177
191488196382116
19159049272347
191691292411-27

It looks like there might be a boundary between 1922 and 1923:

BornPrev.NextForwardTrend
1918927876-51-27
1919941867-74-112
19209159216-26
1921926896-30-98
1922898896-2-26
19239029201852
19248829112946
19259199245-13
192691392512-28
1927936905-31-13

There're several mostly backward-looking cohorts followed by several forward-looking cohorts. But the 1925 and 1926 cohorts aren't forward-looking enough to be forward-trending; so the forward trend fizzles after the 1923 and 1924 cohorts, which means it doesn't meet my standard of at least three forward-trending cohorts. Besides, a boundary here would mean a generation of just 11 cohort birthyears (1912-1922), which is too short to be a true generation.

The actual boundary is six years later, between 1928 and 1929:

BornPrev.NextForwardTrend
19248829112946
19259199245-13
192691392512-28
1927936905-31-13
1928932922-10-28
19298829436194
19308849122890
1931885889459
19328789355769
1933931929-259

This time the backward-trending cohorts (1925-1928) are followed by a sustained forward trend. The 1929 cohort is the MOST forward-looking cohort since 1914, in the first wave of the previous generation.

The forward trend lasts through the 1941 cohort, and is then followed by 20 consecutive backward-trending cohorts. The next generational boundary isn't until 1961/62, 33 birthyears after the last one.

BornPrev.NextForwardTrend
1957959863-95-185
1958921896-24-154
1959928877-51-170
1960938906-32-107
1961934875-59-142
196290593833144
1963864975110108
1964906907148
1965926924-3108
196689594550137

And finally, we're back to the boundary between the two currently-active generations:

BornPrev.NextForwardTrend
1979939931-8-12
1980943916-27-27
198193895416-9
1982958948-10-21
1983956941-15-9
198493296128106
19859159655085
19869279552799
1987934941779
19888789426579

And it looks like the boundary is indeed between 1983 and 1984, at least for now. These cohorts are still adding to their batting and pitching totals. The 1981-1983 group could possibly slip into the younger generation (I hope it does, anyway; it's hard to imagine the baby-faced Miguel Cabrera of the 1983 cohort being in the same generation as Clemens and Bonds).

So that's eight boundaries, which divides all ballplayers born between 1831 and 1996 into nine generations. Ignoring the first and last (partial) generations, the seven generations in the middle have an average length of 21.3 years, which nearly matches Strauss & Howe's 22-year "phase of life", or the length of a very long major league career.

GenerationBirth YearsBest Player
Knickerbocker1831-1834
National1835-1856Cap Anson
American1857-1873Cy Young
Deadball1874-1892Ty Cobb
Ruthian1893-1911Babe Ruth
G.I.1912-1928Ted Williams
Expansion1929-1961Willie Mays
Steroid1962-1983Barry Bonds
Millennial1984-1996Mike Trout

Tuesday, July 25, 2017

Marty's Orioles

Since Marty has also seen a lot of the Reds' opponents in his 44 years of broadcasting, I thought I'd do all-star teams for other franchises, too, as the Reds encounter them.

The Orioles are the one Reds' opponent I didn't make an all-star team for when they came to town in April, and since the Reds only have a couple opponents left that they haven't already seen in 2017 (they're playing the Yankees today and tomorrow), I decided I'd go ahead and knock the O's out now.

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starting pitchers, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).


Line Up

PosPlayerWARBAHRRBISBOBPSLG
LFBrady Anderson34.9.257196928.364.430
DHKen Singleton29.9.28420861.388.445
SSCal Ripken95.6.27623912.340.447
1BEddie Murray56.2.294291055.370.498
3BManny Machado26.6.27828848.330.471
CFAdam Jones29.4.27828909.319.463
CChris Hoiles23.5.26227811.366.467
RFNick Markakis25.6.29017787.358.435
2BBrian Roberts28.8.278116434.349.412

Bench

PosPlayerWARBAHRRBISBOBPSLG
IFMelvin Mora28.8.280208511.355.438
IFRafael Palmeiro24.3.284361146.366.520
IFMark Belanger23.1.22323015.294.280
CRick Dempsey21.4.23810462.319.355
IFBobby Grich20.7.263166916.379.416
OFAl Bumbry20.3.27964428.341.370

Pitchers

PosPlayerWARWLSVERAIPSOWHIP
SPMike Mussina47.5171003.532371811.18
SPJim Palmer39.3161003.002481231.19
SPScott McGregor20.2141113.99219921.29
SPMike Boddicker17.4151303.732341541.29
SPJeremy Guthrie16.5101404.122131301.27
RPMike Flanagan21.8121003.892031131.32
RPDarren O'Day10.35232.3865720.99
RPJim Johnson10.435233.1075501.23
RPZach Britton10.465273.22104861.26
RPGregg Olson11.744342.2674741.25

Saturday, July 22, 2017

All-time Marlins

Since Marty has also seen a lot of the Reds' opponents in his 44 years of broadcasting, I thought I'd do all-star teams for other franchises, too, as the Reds encounter them. (And since the Miami Marlins have only existed as a franchise since 1993, Marty's Marlins are the all-time Marlins.)

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).


Line Up

PosPlayerWARBAHRRBISBOBPSLG
2BLuis Castillo22.5.29333940.370.356
SSHanley Ramirez26.8.300258340.374.499
3BMiguel Cabrera18.2.313311184.388.542
RFGiancarlo Stanton30.3.267421066.357.544
LFCliff Floyd16.8.2942810423.374.523
1BJeff Conine13.6.29019882.358.455
CFMarcell Ozuna11.8.27323853.324.448
CCharles Johnson11.7.24121760.324.418

Bench

PosPlayerWARBAHRRBISBOBPSLG
IFDan Uggla15.6.26332974.349.488
IFMike Lowell14.1.27224953.339.462
OFGary Sheffield12.9.2883511021.426.543
IFDerrek Lee9.8.264258010.353.469
OFCody Ross9.5.26523846.322.465
CJ.T. Realmuto7.3.285136411.327.424
IFEdgar Renteria5.0.28854737.342.357

Pitchers

PosPlayerWARWLSVERAIPSOWHIP
SPJosh Johnson25.313803.152091901.23
SPDontrelle Willis17.1141103.782151591.36
SPKevin Brown15.0171002.302461901.06
SPAnibal Sanchez13.7111203.752041731.35
SPJose Fernandez13.117802.582112641.05
RPRenyel Pinto3.42313.6264621.46
RPBraden Looper3.54393.6972451.39
RPAJ Ramos6.333192.8168801.23
RPSteve Cishek4.845232.8669751.22
RPRobb Nen5.054273.4179831.28

Wednesday, July 19, 2017

All-time Diamondbacks

Since Marty has also seen a lot of the Reds' opponents in his 44 years of broadcasting, I thought I'd do all-star teams for other franchises, too, as the Reds encounter them. (And since the Arizona Diamondbacks have only existed as a franchise since 1998, Marty's Diamondbacks are the all-time Diamondbacks.)

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).


Line Up

PosPlayerWARBAHRRBISBOBPSLG
3BCraig Counsell12.6.26664717.348.357
2BOrlando Hudson10.5.29413699.365.448
1BPaul Goldschmidt33.3.3013010821.401.531
LFLuis Gonzalez29.9.298301054.391.529
CFSteve Finley18.0.278299113.351.500
RFJustin Upton14.3.278248018.357.475
SSStephen Drew13.1.26615707.328.436
CMiguel Montero12.4.26417800.342.421

Bench

PosPlayerWARBAHRRBISBOBPSLG
OFA.J. Pollock16.3.291155730.345.459
OFChris Young14.7.239247521.318.437
OFGerardo Parra11.7.27485110.326.395
IFJay Bell9.8.26324804.355.458
IFMatt Williams8.3.278271043.327.471
CDamian Miller6.0.26917671.336.437
IFChad Tracy5.9.28018733.339.453

Pitchers

PosPlayerWARWLSVERAIPSOWHIP
SPRandy Johnson52.917902.832383041.07
SPBrandon Webb33.4151103.272261821.24
SPCurt Schilling26.018903.142472771.04
SPDan Haren13.2141003.562292231.13
SPMiguel Batista11.610903.991871171.36
RPJosh Collmenter7.79803.541631141.18
RPGreg Swindell4.12413.7669541.18
RPBrad Ziegler7.242122.4965431.15
RPByung-Hyun Kim8.466193.43871021.21
RPJose Valverde5.524263.2970891.17

Tuesday, July 11, 2017

Here Comes da Judge

I mentioned in my post yesterday how until Mark McGwire, no player in history had a career $HR component of .100 or better, which means no player had homered on 10% or more of his batted balls for his entire career. Babe Ruth, at 9.9%, looked like he had reached the limit of what was humanly possible. That is, until the last two generations of ballplayers came along.

I made a list of all the players in history with 300 or more PA and an ISO of .200 or better, which is 298 players, and I ranked them by $HR rate. The top 25 are shown below:

RkPlayerPAHRBBSOHBP$HR
1Joey Gallo44428591884.145
2Aaron Judge46134701515.145
3Ryan Schimpf527346917512.125
4Mark McGwire76605831317159675.125
5Matt Davidson34621231312.111
6Gary Sanchez47333471167.109
7Miguel Sano1175641514172.106
8Russell Branyan3398194403111830.105
9Giancarlo Stanton3797234441106533.104
10Jim Thome103136121747254869.103
11Chris Carter285315832795126.102
12Adam Dunn83284621317237986.102
13Chris Davis4427255441140446.101
14Babe Ruth106237142062133043.099
15Ryan Howard6531382709184359.097
16Trevor Story69938632306.095
17Khris Davis212012617256126.093
18Sammy Sosa9896609929230659.092
19Rob Deer4513230575140932.092
20Kyle Schwarber55529751577.092
21Kevin Roberson3452027937.092
22Barry Bonds1260676225581539106.091
23Bo Jackson262614120084114.090
24Dave Kingman7429442608181653.089
25Jose Canseco8129462906194284.089

The reason I set the PA threshold so low was to include all the young players just starting their careers, like 2017 Home Run Derby champion Aaron Judge. Counting these short-timers, there are now thirteen players who have hit homeruns on 10% or more of their batted balls, the oldest of whom is Mark McGwire (born 1963). Four of them were contestants in the Derby last night.

The reason I think McGwire took the baseball world by storm in the '90s was that he gave us something no one had ever seen before. There had been a lot of exciting power hitters since Babe Ruth, but none of them had ever equaled or surpassed his 9.9% $HR rate. McGwire, at 12.5%, obliterated that mark (like he obliterated the single-season homerun record). On average, one in every eight pitches he connected with over the course of his career cleared the fences.

Out of the Gen-Xers (known as baseball's Steroid Generation) with 10%+ $HR rates (McGwire, Branyan, Thome, and Dunn), only McGwire was an admitted (or even suspected) PED user. And baseball has adopted strict drug testing since Big Mac retired.

So why are there now NINE Millennials with $HR rates north of 10%? The reason for the surge in homeruns over the last couple seasons may provide the answer. Many people suspect that the balls are now juiced (to make up for the fact that the players no longer are). The table above lends credence to this theory. Of the top seven batters in career $HR rate, six of them began their careers in 2013 or later. The other one is Mark McGwire.

But beyond that, hitters these days are more willing than ever to strike out an exorbitant amount of times in order to hit the ball HARD when they do connect. In a way, they are following Babe Ruth's approach to its logical conclusion.

McGwire still has by far the best $HR rate for players with long (3000 PA) careers, but three very young players have surpassed him, for now. Joey Gallo and Aaron Judge each have $HR rates of .145, meaning better than one in seven of their batted balls leave the yard. Ryan Schimpf, meanwhile, is homering once every eight batted balls, the same rate as McGwire.

But while Gallo and Schimpf (career batting averages of .186 and .195, respectively) haven't made consistent enough contact (or hit safely on enough of the balls they do put in play) to truly be offensive powerhouses, Aaron Judge (career .296 BA) and, to a slightly lesser extent, his teammate Gary Sanchez (.285) have stayed true to the all-or-nothing approach without seeing their batting averages suffer. Whether they can keep it up remains to be seen, but as of now, Judge is beginning to fill the role in the baseball public's mind of the larger-than-life home run hero, more so than any other player has since Mark McGwire.

Being 6-7, 280 doesn't hurt, either.

Monday, July 10, 2017

Home Run Derby

The 2017 Home Run Derby is about to get underway as I type this, and just for fun I thought I'd look at the 2017 leaders in homeruns per batted ball, measured by the $HR component, the formula for which is

HR / (PA - BB - SO - HBP).

I would argue that this metric, rather than raw homeruns or homeruns per total plate appearances, is the best measure of a player's true homerun power, because it takes walks and strikeouts out of the equation and only counts instances where the batter made contact.

Here are the top 10 batters for the 2017 season so far (min. 275 PA):

RkPlayerTmPAHRBBSOHBP$HR
1Aaron JudgeNYY36630611094.156
2Joey GalloTEX29121391124.154
3Cody BellingerLAD2922533850.144
4Eric ThamesMIL3292351894.124
5Khris DavisOAK36824431172.117
6Miguel SanoMIN34521441200.116
7Logan MorrisonTBR3462450782.111
8Mike NapoliTEX2831824924.110
9Giancarlo StantonMIA3692639884.109
10Yonder AlonsoOAK2982039691.106

Going by $HR, it looks like Joey Gallo and Khris Davis should have joined Aaron Judge and Miguel Sano in this year's Derby. On the NL side, Eric Thames would have been a more logical choice than Charlie Blackmon.

Here are the active career leaders (min. 3000 PA):

RkPlayerPAHRBBSOHBP$HR
1Giancarlo Stanton3797234441106533.104
2Chris Davis4427255441140446.101
3Ryan Howard6531382709184359.097
4Mark Reynolds5607270638173252.085
5Mike Napoli5128256625139766.084
6Nelson Cruz5526301463123840.080
7Pedro Alvarez316015329690611.079
8Mike Trout376418451382652.078
9Jose Bautista6540322934120271.074
10Mark Trumbo380419226293413.074

Only two players in the top 10, Giancarlo Stanton and Mark Reynolds, currently play for an NL squad. The next two players in career $HR currently playing in the NL are Jay Bruce (.071) and Bryce Harper (.069).

And here are the all-time career leaders (min. 3000 PA and 200 HR):

RkPlayerPAHRBBSOHBP$HR
1Mark McGwire76605831317159675.125
2Giancarlo Stanton3797234441106533.104
3Jim Thome103136121747254869.103
4Adam Dunn83284621317237986.102
5Chris Davis4427255441140446.101
6Babe Ruth106237142062133043.099
7Ryan Howard6531382709184359.097
8Sammy Sosa9896609929230659.092
9Rob Deer4513230575140932.092
10Barry Bonds1260676225581539106.091

And finally, here are the career leaders for each generation (min. 3000 PA):

Generation Player            $HR
National   Harry Stovey      .021
American   Bill Joyce        .024
Deadball   Ken Williams      .041
Ruthian    Babe Ruth         .099
G.I.       Ralph Kiner       .083
Silent     Harmon Killebrew  .088
Boom       Rob Deer          .092
Gen X      Mark McGwire      .125
Millennial Giancarlo Stanton .104

A homerun rate of 10% of batted balls used to be impossible; Babe Ruth finished his career with what looked to be the maximum human limit of 9.9%. But now six players in the last two generations have topped the 10% threshold for their careers (including Russell Branyan, who missed my career leaders table because he hit less than 200 homeruns, but who homered at an astounding .105 $HR rate, second only to McGwire).