Tuesday, November 14, 2017

Projecting Otani

With Shohei Otani coming to the major leagues in 2018, I thought it would be fun to attempt a simple projection of what his big-league stats might look like. I'll do batting stats first. There have been nine Japanese players who amassed at least 300 plate appearances their rookie years in MLB:

Ichiro Suzuki2001SEA15773869212724234886956.350.381.457.838
Hideki Matsui2003NYY16369562382179421161062.287.353.435.788
Kosuke Fukudome2008CHC15059050179129253105812.257.359.379.738
Norichika Aoki2012MIL15158852081150374105030.288.355.433.787
Tadahito Iguchi2005CHW13558251174142256157115.278.342.438.780
Akinori Iwamura2007TBD12355949182140211073412.285.359.411.770
Kenji Johjima2006SEA1445425066114725118763.291.332.451.783
Kazuo Matsui2004NYM1145094606512532274414.272.331.396.727
Tsuyoshi Shinjo2001NYM1234384004610723110564.268.320.405.725

Next is a weighted average of those players' last three seasons in Japanese ball, with their final season weighted at .5, their next-to-last weighted at .3, and their third-to-last weighted at .2:

Ichiro Suzuki11048242276155262157116.367.435.544.979
Hideki Matsui139613489111162272441063.330.457.6631.120
Kosuke Fukudome1084703868812633321768.325.437.5901.028
Norichika Aoki1436465738218027395413.314.390.421.811
Tadahito Iguchi12556749594160312248826.323.396.535.930
Akinori Iwamura1436185448717027234907.312.387.555.942
Kenji Johjima1215064448314326330805.323.401.594.995
Kazuo Matsui140645579107181374328322.314.372.559.931
Tsuyoshi Shinjo12951748059125223196510.261.307.442.749

And here are those numbers again, but prorated to the number of plate appearances they had their rookie seasons in MLB:

Ichiro Suzuki1697386471172384032310925.367.435.544.979
Hideki Matsui158695555126183302501214.330.457.6631.120
Kosuke Fukudome136590485110158413279511.325.437.5901.028
Norichika Aoki1305885217416424394912.314.390.421.811
Tadahito Iguchi12958250997164322249027.323.396.535.930
Akinori Iwamura1305594927815324231817.312.387.555.942
Kenji Johjima1295424758815328332855.323.401.594.995
Kazuo Matsui11050945684143293256617.314.372.559.931
Tsuyoshi Shinjo1094384065010619316559.261.307.442.749

And here are Shohei Otani's batting stats and weighted average for his last three seasons with the Nippon Ham Fighters:

Wt. Avg.78254220356914112392.315.395.545.939

Now, I'll move through binary component rates, projecting his stats one at a time. First, walks and HBP:

Rookies (Japan)12.4%
Rookies (MLB)8.8%
Otani (Japan)12.2%
Otani (proj. MLB)8.7%

The nine Japanese players walked or were hit by a pitch in 12.4 of their plate appearances in Japan, compared to just 8.8% of their PA their rookie seasons in MLB. So we would expect a Japanese rookie to walk or be hit by a pitch 71% as often as his established rate in Japanese baseball. Otani's weighted $BB rate is right in line with the other Japanese rookies, just a smidge lower. So his projected rookie MLB $BB rate is 8.7%, which means in a 600 PA season, he would walk or be hit by a pitch 52 times. (I know Otani won't actually get 600 PA in the majors if he's pitching, but it's more interesting to see what a full season of batting stats would look like.)

Next, I isolate HBP by finding its rate per (BB + HBP):

Rookies (Japan)12.4%10.6%
Rookies (MLB)8.8%11.7%
Otani (Japan)12.2%4.2%
Otani (proj. MLB)8.7%4.6%

So here are Otani's projected BB and HBP, in a 600 PA season:


Next, strikeouts per plate appearances ending in a strike (PA - BB - HBP):

Rookies (Japan)12.4%10.6%16.4%
Rookies (MLB)8.8%11.7%15.5%
Otani (Japan)12.2%4.2%31.2%
Otani (proj. MLB)8.7%4.6%29.3%

Japanese rookies actually struck out a little less often in MLB than they had in Japan. Otani strikes out a lot - 31% of PA ending in a strike - which translates to a projected 29% as an MLB rookie. Which means we would expect about 161 strikeouts in a 600 PA season for Otani:


Next is homeruns per batted ball (PA - BB - SO - HBP):

Rookies (Japan)12.4%10.6%16.4%6.2%
Rookies (MLB)8.8%11.7%15.5%2.5%
Otani (Japan)12.2%4.2%31.2%7.6%
Otani (proj. MLB)8.7%4.6%29.3%3.1%

Here is the biggest difference from Japanese ball: Japanese players have seen their rates of homers per batted ball plummet 60% from their established rates in Japan. And here is where I'd take this projection with the biggest grain of salt. The nine Japanese players who came before Otani are of the previous generation. The youngest is Aoki, who was 35 in 2017 (Otani was 22). These players were entering a league where they were clean, but many of the American (and Latin) players were juiced. Now, the players are clean, but the ball is likely juiced. But that means it's juiced for everybody. Also, there's the possibility that the quality of competition in Japanese baseball has improved faster in the last 15 years than it has in MLB.

But based on what Japanese players have previously done, we would expect Otani's rate of home runs per batted ball to fall from 7.6% to 3.1% his rookie year in MLB. That means in a 600 PA season, he would hit just 12 (387 * .031) home runs:


Next is hits per balls in play (H - HR) / (PA - HR - BB - SO - HBP):

Rookies (Japan)12.4%10.6%16.4%6.2%34.0%
Rookies (MLB)8.8%11.7%15.5%2.5%32.0%
Otani (Japan)12.2%4.2%31.2%7.6%40.6%
Otani (proj. MLB)8.7%4.6%29.3%3.1%38.1%

Japanese players have seen their BAbip fall a little bit, but remain above the Major League average. This projection doesn't regress to the mean, so it assumes Otani would retain most of his absurdly-high .406 BAbip:


To save time, here are the rates of extra-base hits ((2B + 3B) / (H - HR)), triples (3B / (2B + 3B)), steal attempts ((SB + CS) / (1B + BB + HBP)), and steal successes (SB / (SB + CS)):

Rookies (Japan)12.4%10.6%16.4%6.2%34.0%23.8%7.9%9.6%76.3%
Rookies (MLB)8.8%11.7%15.5%2.5%32.0%23.8%12.0%13.9%74.7%
Otani (Japan)12.2%4.2%31.2%7.6%40.6%26.1%5.3%4.6%67.6%
Otani (proj. MLB)8.7%4.6%29.3%3.1%38.1%26.1%8.1%6.7%66.3%

To get runs scored and at-bats (and therefore, traditional rate stats), I use the following formulas:

$R = (R - HR) / (H - HR + BB + HBP)
$SF = SF / (PA - H - BB - SO - HBP)

Otani didn't have any sac bunts in the last three years, so I don't have to worry about those. Then I just subtract BB, HBP and SF from PA to get his at-bats. I didn't project RBI, but I think you get the picture: a season reminiscent of Hideki Matsui's rookie year:


This projection might be pessimistic, because it doesn't account for Otani's young age (younger than any previous Japanese MLB rookie by four years) or the allegedly tighter ball MLB now employs. But even if he homers more often than I project him to, the moniker "Japanese Babe Ruth" might be giving people unrealistic expectations. He's no Babe Ruth as a hitter, but he IS good enough to be a starting outfielder or DH on the days he's not pitching, and, given he's also an exceptional pitcher, that's exciting enough.

Just how exceptional is Otani's pitching? I'll project his pitching stats in a separate post.

Friday, October 27, 2017

Marty's Astros

The Reds' season is over, but it's October and ten teams (now just two) are still playing baseball. I already made Marty's teams for all 15 NL teams, as well as the Yankees and Indians, but I neglected to make a team for the Red Sox when they were in Cincy for the final home series, and the Reds didn't play the Twins or Astros this year.

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).

Line Up

CCraig Biggio65.2.281176724.363.433
2BJose Altuve29.6.316146638.362.453
1BJeff Bagwell79.8.2973411515.408.540
RFLance Berkman48.2.296331118.410.549
SSCarlos Correa16.4.2883011113.366.498
CFCesar Cedeno30.6.283168356.353.439
LFJose Cruz51.4.292128225.359.429
DHEvan Gattis4.7.25132951.303.477
3BKen Caminiti16.4.26415827.330.402


IFBill Doran30.2.267105627.355.374
OFTerry Puhl28.6.28174623.349.389
OFRichard Hidalgo17.7.27827939.356.501
IFDickie Thon16.1.27094927.329.395
IFMorgan Ensberg14.2.26625815.367.475
CJason Castro9.6.23216561.309.390


SPRoy Oswalt45.716903.242211821.20
SPNolan Ryan25.4131103.132242251.21
SPMike Scott24.8141103.302221721.14
SPJoe Niekro23.1141113.222211151.26
SPJ.R. Richard21.4161103.062442241.23
RPDanny Darwin12.411833.211741231.16
RPKen Forsch13.910982.90163721.22
RPOctavio Dotel11.15593.25951171.17
RPDave Smith12.666242.5392641.19
RPBilly Wagner16.244332.53741021.04

Thursday, October 26, 2017

Marty's Red Sox

The Reds' season is over, but it's October and ten teams (now just two) are still playing baseball. I already made Marty's teams for all 15 NL teams, as well as the Yankees and Indians, but I neglected to make a team for the Red Sox when they were in Cincy for the final home series, and the Reds didn't play the Twins or Astros this year.

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).

This Red Sox team is ridiculously stacked. In order to include David Ortiz (a DH only) and a backup at every position, I basically had a choice between Manny Ramirez (33 WAR) and Mo Vaughn (25), or John Valentin (32) and Carl Yastrzemski (21 WAR on the backside of his career). I opted for Manny and Mo. But that means Jody Reed would be busy backing up Nomar AND Pedroia.

Line Up

3BWade Boggs71.8.3388682.428.462
1BKevin Youkilis31.5.28723964.388.487
DHDavid Ortiz52.9.290401271.386.570
SSNomar Garciaparra41.2.3233011614.370.553
CFFred Lynn32.0.308241028.383.520
LFJim Rice47.3.298301134.352.502
RFDwight Evans65.3.27425895.371.477
CCarlton Fisk27.9.290238810.363.479
2BDustin Pedroia52.2.300157815.366.441


OFManny Ramirez33.2.312411301.411.588
IFMo Vaughn24.8.304361164.394.542
CJason Varitek24.4.25620793.341.435
OFMookie Betts24.0.292259926.351.488
OFJacoby Ellsbury21.1.297157155.350.439
IFJody Reed14.2.2804515.358.372


SPRoger Clemens81.2171003.062472301.16
SPPedro Martinez53.720602.522332830.98
SPJon Lester30.915903.642141951.29
SPLuis Tiant24.8171103.492441341.23
SPJosh Beckett22.6161004.172171941.22
RPTim Wakefield32.5121114.432001361.34
RPDerek Lowe19.4108123.72142921.29
RPBob Stanley23.8119123.64161651.36
RPTom Burgmeier10.574132.72131691.23
RPJonathan Papelbon16.243372.3373871.02

Wednesday, October 25, 2017

Marty's Twins

The Reds' season is over, but it's October and ten teams (now just two) are still playing baseball. I already made Marty's teams for all 15 NL teams, as well as the Yankees and Indians, but I neglected to make a team for the Red Sox when they were in Cincy for the final home series, and the Reds didn't play the Twins or Astros this year.

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).

Line Up

2BRod Carew36.6.355108437.422.484
CJoe Mauer53.6.30813825.391.443
RFKirby Puckett50.9.318199912.360.477
1BKent Hrbek38.2.282271013.367.481
LFShane Mack19.6.309178118.375.479
CFTorii Hunter26.3.268259315.321.462
DHRoy Smalley20.8.26216682.350.401
3BGary Gaetti27.1.25624909.307.437
SSGreg Gagne18.0.249104811.292.385


IFChuck Knoblauch37.7.30476344.391.416
IFJustin Morneau23.3.278281091.347.485
IFCorey Koskie22.2.280208713.373.463
OFTom Brunansky15.9.25029836.330.452
CButch Wynegar15.4.2548662.340.342
OFLarry Hisle13.9.2902210825.355.467


SPBrad Radke45.6131304.222211321.26
SPJohan Santana35.515703.222092201.09
SPFrank Viola27.2151203.862321591.30
SPBert Blyleven25.9141303.642521981.22
SPDave Goltz24.4151203.402491351.30
RPEddie Guardado9.745124.5371621.34
RPGlen Perkins8.754183.8894761.29
RPDoug Corbett8.357212.49122811.20
RPRick Aguilera15.756333.5091771.18
RPJoe Nathan18.342382.1668830.96

Tuesday, August 29, 2017

Marty's Mets

Since Marty has also seen a lot of the Reds' opponents in his 44 years of broadcasting, I thought I'd do all-star teams for other franchises, too, as the Reds encounter them.

Same rules apply: highest career WAR while playing for the franchise from 1974 to 2017; roster mix of 5 starting pitchers, 5 relievers, and 7 bench players (6 on an AL team with the DH); minimum 200 games at a position for position players, 50 games started for starters, and 100 relief appearances for relievers.

In the stats listed below, WAR is the player's career total with the franchise, but other counting stats are at a per-season rate (162 games for position players, 34 starts for starting pitchers, and 68 relief appearances for relievers).

Line Up

SSJose Reyes27.5.286136452.336.436
3BDavid Wright50.0.296259920.376.491
CFCarlos Beltran31.3.2802910819.369.500
CMike Piazza24.5.296371091.373.542
RFDarryl Strawberry36.4.2633710728.359.520
1BKeith Hernandez26.5.29715863.387.429
LFKevin McReynolds15.7.272259414.331.460
2BEdgardo Alfonzo29.5.29218807.367.445


IFHoward Johnson21.8.251278828.341.459
OFMookie Wilson20.6.27695041.318.394
CJohn Stearns19.5.25996218.341.375
IFJohn Olerud17.3.31521992.425.501
IFDaniel Murphy12.5.288117210.331.424
OFCurtis Granderson10.6.23927708.341.444
OFBernard Gilkey10.1.273229512.357.461


SPDwight Gooden41.5181003.102432101.18
SPAl Leiter28.0151103.422171771.30
SPSid Fernandez27.6131103.142131951.11
SPTom Seaver24.2141102.902532011.13
SPDavid Cone19.5151003.132312241.19
RPPedro Feliciano5.63313.3354491.38
RPSkip Lockwood7.7711192.801141101.12
RPJesse Orosco12.299192.73108921.21
RPJohn Franco11.055273.1069581.37
RPArmando Benitez9.843332.7071931.13

Friday, August 18, 2017

Generations of Baseball Players

"A generation, like an individual, merges many different qualities, no one of which is definitive standing alone. But once all the evidence is assembled, we can build a persuasive case for identifying (by birthyear) eighteen generations over the course of American history. All Americans born over the past four centuries have belonged to one or another of these generations" (Generations page 68).
William Strauss and Neil Howe wrote the book on generations (literally). The aim of this post - and, indeed, this entire blog - is to apply their theory to baseball. You can call it my manifesto.

Therefore, by the end of this post, I hope to have built a persuasive case for identifying (by birthyear) NINE generations over the course of baseball history. I will claim that all ballplayers born over the past two centuries belong to one or another of these generations.

Why do we need definitive baseball generations? For one, to provide context for "best of their generation" conversations and arguments (best player, best pitcher, best 3rd baseman, best leadoff batter, etc.) Beyond that, having definitive generations restores meaning to baseball's hallowed leaderboards. For example, Roger Connor hit 138 home runs in his career - a modest total by today's standards, but the most ever hit by a player born before 1887. Similarly, Miguel Cabrera's career batting average of .318 ranks 55th all-time (as of this writing), but it ranks first among players born after 1960.

So if we can agree that sorting baseball players by generation is useful, how exactly should we go about doing it? I'll start with Strauss & Howe's definition:
"A GENERATION is a cohort-group whose length approximates the span of a phase of life and whose boundaries are fixed by peer personality" (page 60).
Earlier in the book (page 44), Strauss & Howe defined a "cohort" as "any set of persons born in the same year" and a "cohort-group" as "any wider set of persons born in a limited set of consecutive years."

The authors laid out (on page 56) four "phases of life," each 22 years long: youth (age 0 to 21), rising adulthood (age 22 to 43), midlife (age 44 to 65), and elderhood (age 66 to 87). Obviously, most major league careers fall almost entirely within the second phase, rising adulthood. And since the longest careers tend to last about 22 years, we can say then that the length of a baseball generation approximates the span of a very long major league career.

And that brings us to "peer personality" - "the element in our definition that distinguishes a generation as a cohesive cohort-group" (page 63). Strauss & Howe measured the similarity of cohorts by the similarity of their peer personality. In a pair of articles published to his website in July 2015 (and placed behind a subscription paywall), Bill James "measured the Similarity of Seasons...by the similarity of their statistical image."

Strauss & Howe "use peer personality to identify a generation and find the boundaries separating it from its neighbors" (page 64). Bill James used similarity scores to identify "natural groups of seasons" and find the "fault lines" separating them.

And while Strauss & Howe could "apply no reductive rules for comparing the beliefs and behavior of one cohort-group with those of its neighbors" (page 67), we have baseball's rich statistical record at our disposal for comparing the "behavior" of baseball cohort-groups.

So here then is my modified definition:
A BASEBALL GENERATION is a cohort-group whose length approximates the span of a very long major league career and whose boundaries are fixed by statistical image.
Bill James' similarity scores for seasons used 30 statistical categories, including both counting stats (hits, homeruns, strikeouts), and rate stats (batting average, on-base percentage, earned run average). But, as Kerry Whisnant explained, using counting stats to compare cohorts, like using them to compare players or seasons, will mean that only cohorts with "similar numbers of plate appearances" will be similar.

And the traditional rate stats "confound" talents, as Jim Albert explained on page 24 of an article in By the Numbers. "A batting average confounds three batter talents: the talent not to strikeout, the talent to hit a home run, and the talent to hit an in-play ball for a hit."

Peer personality has nothing to do with the raw numbers of a generation, but rather the collective behavior of its members. Strauss & Howe elaborated (on page 63 of Generations):
"The peer personality of a generation is essentially a caricature of its prototypical member. It is, in its sum of attributes, a distinctly personlike creation. A generation...can be safe or reckless, calm or aggressive, self-absorbed or outer-driven, generous or selfish, spiritual or secular, interested in culture or interested in politics."
Likewise, the "statistical image" of a baseball generation (like the statistical image of a team or a league) is essentially a caricature of its average player. It is, in its sum of attributes, like an individual player. A baseball generation can be patient or free-swinging, adept at making contact or prone to striking out, powerful or light-hitting, good at hitting the ball "where they ain't" or bad at avoiding defenders, aggressive on the base-paths or station-to-station.

So I'll use eight rate statistics - what I'll call the "attribute rates" - to measure the similarity of baseball cohorts. ("Each rate describes something specific," Tom Tango wrote, describing four of the rates.) These eight rates - representing eight different skills, or tools - taken together, reveal the "statistical image" or "peer personality" of a baseball generation; how its members collectively played the game.

The first rate, BF/G, uses pitching stats only. The next three, $BB, $SO, and $HR - the "three true outcomes" - draw from both batting and pitching stats (for the formulas listed below, I differentiate between batting and pitching with a small 'b' or 'p' in the variables). The last four use batting stats only.

BF/G - the number of batters a pitcher faces per game. = BF / G

$BB - the percentage of plate appearances that end in a walk or a hit by pitch.
= (bBB + bHBP + pBB + pHBP) / (PA + BF)

$SO - the percentage of plate appearances ending in a strike (called, swung and missed, or batted) that are strikeouts.
= (bSO + pSO) / (PA + BF - bBB - bHBP - pBB - pHBP)

$HR - the percentage of batted balls that are homeruns.
= (bHR + pHR) / (PA + BF - bBB - bSO - bHBP - pBB - pSO - pHBP)

$H - the percentage of balls batted into the field of play that are hits.
= (H - HR) / (PA - HR - BB - SO - HBP)

$XBH - the percentage of base hits that go for extra bases (doubles or triples).
= (2B + 3B) / (H - HR)

$3B - the percentage of extra-base hits that are triples. = 3B / (2B + 3B)

$SB - the percentage of successful stolen bases per (approximate) times on first.
= SB / (H - 2B - 3B - HR + BB + HBP)

So, now all we need are the career totals for every batter and every pitcher in MLB history. Then, on a separate sheet in Excel, it's just a matter of using SUMIF formulas to add up the necessary batting and pitching totals for each cohort. From the firstborn player (Nate Berkenstock, 1831) to the lastborn (Julio Urias, 1996), there are 166 MLB cohort birthyears (through 2016; I'm typing this just days after Ozzie Albies became the first 1997-born major-leaguer). The table below shows the career totals for 1980s cohorts:

Batting TotalsPitching Totals

All the batters born in 1980 combined for 151,352 plate appearances, 34,688 hits, and 4,575 homeruns. All the pitchers born that year combined for 11,496 bases on balls, 24,982 strikeouts, etc.

Then I can calculate the attribute rates for each cohort:


Next I need the standard deviations of each rate. I have 166 cohort birthyears, but many of the very early and very recent cohorts were not (or aren't yet) well-represented in the major leagues. So I'll set minimum requirements of 10,000 total plate appearances and 10,000 total batters faced, and therefore only include the 143 cohorts from 1850 (Al Spalding) through 1992 (Bryce Harper) in the population for my standard deviations.

Also, I'll need to assign weights to each rate. I wanted the "three true outcomes" rates ($BB, $SO, and $HR) to weigh double the other rates, because they use both hitting and pitching statistics, and I wanted the $XBH and $3B rates to weigh half the other rates, because they both deal with breakdowns of base hits. Finally, I wanted the weights to add up to 1,000, so that if two groups are exactly four standard deviations apart in every category, their similarity score will be zero.

St. Dev.

To find the similarity score between two groups, start at 1,000 and subtract a penalty for each attribute rate. The penalty is the difference between the two groups, times a multiplier. The multiplier is the rate's weight divided by (4 times its standard deviation).

For example, the 1980 cohort has a $BB rate of .097 and the 1981 cohort has a $BB rate of .091, a difference of .006. So the $BB penalty for 1980 and 1981 would be the difference (.006) times the multiplier (3,870), which is about 23. Add up the penalties for all eight rates and subtract from 1,000, and that is the similarity score.

To find "Epochs and Eras," Bill James asked of "every season in baseball history: Is it more like the season before it, or more like the season after it?" He then made two-year comparisons, three-year comparisons, four-year comparisons... comparing "each season to every other season in baseball history within 15 years before or after."

Instead of comparing each baseball cohort to other neighboring cohorts, I'm comparing them to the 15-year cohort-groups before and after. To find Baseball Generations, I'm asking of every baseball cohort: Is it more like the 15-year cohort-group before it, or more like the 15-year cohort-group after it?

Is the 1980 cohort more similar to the 1965-1979 cohort-group, or more similar to the 1981-1995 cohort-group?

1980 to 1965-1979 - 943
1980 to 1981-1995 - 916

The 1980 cohort is backward-looking, more similar to the cohort-group before it (943) than the cohort-group after it (916). What about 1981?

1981 to 1966-1980 - 938
1981 to 1982-1996 - 954

The 1981 cohort is forward-looking, more similar to the cohort-group after it (954) than the cohort-group before it (938).

To get previous and next cohort-groups for all 166 cohorts, I calculated attribute rates for every possible 15-year group, from the group before the first cohort (1816-1830) to the group after the last cohort (1997-2011), and all 180 groups in between. Attribute rates are calculated from batting and pitching totals. Cohort batting and pitching totals are found by adding up the career totals of the individual batters and pitchers belonging to each cohort; cohort-group batting and pitching totals are found by adding up the cohort totals of the 15 individual cohorts belonging to each cohort-group. (The 1816-1830 and 1997-2011 groups both have totals and rates of zero across the board, of course.)

The table below shows the similarity scores of the 1973-1993 cohorts to their respective previous and next 15-year groups. I also calculated a "forward score" for each cohort, which is simply its next-group similarity score MINUS its previous-group similarity score. The forward score shows just HOW forward- or backward-looking a cohort is. A positive forward score indicates a cohort is forward-looking and a negative score indicates it is backward-looking, and a score above +50 (or below -50) means that the cohort is VERY forward- (or backward-) looking.


I've shaded the positive forward scores green and the negative forward scores red. Every cohort from 1973 to 1983, except for 1978 and 1981, is backward-looking. Every cohort from 1984 to 1993 is forward-looking.

While Bill James declined to develop a "specific protocol...based on this method," he did state, as a general rule, that "an 'epoch' is formed by a series of forward-looking seasons, followed by a series of backward-looking seasons." But what he was really looking for was the "hard break" between epochs - a series of backward-looking seasons (the end of one epoch) followed by a series of forward-looking seasons (the beginning of a new epoch). He was looking for "fault lines" separating "natural groups of seasons," just as Strauss & Howe looked for boundaries separating cohesive cohort-groups.

I showed the 1973-1993 cohorts in the table above, not because those cohorts form a cohesive group, but because they're halves of two different groups; the second half of one group and the first half of the next group, with the boundary between the two groups appearing to fall between 1983 and 1984. But it's not a clean break; not all of the 1973-1983 cohorts are backward-looking, and the 1981-1983 cohorts are all fairly similar to both their respective groups.

I know if a cohort is forward- or backward-looking, and how forward- or backward-looking it is; now I need a way to determine if a cohort is part of a forward- or backward-looking trend. And since I do want a specific protocol for defining generations by an objective process, I'm adding what I'll call a "trend score" for each cohort. The trend score - as its name applies - checks each cohort's forward score to see if it is part of a trend. If a cohort's forward score is positive (forward-looking), the trend score adds to it the forward scores of the next two cohorts. If its forward score is negative (backward-looking), the trend score adds to it the forward scores of the previous two cohorts.

When at least three backward-trending cohorts are followed by at least three forward-trending cohorts, I draw a generational boundary between the last backward-trending cohort and the first forward-trending cohort.

Rather than trying to explain any further how or why trend scores work now, I'll go ahead and start locating generational boundaries and explain them as I go. Here are the first ten baseball cohorts, 1831 to 1840:


Bill James gave a couple of caveats to his rule for defining epochs:
"1) Sometimes it is not a series of backward-looking years that ends an epoch, but just one year, and 2) Sometimes what ends an epoch is not a backward-looking phase, but rather a large difference between two adjacent seasons."
I take it to also be true that sometimes it is just one cohort, or a large difference between two adjacent cohorts, that STARTS a generation, and I tried to build these caveats into my trend scores. Even though the 1835 cohort is the only forward-looking cohort in its group, it is SO different from the cohorts that came before that it should be the start of a new generation. (The 1835 cohort consists of Harry Wright, the firstborn player to have a real major league career; the two players older than him appeared in just one game each as forty-somethings.) So even though the 1836 and 1837 cohorts are backward-looking, they're forward-trending because 1835's forward score is so high it overwhelms their negative scores.

The next several generational boundaries are easy to spot, without trend scores. We can draw one between 1856 and 1857:


And 1873 and 1874:


And 1892 and 1893:


And 1911 and 1912:


It looks like there might be a boundary between 1922 and 1923:


There're several mostly backward-looking cohorts followed by several forward-looking cohorts. But the 1925 and 1926 cohorts aren't forward-looking enough to be forward-trending; so the forward trend fizzles after the 1923 and 1924 cohorts, which means it doesn't meet my standard of at least three forward-trending cohorts. Besides, a boundary here would mean a generation of just 11 cohort birthyears (1912-1922), which is too short to be a true generation.

The actual boundary is six years later, between 1928 and 1929:


This time the backward-trending cohorts (1925-1928) are followed by a sustained forward trend. The 1929 cohort is the MOST forward-looking cohort since 1914, in the first wave of the previous generation.

The forward trend lasts through the 1941 cohort, and is then followed by 20 consecutive backward-trending cohorts. The next generational boundary isn't until 1961/62, 33 birthyears after the last one.


And finally, we're back to the boundary between the two currently-active generations:


And it looks like the boundary is indeed between 1983 and 1984, at least for now. These cohorts are still adding to their batting and pitching totals. The 1981-1983 group could possibly slip into the younger generation (I hope it does, anyway; it's hard to imagine the baby-faced Miguel Cabrera of the 1983 cohort being in the same generation as Clemens and Bonds).

So that's eight boundaries, which divides every MLB player in history into nine generations. Leaving out the first and last (partial) generations, the cohort lengths of the middle seven baseball generations range from 17 to 33 years and average 21.3 years. This average nearly matches Strauss & Howe's 22-year "phase of life", or the length of a very long major league career.

GenerationBirth YearsBest Player
National1835-1856Cap Anson
American1857-1873Cy Young
Deadball1874-1892Ty Cobb
Ruthian1893-1911Babe Ruth
G.I.1912-1928Ted Williams
Expansion1929-1961Willie Mays
Steroid1962-1983Barry Bonds
Millennial1984-1996Mike Trout