Sunday, November 20, 2016

The Generations of Baseball, part three

Continued from part two.
"In some respects, a peer personality gives heavy focus to the attitudes and experiences of the generational elite...('the heads of society, the kings of thought, the lords of the generation'). But while they commonly express the tone of a generation's peer personality, the personality itself is often established by non-elites" (Generations page 64).
The following table shows PA and BF totals and attribute rates for the 1964 cohort. I listed all batters with at least 8,000 PA and all pitchers with at least 8,000 BF ("the generational elite"), all other batters and pitchers (the "non-elites"), and the cohort totals. As you can see, the "non-elites" as a whole overwhelm "elites" in raw numbers of PA and BF, but elites still obviously have a lot of impact on their cohort's attribute rates (Barry Bonds accounted for roughly 6% of his cohort's total plate appearances, while Kenny Rogers pitched to about 8% of its batters faced).

PlayerPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Barry Bonds12,606.211.155.091.284.312.114.124
Rafael Palmeiro12,046.120.127.061.282.254.061.030
Mark Grace9,290.119.078.023.308.245.081.025
B.J. Surhoff9,106.074.099.025.289.225.087.061
Barry Larkin9,057.110.101.027.304.241.147.145
Will Clark8,283.120.163.047.325.257.097.028
Ellis Burks8,177.104.183.059.312.265.135.084
Jose Canseco8,129.122.272.089.299.250.040.098
Kenny Rogers14,28018.7.091.152.031
Dwight Gooden11,70527.2.088.215.025
John Burkett11,32425.4.070.168.029
Bobby Witt11,00325.6.129.204.033
Bret Saberhagen10,42126.1.051.173.027
Kevin Tapani9,60026.6.063.165.035
All other batters138,971.091.169.023.279.221.128.081
All other pitchers114,1299.3.101.168.032
Totals215,665182,46212.0.099.166.032.286.234.115.079

(Batting stats for pitchers are included in "All other batters"; pitching stats for position players are included in "All other pitchers").

Here are stats for seven consecutive cohorts, 1961-1967 (the first wave of Generation X):

CohortSample MemberPABFBF/G$BB$SO$HR$H$XBH$3B$SB
1961Don Mattingly114,102114,88811.8.088.170.029.283.221.127.082
1962Roger Clemens162,446181,14713.7.093.179.031.289.234.112.073
1963Randy Johnson189,945163,10211.3.096.183.035.292.229.107.073
1964Barry Bonds215,665182,46212.0.099.166.032.286.234.115.079
1965Craig Biggio192,592167,78312.7.093.177.033.285.233.111.065
1966Greg Maddux138,533224,28711.1.093.180.034.288.244.088.052
1967John Smoltz215,318197,16910.9.095.177.033.289.236.109.090

And here is Generation X divided into three cohort-groups:

Cohort-GroupSample MemberPABFBF/G$BB$SO$HR$H$XBH$3B$SB
1961-1967Barry Bonds1,228,6011,230,83811.8.094.176.033.288.233.110.074
1968-1974Pedro Martinez1,252,5181,167,27410.1.097.183.037.295.244.091.060
1975-1981Alex Rodriguez1,168,4411,206,18610.7.093.194.038.292.251.098.065
Generation X3,649,5603,604,29810.9.095.184.036.292.243.099.066

And here again are the eleven baseball-playing Strauss & Howe generations - the nine MLB generations (Gilded through Millennial), the purely amateur Transcendentals, and the (so far) purely little-league Homelanders:

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Transcendental1792-182100
Gilded1822-184210,0425,94535.4.028.016.002.283.142.293.039
Progressive1843-1859653,547512,02735.9.051.067.005.272.210.277.093
Missionary1860-18821,822,8041,926,47031.5.083.091.006.276.201.294.143
Lost1883-19001,868,4901,941,12922.2.084.095.009.281.216.248.086
G.I.1901-19242,265,0392,225,34819.1.093.100.019.280.224.185.036
Silent1925-19421,989,6551,939,35615.3.091.152.028.272.203.159.040
Boom1943-19602,915,9432,904,73914.4.091.152.026.279.212.134.071
Generation X1961-19813,649,5603,604,29810.9.095.184.036.292.243.099.066
Millennial1982-20041,154,4831,234,29810.0.090.218.036.298.249.103.068
Homelanders2005-?00

But like I said in part one, these are social generations. Strauss & Howe were trying to identify generations based on how they shape and react to history and each other.

My goal is much simpler: I want to identify baseball generations based on how (if at all) they played major league baseball. In part two I defined my tool for identifying baseball generations: similarity scores based on attribute rates.

I'll use the Strauss & Howe generations as a starting point. Then I can ask of each cohort, does it actually belong in this generation? or should it be in a neighboring one?

Let's look again at the first wave of Generation X, the 1961-1967 cohorts. Here are the similarity scores of each cohort to its own generation (Gen X) and to its next-older generation (Boom):

CohortBoomGen X
1961929870
1962893940
1963856968
1964878916
1965894939
1966847961
1967858953

As you can see, each cohort is in fact more similar to Generation X than it is to the Boom Generation, except for the first one, 1961, which is more similar to the Boomers (929 to 870). So based on these numbers, the 1961 cohort belongs in the Boom Generation instead of Generation X.

So what I did is compare every cohort to every generation. Starting with the firstborn major leaguer (Nate Berkenstock, born 1831) and ending with 2016's youngest player (Julio Urias, born 1996), I have 166 MLB cohorts. And there are eleven generations, although the first and last (Transcendentals and Homelanders) are statistically identical, because their members had no major league experience.

To find the statistical similarity of a cohort and a generation (or of any two players or groups), start at 1,000 and subtract a penalty for each attribute rate. The penalty is the difference (or absolute value) between the cohort and the generation, times a multiplier. The multiplier is the weight I assigned to the attribute rate, divided by (4 times its standard deviation).

BF/G$BB$SO$HR$H$XBH$3B$SB
St. Dev.8.3.013.047.012.011.019.071.040
Weight9018018018090909090
Multiplier2.73,4839553,8062,0171,166315569

I have 166 total cohorts, but for standard deviations, I only wanted to include cohorts with at least 10,000 total PA and 10,000 total BF, so I limited my population to the 143 cohorts born between 1850 and 1992.

I wanted the weights of the "three true outcomes" rates ($BB, $SO, and $HR) to be double the other weights, because they use both hitting AND pitching stats, while the other rates only use one or the other. And I wanted the weights to add up to close to 1,000, so that if two groups are exactly four standard deviations apart in every category, they will have a similarity score near zero.

Don Mattingly's 1961 cohort has a $HR rate of 2.9%, while the $HR rate of Generation X overall is 3.6%. For the similarity score of that cohort to its generation, the $HR penalty is .007 (.036 - .029) times the rate multiplier of 3,806, or about 27.

Add up all eight penalties and subtract from 1,000, and that is the similarity score.

I made a worksheet of the similarity scores of every cohort to every generation, and used conditional formatting to create a "heat map" of the scores, where 1,000 is green and zero (or negative) is red, and everything in between is on a gradient. Below is a screenshot of the portion of the sheet showing the Gen X cohorts (and the first Millennial cohort):


And next to it, I made another table. For each cohort I added a variable to its similarity scores so that the highest score always equals 1,000, and then I highlighted all the 1,000's. It's not as pretty as the above table, but it's more useful for showing which generation each cohort belongs in.


Generation X's last-born cohort, 1981, also jumps ship, going over to the Millennials. And this is in fact where I end up for the birthyears of baseball's Generation X - 1962 to 1980. But I didn't start here. I started at the beginning.


The 1831-1834 (and 1837) Gilded cohorts are much more similar to the Transcendentals (and to the Homelanders, who, as I said, are statistically identical to the Transcendentals). So I'll move the 1831-1834 cohorts to the Transcendental Generation (the pre-MLB 1822-1830 cohorts go too). That pushes the Transcendental/Gilded boundary from 1821-1822 to 1834-1835:

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Transcendental1792-1834700.0.000.429.000.000.000.000.000
Gilded1835-184210,0355,94535.4.028.016.002.283.142.293.039
Progressive1843-1859653,547512,02735.9.051.067.005.272.210.277.093
Missionary1860-18821,822,8041,926,47031.5.083.091.006.276.201.294.143

You can tell from the above table of highlighted 1,000's that most of the 1835 to 1851 cohorts were already more similar to the Gilded Generation. But before I even look at the similarity scores again, I'm enforcing a minimum length for these generations. The shortest Strauss & Howe generation is 17 years; I'll relax that by one year and say the baseball generations must be at least 16 years.

The Gilded Generation is currently reduced to eight years (1835-1842). So it gets pushed out eight more years to 1850, which means the Progressive Generation gets extended, too (which shortens the Missionary Generation to 16 years exactly):

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Transcendental1792-1834700.0.000.429.000.000.000.000.000
Gilded1835-1850127,17981,27937.4.027.031.003.285.179.256.058
Progressive1851-18661,119,915916,49835.4.070.078.007.271.210.287.137
Missionary1867-18821,239,2921,446,66530.4.083.094.006.277.200.294.131

Since the generations' birthyears have changed, so too have their attribute rates. Which means the similarity scores will be different:


Since the Gilded birthyears are now 1835-1850, that generation "pulls in" not only most of those cohorts, but 1851-1853 and 1855 as well. So I can add the 1851-1853 cohorts to the Gildeds, which means I have to extend the Progressives, Missionaries, and Lost (to get them back to 16 years), and cut into the G.I. birthyears:

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Gilded1835-1853205,755138,21237.0.031.039.003.279.181.251.053
Progressive1854-18691,275,2111,226,75534.8.076.080.007.274.211.294.144
Missionary1870-18851,290,1121,313,81028.2.082.101.005.274.199.289.127
Lost1886-19011,663,1761,787,79922.0.084.092.010.283.219.240.078
G.I.1902-19242,185,6612,144,34319.0.094.101.019.280.224.184.036

Which changes the similarity scores again:


The 1854 cohort is still most similar to the Progressives, but now it's more similar to the Gildeds than the 1855 cohort is to the Progressives. So I'll move both cohorts to the Gildeds, which pushes each of the next three generations out another two years:

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Gilded1835-1855303,360210,47936.4.036.049.004.274.189.254.055
Progressive1856-18711,314,6921,253,84834.5.079.080.007.276.211.296.149
Missionary1872-18871,390,1291,429,22827.3.082.103.006.274.200.287.126
Lost1888-19031,615,4721,732,26821.6.084.090.011.285.224.231.066
G.I.1904-19241,996,2621,985,09618.9.094.103.019.278.222.181.035

Which pulls not only the 1854 cohort but also the 1856 cohort into the Gilded Generation:


I think you get the idea by now. After I add the 1856 cohort to the Gilded group, the 1857 and later cohorts remain more similar to the Progressives, which means the Gilded group isn't pulling in any more cohorts. The birthyears have "locked", giving me an opportunity to paraphrase page 82 of Generations:

All things have a beginning, and so must the story of baseball generations.

I start with the cohort-group of 1835 through 1856. I call it the "National Generation." 469 members of this group appeared in at least one major league game. (For the purposes of this study, I consider the National Association, 1871-1875, a "major" league; it certainly was to Harry Wright and his peers). It includes every member of the 1869 Cincinnati Red Stockings (the first openly professional team), nearly every player in the National Association (the first professional league), and most of the players in William Hulbert's National League (1876-1881).

A couple of earlier-born players appeared in the National Association, but they were both forty-somethings who played just one game each. Besides them, the firstborn MLB players were Harry Wright (born 1835) and Dickey Pearce (born 1836). Both played seven years between the NA and NL, and both were key pioneers of the professional game.

After applying the method described in this post through to the end, I arrive at these birthyears for the generations of baseball:

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Transcendental1792-1834700.0.000.429.000.000.000.000.000
Gilded1835-1856379,169319,45736.3.038.055.004.272.195.262.063
Progressive1857-18731,397,5851,336,70034.1.082.080.007.278.209.297.149
Missionary1874-18921,801,7431,875,10625.4.082.104.006.274.204.275.115
Lost1893-19111,829,0431,928,19220.6.086.089.014.287.229.204.046
G.I.1912-19281,618,9361,487,75017.4.098.113.022.273.215.176.032
Silent1929-19441,893,2011,941,91915.2.090.158.028.272.202.154.046
Boom1945-19612,719,9382,680,77814.1.091.152.027.280.214.133.073
Generation X1962-19803,396,9633,326,58910.8.095.184.036.292.243.098.065
Millennial1981-19961,292,9781,397,11910.1.090.217.036.298.249.105.070
Homelanders1997-?00

Here is the same table again, but with a few changes. I gave the earlier generations more appropriate names. Also, I dropped the first 20 cohorts of the generation formerly known as the Transcendentals, and started it at 1812 (birthyear of Duncan Curry, first Knickerbocker president). Lastly, since this study is about grown men playing organized baseball at the highest level (which is what sets the Knickerbocker Generation apart from earlier generations), I omitted the Homelanders (for now):

GENERATIONBIRTH YEARSPABFBF/G$BB$SO$HR$H$XBH$3B$SB
Knickerbocker1812-1834700.0.000.429.000.000.000.000.000
National1835-1856379,169319,45736.3.038.055.004.272.195.262.063
American1857-18731,397,5851,336,70034.1.082.080.007.278.209.297.149
Dead-Ball1874-18921,801,7431,875,10625.4.082.104.006.274.204.275.115
Live-Ball1893-19111,829,0431,928,19220.6.086.089.014.287.229.204.046
G.I.1912-19281,618,9361,487,75017.4.098.113.022.273.215.176.032
Silent1929-19441,893,2011,941,91915.2.090.158.028.272.202.154.046
Boom1945-19612,719,9382,680,77814.1.091.152.027.280.214.133.073
Generation X1962-19803,396,9633,326,58910.8.095.184.036.292.243.098.065
Millennial1981-19961,292,9781,397,11910.1.090.217.036.298.249.105.070

No comments:

Post a Comment