Tuesday, August 30, 2016

The Odds Ratio Method

So the other day I wrote about how I break projections down into eight binary components. I do this for everything: batter projections, pitcher projections, league averages, and all kinds of splits.

Then for each set of components except for batters, I'd convert them to a factor of the league average, where 1 equals league average.

The first component is $BB, which = (BB + HBP) / PA.

Jered Weaver has a $BB factor of about .86 (7.3% / 8.6%). Angel Stadium in Anaheim has a $BB factor of .98 (yes, ballparks can affect walk rates, too). Left-handed batters facing right-handed pitchers have a $BB factor of 1.02, and batters on the road have a $BB factor of .96.

Then all those factors are multiplied by the batter's rate to find the probability for the match-up.

Joey Votto has a $BB of 19%. So when he faces Jered Weaver at Angel Stadium tonight, we would expect him to walk in 16% of his plate appearances against him (19% * .86 * .98 * 1.02 * .96 = 16%).

OK. Let's try another example: the Orioles' Chris Davis vs. the Cubs' Aroldis Chapman. Davis' $BB rate is 14%, and Chapman's $BB factor is 1.29. So the $BB of their matchup would be 14% * 1.29 = 18%. We'll ignore other factors this time and leave it at that. 18% seems pretty reasonable.

Next, $SO. Chris Davis' $SO rate is 38%, one of the highest in the majors for a regular. Chapman has a $SO factor of 1.94 - nearly twice the league average. So the $SO for their match-up would be 38% * 1.94 = 74%. I'm sure a Davis/Chapman match-up would produce a lot of whiffs, but 74% seems a little extreme. And that's without multiplying in the lefty-lefty platoon factor of 1.11, which brings the $SO rate for their match-up up to 82%.

Now imagine a hitter even more prone to strikeouts - a player with a 60% $SO rate against the MLB average. Against Chapman, his $SO rate would be.....116%! Obviously, no batter, no matter how inept, can strike out more times than he has opportunities. Anybody - if I went up there against Chapman, if your grandma did - would have a less than 100% chance of striking out, even if it was 99.9%.

So my old system was breaking down at the extremes. High-strikeout batters facing high-strikeout pitchers were being underrated. Power hitters facing homer-prone pitchers were being overrated.

Back in the '80s, Bill James introduced the Log5 method to find the probability that Team A will defeat Team B, based on Team A's and Team B's winning percentages. Later, with the help of a colleague, he expanded on the formula to account for an average other than .500 (an average winning percentage), so the formula could be used to find the probability of any kind of match-up - batting average, on-base percentage, free throw percentage - where the league average is something other than .500.

Bill recently wrote about the method again on his website (again, subscription required), going into detailed description (as only Bill James can) of the logic behind the method and the steps involved in figuring it.

But as Tom Tango explained, the Odds Ratio method gives you identical results.

"...you can use the Odds Ratio method for any mean.  For example, assume the league OBP is .333, you have a hitter who is .400 and the pitcher is .250.  What’s the resulting OBP?

"Odds(H) = .400/.600 = .667
Odds(P) = .250/.750 = .333
Odds(L) = .333/.667 = .500

"Odds = Odds(H) * Odds(P) / Odds(L)
= .667*.333/.500=.444

"If the Odds are .444 safe to 1 out, the the Rate is .444/(.444+1) = .308"

Let's try it with our Davis/Chapman matchup.

Davis' $SO rate is 38%, Chapman's is 43%, and the league average is 22%.

Odds(H) = .38/.62 = .613
Odds(P) = .43/.57 = .754
Odds(L) = .22/.78 = .282

Odds = Odds(H) * Odds(P) / Odds(L)
= .613 * .754 / .282 = 1.639

The odds of a strikeout are 1.639 to 1, or 1.639 / (1.639 + 1) = 62%.

62% is still very high, but more sane than 74%.

Now let's try our theoretical 60% whiffer.

Odds(H) = .60/.40 = 1.5
Odds(P) = .43/.57 = .754
Odds(L) = .22/.78 = .282

= 1.5 * .754 / .282 = 4.01

4.01 / (4.01 + 1) = 80%

Even a bad hitter (think an average college player, maybe) would run into one once every five times up against Chapman.

Tango goes on to expand the formula to include different league averages for the hitter and the pitcher...and honestly, I don't understand it:

"The full equation is:
Odds(matchup) Odds(H) * Odds(P)
----------------- = -----------------------
Odds(environment) Odds(envH) * Odds(envP)

"So, you have a hitter with an OBP of .400 in a league of .300 facing a pitcher with an OBP of .250 in a league of .350, and they are both playing in a league (or park) where the OBP is expected to be .380 for the league average player.  What’s the resulting OBP?

"Odds(matchup) (.400/.600) * (.250/.750)
------------- = -------------------------
(.380/.620) (.300/.700) * (.350/.650)

"Odds(matchup) = .590
Matchup OBP = .590/1.590 = .371"

So if anyone more schooled in math than I am (or who can decipher the above equation) is reading this...help? Tango? (edit: I've since figured this out - Tango just typed the formula in a confusing way. The odds(environment) - the .380/.620 - is not in the denominator, as it looks like in the formula.)

Anyways, Matt Haechrel expresses the modified Log5 formula like this (in an article on sabr.org):


...which is the formula I currently use in my spreadsheet. x = batter, y = pitcher, z = league. Gee, maybe I should test it to make sure it gives the same result as Tango's Odds Ratio Method.

Plugging the $SO rates for Davis and Chapman into it:

((.38*.43)/.22)/(((.38*.43)/.22)+((1-.38)(1-.43))/(1-.22)) =

(.1634/.22)/((.1634/.22)+(.62*.57)/(.78)) =

.7427 / (.7427 + .4531) = .7427 / 1.1958 = 62%

Ok, good. They're identical.

So from there, I multiply this match-up rate by all the other factors (platoon, park, etc.) like I did before. So with platoon factor multiplied in, the Davis/Chapman $SO is 62% * 1.11 = 69%.

This probably still isn't right. You're probably supposed to include the other factors in the rates for the batter or pitcher. Tango commented on Bill's Log5 article:

"Now, the power of the Odds Ratio form is that you can extend it to include other variables. Say you want to include the Home field advantage. That's a .540 record for the average team, or .54 wins per .46 losses, or 1.17 wins per loss.

"A .6 v .4 team at home:
1.5 x 1.17 / .67 = 2.63 wins per 1 loss, or .725 win%

"A .6 v .4 team on road:
1.5 / (.67 x 1.17) = 1.92 wins per 1 loss, or .658 win%

"You can also use it for things other than a .500 baseline, and include even more things like batter v pitcher to include home field and platoon advantage, etc. It's a bit more complex to account for the non-.500 baseline, but it flows right in once you see it."

I don't see it...not yet anyway. But this method is more right than my old way of doing it.


No comments:

Post a Comment