Run Expectancy and Base-Out Leverage Index
December 19, 2010Posted by in Databases,Leverage Index,Retrosheet,Run/Win Expectancy
Before I get into win expectancy, Win Probability Added, Leverage Index, and WPA/LI, I want to take a look at run expectancy, RE24, Base-Out Leverage Index, and RE24/boLI. Most of these stats were created and/or popularized by Tangotiger (his intro to Leverage Index is here). I have tried to mimic his methodology as closely as possible, but there may be some differences.
Run Expectancy by Base-Out State
From the Retrosheet database I detailed in my last post, I used the following SQL query to generate a run expectancy matrix for 1993-2010 (Retrosheet recently released their event files for the 2010 season, which I added to my database). Note that I am limiting the matrix to “batter events” (using “BAT_EVENT_FL”), to eliminate base-running events such as steals.
SELECT e.OUTS_CT , e.START_BASES_CD , SUM(e.EVENT_RUNS_CT + e.FATE_RUNS_CT) AS RUNS , COUNT(*) AS PA , AVG(e.EVENT_RUNS_CT + e.FATE_RUNS_CT) AS RUN_EXP FROM retrosheet.events e, retrosheet.non_partial_non_home_half_ninth_plus_innings i WHERE e.GAME_ID = i.GAME_ID AND e.INN_CT = i.INN_CT AND e.BAT_HOME_ID = i.BAT_HOME_ID AND e.BAT_EVENT_FL = 'T' AND e.YEAR_ID >= 1993 AND e.YEAR_ID <= 2010 GROUP BY OUTS_CT , START_BASES_CD ; |
Here is a reformatted summary of the resulting table, showing the run expectancy for each base-out state:
Run Expectancy Matrix, 1993-2010 | |||
BASES | 0 OUTS | 1 OUT | 2 OUTS |
___ | 0.539 | 0.289 | 0.111 |
1__ | 0.929 | 0.555 | 0.240 |
_2_ | 1.172 | 0.714 | 0.342 |
__3 | 1.444 | 0.984 | 0.373 |
12_ | 1.542 | 0.948 | 0.464 |
1_3 | 1.844 | 1.204 | 0.512 |
_23 | 2.047 | 1.438 | 0.604 |
123 | 2.381 | 1.620 | 0.798 |
Using these numbers, we can calculate a player’s RE24. For each of a player’s plate appearances, we take the RE from the base-out state when they came to bat, and subtract it from the sum of the runs scored on the play and the RE of the base-out state following the at-bat.
Base-Out State Transitions
RE24 looks at the change in run expectancy from one base-out state to another on a player level. Base-Out Leverage Index is a way of looking at the distribution of such changes starting from each of the 24 base-out states. Not only does each base-out state have a different starting run expectancy (as displayed above), it also has a different range of possible changes in run expectancy. To examine this we need to see how frequently each base-out state transitions into each of the other base-out states. Here’s a query to do that:
SELECT e.OUTS_CT , e.START_BASES_CD , e.EVENT_OUTS_CT , e.END_BASES_CD , e.EVENT_RUNS_CT , COUNT(*) AS COUNT FROM retrosheet.events e, retrosheet.non_partial_non_home_half_ninth_plus_innings i WHERE e.GAME_ID = i.GAME_ID AND e.INN_CT = i.INN_CT AND e.BAT_HOME_ID = i.BAT_HOME_ID AND e.BAT_EVENT_FL = 'T' AND e.YEAR_ID >= 1993 AND e.YEAR_ID <= 2010 GROUP BY OUTS_CT , START_BASES_CD , EVENT_OUTS_CT , END_BASES_CD , EVENT_RUNS_CT ; |
I’ve summarized the resulting data in a table . I think this is a pretty interesting way of looking at the data. Each row represents a different starting base-out state. The columns represent different ending base-out states. The percentages within each row show how frequently that start state transitioned into each of the end states. The run expectancies for all the start and end states are listed on the side and at the top, and I have highlighted each percentage to indicate whether that particular transition resulted in 0, 1, 2, 3 or 4 runs scoring on the play. This gives all the information needed to calculate the changes in run expectancy that make up RE24. For a transition from starting state S to ending state E, just add the runs scored on the transition to the RE of E, and subtract the RE of S.
Base-Out Leverage Index
Using the data from these two queries we can calculate the Base-Out Leverage Index for each base-out state. I have put everything together in Google spreadsheet. I have left in all the formulas so you can follow along with the steps I took. The first several columns of the “states” sheet are the results from the run expectancy query above, and the first several columns of the “transitions” sheet are the results from the base-out state transitions query.
Base-Out Leverage Index for a given starting base-out state is calculated by taking a weighted average of the absolute changes in run expectancy arising from all of the possible transitions to different ending base-out states. The different changes in RE are weighted by the frequency of each transition. On the “transitions” sheet I first calculated the change in run expectancy for each base-out state transition, and then on the “states” sheet I calculated the frequency-weighted average of the absolute values of all the possible changes from each state. The result is the “AVG_ABS_CHG_RE” column, which represents the leverage for each state represented in terms of runs. Here is a summary table of that data:
Base-Out Leverage Index, 1993-2010 (unscaled, units are runs) | |||
BASES | 0 OUTS | 1 OUT | 2 OUTS |
___ | 0.332 | 0.237 | 0.147 |
1__ | 0.561 | 0.451 | 0.316 |
_2_ | 0.465 | 0.454 | 0.420 |
__3 | 0.423 | 0.472 | 0.460 |
12_ | 0.737 | 0.732 | 0.625 |
1_3 | 0.655 | 0.676 | 0.669 |
_23 | 0.564 | 0.566 | 0.713 |
123 | 0.880 | 0.963 | 1.071 |
The final step is to re-scale these values from runs into unitless ratios where the average plate appearance has a leverage of 1, with higher leverage states being greater than 1 and lower leverage states being between 0 and 1. Here is a table showing the final Base-Out Leverage Index values (the values are pretty close to those listed here):
Base-Out Leverage Index, 1993-2010 (scaled so average is 1, unitless) | |||
BASES | 0 OUTS | 1 OUT | 2 OUTS |
___ | 0.87 | 0.62 | 0.38 |
1__ | 1.46 | 1.18 | 0.83 |
_2_ | 1.21 | 1.18 | 1.09 |
__3 | 1.10 | 1.23 | 1.20 |
12_ | 1.92 | 1.91 | 1.63 |
1_3 | 1.71 | 1.76 | 1.74 |
_23 | 1.47 | 1.48 | 1.86 |
123 | 2.29 | 2.51 | 2.79 |
RE24/boLI
Now that we have the data for RE24 and boLI, we can calculate RE24/boLI. By taking the RE24 for a state transition and dividing it by the boLI for the starting state, we are basically normalizing (or standardizing) RE24. We are de-leveraging RE24, which means we are removing the impact of the leverage of the state. If a player comes up to bat in a high leverage state (where large positive or negative changes in run expectancy are typical), RE24 credits that to the player, while RE24/boLI takes the initial leverage as a given and just gives the player credit for how his transition fares relative to the other possible transitions from that state.
For example, we can compare the values of an out and a home run in the lowest boLI state (2 outs, bases empty) and the highest boLI state (2 outs, bases loaded).
Low boLI | High boLI | |||
Event | Out | HR | Out | HR |
Outs | 2 | 2 | ||
Bases | ___ | 123 | ||
Avg Abs Chg RE | 0.147 | 1.071 | ||
boLI | 0.38 | 2.79 | ||
Start RE | 0.111 | 0.798 | ||
Runs on Play | 0 | 1 | 0 | 4 |
End RE | 0 | 0.111 | 0 | 0.111 |
RE24 | -0.11 | 1.00 | -0.80 | 3.31 |
RE24/boLI | -0.29 | 2.61 | -0.29 | 1.19 |
With the bases empty, a home run has an RE24 of exactly 1, while with the bases loaded a home run has an RE24 of 3.31. For assigning retrospective value, this makes sense. But if we want to evaluate players relative to the context in which they happened to come up to bat, we can de-leverage the situations by dividing by boLI. In the low-leverage none-on state, the home run may only add 1 run, but that is almost seven times larger than the typical result from that state (average absolute RE24 of 0.147). On the other hand, in the high-leverage bases-loaded state, the home run adds 3.31 runs, but that is only three times larger than the typical change in RE of +/- 1.071. Thus in RE24/boLI, the bases empty home run has over twice the value of the bases loaded home run. This also highlights how RE24 and RE24/boLI differ from a linear weights formula like wOBA — RE24 values the high leverage home run more than the low leverage home run, linear weights formulas value them equally (as they treat all base-out states identically), and RE24/boLI values the high leverage home run less than the low leverage home run.
March 11th, 2012 at 10:45 am
This is an amazing article. Thank you for taking the time to break down the process step by step and allowing me to follow on SQL. LI and RunEx have this intimidating aura about them that only quantum physicists can understand, but this helps illustrate that it’s actually quite simple.
I think one of the queries took about 74 hours to run on my old laptop, hahah.
Ideally, though, I’d like to create an RE24 table in order to calculate it for individual players in particular circumstances using Retrosheet. But I imagine the queries would be considerably long, has anyone tried it?
June 28th, 2012 at 7:18 pm
Given the lower scoring era we appear to be in now, especially as described by Eric Walker at his website – – it would be interesting to see the run expectancy table for just the past few seasons, even if that is not enough data to use that table (don’t know how much data is necessary), just to see the difference between this new lower era and the higher 1993-2008 (or 9; don’t recall) period.