Archive for the ‘Retrosheet’ Category

Run Expectancy and Markov Chains

Sunday, August 14th, 2011

Sorry for the long interval between entries – I hope to get back to posting on more regular basis. Continuing in the vein of my previous two posts, I’m still working my way towards baseball win expectancy, but I’m going to pause to examine run expectancy in a more detailed manner.

First, let’s look back at the run expectancy matrix from my last post. It was built by looking at each time a given base-out state occurred, and seeing how many runs were scored in the remainder of those innings (by utilizing the FATE_RUNS_CT field from Chadwick). I will refer to this as empirical run expectancy, as it is based on how many runs were actually scored following each base-out state.

Run Expectancy Matrix, Empirical
BASES 0 OUTS 1 OUT 2 OUTS
___ 0.539 0.289 0.111
1__ 0.929 0.555 0.24
_2_ 1.172 0.714 0.342
__3 1.444 0.984 0.373
12_ 1.542 0.948 0.464
1_3 1.844 1.204 0.512
_23 2.047 1.438 0.604
123 2.381 1.62 0.798

(more…)

Run Expectancy and Base-Out Leverage Index

Sunday, December 19th, 2010

Before I get into win expectancy, Win Probability Added, Leverage Index, and WPA/LI, I want to take a look at run expectancy, RE24, Base-Out Leverage Index, and RE24/boLI. Most of these stats were created and/or popularized by Tangotiger (his intro to Leverage Index is here). I have tried to mimic his methodology as closely as possible, but there may be some differences.

(more…)

Building a Retrosheet Database, Part 1

Wednesday, October 27th, 2010

I want to be able to calculate Tangotiger’s WPA/LI stat (Win Probability Added/Leverage Index, a.k.a. situational wins, context neutral wins, or game state linear weights). To do that, I need to be able to calculate WPA and LI. To do that, I need to construct a Win Expectancy matrix. To do that, I need to build a Retrosheet database. So that’s where I’m going to start. I’ve never worked with a database or explored any Retrosheet data before, so I am starting from scratch (though I will be utilizing a lot of great resources from around the web). In a series of posts I will describe my process step-by-step. If you want to follow along, make sure you have a lot of free disk space (the parsed data files for all seasons take up over 5 GB). Also be aware that some of my instructions will be Windows-specific.

(more…)