Archive for the ‘R’ Category

Run Expectancy and Markov Chains

Sunday, August 14th, 2011

Sorry for the long interval between entries – I hope to get back to posting on more regular basis. Continuing in the vein of my previous two posts, I’m still working my way towards baseball win expectancy, but I’m going to pause to examine run expectancy in a more detailed manner.

First, let’s look back at the run expectancy matrix from my last post. It was built by looking at each time a given base-out state occurred, and seeing how many runs were scored in the remainder of those innings (by utilizing the FATE_RUNS_CT field from Chadwick). I will refer to this as empirical run expectancy, as it is based on how many runs were actually scored following each base-out state.

Run Expectancy Matrix, Empirical
BASES 0 OUTS 1 OUT 2 OUTS
___ 0.539 0.289 0.111
1__ 0.929 0.555 0.24
_2_ 1.172 0.714 0.342
__3 1.444 0.984 0.373
12_ 1.542 0.948 0.464
1_3 1.844 1.204 0.512
_23 2.047 1.438 0.604
123 2.381 1.62 0.798

(more…)

The Distribution of Talent Between Teams

Wednesday, October 20th, 2010

Four years ago Tango had a very interesting post on how talent is distributed between teams in different sports leagues. I want to revisit and expand upon some of the points that came up in that discussion.

First, lets look at some empirical data. I scraped end of season records from the last ten years for the NFL, NBA and MLB from ShrpSports (I decided to omit the NHL from this analysis due to the prevalence of ties). The data is available here (click through) as a tab-delimited text file. I used R to analyze the data. If you don’t have R you can download it for free (if you use Windows I recommend using it in conjunction with Tinn-R, which is great for editing and interactively running R scripts). Here is the R code I used:

?View Code RSPLUS
records = read.delim(file = "records.txt")
lgs = data.frame(league=c("NFL","NBA","MLB"),teams=c(32,30,30),games=c(16,82,162))
lgs$var.obs[lgs$league == "NFL"] = var(records$win_pct[records$league == "NFL"])
lgs$var.obs[lgs$league == "NBA"] = var(records$win_pct[records$league == "NBA"])
lgs$var.obs[lgs$league == "MLB"] = var(records$win_pct[records$league == "MLB"])
lgs$var.rand.est = .5*(1-.5)/lgs$games
lgs$var.true.est = lgs$var.obs - lgs$var.rand.est
lgs$regress.halfway.games = lgs$games*lgs$var.rand.est/lgs$var.true.est
lgs$regress.halfway.pct.season = lgs$regress.halfway.games/lgs$games
lgs$noll.scully = sqrt(lgs$var.obs)/sqrt(lgs$var.rand.est)
lgs$better.team.better.record.pct = 0.5 + atan(sqrt(lgs$var.obs - lgs$var.rand.est)/sqrt(lgs$var.rand.est))/pi
lgs

Here is the resulting table:

(more…)