So started playing with the retrosheet data. Retrosheet is a non-profit organization that gathers baseball stats and presents them in consumable formats.
Then some dude wrote this software called chadwick which allows you to parse the data and import it into a database. The article I found created a MySQL database which is ok for small stuff. The event data (every plate appearance since 1952) is 9.4 million rows. I didn’t really have the patience to learn how to optimize this using MySQL, so I quickly moved the data over to a SQL Server instance. So now I have a SQL Server database with every play from 1952-2012.
I have run several queries against the data, created some stored procedures for Run Expectancy and such. One of the searches I am working on is who is the player that has hit the most Home runs with an 0-2 count with 2 outs. See the chart below.
Sammy “No Comprende” Sosa tops the list with 11 jacks on an 0-2 2 out count. In fact we have several who’s who of the steroid era in this list. Jose Valentin has the least amount of career bombs but hits our list at number 7. 7.32% of the time with a 0-2 2 out count he hits a home run, and a full 3.61% of his Home runs were hit on this count. Kendrys Morales has the fewest TotalABs at this count and is leading in PCT HRs on this count. His PCT will go down as he gets more ABs.
CTE used to get Initial Counts.
WITH HRCTE AS
(select p.firstname + ' ' + p.lastname Player, COUNT(1) TotalHR, p.id
FROM events e JOIN lkup_people p ON e.BAT_ID = p.id
WHERE STRIKES_CT = 2
and BALLS_CT = 0
AND OUTS_CT = 2
AND EVENT_CD = 23
GROUP BY p.firstname + ' ' + p.lastname, p.id)
SELECT r.player, r.TotalHR, COUNT(1)
FROM events e JOIN HRCTE r on e.BAT_ID = r.id
WHERE STRIKES_CT = 2
and BALLS_CT = 0
AND OUTS_CT = 2
GROUP BY r.player, r.TotalHR
ORDER BY 2 DESC
0 comments:
Post a Comment