Statcast is slightly upon us. There is a new table in the gameday schema with the following fields: 'gameName', 'description', 'player_id', 'mph', 'distance', 'result'. The data is the same as you see in the "Feed" tab of mlb.com's GameDay app. There is a player ID and a play result and a description. The description looks like this: "George Springer flies out to right fielder Shin-Soo Choo. Batted ball speed 75 mph; distance of 233 feet." If you are following, this means there is no foolproof way to link the statcast data to the atbat it refers to as there is no atbat id provided. It is what it is. The mph and distance fields have been parsed from the description for you. Hopefully more will come later this season. As of 5/5 there were 14,751 hits to examine from 2015, if you include Spring Training. Good hunting.
There was a bug in 6.2 that was fixed just now if you were not getting the new data. Making the new load compatible with old games broke it :( Sorry for the inconvenience.
Well, that was fast! It seems there is some new data available this year. Whether this is StatCast data or not I do not know but it is found in the pregumbo.json file. The data looks like this for each ball in play.
The play_guid field has also been added to the pitch data and is now loaded to that table. This and the game name allow you to relate the ball in play to the pitch and atbat it occurred on.... read more
Thanks to some helpful users, and the lack of yet to be unveiled statcast data, we have published a BBOS 6.1. The errors with unicode text on play descriptions have been solved. Thanks to emschorsch for the code.
The order of game loading has been returned to chronological forward.
And if you loaded today's spring games you noticed that some of the atbat spanish tags had non unicode latin-1 characters in them. They have been updated to have those characters stripped as is done to the spanish commentary fields on the pitch data.
The version was not updated.
-The full release number was given because threading for Pitch F/X is finally here! Loading games takes about 1/3 of the time. Probably even more if you have more cores. The count of threads (instance of the program doing the same task at the same time) is available in the config file.
-Unicode errors when loading default pitch fields which include color commentary no longer happen. Color commentary is scrubbed of non Unicode characters prior to loading.
-.bat files have been removed. They needed yearly updates and did a disservice to our users. If you would like a .bat file to run bbos, simply make your own with which whichever command line options you would like.
-Retrosheet support has added loading of the Comment file for each game. Now you can read things like "and 50 soda bottles were thrown onto the field in protest" regarding a controversial call during a 1921 game.
-Retrosheet was updated to the latest version of Chadwick which contained two fields not previously loaded.
-Retrosheet and Pitch F/X no longer have the -All options. For either data set it is simply too much data to wisely load more then a year at one time.
I had some issues with Retrosheet running the old schema. Perhaps they updated some data, perhaps MySQL 5.5 is more finnicky. Regardless the updated schema is running the event files well now. Also added loading scripts for 2012 and 13 where missing.
Updated to play nicely with the latest versions of MySQL and to expand and detail the installation instructions.
Look for a "follow along" install video to be posted as soon as we figure out how one makes a web video...
All available leagues will have their own easy load scripts.
Upcoming functions for BBOS 4.0 will be;
1) Support unmodified standalone MySQL 5.X instance and internal server.
2) Investigate possible slowdown do to schema changes, performance enhancements
3) Retrosheet support returns!
ETA January, anyone who would like to help is more then welcome!
Contains many schema changes to shrink the size of the database to field minimums (thanks to Colin Wyers) and support for player demographic information including the ability to load demographic information separately from game information. Details of loading demographic information contained in Installation guide in doc folder.
Has support for 2009. If you already have a database and want to no reload all the games you must add the following columns to your tables to run against 2009.
`spin_dir` float DEFAULT NULL,
`spin_rate` float DEFAULT NULL,
sb SMALLINT(2), -- stolen bases
3.1.0 adds a log file which contains most of the actions BBOS is taking and all SQL it executes. The screen now only displays the progress through each loading game. All years 2006-2008 now load without SQL warnings. The location of MySQL is now contained in the log file and any MySQL install should be able to be used with the code.
Project re-released in an all Python form with no
Retrosheet data as of yet.
Simple way to get your own MySQL Pitch
It is an open source project and so all the code is
available in the download.
Goal was to stop many different people from having to create their own database.
The database has two schema's. The Gameday schema largely contains
simply a translation of the Gameday XML files into MySQL tables. This
should accommodate anyone who is simply looking for an easy way to grab
the data. The MLB schema contains a lot of information built from the
raw data. Each pitch/atbat/player/game etc has been given an ID and
each type of information derived from the data is split out into
tables and some bad information has been filtered out. These tables
should greatly enhance a person's ability to quickly start asking
questions of the data. If you have custom data needs then there is
even a place to put your custom SQL so that it will get run each time
you update the database.... read more
Baseball on a stick is converting itself to Python for simplicity and ease of enhancement. It will also integrate itself with the data model published by Dan Turkenkopf. The first Python release will not contain retrosheet data and should be out soon.