Errors when importing Gameday pitch data with comments

Help
Jeremy
2013-05-11
2014-03-08
  • Jeremy

    Jeremy - 2013-05-11

    I noticed I am getting some errors downloading Gameday pitch data. The problem seems to be limited to pitches where there is a commentary from the "Gameday Scout". I've tracked down one example pitch that gave an error and wasn't added to the database. From the 5/8 game between KC and Baltimore, Chris Tillman's fourth pitch to Eric Hosmer in the top of the 6th wasn't recorded. Here's the Gameday xml for that pitch:

    <pitch des="In play, out(s)" des_es="En juego, out(s)" id="339" type="X" tfs="003109" tfs_zulu="2013-05-09T00:31:09Z" x="80.69" y="170.96" sv_id="130508_203159" start_speed="88.8" end_speed="81.3" sz_top="3.41" sz_bot="1.5" pfx_x="-3.68" pfx_z="12.51" px="0.51" pz="1.471" x0="-2.008" y0="50.0" z0="6.666" vx0="7.638" vy0="-129.543" vz0="-11.085" ax="-6.203" ay="28.938" az="-11.006" break_y="23.8" break_angle="18.8" break_length="3.4" pitch_type="FF" type_confidence=".894" zone="14" nasty="56" spin_dir="196.331" spin_rate="2470.342" cc="Chris Tillman really has his changeup working; he�s allowing a .223 average against it this season and Royals hitters are 1-for-4 against it in this game." mt=""/>
    

    The scout commentary is contained in the cc field (second to last field in the series). I think the problem is the weird character that is supposed to be an apostrophe in "he's". Has anyone seen this, and if so, is there a solution/workaround?

    Thanks for your help.

     
    • Brian L Cartwright

      Yes, I’ve seen this.

      This year they apparently began using “he’s” in the cc field, and the apostrophe is throwing an encoding error.

      I do not have a good fix for this yet, and am very interested to get one.

      What I did instead was to delete the cc column from the pitches table (I never use that column). The pitch gets recorded, but once per game there’s an error message about the missing column, which lengthens the time to download.

      Brian Cartwright

      From: Jeremy
      Sent: Saturday, May 11, 2013 5:57 PM
      To: [baseballonastic:discussion]
      Subject: [baseballonastic:discussion] Errors when importing Gameday pitch data with comments

      I noticed I am getting some errors downloading Gameday pitch data. The problem seems to be limited to pitches where there is a commentary from the "Gameday Scout". I've tracked down one example pitch that gave an error and wasn't added to the database. From the 5/8 game between KC and Baltimore, Chris Tillman's fourth pitch to Eric Hosmer in the top of the 6th wasn't recorded. Here's the Gameday xml for that pitch:

      <pitch des="In play, out(s)" des_es="En juego, out(s)" id="339" type="X" tfs="003109" tfs_zulu="2013-05-09T00:31:09Z" x="80.69" y="170.96" sv_id="130508_203159" start_speed="88.8" end_speed="81.3" sz_top="3.41" sz_bot="1.5" pfx_x="-3.68" pfx_z="12.51" px="0.51" pz="1.471" x0="-2.008" y0="50.0" z0="6.666" vx0="7.638" vy0="-129.543" vz0="-11.085" ax="-6.203" ay="28.938" az="-11.006" break_y="23.8" break_angle="18.8" break_length="3.4" pitch_type="FF" type_confidence=".894" zone="14" nasty="56" spin_dir="196.331" spin_rate="2470.342" cc="Chris Tillman really has his changeup working; he�s allowing a .223 average against it this season and Royals hitters are 1-for-4 against it in this game." mt=""/>
      The scout commentary is contained in the cc field (second to last field in the series). I think the problem is the weird character that is supposed to be an apostrophe in "he's". Has anyone seen this, and if so, is there a solution/workaround?

      Thanks for your help.


      Errors when importing Gameday pitch data with comments


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/baseballonastic/discussion/820145/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Jeremy

    Jeremy - 2013-05-13

    Thanks for the reply, Brian. That's one workaround. I don't use that column either, so it wouldn't bother me too much to delete it. Do you have any sense of how many pitches have this problem? I don't watch the updates very carefully, but I've noticed maybe one pitch per day. You think that this problem only started this year? I'm wondering if it's worth dumping all of 2013 data and reimporting without the cc column.

    As far as another solution goes, is the problem in reading from the xml file or is it writing into the database? I assume it's the latter. Maybe there's some way for Python to either delete or replace that character? I'm a serious novice with Python, but I'd be happy to look into it if there was some indication that a solution was possible.

     
    • Deez Nutz

      Deez Nutz - 2013-05-13

      There is a way, I just don't have any time right now. I believe you also
      could just remove that column from the config file in
      src/bbos/config/gamedayConfig.py. I believe we are referring to the des
      field on the pitch tag. That should cause bbos to download without error
      and neither read nor write that field.

      On Mon, May 13, 2013 at 2:39 PM, Jeremy jerepierre@users.sf.net wrote:

      Thanks for the reply, Brian. That's one workaround. I don't use that
      column either, so it wouldn't bother me too much to delete it. Do you have
      any sense of how many pitches have this problem? I don't watch the updates
      very carefully, but I've noticed maybe one pitch per day. You think that
      this problem only started this year? I'm wondering if it's worth dumping
      all of 2013 data and reimporting without the cc column.

      As far as another solution goes, is the problem in reading from the xml
      file or is it writing into the database? I assume it's the latter. Maybe
      there's some way for Python to either delete or replace that character? I'm
      a serious novice with Python, but I'd be happy to look into it if there was
      some indication that a solution was possible.


      Errors when importing Gameday pitch data with commentshttps://sourceforge.net/p/baseballonastic/discussion/820145/thread/7d2ea91a/?limit=50#9714

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/baseballonastic/discussion/820145/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • Brian L Cartwright

        it's not necessary to reimport existing data, just delete the column (and it's data) from your existing table.

        The 2nd step is to tell BBOS to not attempt to import the column in the future, and Kyle might have the solution to that in modifying config.py (I am going to try that now)

        Brian

        From: Deez Nutz
        Sent: Monday, May 13, 2013 4:51 PM
        To: [baseballonastic:discussion]
        Subject: [baseballonastic:discussion] Re: Errors when importing Gameday pitch data with comments

        There is a way, I just don't have any time right now. I believe you also
        could just remove that column from the config file in
        src/bbos/config/gamedayConfig.py. I believe we are referring to the des
        field on the pitch tag. That should cause bbos to download without error
        and neither read nor write that field.

        On Mon, May 13, 2013 at 2:39 PM, Jeremy jerepierre@users.sf.net wrote:

        Thanks for the reply, Brian. That's one workaround. I don't use that
        column either, so it wouldn't bother me too much to delete it. Do you have
        any sense of how many pitches have this problem? I don't watch the updates
        very carefully, but I've noticed maybe one pitch per day. You think that
        this problem only started this year? I'm wondering if it's worth dumping
        all of 2013 data and reimporting without the cc column.

        As far as another solution goes, is the problem in reading from the xml
        file or is it writing into the database? I assume it's the latter. Maybe
        there's some way for Python to either delete or replace that character? I'm
        a serious novice with Python, but I'd be happy to look into it if there was
        some indication that a solution was possible.


        Errors when importing Gameday pitch data with commentshttps://sourceforge.net/p/baseballonastic/discussion/820145/thread/7d2ea91a/?limit=50#9714
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/baseballonastic/discussion/820145/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/baseballonastic/discussion/820145/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

         
      • Brian L Cartwright

        Correcting my previous comment, you would want to reload 2013 if you are missing pitches that weren't saved because of the encoding error.

        gamedayConfig.py has a list of the various fields found in the Gameday xml. Kyle can correct me if I'm wrong, but to download and save a field, it must be in the xml, in the config file, and in the database (the config file loads a list of Gameday fields into a named array, the array being written to the MySQL db).

        So to not download 'cc' or 'mt', delete if from parser_inning_pitch in the config file.

        If in the future Gameday adds a new column that you would like to download, add it to the appropriate parser list, and add a column to the db table (the trick would be knowing, depending on which xml file the field is in, which parser string and which db table to add it to).

        Brian

        From: Deez Nutz
        Sent: Monday, May 13, 2013 4:51 PM
        To: [baseballonastic:discussion]
        Subject: [baseballonastic:discussion] Re: Errors when importing Gameday pitch data with comments

        There is a way, I just don't have any time right now. I believe you also
        could just remove that column from the config file in
        src/bbos/config/gamedayConfig.py. I believe we are referring to the des
        field on the pitch tag. That should cause bbos to download without error
        and neither read nor write that field.

        On Mon, May 13, 2013 at 2:39 PM, Jeremy jerepierre@users.sf.net wrote:

        Thanks for the reply, Brian. That's one workaround. I don't use that
        column either, so it wouldn't bother me too much to delete it. Do you have
        any sense of how many pitches have this problem? I don't watch the updates
        very carefully, but I've noticed maybe one pitch per day. You think that
        this problem only started this year? I'm wondering if it's worth dumping
        all of 2013 data and reimporting without the cc column.

        As far as another solution goes, is the problem in reading from the xml
        file or is it writing into the database? I assume it's the latter. Maybe
        there's some way for Python to either delete or replace that character? I'm
        a serious novice with Python, but I'd be happy to look into it if there was
        some indication that a solution was possible.


        Errors when importing Gameday pitch data with commentshttps://sourceforge.net/p/baseballonastic/discussion/820145/thread/7d2ea91a/?limit=50#9714
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/baseballonastic/discussion/820145/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/baseballonastic/discussion/820145/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

         
        • Brian L Cartwright

          After deleting 'cc & 'mt' from gamdeayConfig.py, I deleted all games within the past 120 days and am now downloading again - the games are coming in fast & clean, no error messages.

          Brian

          From: Brian L Cartwright
          Sent: Monday, May 13, 2013 6:11 PM
          To: [baseballonastic:discussion]
          Subject: [baseballonastic:discussion] Re: Errors when importing Gameday pitch data with comments

          Correcting my previous comment, you would want to reload 2013 if you are missing pitches that weren't saved because of the encoding error.

          gamedayConfig.py has a list of the various fields found in the Gameday xml. Kyle can correct me if I'm wrong, but to download and save a field, it must be in the xml, in the config file, and in the database (the config file loads a list of Gameday fields into a named array, the array being written to the MySQL db).

          So to not download 'cc' or 'mt', delete if from parser_inning_pitch in the config file.

          If in the future Gameday adds a new column that you would like to download, add it to the appropriate parser list, and add a column to the db table (the trick would be knowing, depending on which xml file the field is in, which parser string and which db table to add it to).

          Brian

          From: Deez Nutz
          Sent: Monday, May 13, 2013 4:51 PM
          To: [baseballonastic:discussion]
          Subject: [baseballonastic:discussion] Re: Errors when importing Gameday pitch data with comments

          There is a way, I just don't have any time right now. I believe you also
          could just remove that column from the config file in
          src/bbos/config/gamedayConfig.py. I believe we are referring to the des
          field on the pitch tag. That should cause bbos to download without error
          and neither read nor write that field.

          On Mon, May 13, 2013 at 2:39 PM, Jeremy jerepierre@users.sf.net wrote:

          Thanks for the reply, Brian. That's one workaround. I don't use that
          column either, so it wouldn't bother me too much to delete it. Do you have
          any sense of how many pitches have this problem? I don't watch the updates
          very carefully, but I've noticed maybe one pitch per day. You think that
          this problem only started this year? I'm wondering if it's worth dumping
          all of 2013 data and reimporting without the cc column.

          As far as another solution goes, is the problem in reading from the xml
          file or is it writing into the database? I assume it's the latter. Maybe
          there's some way for Python to either delete or replace that character? I'm
          a serious novice with Python, but I'd be happy to look into it if there was
          some indication that a solution was possible.


          Errors when importing Gameday pitch data with commentshttps://sourceforge.net/p/baseballonastic/discussion/820145/thread/7d2ea91a/?limit=50#9714
          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/baseballonastic/discussion/820145/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/


          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/baseballonastic/discussion/820145/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/


          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/baseballonastic/discussion/820145/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

           
  • Tony

    Tony - 2014-03-08

    Deleting 'cc & 'mt' from gamdeayConfig.py seemed to resolve my issues as well, although it was causing the load to hang on specific games (and not the same issues posted above)

     
  • Deez Nutz

    Deez Nutz - 2014-03-08

    The unicode errors with commentary fields mentioned here should no longer happen in 6.0

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks