#71 Cannot fetch listings with tv_grab_fi

closed-accepted
tv_grab_fi (2)
5
2010-12-04
2010-11-26
No

Due to a recent change to telkku.com, listings in Finland are broken.

getting list of channels: 0% [ ]could not fetch http://www.telkku.com/telkku?tila=knvt&kan=149, error: 404 Not Found, aborting

Discussion

  • Stefan Becker

    Stefan Becker - 2010-11-27

    Yup, looks like it. I'll have a look at the changes today.

     
  • Stefan Becker

    Stefan Becker - 2010-11-27

    One bad news: telkku.com has re-numbered the channels. So a simple update of tv_grab_fi will not be enough. You'll have to update your configuration or run XMLTV configure again.

     
  • Stefan Becker

    Stefan Becker - 2010-11-27

    There were more changes required than I anticipated, but it is working again:

    - URL format changed
    - channel list HTML code changed
    - program info HTML code changed
    - handle UTF-8 encoded web pages
    - make sure to convert Unicode characters not compatible with XMLTV's ISO-8859-1 encoding (currently U+2013, U+2019, U+201D)

    I can't add files to this bug report, so it has to be via pastebin:

    Updated script: <http://pastebin.com/Qfyr9iDh>
    Diffs for review: <http://pastebin.com/MmLXCVDg>

    Ville: I could commit the changes directly from my CVS repository if I would get write access.

     
  • Petri Airio

    Petri Airio - 2010-11-27

    Both your pastebin links gives some 502 Bad Gateway error...

     
  • Stefan Becker

    Stefan Becker - 2010-11-27

    It seems http://pastebin.com is down. Please try again later.

     
  • Stefan Becker

    Stefan Becker - 2010-11-27

    I have uploaded the files to an alternative service:

    Script: <http://filebin.ca/zotny/tv_grab_fi>
    Patch: <http://filebin.ca/hdrmjb/tv_grab_fi.patch>

    I hope this works better.

     
  • Ville Ahonen

    Ville Ahonen - 2010-11-27

    Hi,

    Stefan, thanks again for the swift response; I'll try out your patch and commit the changes as soon as pastebin starts working again. I'll also see that you get write access to cvs asap (I don't have permission to give write access myself but will contact the admins about it).

     
  • Karl Dietz

    Karl Dietz - 2010-11-28

    The data contains duplicate programs, it seems the site is providing the program at the border of days on both days.
    Please verify and leave out either the first or last program on each day. (as the exact times vary between channels there's no other logic I could come up with quickly)

     
  • Karl Dietz

    Karl Dietz - 2010-11-28
    • labels: 724077 --> tv_grab_fi
    • status: open --> open-accepted
     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2010-11-29

    Here is a small diff that makes tv_grab_fi output utf-8 instead of latin1 now that the source data is also in utf-8

    http://pastebin.com/PSCgHhXH

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2010-11-29

    Now that the source data is in utf-8 the tv_grab_fi should pass the data through as is without hacking it to iso-8859-1
    Example diff:
    http://pastebin.com/PSCgHhXH

     
  • Stefan Becker

    Stefan Becker - 2010-11-29

    Ahh, I was wondering if XMLTV supports UTF-8. Thanks.

     
  • Stefan Becker

    Stefan Becker - 2010-11-29

    Here is the updated version: <http://filebin.ca/mpsmxv/tv_grab_fi>

    - applied UTF-8 patch with fixes:
    * reverted decode_utf8 removal, as XMLTV only seems to read octects, even if the web page is UTF-8 encoded
    * set utf-8 encoding on output XML file
    - As we no longer tidy the program data, we now have to process the config file as UTF-8 encoded file (example: "70's show" on "TV Viisi/The Voice") otherwise the user won't be able to specify a "series description",
    - Drop last program entry from each page, as telkku.com now puts the last show of the day also at the start of the next day too. This avoids duplicate programme entries.

     
  • Karl Dietz

    Karl Dietz - 2010-12-01

    there is one small issue left with the encoding (as found by
    http://www.crustynet.org.uk/~xmltv-tester/squeeze/nightly/0/result.html#tv_grab_fi )
    The output gets encoded as utf-8 only when writing directly to a file (--output tv.xml) but not when outputting to STDOUT and redirecting to a file (> tv.xml)
    The tester is complaining about iso-8859-1 encoding in the redirected output when it's expecting utf-8 encoding.

     
  • Stefan Becker

    Stefan Becker - 2010-12-01

    *sigh* I had been wondering about the encoding handling of the default XMLTV::Writer output. But as I didn't see any "wide character" warnings in my tests any more, I thought XML::Writer would set the file encoding automatically. Appearently MythTV uses the --output option, or otherwise I would have seen those warnings during my test runs.

    Well it's easy to fix.

     
  • Nobody/Anonymous

    I'm getting programs listed one day too late when the first program in the scraped page is from the day before. Example: http://telkku.com/channel/list/1/20101204 . Changing tv_grab_fi line 529: "pop(@data) if @data;" that removes the last program to "shift(@data) if @data;" fixes the problem, but possibly causes others.

     
  • Stefan Becker

    Stefan Becker - 2010-12-03

    @nobody: this is another problem which is tracked in bug #3125542. Your solution proposal isn't correct either.

    @va1210: please close this bug.

     
  • Ville Ahonen

    Ville Ahonen - 2010-12-04
    • status: open-accepted --> closed-accepted
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks