#94 tv_grab_es_laguiatv extremely slow downloading data

none
closed
nobody
5
2014-04-07
2010-12-29
murrayf
No

Please, consider a code revision of tv_grab_es_laguiatv because downloading 10/15 channel data for 2 days is a very hard task. Maybe it depends on the website which script retrieve data but If possible speed it up.

Thanks,

Alberto.

Discussion

  • Chris Owen
    Chris Owen
    2011-01-10

    Hi there,

    Ive had a quick look at the code and the parsing of the web pages could definitely be more efficient. I will try to get some time to look at it, but please be patient as I am very busy right now.

    Chris

     
  • murrayf
    murrayf
    2011-01-10

    Thank you very much candu, take the time you need...

    :)

     
  • Chris Owen
    Chris Owen
    2011-01-11

    I have checked in two changes to do with getting icons:

    * icon url grabbing is done with string functions rather than html tree functions
    * icon urls can be grabbed and cached when running --configure

    To compare performance I am doing --days 2 on configs with all channels. Here are grabbing times for the three setups on my machine:

    original: 19m14s
    new icon grabber: 14m35s
    cached icons: 12m51s

    Time to grab schedule is almost halved if you are prepared to run --configure again. Even without this it is still much faster.

    I have not decided yet whether to rework the schedule grab in a similar way as using string functions is more likely to go wrong if the site changes.

    If you can get the latest from CVS and test using it for your situation it would help a lot.

    Chris

     

  • Anonymous
    2011-01-12

    Hi.
    I've update via CVS and this are my times:

    9 channels, 5 days

    original: 39m7.815s
    new icon grabber: 37m55.498s
    cached icons: 37m36.626s

    In my case there is no much difference in times.

    Thanks.

     

  • Anonymous
    2011-01-12

    Hi, again.
    By the way, with no descriptions the times are very similar:

    9 channels, 5 days

    original: 2m27.771s
    new icon grabber: 2m11.791s
    cached icons: 2m6.151s

    kralizeck

     
  • Chris Owen
    Chris Owen
    2011-01-12

    Thanks for the feedback!

    Getting descriptions is extremely slow, thats why its an option with a warning and defaulted to no. What it needs to do for this is do a separate get & parse for every show.

    I dont think I can get around this since its dependent on the the site setup (AFAICS they don't include any description in the table, you need to go to another page for each show).

    I will look into how much time is spent getting and how much is spent parsing to see if rewriting would help.

    Chris

     
  • murrayf
    murrayf
    2011-01-12

    Helllo again,

    I've tested new modifications in the script using tv_grab_combiner and tv_grab_es_laguiatv alone using a java program I use to use for retrieiving data with xmltv (http://www.artificialworlds.net/freeguide/Main/HomePage) with descriptions. Speed maybe is faster (thx candu for your interest) but there is something with this grabber not working well. For spanish tv is possible to use another grabber tv_grab_es_miguiatv which works pretty well but its website doesn't publish all channels laguiatv does (http://www.miguiatv.com/todos-los-canales). Maybe its code contains something inspiring for better results. ;) Any way thanks a lot...
    ... and let me comment another issue with laguiatv (miguiatv same error probably): channels names using spanish tilde (vowels like áéíóú) are not shown correctly. I don't remember exactly how they appear but something like {aoute} instead of á or {eoute} instead or é. Maybe this can be also fixed.
    :)
    Alberto.

     
  • Hi,

    Can you explain what is "not working well"? I haven't used the grabber for a few years, so I don't see issues myself. Is it just the speed thing or do you have other issues? If you have other problems please submit bugs for them or if you can, submit a patch that fixes it.

    I will soon check in a change to speed up descriptions as much as I can. As I said before, the problem comes from the way the site is set up, so there is only so much I can do.

    Chris

     
  • murrayf
    murrayf
    2011-01-16

    Ok, maybe the grabber does its job, but I think it is not very practical for downloading many data. If the problem is the website. There's no more to talk about.

    Thanks,

    Alberto.

     
  • I have checked in a change to description grabbing code that seems much faster to me, it also did a better job of the various types of descriptions in my test.

    Please test and let me know if it is better for you. Generally critical comments are counter-productive - I am much more likely to continue to help if you give accurate feedback and I think my work is appreciated.

    Chris

     

  • Anonymous
    2011-01-17

    Hi.
    I've updated with cvs and the tiem is similar (even worst this time):

    original: 39m7.815s
    prev. cvs ver.: 37m36.626s
    actual cvs: 40m37.435s

    All with 9 channels, 5 days.
    I tested it twice to be sure. Same result.

    Thanks for the support.

    kralizeck

     
  • murrayf
    murrayf
    2011-01-17

    Hello,

    I've tested latest code satisfiying all perl dependencies and results invoking tv_grab_es_laguiatv from terminal are not good.

    Yesterday: 25 Channels, 2 days, getting descriptions --> stopped at 50% 35 minutes.
    Today: 25 channels, 1 day, getting descriptions --> stopped at 35% 40 minutes.

    Thanks,

    Alberto.

     
  • Chris Owen
    Chris Owen
    2011-01-17

    It seems my latest change didnt make it into CVS, this is why you saw no improvement kralizeck. Really sorry to have wasted your time testing :(

    The change is in now, but I see Albertos lock-up so I will prioritze that first.

    @alberto - I created a new bug for the locking issue https://sourceforge.net/tracker/?func=detail&aid=3160229&group_id=39046&atid=424135

    Progress on the locking issue will be posted there.

    Chris

     
  • murrayf
    murrayf
    2011-01-17

    Sorry Chris,

    Maybe I didn't explain myself correctly, grabber does not lock up I stopped because the delay was excessive. Bug ID: 3160229 hasn't be considered.

    Alberto.

     

  • Anonymous
    2011-01-18

    Hi, Chris.
    Don't worry, the tests only took me a few minutes.

    I've update to new version and the times are much better:

    9 channels, 5 days

    es_laguiatv v1.16: 39m7.815s
    es_laguiatv v1.17: 37m36.626s
    es_laguiatv v1.18: 9m36.903s (6m13.525s the second time, 6m36.519s third)

    v1.17 and 1.8 with cached icons.

    I tested v1.18 three times... just to be sure ;-)

    Tell me if you want me to do more tests.

    Very good job, Chris!!

    Many thanks.

    kralizeck

     
  • murrayf
    murrayf
    2011-01-18

    Yes, after upgrading to 1.18 changes are big (25 channels, 2 days, 15 minutes). In my opinion issue can be considered solved.
    Congratulations for your work Chris and thanks.

    Alberto.

    :)

     
  • Geoff
    Geoff
    2014-04-06

    • status: open --> closed
    • Group: --> Next Release (example)
     
  • Geoff
    Geoff
    2014-04-06

    A replacement tv_grab_es_laguiatv grabber (version 1.24) is available in CVS (will be in XMLTV release 0.5.65). This replaces the previous grabber which broke in January 2014 due to source website changes.

    This grabber (like many) will appear "slow" if you run it in real-time since it has to fetch a lot of pages to get all the information. In general you should only fetch data for channels which you actaully need (i.e. don't get all channels if you don't need to ). Also you should try to run this as a scheduled job at a convenient time so you don't have to sit and wait for it.