From: <zi...@al...> - 2004-11-29 20:37:01
|
Stefan Siegl írta: > see attached patch below. In case you're in need of a complete file, > >it's available from [1]. > > I got it. >It'd be nice if you could give it a try and test whether it's output >is okay. > Now testing, will take some time port.hu listings are slow. Meanwhile, I have subscribed to xml-devel. Seems a nice community, responses are fast, so you really deserve nice users w/o extra troubles (i.e., asking for CC-ed, etc.) >Comparing the website to the output is a little bit painful > > I use to do this with tvime and a web browser :-)) >(I've compared only a couple of channels per source so far). > >Especially, are there some guys, using the Romanian grabber, here? > > Not me, unfortunately. >Your port.ro page is even more distorted. There are images after >some shows, however the admins prefer to begin a new <table> after >each image instead of surrounding each with a <td> ... strange. >Would be nice if you could compare some pages as well - even so it >should be okay .. > > >Since the grabber's sources provide three days of data per page I > > >furthermore changed it to increase by three days in the bumping >function nextday(). > Seems suspicious to me. The URL-s used are the follwoings. e.g. for station "mtv" on today ("mtv" stands for "Magyar Televízió", not for "Music Televison"): http://www.port.hu/pls/tv/tv.prog?i_days=1&i_ch_nr=1&i_ch=1 i_days is the day offset from current date, current day has an offset 1 (one), beware, the page always shows ONE DAY i_ch is the channel identifier i_ch_nr is number of channels per page. It is used for customizable programme pages. Usign a web browser and the usual combobox on the web page, i_ch_nr is equal or bigger than 3, that is, displaying channals (i_ch)th, (i_ch+1)th and (i_ch+2)th, to speed up things. Three channel programmes fit nicely on 1024x768 screens. Gabor >Seems to me that it works, and speeds up things >a lot ... > > >@{Ed,Robert}: As I'm not the maintainer of tv_grab_huro, is it okay >for you that I commit the patch to CVS? > > >best regards, > Stef@n > > > >[1] http://home.vr-web.de/stefan-siegl/xmltv/tv_grab_huro/ > > > >------------------------------------------------------------------------ > >? new >? new2 >? new3 >? new3s >? new4 >? new4s >? new5 >? new5s >? new6 >? new6s >? new7 >? out >? patch >? test >? tv_grab_huro.cache >Index: tv_grab_huro >=================================================================== >RCS file: /cvsroot/xmltv/xmltv/grab/huro/tv_grab_huro,v >retrieving revision 1.6 >diff -u -5 -r1.6 tv_grab_huro >--- tv_grab_huro 19 Sep 2004 14:16:01 -0000 1.6 >+++ tv_grab_huro 29 Nov 2004 19:37:21 -0000 >@@ -322,11 +322,11 @@ > } > > # Make list of pages to fetch for each day. > my @days; > my $day=UnixDate($now,'%Q'); >-for (my $i=1+$opt_offset;$i<$opt_days+$opt_offset+1;$i++) { >+for (my $i=1+$opt_offset;$i<$opt_days+$opt_offset+1;$i+=3) { > push @days, [ $day, $i ]; > $day=nextday($day); die if not defined $day; > } > > # This progress bar is for both downloading and parsing. Maybe >@@ -393,11 +393,89 @@ > }; > > # parse the page to a document object > my $tree = HTML::TreeBuilder->new(); > $tree->parse($data); >- my @program_data = get_program_data($tree); >+ >+ my @datatables; >+ # page consists of two main tables, split by an advertisement >+ # we need to reorder those tables, to grab continued column by column >+ # >+ # actually we assign to @datatables like this: >+ >+ # UPPER MAJOR TABLE: >+ # 0 10 20 >+ # 1 11 21 >+ # 12 >+ # >+ # <<< the ad >>> >+ # LOWER MAJOR TABLE: >+ # 5 15 25 >+ # 16 >+ # >+ >+ my $i = 0; >+ my $lasttime = 0; >+ # assign to @datatables in order: 0, 2, 4, 1, 3, 5, etc. >+ foreach my $tab($tree->look_down >+ # "width"=>215 unfortunately isn't specified all the time >+ ("_tag"=>"table", "cellspacing"=>2)) { >+ my $width = $tab->attr(qw(width)); >+ next unless($width eq "100%" || $width == 215); >+ >+ # time is printed in <strong /> tags, require those to skip the >+ # headings - which we don't really care for ... >+ next unless($tab->look_down("_tag"=>"strong")); >+ >+ >+ # especially on port.ro there aren't only two tables per day column, >+ # there are even more, split by images, etc. >+ # >+ # why the hell don't they continue the table and put the image >+ # right into a <tr><td> <img> </td></tr> thingy?? >+ # >+ # tsts... >+ # >+ >+ >+ # extract the first time specified in this table-piece ... >+ $tab->as_text() =~ m/([012][0-9]):([0-5][0-9])/ >+ or die "unable to parse returned html page"; >+ my $time = $1 * 60 + $2; >+ $time += 24 * 60 if($time < 6 * 60 && ($i % 10 > 4)); >+ >+ #print "this: $time, last: $lasttime ...\n"; >+ >+ if($time < $lasttime) { >+ # this table is in the same major table, but in the next column >+ # since it's first time is before the last time of the prev. tab. >+ $i = $i - ($i % 5) + 10; >+ } >+ >+ if($time > 19 * 60 && ($i % 10 < 5)) { >+ # first time time's after 19 o'clock => lower table >+ $i = 5; >+ } >+ >+ >+ # lookup last time in this minor table ... (as base for comparing) >+ $tab->as_text() =~ m/.*([012][0-9]):([0-5][0-9])/ >+ or die "unable to parse returned html page"; >+ $lasttime = $1 * 60 + $2; >+ $lasttime += 24 * 60 if($lasttime < 6 * 60 && ($i % 10 > 4)); >+ >+ >+ #print "assigning datatables entry ", $i, ".\n"; >+ #$tab->dump(); >+ $datatables[$i++] = $tab; >+ } >+ >+ my @program_data; >+ foreach(@datatables) { >+ push @program_data, get_program_data($_) >+ if defined; >+ } > > if (not @program_data) { > warn "no programs found, skipping\n"; > return (); > } >@@ -586,9 +664,9 @@ > > # Bump a YYYYMMDD date by one. > sub nextday( $ ) { > my $d = shift; > my $p = parse_date($d); >- my $n = DateCalc($p, '+ 1 day'); >+ my $n = DateCalc($p, '+ 3 day'); > return UnixDate($n, '%Q'); > } > > > |