[xmltv-commit] CVS: xmltv tv_extractinfo_en,NONE,1.1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Update of /cvsroot/xmltv/xmltv
In directory usw-pr-cvs1:/tmp/cvs-serv23236

Added Files:
	tv_extractinfo_en 
Log Message:
Added tv_extractinfo_en, which reads English-language programme
descriptions and attempts to sniff out information which could better
be stored in machine-readable form.  This is mostly code which used to
live in the old scrapped_getlistings_uk_ananova in the attic/
directory, I've just ported it to the new data structures and tidied
it up.

This sort of regular expression matching works well on the long
detailed descriptions Ananova provides.  It's not so good on the North
American listings because they have shorter descriptions.  But it did
manage to extract the names of quiz show hosts.

--- NEW FILE: tv_extractinfo_en ---
#!/usr/bin/perl -w
#
# tv_extractinfo_en
#
# Look at programme descriptions and other text, and extract
# information from the textual descriptions into subelements of
# <programme>.  This tv_extractinfo handles English-language
# descriptions.
#
# It also attempts to split multipart programmes into their
# constituents, by looking for a description that seems to contain
# lots of times and titles.  But this depends on the description
# following the particular style used by Ananova.  If I find more
# examples of listings with multipart programmes it can be extended.
#
# -- Ed Avis, ep...@do..., 2002-01-31
# $Id: tv_extractinfo_en,v 1.1 2002/01/31 15:39:31 epaepa Exp $
#

[...1409 lines suppressed...]
    }
}

# More debugging aids.
sub cst( $ ) {
    my $p = shift;
    croak "prog $p->{title}->[0]->[0] has bogus stop time"
      if exists $p->{stop} and $p->{stop} eq 'boogus FIXME XXX';
}

sub no_shared_scalars( $ ) {
    my %seen;
    foreach my $h (@{$_[0]}) {
	foreach my $k (keys %$h) {
	    my $ref = \ ($h->{$k});
	    my $addr = "$ref";
	    $seen{$addr}++ && die "scalar $addr seen twice";
	}
    }
}

[xmltv-commit] CVS: xmltv tv_extractinfo_en,NONE,1.1

XMLTV obtains and processes TV listings data

[xmltv-commit] CVS: xmltv tv_extractinfo_en,NONE,1.1