You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(12) |
Sep
(25) |
Oct
(1) |
Nov
|
Dec
|
|
From: <ik...@us...> - 2009-09-05 09:35:19
|
Revision: 45
http://webfetch.svn.sourceforge.net/webfetch/?rev=45&view=rev
Author: ikluft
Date: 2009-09-05 09:35:07 +0000 (Sat, 05 Sep 2009)
Log Message:
-----------
expand docs, expand error reporting, streamline debugging, add AUTOLOAD
Modified Paths:
--------------
branches/v0.13/lib/WebFetch.pm
Modified: branches/v0.13/lib/WebFetch.pm
===================================================================
--- branches/v0.13/lib/WebFetch.pm 2009-08-29 00:38:38 UTC (rev 44)
+++ branches/v0.13/lib/WebFetch.pm 2009-09-05 09:35:07 UTC (rev 45)
@@ -19,15 +19,28 @@
=head1 DESCRIPTION
-The WebFetch module is a general framework for downloading and saving
-information from the web, and for display on the web.
-It requires another module to inherit it and fill in the specifics of
-what and how to download.
-WebFetch provides a generalized interface for saving to a file
+The WebFetch module is a framework for downloading and saving
+information from the web, and for saving or re-displaying it.
+It provides a generalized interface for saving to a file
while keeping the previous version as a backup.
-This is expected to be used for periodically-updated information
-which is run as a cron job.
+This is mainly intended for use in a cron-job to acquire
+periodically-updated information.
+WebFetch allows the user to specify a source and destination, and
+the input and output formats. It is possible to write new Perl modules
+to the WebFetch API in order to add more input and output formats.
+
+The currently-provided input formats are Atom, RSS, WebFetch "SiteNews" files
+and raw Perl data structures.
+
+The currently-provided output formats are RSS, WebFetch "SiteNews" files,
+the Perl Template Toolkit, and export into a TWiki site.
+
+Some modules which were specific to pre-RSS/Atom web syndication formats
+have been deprecated. Those modules can be found in the CPAN archive
+in WebFetch 0.10. Those modules are no longer compatible with changes
+in the current WebFetch API.
+
=head1 INSTALLATION
After unpacking and the module sources from the tar file, run
@@ -63,41 +76,19 @@
=head2 SETTING UP CRONTAB ENTRIES
-First of all, if you don't have crontab access or don't know what they are,
-contact your site's system administrator(s). Only local help will do any
-good on local-configuration issues. No one on the Internet can help.
-(If you are the administrator for your system, see the crontab(1) and
-crontab(5) manpages and nearly any book on Unix system administration.)
+If needed, see the manual pages for crontab(1), crontab(5) and any
+web sites or books on Unix system administration.
-Since the WebFetch command lines are usually very long, you may prefer
-to make one or more scripts as front-ends so your crontab entries aren't
-so huge.
+Since WebFetch command lines are usually very long, the user may prefer
+to make one or more scripts as front-ends so crontab entries aren't so big.
-Do not run the crontab entries too often - be a good net.citizen and
-do your updates no more often than necessary.
-Popular sites need their users to refrain from making automated
-requests too often because they add up on an enormous scale
-on the Internet.
-Some sites such as Freshmeat prefer no shorter than hourly intervals.
-Slashdot prefers no shorter than half-hourly intervals.
-When in doubt, ask the site maintainers what they prefer.
+Try not to run crontab entries too often - be aware if the site you're
+accessing has any resource constraints, and how often their information
+gets updated. If they request users not to access a feed more often
+than a certain interval, respect it. (It isn't hard to find violators
+in server logs.) If in doubt, try every 30 minutes until more information
+becomes available.
-(Then again, there are a very few sites like Yahoo and CNN who don't
-mind getting the extra hits if you're going to create links to them.
-Even so, more often than every 20 minutes would still be excessive
-to the biggest web sites.)
-
-=head2 SETTING UP SERVER-SIDE INCLUDES
-
-See the manual for your web server to make sure you have server-side include
-(SSI) enabled for the files that need it.
-(It's wasteful to enable it for all your files so be careful.)
-
-When using Apache HTTPD,
-a line like this will include a WebFetch-generated file:
-
-<!--#include file="fetch/slashdot.html"-->
-
=head1 WebFetch FUNCTIONS
The following function definitions assume B<C<$obj>> is a blessed
@@ -147,10 +138,10 @@
description => "unable to save: no data or nowhere to save it",
},
- 'WebFetch::Exception::NoInputHandler' => {
+ 'WebFetch::Exception::NoHandler' => {
isa => 'WebFetch::Exception',
- alias => 'throw_no_input_handler',
- description => "no input handler was found",
+ alias => 'throw_no_handler',
+ description => "no handler was found",
},
'WebFetch::Exception::MustOverride' => {
@@ -182,10 +173,16 @@
description => "no module was found to run the request",
},
+ 'WebFetch::Exception::AutoRunFailure' => {
+ isa => 'WebFetch::TracedException',
+ alias => 'throw_autoload_fail',
+ description => "AUTORUN failed to handle function call",
+ },
+
);
# initialize class variables
-our $VERSION = '0.12';
+our $VERSION = '0.13-pre29';
our %default_modules = (
"input" => {
"rss" => "WebFetch::Input::RSS",
@@ -203,12 +200,14 @@
}
);
our %modules;
+our $AUTOLOAD;
my $debug;
-=item import( "param-name" => "value", ... )
+sub debug
+{
+ $debug and print STDERR "debug: ".join( " ", @_ )."\n";
+}
-=cut
-
=item WebFetch::module_register( $module, @capabilities );
This function allows a Perl module to register itself with the WebFetch API
@@ -223,9 +222,15 @@
The @capabilities array is any number of strings as needed to list the
capabilities which the module performs for the WebFetch API.
The currently-recognized capabilities are "cmdline", "input" and "output".
-"config" and "storage" are reserved for future use. The function will save
-all the capability names that the module provides.
+"config", "filter", "save" and "storage" are reserved for future use. The
+function will save all the capability names that the module provides, without
+checking whether any code will use it.
+For example, the WebFetch::Output::TT module registers itself like this:
+ C<__PACKAGE__->module_register( "cmdline", "output:tt" );>
+meaning that it defines additional command-line options, and it provides an
+output format handler for the "tt" format, the Perl Template Toolkit.
+
=cut
sub module_register
@@ -258,10 +263,6 @@
}
}
-# satisfy POD coverage test - but don't put this function in the user manual
-=pod
-=cut
-
# module selection - choose WebFetch module based on selected file format
# for WebFetch internal use only
sub module_select
@@ -269,8 +270,7 @@
my $capability = shift;
my $is_optional = shift;
- $debug and print STDERR "debug: "
- ."module_select($capability,$is_optional)\n";
+ debug "module_select($capability,$is_optional)";
# parse the capability string
my ( $group, $topic );
if ( $capability =~ /([^:]*):(.*)/ ) {
@@ -320,13 +320,12 @@
}
}
- # check if any handlers were found for this input format
+ # check if any handlers were found for this format
if ( ! @handlers and ! $is_optional ) {
- throw_no_input_handler( "handler not found for $capability" );
+ throw_no_handler( "handler not found for $capability" );
}
- $debug and print STDERR "debug: module_select: "
- .join( " ", @handlers )."\n";
+ debug "module_select: ".join( " ", @handlers );
return @handlers;
}
@@ -341,7 +340,7 @@
{
my $group = shift;
- $debug and print STDERR "debug: singular_handler($group)\n";
+ debug "singular_handler($group)";
my $count = 0;
my ( $entry, $last );
foreach $entry ( keys %{$modules{$group}} ) {
@@ -358,8 +357,7 @@
}
# if there's only one registered, that's the one to use
- $debug and print STDERR "debug: singular_handler: "
- ."count=$count last=$last\n";
+ debug "singular_handler: count=$count last=$last";
return $count == 1 ? $last : undef;
}
@@ -432,11 +430,13 @@
and ( ref $modules{cmdline} eq "ARRAY" ))
{
foreach $cli_mod ( @{$modules{cmdline}}) {
- if ( defined @cli_mod::Options ) {
- push @mod_options, @cli_mod::Options;
+ if ( eval "defined \@{".$cli_mod."::Options}" ) {
+ eval "push \@mod_options,"
+ ."\@{".$cli_mod."::Options}";
}
- if ( defined @cli_mod::Usage ) {
- push @mod_options, @cli_mod::Usage;
+ if ( eval "defined \@{".$cli_mod."::Usage}" ) {
+ eval "push \@mod_options, \@{"
+ .$cli_mod."::Usage}";
}
}
}
@@ -470,7 +470,7 @@
if (( exists $options{debug}) and $options{debug}) {
$debug = 1;
}
- $debug and print STDERR "debug: fetch_main\n";
+ debug "fetch_main";
# if either source/input or dest/output formats were not provided,
@@ -511,7 +511,7 @@
# check if any handlers were found for this input format
if ( ! @handlers ) {
- throw_no_input_handler( "input handler not found for "
+ throw_no_handler( "input handler not found for "
.$options{source_format});
}
@@ -519,7 +519,7 @@
my $pkgname;
my $run_count = 0;
foreach $pkgname ( @handlers ) {
- $debug and print STDERR "debug: running for $pkgname\n";
+ debug "running for $pkgname";
eval { &WebFetch::run( $pkgname, \%options )};
if ( $@ ) {
print STDERR "WebFetch: run eval error: $@\n";
@@ -661,7 +661,7 @@
my $options_ref = shift;
my $obj;
- $debug and print STDERR "debug: entered run for $run_pkg\n";
+ debug "entered run for $run_pkg";
# make sure we have the run package loaded
mod_load $run_pkg;
@@ -677,7 +677,7 @@
# create the new object
# this also calls the $obj->fetch() routine for the module which
# has inherited from WebFetch to do this
- $debug and print STDERR "debug: run before new\n";
+ debug "run before new";
$obj = eval $run_pkg."->new( \%\$options_ref )";
if ( $@ ) {
throw_mod_run_failure( "module run failure: ".$@ );
@@ -686,7 +686,7 @@
# if the object had data for the WebFetch-embedding API,
# then data processing is external to the fetch routine
# (This externalizes the data for other software to capture it.)
- $debug and print STDERR "run before output\n";
+ debug "run before output";
my $dest_format = $obj->{dest_format};
if ( !exists $obj->{actions}) {
$obj->{actions} = {};
@@ -705,22 +705,26 @@
throw_no_save( "save failed: no data or nowhere to save it" );
}
- $debug and print STDERR "run before save\n";
+ debug "run before save";
my $result = $obj->save();
- # Old WebFetch pre-0.9 API code, should not be needed any more
- #if ( ! $result ) {
- # my $savable;
- # foreach $savable ( @{$obj->{savable}}) {
- # (ref $savable eq "HASH") or next;
- # if ( exists $savable->{error}) {
- # throw_save_error( "error saving in "
- # .$obj->{dir}
- # ."file: ".$savable->{file}
- # ."error: " .$savable->{error} );
- # }
- # }
- #}
+ # check for errors, throw exception to report errors per savable item
+ if ( ! $result ) {
+ my $savable;
+ my @errors;
+ foreach $savable ( @{$obj->{savable}}) {
+ (ref $savable eq "HASH") or next;
+ if ( exists $savable->{error}) {
+ push @errors, "file: ".$savable->{file}
+ ."error: " .$savable->{error};
+ }
+ }
+ if ( @errors ) {
+ throw_save_error( "error saving results in "
+ .$obj->{dir}
+ ."\n".join( "\n", @errors )."\n" );
+ }
+ }
return $result ? 0 : 1;
}
@@ -765,6 +769,10 @@
URL or file path (as appropriate) to the news source
+=item id
+
+unique identifier string for the entry
+
=item date
a date stamp,
@@ -942,6 +950,7 @@
sub do_actions
{
my ( $self ) = @_;
+ debug "in WebFetch::do_actions";
# we *really* need the data and actions to be set!
# otherwise assume we're in WebFetch 0.09 compatibility mode and
@@ -962,7 +971,6 @@
if ( exists $modules{output}{$action_spec}) {
my $class;
foreach $class ( @{$modules{output}{$action_spec}}) {
- print STDERR "can test on $class\n";
if ( $class->can( $action_handler )) {
$handler_ref = \&{$class."::".$action_handler};
last;
@@ -1258,6 +1266,24 @@
});
}
+=item $obj->no_savables_ok
+
+This can be used by an output function which handles its own intricate output
+operation (such as WebFetch::Output::TWiki). If the savables array is empty,
+it would cause an error. Using this function drops a note in it which
+basically says that's OK.
+
+=cut
+
+sub no_savables_ok
+{
+ my $self = shift;
+
+ push ( @{$self->{savable}}, {
+ 'ok_empty' => 1,
+ });
+}
+
=item $obj->save
This WebFetch utility function goes through all the entries in the
@@ -1330,6 +1356,11 @@
print STDERR "saving ".$savable->{file}."\n";
}
+ # an output module may have handled a more intricate operation
+ if ( exists $savable->{ok_empty}) {
+ last;
+ }
+
# verify contents of savable record
if ( !exists $savable->{file}) {
$savable->{error} = "missing file name - skipped";
@@ -1571,6 +1602,10 @@
return 1;
}
+=item $obj->wk2fname( $wk )
+
+=cut
+
# convert well-known name to field name
sub wk2fname
{
@@ -1604,6 +1639,10 @@
return undef;
}
+=item $obj->fname2fnum( $fname )
+
+=cut
+
# convert a field name to a field number
sub fname2fnum
{
@@ -1614,6 +1653,10 @@
? $self->{fname2fnum}{$fname} : undef;
}
+=item $obj->wk2fnum( $wk )
+
+=cut
+
# convert well-known name to field number
sub wk2fnum
{
@@ -1624,6 +1667,54 @@
? $self->{wk2fnum}{$wk} : undef;
}
+=item AUTOLOAD
+
+=cut
+
+# autoloader catches calls to unknown functions
+# first try: redirect to the class which made the call, if the function exists
+# second try: act as a read-only accessor for object data
+# (want a read/write accessor? define the function explicitly)
+sub AUTOLOAD
+{
+ my $self = shift;
+ my $type = ref($self) or throw_autoload_fail "self is not an object";
+
+ my $name = $AUTOLOAD;
+ $name =~ s/.*://; # strip fully-qualified portion
+
+ # skip all-caps special Perl functions
+ if ( $name =~ /^[A-Z]+$/ ) {
+ return;
+ }
+
+ # check for function in caller package
+ # (WebFetch may hand an input module's object to an output module)
+ my ( $package, $filename, $line ) = caller;
+ if ( $package->can( $name )) {
+ my $retval = eval $package."::".$name."( \$self, \@_ )";
+ if ( $@ ) {
+ my $e = Exception::Class->caught();
+ ref $e ? $e->rethrow
+ : throw_autoload_fail "failure in "
+ ."autoloaded function: ".$e;
+ }
+ return $retval;
+ }
+
+ # act as a read-only accessor
+ # add write accessors when API can specify what's OK to write
+ if ( exists $self->{$name}) {
+ # define the sub for better efficiency next time
+ eval "sub WebFetch::$name { return \$_[0]->{$name}; }";
+ return $self->{$name};
+ }
+
+ # if we got here, we failed
+ throw_autoload_fail "function $name not found - "
+ ."called by $package ($filename line $line)";
+}
+
1;
__END__
# remainder of POD docs follow
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-29 01:15:01
|
Revision: 43
http://webfetch.svn.sourceforge.net/webfetch/?rev=43&view=rev
Author: ikluft
Date: 2009-08-29 00:14:57 +0000 (Sat, 29 Aug 2009)
Log Message:
-----------
branch with major changes replaces old CVS structure on trunk
Added Paths:
-----------
trunk/
Property changes on: trunk
___________________________________________________________________
Added: svn:mergeinfo
+ /trunk:38
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-29 01:14:50
|
Revision: 42
http://webfetch.svn.sourceforge.net/webfetch/?rev=42&view=rev
Author: ikluft
Date: 2009-08-29 00:13:25 +0000 (Sat, 29 Aug 2009)
Log Message:
-----------
move old trunk from CVS conversion to branches
Added Paths:
-----------
branches/trunk-from-cvs-conversion/
Removed Paths:
-------------
trunk/
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-29 00:38:48
|
Revision: 44
http://webfetch.svn.sourceforge.net/webfetch/?rev=44&view=rev
Author: ikluft
Date: 2009-08-29 00:38:38 +0000 (Sat, 29 Aug 2009)
Log Message:
-----------
new branch for 0.13
Added Paths:
-----------
branches/v0.13/
Property changes on: branches/v0.13
___________________________________________________________________
Added: svn:mergeinfo
+ /trunk:38
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-29 00:02:41
|
Revision: 41
http://webfetch.svn.sourceforge.net/webfetch/?rev=41&view=rev
Author: ikluft
Date: 2009-08-29 00:02:26 +0000 (Sat, 29 Aug 2009)
Log Message:
-----------
move it back and try that again
Added Paths:
-----------
branches/v0.11/
Removed Paths:
-------------
trunk/v0.11/
Property changes on: branches/v0.11
___________________________________________________________________
Added: svn:mergeinfo
+ /trunk:38
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-29 00:00:08
|
Revision: 40
http://webfetch.svn.sourceforge.net/webfetch/?rev=40&view=rev
Author: ikluft
Date: 2009-08-28 23:59:58 +0000 (Fri, 28 Aug 2009)
Log Message:
-----------
branch with major changes replaces old CVS structure on trunk
Added Paths:
-----------
trunk/v0.11/
Removed Paths:
-------------
branches/v0.11/
Property changes on: trunk/v0.11
___________________________________________________________________
Added: svn:mergeinfo
+ /trunk:38
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-28 23:40:30
|
Revision: 39
http://webfetch.svn.sourceforge.net/webfetch/?rev=39&view=rev
Author: ikluft
Date: 2009-08-28 23:40:20 +0000 (Fri, 28 Aug 2009)
Log Message:
-----------
mark as merged from trunk - these major changes are intended to replace it
Property Changed:
----------------
branches/v0.11/
Property changes on: branches/v0.11
___________________________________________________________________
Added: svn:mergeinfo
+ /trunk:38
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-26 21:42:47
|
Revision: 38
http://webfetch.svn.sourceforge.net/webfetch/?rev=38&view=rev
Author: ikluft
Date: 2009-08-26 21:42:35 +0000 (Wed, 26 Aug 2009)
Log Message:
-----------
final cleanup of MANIFEST for 0.12
Modified Paths:
--------------
branches/v0.11/MANIFEST
Modified: branches/v0.11/MANIFEST
===================================================================
--- branches/v0.11/MANIFEST 2009-08-26 21:37:57 UTC (rev 37)
+++ branches/v0.11/MANIFEST 2009-08-26 21:42:35 UTC (rev 38)
@@ -4,7 +4,6 @@
README
TODO
Makefile.PL
-test.pl
lib/WebFetch.pm
lib/WebFetch/Input/Atom.pm
lib/WebFetch/Input/PerlStruct.pm
@@ -12,5 +11,3 @@
lib/WebFetch/Input/SiteNews.pm
lib/WebFetch/Output/Dump.pm
lib/WebFetch/Output/TT.pm
-misc/module-list
-misc/webfetch-pb.gif
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-26 21:38:16
|
Revision: 37
http://webfetch.svn.sourceforge.net/webfetch/?rev=37&view=rev
Author: ikluft
Date: 2009-08-26 21:37:57 +0000 (Wed, 26 Aug 2009)
Log Message:
-----------
clean up tests
Modified Paths:
--------------
branches/v0.11/MANIFEST
branches/v0.11/t/00-load.t
branches/v0.11/t/pod-coverage.t
Modified: branches/v0.11/MANIFEST
===================================================================
--- branches/v0.11/MANIFEST 2009-08-26 21:28:38 UTC (rev 36)
+++ branches/v0.11/MANIFEST 2009-08-26 21:37:57 UTC (rev 37)
@@ -6,9 +6,11 @@
Makefile.PL
test.pl
lib/WebFetch.pm
-lib/WebFetch/General.pm
-lib/WebFetch/ListSubs.pm
-lib/WebFetch/PerlStruct.pm
-lib/WebFetch/SiteNews.pm
+lib/WebFetch/Input/Atom.pm
+lib/WebFetch/Input/PerlStruct.pm
+lib/WebFetch/Input/RSS.pm
+lib/WebFetch/Input/SiteNews.pm
+lib/WebFetch/Output/Dump.pm
+lib/WebFetch/Output/TT.pm
misc/module-list
misc/webfetch-pb.gif
Modified: branches/v0.11/t/00-load.t
===================================================================
--- branches/v0.11/t/00-load.t 2009-08-26 21:28:38 UTC (rev 36)
+++ branches/v0.11/t/00-load.t 2009-08-26 21:37:57 UTC (rev 37)
@@ -1,13 +1,15 @@
#!perl -T
-use Test::More tests => 5;
+use Test::More tests => 7;
BEGIN {
use_ok( 'WebFetch' );
- use_ok( 'WebFetch::General' );
- use_ok( 'WebFetch::ListSubs' );
- use_ok( 'WebFetch::PerlStruct' );
- use_ok( 'WebFetch::SiteNews' );
+ use_ok( 'WebFetch::Input::Atom' );
+ use_ok( 'WebFetch::Input::PerlStruct' );
+ use_ok( 'WebFetch::Input::RSS' );
+ use_ok( 'WebFetch::Input::SiteNews' );
+ use_ok( 'WebFetch::Output::Dump' );
+ use_ok( 'WebFetch::Output::TT' );
}
diag( "Testing WebFetch $WebFetch::VERSION, Perl $], $^X" );
Modified: branches/v0.11/t/pod-coverage.t
===================================================================
--- branches/v0.11/t/pod-coverage.t 2009-08-26 21:28:38 UTC (rev 36)
+++ branches/v0.11/t/pod-coverage.t 2009-08-26 21:37:57 UTC (rev 37)
@@ -1,6 +1,6 @@
use strict;
use warnings;
-use Test::More skip_all => "not ready for coverage test - modernization in progress";
+use Test::More skip_all => "to-do";
# Ensure a recent version of Test::Pod::Coverage
my $min_tpc = 1.08;
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-26 21:28:48
|
Revision: 36
http://webfetch.svn.sourceforge.net/webfetch/?rev=36&view=rev
Author: ikluft
Date: 2009-08-26 21:28:38 +0000 (Wed, 26 Aug 2009)
Log Message:
-----------
clean up
Modified Paths:
--------------
branches/v0.11/lib/WebFetch/Input/Atom.pm
branches/v0.11/lib/WebFetch/Input/PerlStruct.pm
branches/v0.11/lib/WebFetch/Input/RSS.pm
Modified: branches/v0.11/lib/WebFetch/Input/Atom.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/Atom.pm 2009-08-26 21:27:09 UTC (rev 35)
+++ branches/v0.11/lib/WebFetch/Input/Atom.pm 2009-08-26 21:28:38 UTC (rev 36)
@@ -28,7 +28,7 @@
# no user-servicable parts beyond this point
# register capabilities with WebFetch
-__PACKAGE__->module_register( "input:atom" );
+__PACKAGE__->module_register( "cmdline", "input:atom" );
# called from WebFetch main routine
sub fetch
@@ -46,7 +46,6 @@
# set up Webfetch Embedding API data
$self->{data} = {};
- $self->{actions} = {};
$self->{data}{fields} = [ "id", "updated", "title", "author", "link",
"summary", "content", "xml" ];
# defined which fields match to which "well-known field names"
@@ -102,8 +101,8 @@
# save the data record
my $id = extract_value( $entry->id() );
my $title = extract_value( $entry->title() );
- my $author = extract_value( $entry->author() );
- my $link = extract_value( $entry->link() );
+ my $author = extract_value( $entry->author->name );
+ my $link = extract_value( $entry->link->href );
my $updated = extract_value( $entry->updated() );
my $summary = extract_value( $entry->summary() );
my $content = extract_value( $entry->content() );
Modified: branches/v0.11/lib/WebFetch/Input/PerlStruct.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/PerlStruct.pm 2009-08-26 21:27:09 UTC (rev 35)
+++ branches/v0.11/lib/WebFetch/Input/PerlStruct.pm 2009-08-26 21:28:38 UTC (rev 36)
@@ -59,7 +59,8 @@
push ( @content_links, $subparts );
}
- # TODO: build data and actions structures
+ # build data structure
+ $self->{data} = {};
}
Modified: branches/v0.11/lib/WebFetch/Input/RSS.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/RSS.pm 2009-08-26 21:27:09 UTC (rev 35)
+++ branches/v0.11/lib/WebFetch/Input/RSS.pm 2009-08-26 21:28:38 UTC (rev 36)
@@ -46,7 +46,6 @@
# set up Webfetch Embedding API data
$self->{data} = {};
- $self->{actions} = {};
$self->{data}{fields} = [ "pubDate", "title", "link", "category",
"description" ];
# defined which fields match to which "well-known field names"
@@ -55,7 +54,7 @@
"url" => "link",
"date" => "pubDate",
"summary" => "description",
- "category" => "category"
+ "category" => "category",
};
$self->{data}{records} = [];
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-26 21:27:23
|
Revision: 35
http://webfetch.svn.sourceforge.net/webfetch/?rev=35&view=rev
Author: ikluft
Date: 2009-08-26 21:27:09 +0000 (Wed, 26 Aug 2009)
Log Message:
-----------
expand get and save functions, clean up
Modified Paths:
--------------
branches/v0.11/lib/WebFetch.pm
Modified: branches/v0.11/lib/WebFetch.pm
===================================================================
--- branches/v0.11/lib/WebFetch.pm 2009-08-25 22:19:47 UTC (rev 34)
+++ branches/v0.11/lib/WebFetch.pm 2009-08-26 21:27:09 UTC (rev 35)
@@ -113,6 +113,7 @@
use Getopt::Long;
use LWP::UserAgent;
use HTTP::Request;
+use Date::Calc;
use Data::Dumper;
# define exceptions/errors
@@ -143,7 +144,7 @@
'WebFetch::Exception::NoSave' => {
isa => 'WebFetch::Exception',
alias => 'throw_no_save',
- description => "unable to save data because of no data or nowhere to save it",
+ description => "unable to save: no data or nowhere to save it",
},
'WebFetch::Exception::NoInputHandler' => {
@@ -183,6 +184,7 @@
);
+# initialize class variables
our $VERSION = '0.12';
our %default_modules = (
"input" => {
@@ -221,8 +223,8 @@
The @capabilities array is any number of strings as needed to list the
capabilities which the module performs for the WebFetch API.
The currently-recognized capabilities are "cmdline", "input" and "output".
-However, the function will save all the capability names that the module
-provides.
+"config" and "storage" are reserved for future use. The function will save
+all the capability names that the module provides.
=cut
@@ -449,6 +451,7 @@
"source_format:s",
"dest=s",
"dest_format:s",
+ "fetch_urls",
"quiet",
"debug",
@mod_options ) };
@@ -459,7 +462,8 @@
."[--group group] [--mode mode] "
."[--source file] [--source_format fmt-string] "
."[--dest file] [--dest_format fmt-string] "
- ."[--quiet] ".join( " ", @mod_usage ));
+ ."[--fetch_urls] [--quiet] "
+ .join( " ", @mod_usage ));
}
# set debugging mode
@@ -470,7 +474,7 @@
# if either source/input or dest/output formats were not provided,
- # check if only one handler is defined - if so that's the one to use
+ # check if only one handler is registered - if so that's the default
if ( !exists $options{source_format}) {
if ( my $fmt = singular_handler( "input" )) {
$options{source_format} = $fmt;
@@ -550,7 +554,7 @@
# go fetch the data
# this function must be provided by a derived module
- if (( ! defined $self->{no_fetch}) or ! $self->{no_fetch}) {
+ if (( ! exists $self->{no_fetch}) or ! $self->{no_fetch}) {
$self->fetch();
}
@@ -657,34 +661,6 @@
my $options_ref = shift;
my $obj;
- #my ( $obj, $dir, $group, $mode,
- # $dest, $dest_format,
- # $quiet, $source, $source_format );
- #my $result = GetOptions (
- # "dir=s" => \$dir,
- # "group:s" => \$group,
- # "mode:s" => \$mode,
- # "dest:s" => \$dest,
- # "dest_format:s" => \$dest_format,
- # "source:s" => \$source,
- # "source_format:s" => \$source_format,
- # "quiet" => \$quiet,
- # "debug" => \$debug,
- # ( eval "defined \@".$run_pkg."::Options" )
- # ? eval "\@".$run_pkg."::Options"
- # : ());
- #if ( ! $result ) {
- # print STDERR "usage: $0 --dir dirpath "
- # ."[--group group] [--mode mode] "
- # ."[--source file] [--source_format fmt-string] "
- # ."[--dest file] [--dest_format fmt-string] "
- # ."[--quiet]\n";
- # if ( eval "defined \$".$run_pkg."::Usage" ) {
- # print STDERR " "
- # .( eval "\$".$run_pkg."::Usage" )."\n";
- # }
- # exit 1;
- #}
$debug and print STDERR "debug: entered run for $run_pkg\n";
# make sure we have the run package loaded
@@ -711,12 +687,15 @@
# then data processing is external to the fetch routine
# (This externalizes the data for other software to capture it.)
$debug and print STDERR "run before output\n";
- my $dest_format = $options_ref->{dest_format};
- if (( defined $obj->{data}) and ( defined $obj->{actions})) {
- if ( defined $obj->{dest}) {
- ( defined $obj->{actions}) or $obj->{actions} = {};
- ( defined $obj->{actions}{$dest_format})
- or $obj->{actions}{$dest_format} = [];
+ my $dest_format = $obj->{dest_format};
+ if ( !exists $obj->{actions}) {
+ $obj->{actions} = {};
+ }
+ if (( exists $obj->{data})) {
+ if ( exists $obj->{dest}) {
+ if ( !exists $obj->{actions}{$dest_format}) {
+ $obj->{actions}{$dest_format} = [];
+ }
push @{$obj->{actions}{$dest_format}}, [ $obj->{dest} ];
}
@@ -728,18 +707,21 @@
$debug and print STDERR "run before save\n";
my $result = $obj->save();
- if ( ! $result ) {
- my $savable;
- foreach $savable ( @{$obj->{savable}}) {
- (ref $savable eq "HASH") or next;
- if ( defined $savable->{error}) {
- throw_save_error( "error saving in "
- .$obj->{dir}
- ."file: ".$savable->{file}
- ."error: " .$savable->{error} );
- }
- }
- }
+
+ # Old WebFetch pre-0.9 API code, should not be needed any more
+ #if ( ! $result ) {
+ # my $savable;
+ # foreach $savable ( @{$obj->{savable}}) {
+ # (ref $savable eq "HASH") or next;
+ # if ( exists $savable->{error}) {
+ # throw_save_error( "error saving in "
+ # .$obj->{dir}
+ # ."file: ".$savable->{file}
+ # ."error: " .$savable->{error} );
+ # }
+ # }
+ #}
+
return $result ? 0 : 1;
}
@@ -779,7 +761,7 @@
a one-liner banner or title text
(plain text, no HTML tags)
-=item source
+=item url
URL or file path (as appropriate) to the news source
@@ -964,7 +946,7 @@
# we *really* need the data and actions to be set!
# otherwise assume we're in WebFetch 0.09 compatibility mode and
# $self->fetch() better have created its own savables already
- if (( !defined $self->{data}) or ( !defined $self->{actions})) {
+ if (( !exists $self->{data}) or ( !exists $self->{actions})) {
return
}
@@ -1155,16 +1137,19 @@
# utility function to get the contents of a URL
sub get
{
- my ( $self ) = @_;
+ my ( $self, $source ) = @_;
+ if ( ! defined $source ) {
+ $source = $self->{source};
+ }
if ( $self->{debug}) {
- print STDERR "debug: get(".$self->{source}.")\n";
+ print STDERR "debug: get(".$source.")\n";
}
# send request, capture response
my $ua = LWP::UserAgent->new;
$ua->agent("WebFetch/$VERSION ".$ua->agent);
- my $request = HTTP::Request->new(GET => $self->{source});
+ my $request = HTTP::Request->new(GET => $source);
my $response = $ua->request($request);
# abort on failure
@@ -1232,17 +1217,47 @@
{
my ( $self, $filename, $content ) = @_;
- if ( !defined $self->{savable}) {
+ if ( !exists $self->{savable}) {
$self->{savable} = [];
}
push ( @{$self->{savable}}, {
'file' => $filename,
'content' => $content,
- (( defined $self->{group}) ? ('group' => $self->{group}) : ()),
- (( defined $self->{mode}) ? ('mode' => $self->{mode}) : ())
+ (( exists $self->{group}) ? ('group' => $self->{group}) : ()),
+ (( exists $self->{mode}) ? ('mode' => $self->{mode}) : ())
});
}
+=item $obj->direct_fetch_savable( $filename, $source )
+
+I<This should be used only in format handler functions.
+See do_actions() for details.>
+
+This adds a task for the save function to fetch a URL and save it
+verbatim in a file. This can be used to download links contained
+in a news feed.
+
+=cut
+
+sub direct_fetch_savable
+{
+ my ( $self, $url ) = @_;
+
+ if ( !exists $self->{savable}) {
+ $self->{savable} = [];
+ }
+ my $filename = $url;
+ $filename =~ s=[;?].*==;
+ $filename =~ s=^.*/==;
+ push ( @{$self->{savable}}, {
+ 'url' => $url,
+ 'file' => $filename,
+ 'index' => 1,
+ (( exists $self->{group}) ? ('group' => $self->{group}) : ()),
+ (( exists $self->{mode}) ? ('mode' => $self->{mode}) : ())
+ });
+}
+
=item $obj->save
This WebFetch utility function goes through all the entries in the
@@ -1284,17 +1299,29 @@
}
# check if we have attributes needed to proceed
- if ( !defined $self->{"dir"}) {
+ if ( !exists $self->{"dir"}) {
die "WebFetch: directory path missing - "
."required for save\n";
}
- if ( !defined $self->{savable}) {
+ if ( !exists $self->{savable}) {
die "WebFetch: nothing to save\n";
}
if ( ref($self->{savable}) ne "ARRAY" ) {
die "WebFetch: cannot save - savable is not an array\n";
}
+ # if fetch_urls is defined, turn link fields in the data to savables
+ if (( exists $self->{fetch_urls}) and $self->{fetch_urls}) {
+ my $url_fnum = $self->wk2fnum( "url" );
+ my $entry;
+ foreach $entry ( @{$self->{data}{records}}) {
+ if ( defined $entry->[$url_fnum]) {
+ $self->direct_fetch_savable(
+ $entry->[$url_fnum]);
+ }
+ }
+ }
+
# loop through "savable" (grouped content and filename destination)
my $savable;
foreach $savable ( @{$self->{savable}}) {
@@ -1304,12 +1331,14 @@
}
# verify contents of savable record
- if ( !defined $savable->{file}) {
+ if ( !exists $savable->{file}) {
$savable->{error} = "missing file name - skipped";
next;
}
- if ( !defined $savable->{content}) {
- $savable->{error} = "missing content text - skipped";
+ if (( !exists $savable->{content})
+ and ( !exists $savable->{url}))
+ {
+ $savable->{error} = "missing content or URL - skipped";
next;
}
@@ -1327,6 +1356,45 @@
}
}
+ # if a URL was provided and index flag is set, use index file
+ my %id_index;
+ my ( $timestamp, $filename );
+ my $was_in_index = 0;
+ if (( exists $savable->{url}) and ( exists $savable->{index}))
+ {
+ require DB_File;
+ tie %id_index, 'DB_File',
+ $self->{dir}."/id_index.db",
+ &DB_File::O_CREAT|&DB_File::O_RDWR, 0640;
+ if ( exists $id_index{$savable->{url}}) {
+ ( $timestamp, $filename ) =
+ split /#/, $id_index{$savable->{url}};
+ $was_in_index = 1;
+ } else {
+ $timestamp = time;
+ $id_index{$savable->{url}} =
+ $timestamp."#".$savable->{file};
+ }
+ untie %id_index ;
+ }
+
+ # For now, we consider it done if the file was in the index.
+ # Future options would be to check if URL was modified.
+ if ( $was_in_index ) {
+ next;
+ }
+
+ # if a URL was provided and no content, get content from URL
+ if (( ! exists $savable->{content})
+ and ( exists $savable->{url}))
+ {
+ $savable->{content} =
+ eval { ${$self->get($savable->{url})} };
+ if ( $@ ) {
+ next;
+ }
+ }
+
# write content to the "new content" file
if ( ! open ( new_content, ">$new_content" )) {
$savable->{error} = "cannot open "
@@ -1366,7 +1434,7 @@
}
# chgrp the "new content" before final installation
- if ( defined $savable->{group}) {
+ if ( exists $savable->{group}) {
my $gid = $savable->{group};
if ( $gid !~ /^[0-9]+$/o ) {
$gid = (getgrnam($gid))[2];
@@ -1387,7 +1455,7 @@
}
# chmod the "new content" before final installation
- if ( defined $savable->{mode}) {
+ if ( exists $savable->{mode}) {
if ( ! chmod oct($savable->{mode}), $new_content ) {
$savable->{error} = "cannot chmod "
.$new_content." to "
@@ -1410,7 +1478,7 @@
# loop through savable to report any errors
my $err_count = 0;
foreach $savable ( @{$self->{savable}}) {
- if ( defined $savable->{error}) {
+ if ( exists $savable->{error}) {
print STDERR "WebFetch: failed to save "
.$savable->{file}.": "
.$savable->{error}."\n";
@@ -1435,7 +1503,7 @@
my ( $self ) = @_;
# check if fname2fnum is already initialized
- if (( defined $self->{fname2fnum})
+ if (( exists $self->{fname2fnum})
and ref $self->{fname2fnum} eq "HASH" )
{
# already done - success
@@ -1443,8 +1511,8 @@
}
# check if prerequisite data exists
- if (( ! defined $self->{data} )
- or ( ! defined $self->{data}{fields}))
+ if (( ! exists $self->{data} )
+ or ( ! exists $self->{data}{fields}))
{
# missing prerequisites - failed
return 0;
@@ -1470,7 +1538,7 @@
$self->init_fname2fnum() or return 0;
# check if wk2fnum is already initialized
- if (( defined $self->{wk2fnum})
+ if (( exists $self->{wk2fnum})
and ref $self->{wk2fnum} eq "HASH" )
{
# already done - success
@@ -1478,7 +1546,7 @@
}
# check for prerequisite data
- if ( ! defined $self->{data}{wk_names}) {
+ if ( ! exists $self->{data}{wk_names}) {
return 0;
}
@@ -1486,7 +1554,7 @@
$self->{wk2fnum} = {};
foreach $wk_key ( keys %{$self->{data}{wk_names}}) {
# perform consistency cross-check between wk_names and fields
- if ( !defined $self->{fname2fnum}{$self->{data}{wk_names}{$wk_key}})
+ if ( !exists $self->{fname2fnum}{$self->{data}{wk_names}{$wk_key}})
{
# wk_names has a bad field name - carp about it!
carp "warning: wk_names contains $wk_key"."->"
@@ -1511,22 +1579,22 @@
$self->init_fname2fnum() or return undef;
# check for prerequisite data
- if (( ! defined $self->{data}{wk_names})
- or ( ! defined $self->{data}{wk_names}{$wk}))
+ if (( ! exists $self->{data}{wk_names})
+ or ( ! exists $self->{data}{wk_names}{$wk}))
{
return undef;
}
# double check that the field exists before pronouncing it OK
# (perform consistency cross-check between wk_names and fields)
- if ( defined $self->{fname2fnum}{$self->{data}{wk_names}{$wk}}) {
+ if ( exists $self->{fname2fnum}{$self->{data}{wk_names}{$wk}}) {
return $self->{data}{wk_names}{$wk};
}
# otherwise, wk_names has a bad field name.
# But init_wk2fnum() may have already carped about it
# so check whether we need to carp about it or not.
- if ( ! defined $self->{wk2fnum}) {
+ if ( ! exists $self->{wk2fnum}) {
carp "warning: wk_names contains $wk"."->"
.$self->{data}{wk_names}{$wk}
." but "
@@ -1542,7 +1610,8 @@
my ( $self, $fname ) = @_;
$self->init_fname2fnum() or return undef;
- return $self->{fname2fnum}{$fname};
+ return ( exists $self->{fname2fnum}{$fname})
+ ? $self->{fname2fnum}{$fname} : undef;
}
# convert well-known name to field number
@@ -1551,14 +1620,15 @@
my ( $self, $wk ) = @_;
$self->init_wk2fnum() or return undef;
- return $self->{wk2fnum}{$wk};
+ return ( exists $self->{wk2fnum}{$wk})
+ ? $self->{wk2fnum}{$wk} : undef;
}
1;
__END__
# remainder of POD docs follow
-=head2 WRITING NEW WebFetch-DERIVED MODULES
+=head2 WRITING WebFetch-DERIVED MODULES
The easiest way to make a new WebFetch-derived module is to start
from the module closest to your fetch operation and modify it.
@@ -1609,17 +1679,21 @@
Please consider contributing any useful changes back to the WebFetch
project at C<ma...@we...>.
-=head1 AUTHOR
+=head1 ACKNOWLEDGEMENTS
WebFetch was written by Ian Kluft
Send patches, bug reports, suggestions and questions to
C<ma...@we...>.
+=head1 LICENSE
+
WebFetch is Open Source software distributed via the
Comprehensive Perl Archive Network (CPAN),
a worldwide network of Perl web mirror sites.
WebFetch may be copied under the same terms and licensing as Perl itelf.
+=head1 SEE ALSO
+
=for html
A current copy of the source code and documentation may be found at
<a href="http://www.webfetch.org/">http://www.webfetch.org/</a>
@@ -1632,27 +1706,26 @@
A current copy of the source code and documentation may be found at
http://www.webfetch.org/
-=head1 SEE ALSO
-
TODO: fill in these lists
=for html
<a href="http://www.perl.org/">perl</a>(1),
<a href="WebFetch::Input::PerlStruct.html">WebFetch::Input::PerlStruct</a>,
<a href="WebFetch::Input::SiteNews.html">WebFetch::Input::SiteNews</a>,
+<a href="WebFetch::Input::Atom.html">WebFetch::Input::Atom</a>,
<a href="WebFetch::Input::RSS.html">WebFetch::Input::RSS</a>,
<a href="WebFetch::Input::Dump.html">WebFetch::Input::Dump</a>,
-<a href="WebFetch::Output::RSS.html">WebFetch::Output::RSS</a>,
+<a href="WebFetch::Output::TT.html">WebFetch::Output::TT</a>,
<a href="WebFetch::Output::Dump.html">WebFetch::Output::Dump</a>,
=for text
perl(1), WebFetch::Input::PerlStruct, WebFetch::Input::SiteNews,
-WebFetch::Input::RSS, WebFetch::Input::Dump,
-WebFetch::Output::RSS, WebFetch::Output::Dump
+WebFetch::Input::Atom, WebFetch::Input::RSS, WebFetch::Input::Dump,
+WebFetch::Output::TT, WebFetch::Output::Dump
=for man
perl(1), WebFetch::Input::PerlStruct, WebFetch::Input::SiteNews,
-WebFetch::Input::RSS, WebFetch::Input::Dump,
-WebFetch::Output::RSS, WebFetch::Output::Dump
+WebFetch::Input::Atom, WebFetch::Input::RSS, WebFetch::Input::Dump,
+WebFetch::Output::TT, WebFetch::Output::Dump
=cut
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-25 22:20:03
|
Revision: 34
http://webfetch.svn.sourceforge.net/webfetch/?rev=34&view=rev
Author: ikluft
Date: 2009-08-25 22:19:47 +0000 (Tue, 25 Aug 2009)
Log Message:
-----------
add WebFetch::Input::Atom and WebFetch::Output:TT
Added Paths:
-----------
branches/v0.11/lib/WebFetch/Input/Atom.pm
branches/v0.11/lib/WebFetch/Output/TT.pm
Added: branches/v0.11/lib/WebFetch/Input/Atom.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/Atom.pm (rev 0)
+++ branches/v0.11/lib/WebFetch/Input/Atom.pm 2009-08-25 22:19:47 UTC (rev 34)
@@ -0,0 +1,174 @@
+#
+# WebFetch::Input::Atom - get headlines from remote Atom feed
+#
+# Copyright (c) 1998-2009 Ian Kluft. This program is free software; you can
+# redistribute it and/or modify it under the terms of the GNU General Public
+# License Version 3. See http://www.webfetch.org/GPLv3.txt
+
+package WebFetch::Input::Atom;
+
+use strict;
+use base "WebFetch";
+
+use Carp;
+use Scalar::Util qw( blessed );
+use Date::Calc qw(Today Delta_Days Month_to_Text);
+use XML::Atom::Client;
+use LWP::UserAgent;
+
+use Exception::Class (
+);
+
+our @Options = ();
+our $Usage = "";
+
+# configuration parameters
+our $num_links = 5;
+
+# no user-servicable parts beyond this point
+
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "input:atom" );
+
+# called from WebFetch main routine
+sub fetch
+{
+ my ( $self ) = @_;
+
+ # set parameters for WebFetch routines
+ if ( !defined $self->{num_links}) {
+ $self->{num_links} = $WebFetch::Input::Atom::num_links;
+ }
+ if ( !defined $self->{style}) {
+ $self->{style} = {};
+ $self->{style}{para} = 1;
+ }
+
+ # set up Webfetch Embedding API data
+ $self->{data} = {};
+ $self->{actions} = {};
+ $self->{data}{fields} = [ "id", "updated", "title", "author", "link",
+ "summary", "content", "xml" ];
+ # defined which fields match to which "well-known field names"
+ $self->{data}{wk_names} = {
+ "title" => "title",
+ "url" => "link",
+ "date" => "updated",
+ "summary" => "summary",
+ };
+ $self->{data}{records} = [];
+
+ # process the links
+
+ # parse data file
+ $self->parse_input();
+
+ # return and let WebFetch handle the data
+}
+
+# extract a string value from a scalar/ref if possible
+sub extract_value
+{
+ my $thing = shift;
+
+ ( defined $thing ) or return undef;
+ if ( ref $thing ) {
+ if ( !blessed $thing ) {
+ # it's a HASH/ARRAY/etc, not an object
+ return undef;
+ }
+ if ( $thing->can( "as_string" )) {
+ return $thing->as_string;
+ }
+ return undef;
+ } else {
+ $thing =~ s/\s+$//s;
+ length $thing > 0 or return undef;
+ return $thing;
+ }
+}
+
+# parse Atom input
+sub parse_input
+{
+ my $self = shift;
+ my $atom_api = XML::Atom::Client->new;
+ my $atom_feed = $atom_api->getFeed( $self->{source} );
+
+ # parse values from top of structure
+ my ( %feed, @entries, $entry );
+ @entries = $atom_feed->entries;
+ foreach $entry ( @entries ) {
+ # save the data record
+ my $id = extract_value( $entry->id() );
+ my $title = extract_value( $entry->title() );
+ my $author = extract_value( $entry->author() );
+ my $link = extract_value( $entry->link() );
+ my $updated = extract_value( $entry->updated() );
+ my $summary = extract_value( $entry->summary() );
+ my $content = extract_value( $entry->content() );
+ my $xml = $entry->as_xml();
+ push @{$self->{data}{records}},
+ [ $id, $updated, $title, $author, $link, $summary,
+ $content, $xml ];
+ }
+}
+
+1;
+__END__
+# POD docs follow
+
+=head1 NAME
+
+WebFetch::Input::Atom - download and save an Atom feed
+
+=head1 SYNOPSIS
+
+In perl scripts:
+
+C<use WebFetch::Input::Atom;>
+
+From the command line:
+
+C<perl -w -MWebFetch::Input::Atom -e "&fetch_main" -- --dir directory
+ --source atom-feed-url [...WebFetch output options...]>
+
+=head1 DESCRIPTION
+
+This module gets the current headlines from a site-local file.
+
+The I<--input> parameter specifies a file name which contains news to be
+posted. See L<"FILE FORMAT"> below for details on contents to put in the
+file. I<--input> may be specified more than once, allowing a single news
+output to come from more than one input. For example, one file could be
+manually maintained in CVS or RCS and another could be entered from a
+web form.
+
+After this runs, the file C<site_news.html> will be created or replaced.
+If there already was a C<site_news.html> file, it will be moved to
+C<Osite_news.html>.
+
+=head1 Atom FORMAT
+
+Atom is an XML format defined at http://atompub.org/rfc4287.html
+
+WebFetch::Input::Atom uses Perl's XML::Atom::Client to parse Atom feeds.
+
+=head1 AUTHOR
+
+WebFetch was written by Ian Kluft
+Send patches, bug reports, suggestions and questions to
+C<ma...@we...>.
+
+=head1 SEE ALSO
+
+=for html
+<a href="WebFetch.html">WebFetch</a>
+
+=for text
+WebFetch
+
+=for man
+WebFetch
+
+=cut
Added: branches/v0.11/lib/WebFetch/Output/TT.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Output/TT.pm (rev 0)
+++ branches/v0.11/lib/WebFetch/Output/TT.pm 2009-08-25 22:19:47 UTC (rev 34)
@@ -0,0 +1,106 @@
+#
+# WebFetch::Output::TT - save data via the Perl Template Toolkit
+#
+# Copyright (c) 1998-2009 Ian Kluft. This program is free software; you can
+# redistribute it and/or modify it under the terms of the GNU General Public
+# License Version 3. See http://www.webfetch.org/GPLv3.txt
+
+package WebFetch::Output::TT;
+
+use strict;
+use base "WebFetch";
+
+use Carp;
+use Template;
+
+# define exceptions/errors
+use Exception::Class (
+ "WebFetch::Output::TT::Exception::Template" => {
+ isa => "WebFetch::TracedException",
+ alias => "throw_template",
+ description => "error during template processing",
+ },
+
+);
+
+
+# set defaults
+
+our @Options = ( "template=s", "tt_include:s" );
+our $Usage = "--template template-file [--tt_include include-path]";
+
+# no user-servicable parts beyond this point
+
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "cmdline", "output:tt" );
+
+# Perl Template Toolkit format handler
+sub fmt_handler_tt
+{
+ my $self = shift;
+ my $filename = shift;
+ my $output;
+
+ # configure and create template object
+ my %tt_config = (
+ ABSOLUTE => 1,
+ RELATIVE => 1,
+ );
+ if ( exists $self->{tt_include}) {
+ $tt_config{INCLUDE_PATH} = $self->{tt_include}
+ }
+ my $template = Template->new( \%tt_config );
+
+ # process template
+ my $result = $template->process( $self->{template}, $self->{data},
+ \$output );
+
+ $result or throw_template ( $template->error());
+
+ $self->raw_savable( $filename, $output );
+ 1;
+}
+
+1;
+__END__
+# POD docs follow
+
+=head1 NAME
+
+WebFetch::Output::TT - save data via the Perl Template Toolkit
+
+=head1 SYNOPSIS
+
+In perl scripts:
+
+C<use WebFetch::Output::TT;>
+
+From the command line:
+
+C<perl -w -MWebFetch::Output::TT -e "&fetch_main" -- --dir directory
+ --dest_format tt --dest dest-path [...WebFetch output options...]>
+
+=head1 DESCRIPTION
+
+This module saves output via the Perl Template Toolkit.
+
+TODO: add description
+
+=head1 AUTHOR
+
+WebFetch was written by Ian Kluft
+Send patches, bug reports, suggestions and questions to
+C<ma...@we...>.
+
+=head1 SEE ALSO
+
+=for html
+<a href="WebFetch.html">WebFetch</a>
+
+=for text
+WebFetch
+
+=for man
+WebFetch
+
+=cut
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <ik...@us...> - 2009-08-25 22:17:42
|
Revision: 33
http://webfetch.svn.sourceforge.net/webfetch/?rev=33&view=rev
Author: ikluft
Date: 2009-08-25 22:17:25 +0000 (Tue, 25 Aug 2009)
Log Message:
-----------
code modernization
version bumped to 0.12
WebFetch.pm: eval wrapper for main
changes to plugin API for automated selection of plugin modules
changed command line processing
replaced "fetch" capability string with "input" group
Modified Paths:
--------------
branches/v0.11/lib/WebFetch/Input/PerlStruct.pm
branches/v0.11/lib/WebFetch/Input/RSS.pm
branches/v0.11/lib/WebFetch/Input/SiteNews.pm
branches/v0.11/lib/WebFetch/Output/Dump.pm
branches/v0.11/lib/WebFetch.pm
Modified: branches/v0.11/lib/WebFetch/Input/PerlStruct.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/PerlStruct.pm 2009-08-22 05:44:49 UTC (rev 32)
+++ branches/v0.11/lib/WebFetch/Input/PerlStruct.pm 2009-08-25 22:17:25 UTC (rev 33)
@@ -14,7 +14,7 @@
use Carp;
our $format;
-our @Options = ( "format:s" => \$format );
+our @Options = ( "format:s" );
our $Usage = "";
# configuration parameters
@@ -23,8 +23,8 @@
# no user-servicable parts beyond this point
-# register with WebFetch to provide "fetch" capability
-__PACKAGE__->module_register( "fetch", "input:perlstruct" );
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "input:perlstruct" );
sub fetch
{
Modified: branches/v0.11/lib/WebFetch/Input/RSS.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/RSS.pm 2009-08-22 05:44:49 UTC (rev 32)
+++ branches/v0.11/lib/WebFetch/Input/RSS.pm 2009-08-25 22:17:25 UTC (rev 33)
@@ -27,8 +27,8 @@
# no user-servicable parts beyond this point
-# register with WebFetch to provide "fetch" capability
-__PACKAGE__->module_register( "fetch" );
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "input:rss" );
# called from WebFetch main routine
sub fetch
Modified: branches/v0.11/lib/WebFetch/Input/SiteNews.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Input/SiteNews.pm 2009-08-22 05:44:49 UTC (rev 32)
+++ branches/v0.11/lib/WebFetch/Input/SiteNews.pm 2009-08-25 22:17:25 UTC (rev 33)
@@ -14,23 +14,21 @@
use Date::Calc qw(Today Delta_Days Month_to_Text);
# set defaults
-our ( @input, $short_path, $long_path, $cat_priorities, $now, $nowstamp );
-our $short_path = undef;
-our $long_path = undef;
+our ( $cat_priorities, $now, $nowstamp );
our @Options = (
- "input=s@" => \@input,
- "short=s" => \$short_path,
- "long=s" => \$long_path);
-our $Usage = "--input news-file --short short-output-file --long long-output-file";
+ "short=s",
+ "long=s",
+);
+our $Usage = "--short short-output-file --long long-output-file";
# configuration parameters
our $num_links = 5;
# no user-servicable parts beyond this point
-# register with WebFetch to provide "fetch" capability
-__PACKAGE__->module_register( "fetch", "input:sitenews" );
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "cmdline", "input:sitenews" );
# constants for state names
sub initial_state { 0; }
@@ -72,13 +70,15 @@
$nowstamp = sprintf "%04d%02d%02d", @$now;
# parse data file
- my $input;
- foreach $input ( @input ) {
- $self->parse_input( $input );
+ my $source;
+ if (( exists $self->{sources}) and ( ref $self->{sources} eq "ARRAY" )) {
+ foreach $source ( @{$self->{sources}}) {
+ $self->parse_input( $source );
+ }
}
# set parameters for the short news format
- if ( defined $short_path ) {
+ if ( defined $self->{short_path} ) {
# create the HTML actions list
$self->{actions}{html} = [];
@@ -123,18 +123,18 @@
};
# put parameters for fmt_handler_html() on the html list
- push @{$self->{actions}{html}}, [ $short_path, $params ];
+ push @{$self->{actions}{html}}, [ $self->{short_path}, $params ];
}
# set parameters for the long news format
- if ( defined $long_path ) {
+ if ( defined $self->{long_path} ) {
# create the SiteNews-specific action list
# It will use WebFetch::Input::SiteNews::fmt_handler_sitenews_long()
# which is defined in this file
$self->{actions}{sitenews_long} = [];
# put parameters for fmt_handler_sitenews_long() on the list
- push @{$self->{actions}{sitenews_long}}, [ $long_path ];
+ push @{$self->{actions}{sitenews_long}}, [ $self->{long_path} ];
}
}
@@ -300,7 +300,7 @@
push @long_text, "</dl>";
# store it for later save to disk
- $self->html_savable( $long_path, join("\n",@long_text)."\n" );
+ $self->html_savable( $self->{long_path}, join("\n",@long_text)."\n" );
}
#---------------------------------------------------------------------------
@@ -342,8 +342,9 @@
if (( defined $entry->{category}) and
( defined $cat_priorities->{$entry->{category}}))
{
- return $cat_priorities->{$entry->{category}} + $age * 0.025
- + $bonus;
+ my $cat_pri = ( exists $cat_priorities->{$entry->{category}})
+ ? $cat_priorities->{$entry->{category}} : 0;
+ return $cat_pri + $age * 0.025 + $bonus;
} else {
return $cat_priorities->{"default"} + $age * 0.025
+ $bonus;
@@ -369,16 +370,16 @@
From the command line:
C<perl -w -MWebFetch::Input::SiteNews -e "&fetch_main" -- --dir directory
- --input news-file --short short-form-output-file
+ --source news-file --short short-form-output-file
--long long-form-output-file>
=head1 DESCRIPTION
This module gets the current headlines from a site-local file.
-The I<--input> parameter specifies a file name which contains news to be
+The I<--source> parameter specifies a file name which contains news to be
posted. See L<"FILE FORMAT"> below for details on contents to put in the
-file. I<--input> may be specified more than once, allowing a single news
+file. I<--source> may be specified more than once, allowing a single news
output to come from more than one input. For example, one file could be
manually maintained in CVS or RCS and another could be entered from a
web form.
Modified: branches/v0.11/lib/WebFetch/Output/Dump.pm
===================================================================
--- branches/v0.11/lib/WebFetch/Output/Dump.pm 2009-08-22 05:44:49 UTC (rev 32)
+++ branches/v0.11/lib/WebFetch/Output/Dump.pm 2009-08-25 22:17:25 UTC (rev 33)
@@ -32,8 +32,8 @@
# no user-servicable parts beyond this point
-# register with WebFetch to provide "fetch" capability
-__PACKAGE__->module_register( "save", "output:dump" );
+# register capabilities with WebFetch
+__PACKAGE__->module_register( "output:dump" );
# Perl structure dump format handler
sub fmt_handler_dump
@@ -65,7 +65,7 @@
=head1 DESCRIPTION
-This module gets the current headlines from a site-local file.
+This module gets the current news headlines from a site-local file.
TODO: add description
Modified: branches/v0.11/lib/WebFetch.pm
===================================================================
--- branches/v0.11/lib/WebFetch.pm 2009-08-22 05:44:49 UTC (rev 32)
+++ branches/v0.11/lib/WebFetch.pm 2009-08-25 22:17:25 UTC (rev 33)
@@ -118,28 +118,72 @@
# define exceptions/errors
use Exception::Class (
'WebFetch::Exception',
+ 'WebFetch::TracedException' => {
+ isa => 'WebFetch::Exception',
+ },
+ 'WebFetch::Exception::GetoptError' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_getopt_error',
+ description => "software error during command line processing",
+ },
+
+ 'WebFetch::Exception::Usage' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_cli_usage',
+ description => "command line processing failed",
+ },
+
'WebFetch::Exception::Save' => {
isa => 'WebFetch::Exception',
+ alias => 'throw_save_error',
description => "an error occurred while saving the data",
- trace => 0,
},
+ 'WebFetch::Exception::NoSave' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_no_save',
+ description => "unable to save data because of no data or nowhere to save it",
+ },
+
+ 'WebFetch::Exception::NoInputHandler' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_no_input_handler',
+ description => "no input handler was found",
+ },
+
'WebFetch::Exception::MustOverride' => {
- isa => 'WebFetch::Exception',
+ isa => 'WebFetch::TracedException',
description => "A WebFetch function was called which is "
."supposed to be overridden by a subclass",
- trace => 1,
},
+
'WebFetch::Exception::NetworkGet' => {
isa => 'WebFetch::Exception',
description => "Failed to access RSS feed",
- trace => 0,
},
+ 'WebFetch::Exception::ModLoadFailure' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_mod_load_failure',
+ description => "failed to load a WebFetch Perl module",
+ },
+
+ 'WebFetch::Exception::ModRunFailure' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_mod_run_failure',
+ description => "failed to run a WebFetch module",
+ },
+
+ 'WebFetch::Exception::ModNoRunModule' => {
+ isa => 'WebFetch::Exception',
+ alias => 'throw_no_run',
+ description => "no module was found to run the request",
+ },
+
);
-our $VERSION = '0.11-pre3';
+our $VERSION = '0.12';
our %default_modules = (
"input" => {
"rss" => "WebFetch::Input::RSS",
@@ -150,6 +194,7 @@
},
"output" => {
"rss" => "WebFetch::Output:RSS",
+ "atom" => "WebFetch::Output:Atom",
"tt" => "WebFetch::Output:TT",
"perlstruct" => "WebFetch::Output::PerlStruct",
"dump" => "WebFetch::Output::Dump",
@@ -173,10 +218,11 @@
For the $module parameter, the Perl module should provide its own
name, usually via the __PACKAGE__ string.
-The @capabilities array is any number of strings as needed to list the capabilities which
-the module performs for the WebFetch API.
-The currently-recognized capabilities are "fetch" and "save".
-However, the function will save all the capability names that the module provides.
+The @capabilities array is any number of strings as needed to list the
+capabilities which the module performs for the WebFetch API.
+The currently-recognized capabilities are "cmdline", "input" and "output".
+However, the function will save all the capability names that the module
+provides.
=cut
@@ -210,13 +256,118 @@
}
}
+# satisfy POD coverage test - but don't put this function in the user manual
+=pod
+=cut
+# module selection - choose WebFetch module based on selected file format
+# for WebFetch internal use only
+sub module_select
+{
+ my $capability = shift;
+ my $is_optional = shift;
+
+ $debug and print STDERR "debug: "
+ ."module_select($capability,$is_optional)\n";
+ # parse the capability string
+ my ( $group, $topic );
+ if ( $capability =~ /([^:]*):(.*)/ ) {
+ $group = $1;
+ $topic = $2
+ } else {
+ $topic = $capability;
+ }
+
+ # check for modules to handle the specified source_format
+ my ( @handlers, %handlers, $handler );
+
+ # consider whether a group is in use (single or double-level scan)
+ if ( $group ) {
+ # double-level scan
+
+ # if the group exists, search in it
+ if (( exists $modules{$group}{$topic} )
+ and ( ref $modules{$group}{$topic} eq "ARRAY" ))
+ {
+ # search group for topic
+ foreach $handler (@{$modules{$group}{$topic}})
+ {
+ if ( !exists $handlers{$handler}) {
+ push @handlers, $handler;
+ $handlers{$handler} = 1;
+ }
+ }
+
+ # otherwise check the defaults
+ } elsif ( exists $default_modules{$group}{$topic} ) {
+ # check default handlers
+ $handler = $default_modules{$group}{$topic};
+ if ( !exists $handlers{$handler}) {
+ push @handlers, $handler;
+ $handlers{$handler} = 1;
+ }
+ }
+ } else {
+ # single-level scan
+
+ # if the topic exists, the search is a success
+ if (( exists $modules{$topic})
+ and ( ref $modules{$topic} eq "ARRAY" ))
+ {
+ @handlers = @{$modules{$topic}};
+ }
+ }
+
+ # check if any handlers were found for this input format
+ if ( ! @handlers and ! $is_optional ) {
+ throw_no_input_handler( "handler not found for $capability" );
+ }
+
+ $debug and print STDERR "debug: module_select: "
+ .join( " ", @handlers )."\n";
+ return @handlers;
+}
+
+# satisfy POD coverage test - but don't put this function in the user manual
+=pod
+=cut
+
+# if no input or output format was specified, but only 1 is registered, pick it
+# $group parameter should be config group to search, i.e. "input" or "output"
+# returns the format string which will be provided
+sub singular_handler
+{
+ my $group = shift;
+
+ $debug and print STDERR "debug: singular_handler($group)\n";
+ my $count = 0;
+ my ( $entry, $last );
+ foreach $entry ( keys %{$modules{$group}} ) {
+ if ( ref $modules{$group}{$entry} eq "ARRAY" ) {
+ my $entry_count = scalar @{$modules{$group}{$entry}};
+ $count += $entry_count;
+ if ( $count > 1 ) {
+ return undef;
+ }
+ if ( $entry_count == 1 ) {
+ $last = $entry;
+ }
+ }
+ }
+
+ # if there's only one registered, that's the one to use
+ $debug and print STDERR "debug: singular_handler: "
+ ."count=$count last=$last\n";
+ return $count == 1 ? $last : undef;
+}
+
+
=item fetch_main
This function is exported into the main package.
-For all modules which registered with the "fetch" capability at the time
-this is called, it will call the run() function on behalf of each of the
-packages.
+For all modules which registered with an "input" capability for the requested
+file format at the time this is called, it will call the run() function on
+behalf of each of the packages.
=cut
@@ -224,20 +375,159 @@
# This eliminates the need for the sub-packages to export their own
# fetch_main(), which users found conflicted with each other when
# loading more than one WebFetch-derived module.
+=head2 eval_wrapper ( $code, $throw_func, [ name => value, ...] )
+
+=cut
+
+# fetch_main - eval wrapper for fetch_main2 to catch and display errors
sub main::fetch_main
{
- my ( $pkgname );
+ # run fetch_main2 in an eval so we can catch exceptions
+ my $result = eval { &WebFetch::fetch_main2; };
- # loop through the packages which registered with fetch capability
- print STDERR "WebFetch: fetch_main\n";
- foreach $pkgname ( @{$modules{fetch}}) {
- print STDERR "WebFetch: running for $pkgname\n";
- eval "\&WebFetch::run(\$pkgname)";
+ # process any error/exception that we may have gotten
+ if ( $@ ) {
+ my $ex = $@;
+
+ # determine if there's an error message available to display
+ my $pkg = __PACKAGE__;
+ if ( ref $ex ) {
+ if ( my $ex_cap = Exception::Class->caught(
+ "WebFetch::Exception"))
+ {
+ if ( $ex_cap->isa( "WebFetch::TracedException" )) {
+ warn $ex_cap->trace->as_string, "\n";
+ }
+
+ die "$pkg: ".$ex_cap->error."\n";
+ }
+ if ( $ex->can("stringify")) {
+ # Error.pm, possibly others
+ die "$pkg: ".$ex->stringify."\n";
+ } elsif ( $ex->can("as_string")) {
+ # generic - should work for many classes
+ die "$pkg: ".$ex->as_string."\n";
+ } else {
+ die "$pkg: unknown exception of type "
+ .(ref $ex)."\n";
+ }
+ } else {
+ die "pkg: $@\n";
+ }
+ }
+
+ # success
+ exit 0;
+}
+
+
+sub fetch_main2
+{
+ # search for modules which have registered "cmdline" capability
+ # collect their command line options
+ my ( $cli_mod, @mod_options, @mod_usage );
+ if (( exists $modules{cmdline} )
+ and ( ref $modules{cmdline} eq "ARRAY" ))
+ {
+ foreach $cli_mod ( @{$modules{cmdline}}) {
+ if ( defined @cli_mod::Options ) {
+ push @mod_options, @cli_mod::Options;
+ }
+ if ( defined @cli_mod::Usage ) {
+ push @mod_options, @cli_mod::Usage;
+ }
+ }
+ }
+
+ # process command line
+ my ( $result, %options );
+ $result = eval { GetOptions ( \%options,
+ "dir:s",
+ "group:s",
+ "mode:s",
+ "source=s",
+ "source_format:s",
+ "dest=s",
+ "dest_format:s",
+ "quiet",
+ "debug",
+ @mod_options ) };
+ if ( $@ ) {
+ throw_getopt_error ( "command line processing failed: $@" );
+ } elsif ( ! $result ) {
+ throw_cli_usage ( "usage: $0 --dir dirpath "
+ ."[--group group] [--mode mode] "
+ ."[--source file] [--source_format fmt-string] "
+ ."[--dest file] [--dest_format fmt-string] "
+ ."[--quiet] ".join( " ", @mod_usage ));
+ }
+
+ # set debugging mode
+ if (( exists $options{debug}) and $options{debug}) {
+ $debug = 1;
+ }
+ $debug and print STDERR "debug: fetch_main\n";
+
+
+ # if either source/input or dest/output formats were not provided,
+ # check if only one handler is defined - if so that's the one to use
+ if ( !exists $options{source_format}) {
+ if ( my $fmt = singular_handler( "input" )) {
+ $options{source_format} = $fmt;
+ }
+ }
+ if ( !exists $options{dest_format}) {
+ if ( my $fmt = singular_handler( "output" )) {
+ $options{dest_format} = $fmt;
+ }
+ }
+
+ # check for modules to handle the specified source_format
+ my ( @handlers, %handlers );
+ if (( exists $modules{input}{ $options{source_format}} )
+ and ( ref $modules{input}{ $options{source_format}}
+ eq "ARRAY" ))
+ {
+ my $handler;
+ foreach $handler (@{$modules{input}{$options{source_format}}})
+ {
+ if ( !exists $handlers{$handler}) {
+ push @handlers, $handler;
+ $handlers{$handler} = 1;
+ }
+ }
+ }
+ if ( exists $default_modules{ $options{source_format}} ) {
+ my $handler = $default_modules{ $options{source_format}};
+ if ( !exists $handlers{$handler}) {
+ push @handlers, $handler;
+ $handlers{$handler} = 1;
+ }
+ }
+
+ # check if any handlers were found for this input format
+ if ( ! @handlers ) {
+ throw_no_input_handler( "input handler not found for "
+ .$options{source_format});
+ }
+
+ # run the available handlers until one succeeds or none are left
+ my $pkgname;
+ my $run_count = 0;
+ foreach $pkgname ( @handlers ) {
+ $debug and print STDERR "debug: running for $pkgname\n";
+ eval { &WebFetch::run( $pkgname, \%options )};
if ( $@ ) {
print STDERR "WebFetch: run eval error: $@\n";
+ } else {
+ $run_count++;
+ last;
}
}
-
+ if ( $run_count == 0 ) {
+ throw_no_run( "no handlers were able or available to process "
+ ." source format" );
+ }
}
=item Do not use the new() function directly from WebFetch.
@@ -286,6 +576,24 @@
}
}
+=item WebFetch::mod_load ( $class )
+
+This specifies a WebFetch module (Perl class) which needs to be loaded.
+In case of an error, it throws an exception.
+
+=cut
+
+sub mod_load
+{
+ my $pkg = shift;
+
+ # make sure we have the run package loaded
+ eval "require $pkg";
+ if ( $@ ) {
+ throw_mod_load_failure( "failed to load $pkg: $@" );
+ }
+}
+
=item WebFetch::run
This function can be called by the C<main::fetch_main> function
@@ -314,8 +622,8 @@
(optional) save a copy of the fetched info
in the file named by this parameter.
-The contents of the file are determined by the C<--save_format> parameter.
-If C<--save_format> isn't defined but only one module has registered a
+The contents of the file are determined by the C<--dest_format> parameter.
+If C<--dest_format> isn't defined but only one module has registered a
file format for saving, then that will be used by default.
=item --quiet
@@ -346,35 +654,42 @@
sub run
{
my $run_pkg = shift;
- my ( $obj, $dir, $group, $mode,
- $dest, $save_format,
- $quiet, $source, $source_format );
+ my $options_ref = shift;
+ my $obj;
- my $result = GetOptions (
- "dir=s" => \$dir,
- "group:s" => \$group,
- "mode:s" => \$mode,
- "dest:s" => \$dest,
- "save_format:s" => \$save_format,
- "source:s" => \$source,
- "quiet" => \$quiet,
- "debug" => \$debug,
- ( eval "defined \@".$run_pkg."::Options" )
- ? eval "\@".$run_pkg."::Options"
- : ());
- if ( ! $result ) {
- print STDERR "usage: $0 --dir dirpath "
- ."[--group group] [--mode mode] "
- ."[--save file] [--save_format fmt-string] "
- ."[--quiet]\n";
- if ( eval "defined \$".$run_pkg."::Usage" ) {
- print STDERR " "
- .( eval "\$".$run_pkg."::Usage" )."\n";
- }
- exit 1;
- }
- $debug and print STDERR "WebFetch: entered run for $run_pkg\n";
+ #my ( $obj, $dir, $group, $mode,
+ # $dest, $dest_format,
+ # $quiet, $source, $source_format );
+ #my $result = GetOptions (
+ # "dir=s" => \$dir,
+ # "group:s" => \$group,
+ # "mode:s" => \$mode,
+ # "dest:s" => \$dest,
+ # "dest_format:s" => \$dest_format,
+ # "source:s" => \$source,
+ # "source_format:s" => \$source_format,
+ # "quiet" => \$quiet,
+ # "debug" => \$debug,
+ # ( eval "defined \@".$run_pkg."::Options" )
+ # ? eval "\@".$run_pkg."::Options"
+ # : ());
+ #if ( ! $result ) {
+ # print STDERR "usage: $0 --dir dirpath "
+ # ."[--group group] [--mode mode] "
+ # ."[--source file] [--source_format fmt-string] "
+ # ."[--dest file] [--dest_format fmt-string] "
+ # ."[--quiet]\n";
+ # if ( eval "defined \$".$run_pkg."::Usage" ) {
+ # print STDERR " "
+ # .( eval "\$".$run_pkg."::Usage" )."\n";
+ # }
+ # exit 1;
+ #}
+ $debug and print STDERR "debug: entered run for $run_pkg\n";
+ # make sure we have the run package loaded
+ mod_load $run_pkg;
+
# Note: in order to add WebFetch-embedding capability, the fetch
# routine saves its raw data without any HTML/XML/etc formatting
# in @{$obj->{data}} and data-to-savable conversion routines in
@@ -386,45 +701,40 @@
# create the new object
# this also calls the $obj->fetch() routine for the module which
# has inherited from WebFetch to do this
- $obj = eval 'new '.$run_pkg.' (
- "dir" => $dir,
- (defined $group) ? ( "group" => $group ) : (),
- (defined $mode) ? ( "mode" => $mode ) : (),
- (defined $debug) ? ( "debug" => $debug ) : (),
- (defined $dest) ? ( "dest" => $dest ) : (),
- (defined $save_format) ? ( "save_format" => $save_format ) : (),
- (defined $source) ? ( "source" => $source ) : (),
- (defined $quiet) ? ( "quiet" => $quiet ) : (),
- )';
+ $debug and print STDERR "debug: run before new\n";
+ $obj = eval $run_pkg."->new( \%\$options_ref )";
if ( $@ ) {
- print STDERR "WebFetch: error: $@\n";
- exit 1;
+ throw_mod_run_failure( "module run failure: ".$@ );
}
- # if the object had the data for the WebFetch-embedding API,
+ # if the object had data for the WebFetch-embedding API,
# then data processing is external to the fetch routine
# (This externalizes the data for other software to capture it.)
+ $debug and print STDERR "run before output\n";
+ my $dest_format = $options_ref->{dest_format};
if (( defined $obj->{data}) and ( defined $obj->{actions})) {
-
if ( defined $obj->{dest}) {
( defined $obj->{actions}) or $obj->{actions} = {};
- ( defined $obj->{actions}{$save_format})
- or $obj->{actions}{$save_format} = [];
- push @{$obj->{actions}{$save_format}}, [ $obj->{dest} ];
+ ( defined $obj->{actions}{$dest_format})
+ or $obj->{actions}{$dest_format} = [];
+ push @{$obj->{actions}{$dest_format}}, [ $obj->{dest} ];
}
# perform requested actions on the data
$obj->do_actions();
+ } else {
+ throw_no_save( "save failed: no data or nowhere to save it" );
}
- $result = $obj->save();
+ $debug and print STDERR "run before save\n";
+ my $result = $obj->save();
if ( ! $result ) {
my $savable;
foreach $savable ( @{$obj->{savable}}) {
(ref $savable eq "HASH") or next;
if ( defined $savable->{error}) {
- WebFetch::Exception::Save->throw(
- "error saving in ".$obj->{dir}
+ throw_save_error( "error saving in "
+ .$obj->{dir}
."file: ".$savable->{file}
."error: " .$savable->{error} );
}
@@ -664,7 +974,8 @@
foreach $action_spec ( keys %{$self->{actions}} ) {
my $handler_ref;
- # check if there's a handler function for this action
+ # check for modules to handle the specified dest_format
+ my ( @handlers, %handlers );
my $action_handler = "fmt_handler_".$action_spec;
if ( exists $modules{output}{$action_spec}) {
my $class;
@@ -816,8 +1127,7 @@
sub fetch
{
WebFetch::Exception::MustOverride->throw(
- "WebFetch: fetch() "
- ."function must be overridden by a derived module\n" );
+ "fetch() function must be overridden by a derived module\n" );
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: Ian K. <ik...@us...> - 2003-02-10 23:07:10
|
Update of /cvsroot/webfetch/src/sub In directory sc8-pr-cvs1:/tmp/cvs-serv11595 Removed Files: Makefile Log Message: Makefile should be generated from Makefile.PL, not checked into CVS --- Makefile DELETED --- |
|
From: Ian K. <ik...@us...> - 2001-08-18 01:56:05
|
Update of /cvsroot/webfetch/src
In directory usw-pr-cvs1:/tmp/cvs-serv21331
Modified Files:
TODO
Log Message:
added a TODO for a module to watch for changes to monitored web pages
Index: TODO
===================================================================
RCS file: /cvsroot/webfetch/src/TODO,v
retrieving revision 1.1.1.1
retrieving revision 1.2
diff -C2 -d -r1.1.1.1 -r1.2
*** TODO 2000/02/28 23:38:13 1.1.1.1
--- TODO 2001/08/18 01:56:02 1.2
***************
*** 22,25 ****
--- 22,26 ----
* ARRL W1AW bulletins
* California State OES bulletins
+ * watch for updates to any web page from a specified list
* others as submitted by users
|