|
From: <jgr...@us...> - 2003-07-09 18:18:57
|
Update of /cvsroot/popfile/engine/UI
In directory sc8-pr-cvs1:/tmp/cvs-serv2831/UI
Modified Files:
HTML.pm
Log Message:
PERFORMANCE CHANGES
Bayes.pm: Added new add_messages_to_bucket API to add multiple messages to a
bucket at the same time with a single read/write of the appropriate
corpus table for speed.
New write_line__ method to write a line to a MSG file and optionally
to the parse_line API of MailParse.pm. Now we write a file to disk
and parse it without reloading the MSG file from disk for speed.
The MSG gets a temporary name until the CLS file is written to prevent
the history from reloading in the middle of a download ending up with
a message with a class file error
classify_file becomes classify and can classify either from a file
or from the preparsed information in the parser
classify_and_modify returns the name of the file where the message
was stored in addition to the classification.
HTML.pm: Use add_messages_to_bucket API to reclassification for speed.
Use the new classify method in Bayes.pm to classify a file after it
has been digested by the parser for colorization and get the word
scores. This means we only load the MSG file once (used to be
twice) and hence double the speed of viewing a colorized message.
New method load_disk_cache__ and save_disk_cache__ are used to
keep a copy of the history cache on disk between sessions so that
session start up is as fast as possible. There will be no need
to parse messages for header information on start up if the
history_cache file is present.
Removed the boundary feature because it is incompatible with the
concept of a "download" since we now send new history file messages
async. through the MQ.
Load the history cache progessively as files are written. The proxies
send the message NEWFL and the method new_history_file__ adds the
file to the history. This is done so that when the user hits the
History tab button after a mail download the history cache is
already loaded and there should be no delay in displaying the
history page.
MailParse.pm: Renamed parse_stream to parse_file since that's a better name
New start_parse, stop_parse and parse_line APIs so that a file can
be parsed line by line.
MQ.pm: Defined a new message type NEWFL which is used to indicate that
a file has been added to the history cache. NEWFL's message
is the name of the file (the MSG file) that was added.
POP3.pm: Send the NEWFL message through the pipe to the parent so that
the history is aware of new messages.
SMTP.pm:
NNTP.pm: Send CLASS and NEWFL messages through the pipe to the parent.
insert.pl: Updated to use new parse_file API
bayes.pl: Updated to use new classify not classify_file API.
TEST SUITE CHANGES
tests.pl: New test_assert_regexp function for doing fuzzy matching of
test results.
Returns 0 if all tests run successfully, and 1 if there are
any errors
TestLogger.tst: New file for testing POPFile::Logger functionality.
Makefile: The test target has a variable TESTARGS can be set with the
specific module (or modules using glob patterns) to run.
For example: gmake test TESTARGS='TestLogger'
There's a new coverage target to run the test suite and output
code coverage information for the modules used.
TestCoverage.pm: New module that provides line coverage information for
the test suite. Executed as a Perl debugger using the -d
switch and outputs code coverage information for all
POPFile files tested.
Index: HTML.pm
===================================================================
RCS file: /cvsroot/popfile/engine/UI/HTML.pm,v
retrieving revision 1.177
retrieving revision 1.178
diff -C2 -d -r1.177 -r1.178
*** HTML.pm 6 Jul 2003 01:18:38 -0000 1.177
--- HTML.pm 9 Jul 2003 18:18:23 -0000 1.178
***************
*** 77,84 ****
--- 77,89 ----
# history_invalid is set to cause the history cache to be reloaded by a call to
# load_history_cache__, and is set by a call to invalidate_history_cache
+ #
+ # If new items have been added to the history the set need_resort__ to 1 to ensure
+ # that the next time a history page is being displayed the appropriate sort, search
+ # and filter is applied
$self->{history__} = {};
$self->{history_keys__} = ();
$self->{history_invalid__} = 0;
+ $self->{need_resort__} = 0;
# A hash containing a mapping between alphanumeric identifiers and appropriate strings used
***************
*** 199,203 ****
# Finally register for the messages that we need to receive
! $self->mq_register_( 'CLASS', $self );
$self->mq_register_( 'UIREG', $self );
$self->mq_register_( 'TICKD', $self );
--- 204,208 ----
# Finally register for the messages that we need to receive
! $self->mq_register_( 'NEWFL', $self );
$self->mq_register_( 'UIREG', $self );
$self->mq_register_( 'TICKD', $self );
***************
*** 237,240 ****
--- 242,246 ----
$self->invalidate_history_cache();
+ $self->load_disk_cache__();
$self->load_history_cache__();
$self->sort_filter_history( '', '', '' );
***************
*** 247,250 ****
--- 253,270 ----
# ---------------------------------------------------------------------------------------------
#
+ # start
+ #
+ # Called to stop the HTML interface running
+ #
+ # ---------------------------------------------------------------------------------------------
+ sub stop
+ {
+ my ( $self ) = @_;
+
+ $self->save_disk_cache__();
+ }
+
+ # ---------------------------------------------------------------------------------------------
+ #
# deliver
#
***************
*** 266,273 ****
}
! # Invalidate the history cache if a classification occurs
! if ( $type eq 'CLASS' ) {
! $self->invalidate_history_cache();
}
--- 286,294 ----
}
! # Get the new file in the history
! if ( $type eq 'NEWFL' ) {
! $self->new_history_file__( $message );
! $self->{need_resort__} = 1;
}
***************
*** 2363,2366 ****
--- 2384,2474 ----
@{$self->{history_keys__}} = reverse @{$self->{history_keys__}} if ($descending);
+
+ $self->{need_resort__} = 0;
+ }
+
+ # ---------------------------------------------------------------------------------------------
+ #
+ # load_disk_cache__
+ #
+ # Preloads the history__ hash with information from the disk which will have been saved
+ # the last time we shutdown
+ #
+ # ---------------------------------------------------------------------------------------------
+ sub load_disk_cache__
+ {
+ my ( $self ) = @_;
+
+ my $cache_file = $self->global_config_( 'msgdir' ) . 'history_cache';
+ if ( !(-e $cache_file) ) {
+ return;
+ }
+
+ open CACHE, "<$cache_file";
+
+ my $first = <CACHE>;
+
+ if ( $first =~ /___HISTORY__ __ VERSION__ 1/ ) {
+ while ( my $line = <CACHE> ) {
+ if ( !( $line =~ /__HISTORY__ __BOUNDARY__/ ) ) {
+ $self->log_( "Problem in history_cache file, expecting boundary got $line" );
+ last;
+ }
+
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ my $key = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{bucket} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{reclassified} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{magnet} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{subject} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{from} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{short_subject} = $line;
+ $line = <CACHE>;
+ $line =~ s/[\r\n]//g;
+ $self->{history__}{$key}{short_from} = $line;
+ $self->{history__}{$key}{cull} = 0;
+ }
+ }
+ close CACHE;
+ }
+
+ # ---------------------------------------------------------------------------------------------
+ #
+ # save_disk_cache__
+ #
+ # Save the current of the history cache so that it can be reloaded next time on startup
+ #
+ # ---------------------------------------------------------------------------------------------
+ sub save_disk_cache__
+ {
+ my ( $self ) = @_;
+
+ open CACHE, '>' . $self->global_config_( 'msgdir' ) . 'history_cache';
+ print CACHE "___HISTORY__ __ VERSION__ 1\n";
+ foreach my $key (keys %{$self->{history__}}) {
+ print CACHE "__HISTORY__ __BOUNDARY__\n";
+ print CACHE "$key\n";
+ print CACHE "$self->{history__}{$key}{bucket}\n";
+ print CACHE "$self->{history__}{$key}{reclassified}\n";
+ print CACHE "$self->{history__}{$key}{magnet}\n";
+ print CACHE "$self->{history__}{$key}{subject}\n";
+ print CACHE "$self->{history__}{$key}{from}\n";
+ print CACHE "$self->{history__}{$key}{short_subject}\n";
+ print CACHE "$self->{history__}{$key}{short_from}\n";
+ }
+ close CACHE;
}
***************
*** 2397,2406 ****
my @history_files = sort compare_mf glob( $self->global_config_( 'msgdir' ) . "popfile*=*.msg" );
- # This will get set the first time we add a new message to the history
- # cache and is used to control where we place boundaries in the history
- # to show where a user left off
-
- my $set_boundary = 0;
-
foreach my $i ( 0 .. $#history_files ) {
--- 2505,2508 ----
***************
*** 2415,2504 ****
if ( defined( $self->{history__}{$history_files[$i]} ) ) {
! $self->{history__}{$history_files[$i]}{cull} = 0;
! $self->{history__}{$history_files[$i]}{index} = $i;
} else {
! # Find the class information for this file using the history_load_class helper
! # function, and then parse the MSG file for the From and Subject information
!
! my ( $reclassified, $bucket, $usedtobe, $magnet ) = $self->{classifier__}->history_load_class( $history_files[$i] );
! my $from = '';
! my $subject = '';
! if ( open MAIL, '<'. $self->global_config_( 'msgdir' ) . "$history_files[$i]" ) {
! while ( <MAIL> ) {
! last if ( /^(\r\n|\r|\n)/ );
! $from = $1 if ( /^From:(.*)/i );
! $subject = $1 if ( /^Subject:(.*)/i );
! last if ( ( $from ne '' ) && ( $subject ne '' ) );
! }
! close MAIL;
! }
! $from = "<$self->{language__}{History_NoFrom}>" if ( $from eq '' );
! $subject = "<$self->{language__}{History_NoSubject}>" if ( !( $subject =~ /[^ \t\r\n]/ ) );
! $from =~ s/\"(.*)\"/$1/g;
! $subject =~ s/\"(.*)\"/$1/g;
! # TODO Interface violation here, need to clean up
! $from = $self->{classifier__}->{parser__}->decode_string( $from );
! $subject = $self->{classifier__}->{parser__}->decode_string( $subject );
! my ( $short_from, $short_subject ) = ( $from, $subject );
! if ( length($short_from)>40 ) {
! $short_from =~ /(.{40})/;
! $short_from = "$1...";
! }
! if ( length($short_subject)>40 ) {
! $short_subject =~ s/=20/ /g;
! $short_subject =~ /(.{40})/;
! $short_subject = "$1...";
! }
! $from =~ s/&/&/g;
! $from =~ s/</</g;
! $from =~ s/>/>/g;
! $short_from =~ s/&/&/g;
! $short_from =~ s/</</g;
! $short_from =~ s/>/>/g;
! $subject =~ s/&/&/g;
! $subject =~ s/</</g;
! $subject =~ s/>/>/g;
! $short_subject =~ s/&/&/g;
! $short_subject =~ s/</</g;
! $short_subject =~ s/>/>/g;
! $self->{history__}{$history_files[$i]}{bucket} = $bucket;
! $self->{history__}{$history_files[$i]}{reclassified} = $reclassified;
! $self->{history__}{$history_files[$i]}{magnet} = $magnet;
! $self->{history__}{$history_files[$i]}{subject} = $subject;
! $self->{history__}{$history_files[$i]}{from} = $from;
! $self->{history__}{$history_files[$i]}{short_subject} = $short_subject;
! $self->{history__}{$history_files[$i]}{short_from} = $short_from;
! $self->{history__}{$history_files[$i]}{cull} = 0;
! $self->{history__}{$history_files[$i]}{index} = $i;
! $self->{history__}{$history_files[$i]}{boundary} = !$set_boundary;
! $set_boundary = 1;
! }
}
! # Remove any entries from the history that have been removed from disk, see the big
! # comment at the start of this function for more detail
! foreach my $key (keys %{$self->{history__}}) {
! if ( $self->{history__}{$key}{cull} == 1 ) {
! delete $self->{history__}{$key};
! }
! }
! $self->{history_invalid__} = 0;
! $self->sort_filter_history( '', '', '' );
}
--- 2517,2622 ----
if ( defined( $self->{history__}{$history_files[$i]} ) ) {
! $self->{history__}{$history_files[$i]}{cull} = 0;
! $self->{history__}{$history_files[$i]}{index} = $i;
} else {
+ $self->new_history_file__( $history_files[$i], $i );
+ }
+ }
! # Remove any entries from the history that have been removed from disk, see the big
! # comment at the start of this function for more detail
! foreach my $key (keys %{$self->{history__}}) {
! if ( $self->{history__}{$key}{cull} == 1 ) {
! delete $self->{history__}{$key};
! }
! }
! $self->{history_invalid__} = 0;
! $self->sort_filter_history( '', '', '' );
! }
! # ---------------------------------------------------------------------------------------------
! #
! # new_history_file__
! #
! # Adds a new file to the history cache
! #
! # $file The name of the file added
! # $index (optional) The history keys index
! #
! # ---------------------------------------------------------------------------------------------
! sub new_history_file__
! {
! my ( $self, $file, $index ) = @_;
! # Find the class information for this file using the history_load_class helper
! # function, and then parse the MSG file for the From and Subject information
! my ( $reclassified, $bucket, $usedtobe, $magnet ) = $self->{classifier__}->history_load_class( $file );
! my $from = '';
! my $subject = '';
! if ( open MAIL, '<'. $self->global_config_( 'msgdir' ) . $file ) {
! while ( <MAIL> ) {
! last if ( /^(\r\n|\r|\n)/ );
! $from = $1 if ( /^From:(.*)/i );
! $subject = $1 if ( /^Subject:(.*)/i );
! last if ( ( $from ne '' ) && ( $subject ne '' ) );
! }
! close MAIL;
! }
! $from = "<$self->{language__}{History_NoFrom}>" if ( $from eq '' );
! $subject = "<$self->{language__}{History_NoSubject}>" if ( !( $subject =~ /[^ \t\r\n]/ ) );
! $from =~ s/\"(.*)\"/$1/g;
! $subject =~ s/\"(.*)\"/$1/g;
! # TODO Interface violation here, need to clean up
! $from = $self->{classifier__}->{parser__}->decode_string( $from );
! $subject = $self->{classifier__}->{parser__}->decode_string( $subject );
! my ( $short_from, $short_subject ) = ( $from, $subject );
! if ( length($short_from)>40 ) {
! $short_from =~ /(.{40})/;
! $short_from = "$1...";
! }
! if ( length($short_subject)>40 ) {
! $short_subject =~ s/=20/ /g;
! $short_subject =~ /(.{40})/;
! $short_subject = "$1...";
}
! $from =~ s/&/&/g;
! $from =~ s/</</g;
! $from =~ s/>/>/g;
! $short_from =~ s/&/&/g;
! $short_from =~ s/</</g;
! $short_from =~ s/>/>/g;
! $subject =~ s/&/&/g;
! $subject =~ s/</</g;
! $subject =~ s/>/>/g;
!
! $short_subject =~ s/&/&/g;
! $short_subject =~ s/</</g;
! $short_subject =~ s/>/>/g;
!
! $self->{history__}{$file}{bucket} = $bucket;
! $self->{history__}{$file}{reclassified} = $reclassified;
! $self->{history__}{$file}{magnet} = $magnet;
! $self->{history__}{$file}{subject} = $subject;
! $self->{history__}{$file}{from} = $from;
! $self->{history__}{$file}{short_subject} = $short_subject;
! $self->{history__}{$file}{short_from} = $short_from;
! $self->{history__}{$file}{cull} = 0;
!
! $index = $self->history_size() if ( !defined( $index ) );
! $self->{history__}{$file}{index} = $index;
}
***************
*** 2665,2668 ****
--- 2783,2790 ----
# new bucket classification
+ # This hash maps buckets to list of files to place in those buckets
+
+ my %work;
+
while ( my ($mail_file, $newbucket) = each %messages ) {
***************
*** 2671,2678 ****
my ( $reclassified, $bucket, $usedtobe, $magnet) = $self->{classifier__}->history_load_class( $mail_file );
! # Only reclassify messages that havn't been reclassified before
if ( !$reclassified ) {
! $self->{classifier__}->add_message_to_bucket( $self->global_config_( 'msgdir' ) . $mail_file, $newbucket );
$self->log_( "Reclassifying $mail_file from $bucket to $newbucket" );
--- 2793,2800 ----
my ( $reclassified, $bucket, $usedtobe, $magnet) = $self->{classifier__}->history_load_class( $mail_file );
! # Only reclassify messages that haven't been reclassified before
if ( !$reclassified ) {
! push @{$work{$newbucket}}, $self->global_config_( 'msgdir' ) . $mail_file;
$self->log_( "Reclassifying $mail_file from $bucket to $newbucket" );
***************
*** 2715,2718 ****
--- 2837,2847 ----
}
}
+
+ # At this point the work hash maps the buckets to lists of files to reclassify, so run through
+ # them doing bulk updates
+
+ foreach my $newbucket (keys %work) {
+ $self->{classifier__}->add_messages_to_bucket( $newbucket, @{$work{$newbucket}} );
+ }
}
}
***************
*** 2944,2948 ****
( defined( $self->{form_}{deletemessage} ) ) ||
( defined( $self->{form_}{clearall} ) ) ||
! ( defined( $self->{form_}{clearpage} ) ) );
# Redirect somewhere safe if non-idempotent action has been taken
--- 3073,3078 ----
( defined( $self->{form_}{deletemessage} ) ) ||
( defined( $self->{form_}{clearall} ) ) ||
! ( defined( $self->{form_}{clearpage} ) ) ||
! ( $self->{need_resort__} == 1 ) );
# Redirect somewhere safe if non-idempotent action has been taken
***************
*** 3045,3053 ****
my $reclassified = $self->{history__}{$mail_file}{reclassified};
my $index = $self->{history__}{$mail_file}{index} + 1;
- my $boundary = $self->{history__}{$mail_file}{boundary};
-
- if ( $boundary && ( $self->{form_}{sort} eq '' ) && ( $i != $start_message ) ) {
- $body .= "<tr class=\"rowHighlighted\" height=\"2\"><td colspan=\"6\"></td></tr>";
- }
$body .= "<tr";
--- 3175,3178 ----
***************
*** 3267,3271 ****
if ( $self->{history__}{$mail_file}{magnet} eq '' ) {
! $body .= $self->{classifier__}->get_html_colored_message($self->global_config_( 'msgdir' ) . $mail_file);
} else {
$self->{history__}{$mail_file}{magnet} =~ /(.+): ([^\r\n]+)/;
--- 3392,3409 ----
if ( $self->{history__}{$mail_file}{magnet} eq '' ) {
! $body .= $self->{classifier__}->get_html_colored_message($self->global_config_( 'msgdir' ) . $mail_file);
!
! # Enable saving of word-scores
!
! $self->{classifier__}->wordscores( 1 );
!
! # Build the scores by classifying the message, since get_html_colored_message has parsed the message
! # for us we do not need to parse it again and hence we pass in undef for the filename
!
! $self->{classifier__}->classify( undef, $self );
!
! # Disable, print, and clear saved word-scores
!
! $self->{classifier__}->wordscores( 0 );
} else {
$self->{history__}{$mail_file}{magnet} =~ /(.+): ([^\r\n]+)/;
***************
*** 3312,3327 ****
if ($self->{history__}{$mail_file}{magnet} eq '') {
-
- # Enable saving of word-scores
-
- $self->{classifier__}->wordscores( 1 );
-
- # Build the scores by classifying the message
-
- $self->{classifier__}->classify_file($self->global_config_( 'msgdir' ) . $mail_file, $self);
-
- # Disable, print, and clear saved word-scores
-
- $self->{classifier__}->wordscores( 0 );
$body .= $self->{classifier__}->scores();
$self->{classifier__}->scores('');
--- 3450,3453 ----
|