From: John Graham-C. <jgr...@us...> - 2005-08-24 18:35:25
|
Update of /cvsroot/popfile/engine/Classifier In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8371/Classifier Modified Files: MailParse.pm Log Message: More v0.23.0 work. In this case around getting more parts of the test suite to work. *** POPFile/Database.pm: Replace INSERT with INSERT OR IGNORE statements. This surpresses the errors we were seeing with duplicate primary keys caused by the insertion of data from the schema file, followed by the insertion of data. The use of IGNORE also means that data inserted by the schema file will override old data. Making this change also causes the following tests to now pass: TestBayesScript, TestInsertScript. *** tests/TestConfiguration.tst Update tests for new global parameter GLOBAL_single_user. Fix a bug that was causing the command-line parser to erroneously complain about an empty -- command-line option. Changed the default options for Getopt::Long so that they are inline with the usage in POPFile/Loader.pm.TestConfiguration test suite now passes. *** POPFile/History.pm, tests/TestHistory.tst Fix minor oddity in History where the query used for searches had a LF in the middle of it (this was harmless but ugly). Update test suite so that start_query now gets a valid session. TestHistory suite now passes. *** Current state of the test suite: TestBayesScript PASS TestBayes PASS TestConfiguration PASS TestHistory PASS TestHTML fail (horribly) TestHTTP PASS TestIMAP PASS TestInsertScript PASS TestLogger PASS TestMailParse fail TestModule PASS TestMQ PASS TestMutex PASS TestPipeScript PASS TestPOP3 fail TestProxy PASS TestWordMangle PASS TestXMLRPC fail TODO Why is there no TestDatabase? TestMailParse is failing on my machine because accented characters are not being recognized as part of the [:alpha:] character class. I have not fully understood why yet. Is any one else seeing this? Try running: print +(sort grep /[[:alpha:]]/, map { chr } 0..255), "\n"; To find out what [:alpha:] maps to. Index: MailParse.pm =================================================================== RCS file: /cvsroot/popfile/engine/Classifier/MailParse.pm,v retrieving revision 1.219 retrieving revision 1.220 diff -C2 -d -r1.219 -r1.220 *** MailParse.pm 14 Aug 2005 03:57:26 -0000 1.219 --- MailParse.pm 24 Aug 2005 18:35:13 -0000 1.220 *************** *** 602,606 **** # Don't decode odd (nonprintable) characters or < >'s. ! if ( ( ( $2 < 255 ) && ( $2 > 63 ) ) || ( $2 == 61 ) || ( ( $2 < 60 ) && ( $2 > 31 ) ) ) { my $from = $1; my $to = chr($2); --- 602,608 ---- # Don't decode odd (nonprintable) characters or < >'s. ! if ( ( ( $2 < 255 ) && ( $2 > 63 ) ) || ! ( $2 == 61 ) || ! ( ( $2 < 60 ) && ( $2 > 31 ) ) ) { my $from = $1; my $to = chr($2); *************** *** 610,614 **** $self->{ut__} =~ s/$from/$to/g; print "$from -> $to\n" if $self->{debug__}; ! $self->update_pseudoword( 'html', 'numericentity', $encoded, $from ); } } --- 612,617 ---- $self->{ut__} =~ s/$from/$to/g; print "$from -> $to\n" if $self->{debug__}; ! $self->update_pseudoword( 'html', ! 'numericentity', $encoded, $from ); } } *************** *** 624,627 **** --- 627,631 ---- # Grab domain names + while ( $line =~ s/(([[:alpha:]0-9\-_]+\.)+)(com|edu|gov|int|mil|net|org|aero|biz|coop|info|museum|name|pro)([^[:alpha:]0-9\-_\.]|$)/$4/i ) { add_url($self, "$1$3", $encoded, '', '', $prefix); *************** *** 655,658 **** --- 659,663 ---- if ( $self->{lang__} eq 'Nihongo' ) { + # In Japanese mode, non-symbol EUC-JP characters should be # matched. *************** *** 686,690 **** # 2 byte characters. # ! # In Korean, care about words between 2 and 45 characters. while ( $line =~ s/(([A-Za-z]|$eksc)([A-Za-z\']|$eksc){1,44})([_\-,\.\"\'\)\?!:;\/& \t\n\r]{0,5}|$)// ) { --- 691,696 ---- # 2 byte characters. # ! # In Korean, care about words between 2 and 45 ! # characters. while ( $line =~ s/(([A-Za-z]|$eksc)([A-Za-z\']|$eksc){1,44})([_\-,\.\"\'\)\?!:;\/& \t\n\r]{0,5}|$)// ) { |