[popfile-commit] engine/Devel TestCoverage.pm,NONE,1.1

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/popfile/engine/Devel
In directory sc8-pr-cvs1:/tmp/cvs-serv2831/Devel

Added Files:
	TestCoverage.pm 
Log Message:
PERFORMANCE CHANGES

Bayes.pm:     Added new add_messages_to_bucket API to add multiple messages to a 
              bucket at the same time with a single read/write of the appropriate 
              corpus table for speed.

              New write_line__ method to write a line to a MSG file and optionally
              to the parse_line API of MailParse.pm.  Now we write a file to disk
              and parse it without reloading the MSG file from disk for speed.

              The MSG gets a temporary name until the CLS file is written to prevent
              the history from reloading in the middle of a download ending up with
              a message with a class file error

              classify_file becomes classify and can classify either from a file
              or from the preparsed information in the parser

              classify_and_modify returns the name of the file where the message
              was stored in addition to the classification.

HTML.pm:      Use add_messages_to_bucket API to reclassification for speed.

              Use the new classify method in Bayes.pm to classify a file after it
              has been digested by the parser for colorization and get the word
              scores.  This means we only load the MSG file once (used to be 
              twice) and hence double the speed of viewing a colorized message.

              New method load_disk_cache__ and save_disk_cache__ are used to 
              keep a copy of the history cache on disk between sessions so that
              session start up is as fast as possible.  There will be no need
              to parse messages for header information on start up if the
              history_cache file is present.

              Removed the boundary feature because it is incompatible with the
              concept of a "download" since we now send new history file messages
              async. through the MQ.

              Load the history cache progessively as files are written.  The proxies
              send the message NEWFL and the method new_history_file__ adds the 
              file to the history.  This is done so that when the user hits the 
              History tab button after a mail download the history cache is
              already loaded and there should be no delay in displaying the 
              history page.

MailParse.pm: Renamed parse_stream to parse_file since that's a better name
              New start_parse, stop_parse and parse_line APIs so that a file can 
              be parsed line by line.

MQ.pm:        Defined a new message type NEWFL which is used to indicate that
              a file has been added to the history cache.  NEWFL's message
              is the name of the file (the MSG file) that was added.

POP3.pm:      Send the NEWFL message through the pipe to the parent so that 
              the history is aware of new messages.

SMTP.pm:
NNTP.pm:      Send CLASS and NEWFL messages through the pipe to the parent.

insert.pl:    Updated to use new parse_file API

bayes.pl:     Updated to use new classify not classify_file API.

TEST SUITE CHANGES

tests.pl:     New test_assert_regexp function for doing fuzzy matching of
              test results.

              Returns 0 if all tests run successfully, and 1 if there are 
              any errors

TestLogger.tst: New file for testing POPFile::Logger functionality.

Makefile:     The test target has a variable TESTARGS can be set with the 
              specific module (or modules using glob patterns) to run.

              For example: gmake test TESTARGS='TestLogger'

              There's a new coverage target to run the test suite and output
              code coverage information for the modules used.

TestCoverage.pm: New module that provides line coverage information for 
              the test suite.  Executed as a Perl debugger using the -d
              switch and outputs code coverage information for all
              POPFile files tested.

--- NEW FILE: TestCoverage.pm ---
# ---------------------------------------------------------------------------------------------
#
# Devel::TestCoverage - Module to measure code coverage in the test suite
#
# Copyright (c) 2001-2003 John Graham-Cumming
#
# ---------------------------------------------------------------------------------------------

package Devel::TestCoverage;

package DB;

# This hash will store a count of the number of times each line is executed # in each file,
# it is in fact a hash of hashes used as
# $count{filename}{linenumber}
my %count;

# This is called when we begin the code coverage (or debugging) session
BEGIN
{
	# We want to look inside subroutines so tell the debugger to trace into
	# them
	$DB::trace = 1;
}

# Perl will call this function for every line of code it executes.  We keep
# a count for each time a line is executed
sub DB
{
	# The caller function we till us what line of code, in which file and
	# package called us
	my ($package, $file, $line) = caller;

	# A specific line in a specific file just got executed, we remove
	# certain references to eval code that we wont have traced into
	$count{$file}{$line} += 1 if ( ( $file =~ /\(eval/ ) == 0 );
}

END
{
        # This hash will map file names of POPFile modules to coverage
        my %files;

	# Print out information for each file
	for my $file (keys %count)	
	{
	    if ( ( $file =~ /^[^\/]/ ) && ( $file ne 'tests.pl' ) ) {
		my $current_line = 0;

		open SOURCE_FILE, "<$file";

		# Read in each line of the source file and keep track of whether 
		# it was executed or not using a new couple of keys in the 
		# %count hash for each file: total_lines, total_executable_lines 
		# and total_executed
		while (<SOURCE_FILE>)
		{
			# Keep count of the total number of lines in this file
			$current_line              += 1;
			$count{$file}{total_lines} += 1;

			# We do not count lines that are blank or exclusively 
			# comments or just have braces on them or
			# just an else or just a subroutine definition
			if ( ( /^\s*\#/ == 0 ) && ( /^\s*$/ == 0 ) && ( /^\s*(\{|\}|else)\s*$/ == 0 ) && ( /^\s*sub \w+( \{)?\s*$/ == 0 ) )
			{
				$count{$file}{total_executable_lines} += 1;

				# If this line was executed then keep count of
				# that fact
				if ( $count{$file}{$current_line} > 0 ) {
					$count{$file}{total_executed} += 1;
				}
			}
		}

                $files{$file} = int(100 * $count{$file}{total_executed} / $count{$file}{total_executable_lines}) unless ( $count{$file}{total_executable_lines} == 0 );

		close SOURCE_FILE;
          }
     }

	foreach my $file (sort {$files{$b} <=> $files{$a}} keys %files) {
	    print sprintf( "Coverage of %-32s %d%%\n", "$file...", $files{$file});
	}
}

1;