[Nms-cgi-devel] TFMail patch for anti-spam and CRLF decoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello, world!

  I just joined this list last week.  Haven't seen any traffic yet.
Anybody here?

  Below is a patch against TFmail 1.38.  It makes the following changes:

1. In submitted form fields, converts &#13;&#10; (encoded CR LF)
sequences into newlines.  Some browsers encode newlines in TEXTAREA
fields.  Before, users would get hard-to-read mess.  Now they get
properly formatted text.  This may break scripts that assume
one-line-per-field, but our submissions go to people, not scripts.

2. Some basic features to counter spam submissions to HTML forms.

  The anti-spam features discard form submissions which trigger spam
detectors.  They controlled by the following new config options:

reject_html: declares form fields which should never contain HTML

trapfield: declares form fields which should never contain anything

spam_redirect: URL to redirect browser to if spam detected

spam_template: file name of template to feed browser if spam detected

spamlog: file name of log for spam detections

  It appears most spam scripts blast HTML into TEXTAREAs, so by
looking for HTML in fields that should never contain it, we can detect
spam.  Likewise, most spam scripts appear to blast something into
every field.  By configuring a "trap field" which should be left
blank, spam can be detected.  CSS can be used to make a trap hidden;
most spam scripts apparently don't parse CSS.

  The HTML detection is currently ridiculously simplistic, but it also
appears to be effective for almost all cases.  In fact, I haven't had
a need to employ a trap field yet.

  Response on spam detection is much like response to missing input
for required fields.  You can use an HTML template or redirect to a
URL.  The template is parsed like all the others, so if you want, you
can even have it list what fields triggered.  Useful for debugging.
For real-world use, it's not a good idea to tell spammers what to do
differently.  We use a generic "We could not process your submission.
Please contact blah blah blah." message for that.

  You can optionally log spam detections to a log file.  This could be
used to monitor how many spam attempts you're getting.  It also logs
the fields which triggered the detection, so you can see if the
spammers are trying different things.  It may also reveal false
positives.

  Hope this helps someone!

-- Ben Scott (AKA bscott, AKA DragonHawk, AKA mailvortex)

--- TFmail-1.38.pl	2006-02-09 21:40:04.000000000 -0500
+++ TFmail.pl	2009-08-03 22:12:16.481039400 -0400
@@ -2,6 +2,7 @@
 use strict;
 #
 # $Id: TFmail.pl,v 1.38 2006/02/09 21:40:27 gellyfish Exp $
+# Modifications by bscott, release 1, 2009 Aug 03
 #
 # USER CONFIGURATION SECTION
 # --------------------------
@@ -52,6 +53,8 @@
 use lib LIBDIR;
 use NMStreq;
 use NMSCharset;
+use POSIX qw(strftime); # spam_log generates its own datestamps
+
 BEGIN
 {
    if (MIME_LITE)
@@ -71,6 +74,7 @@

    use vars qw($VERSION);
    $VERSION = substr q$Revision: 1.38 $, 10, -1;
+   $VERSION .= '-bscott1';
 }

 delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};
@@ -168,6 +172,8 @@

       if ( check_required_fields($treq) )
       {
+         if ( check_spam_content($treq) )
+         {
          setup_input_fields($treq);
          my $confto = send_main_email($treq, $recipients);
          if ( HTMLFILE_ROOT ne '' )
@@ -191,6 +197,12 @@
       }
       else
       {
+            spam_html($treq);
+            spam_log($treq) if (LOGFILE_ROOT ne '');
+         }
+      }
+      else
+      {
           missing_html($treq);
       }
    }
@@ -529,6 +541,50 @@
    }
 }

+=item check_spam_content ( TREQ )
+
+Returns false if any fields contain apparent spam,
+true otherwise.
+
+=cut
+
+sub check_spam_content
+{
+   my ($treq) = @_;
+
+   my @has_spam = (); # fields found to have spam
+
+   # check fields configured to prohibit HTML
+   my @reject_html = split /\s*,\s*/, $treq->config('reject_html', '');
+   foreach my $r (@reject_html)
+   {
+      # this is VERY basic at this point. we are only looking for
</a>, something common in almost all
+      # spam messages that come through and very unlikely to be
entered by a legitimate user.
+      push @has_spam, $r if $treq->param($r) =~ m{<\s*/a\s*>}i;
+   }
+
+   my @trapfield = split /\s*,\s*/, $treq->config('trapfield', '');
+   foreach my $r (@trapfield)
+   {
+      # if a trapfield has any non-whitespace content, it is spam
+      push @has_spam, $r if $treq->param($r) =~ m{\S+}i;
+   }
+
+   # if any fields had spam, we reject the submission, after
installing field handlers
+   if (scalar @has_spam)
+   {
+      $treq->install_foreach(
+         'spam_field',
+         [map { {name => $_, value => $treq->param($_)} } @has_spam]
+      );
+      return 0; # has spam
+   }
+   else
+   {
+      return 1; # no spam detected
+   }
+}
+
 =item setup_input_fields ( TREQ )

 Installs a FOREACH directive in the TREQ object to
@@ -713,7 +769,10 @@
       $save = clean_template($treq);
    }

-   $msg->{body} = $treq->process_template($template, 'email', undef);
+   my $body;
+   $body = $treq->process_template($template, 'email', undef);
+   $body =~ s{&#13;&#10;}{\n}g;	# convert any &#13;&#10; sequences to newlines
+   $msg->{body} = $body;

    if ( dangerous_recipient($treq))
    {
@@ -1043,8 +1102,9 @@
    my ($treq) = @_;

    my $file = $treq->config('logfile', '');
+   return unless $file; # no logging if config didn't request it
    $file = $treq->process_template("\%$file",'email', undef);
-   return unless $file;
+   return unless $file; # no logging if that left us nothing
    $file =~ m#^([\/\-\w]{1,100})$# or die "bad logfile name [$file]";
    $file = $1;

@@ -1177,6 +1237,87 @@
    }
 }

+=item spam_html ( TREQ )
+
+Generates the output page in the case where submission
+failed anti-spam checks.
+
+=cut
+
+sub spam_html
+{
+   my ($treq) = @_;
+
+   my $redirect = $treq->config('spam_redirect');
+   if ( $redirect )
+   {
+      print "Location: $redirect\n\n";
+   }
+   else
+   {
+      html_page($treq, $treq->config('spam_template','spam'));
+   }
+}
+
+=item spam_log ( TREQ )
+
+Logs submissions rejected as spam, if so configured.
+
+=cut
+
+sub spam_log
+{
+
+   my ($treq) = @_;
+
+   # get requested log file name from config
+   my $spamlog = $treq->config('spamlog', '');
+   return unless $spamlog; # no logging if config didn't request it
+
+   # make sure log file contains only word characters, dir separator
(/), or dash (-)
+   $spamlog =~ m#^([\/\-\w]{1,100})$# or die "bad spam log file name
[$spamlog]";
+   $spamlog = $1; # de-taint
+
+   # build full path name of log file
+   $spamlog = LOGFILE_ROOT . '/' . $spamlog . LOGFILE_EXT;
+
+   # build log message
+   my $logtrt =
+      '%' .
+      strftime ('%Y-%m-%d %H:%M:%S ', localtime) .
+      '{= env.REMOTE_ADDR =} ' .
+      '(' .
+      'via=<{= env.HTTP_VIA =}> ' .
+      'UA=<{= env.HTTP_USER_AGENT =}>' .
+      '): ' .
+      '{= FOREACH spam_field =}<{= name =}>=<{= value =}> {= END =}'
+   ;
+
+   open SPAMLOG,"+>>$spamlog" or die "$spamlog: open: $!";
+
+   # get lock for writing, or about
+   if (!(flock SPAMLOG, LOCK_EX))
+   {
+      warn "$spamlog: flock: $!";
+      close SPAMLOG or die "$spamlog: close: $1";
+      return;
+   }
+
+   # seek to end for append
+   seek SPAMLOG, 0, 2 or die "$spamlog: seek: $!";
+
+   # record the log entry
+   $treq->process_template(
+      $logtrt,
+      'email',
+      \*SPAMLOG
+   );
+
+   # finish up
+   close SPAMLOG or die "$spamlog: close: $1";
+
+} # spam_log
+
 =item return_html ( TREQ )

 Generates the output page in the case where the email has been