Home
Name Modified Size InfoDownloads / Week
README.TXT 2012-04-04 17.0 kB
archive_contents.txt 2012-04-04 569 Bytes
PurePostPro.zip 2012-04-04 43.0 kB
Totals: 3 Items   60.5 kB 0
==================================================
PUREPOSTPRO - My Approach To Managing User Uploads
==================================================

************
* Manifest *
************

README.TXT          - you are reading it!
SetupPurePostPro    - shell script to set up the purepostpro database
                      and optionally the entries in mysql
PurePostPro         - the script itself
gpl.txt             - the GNU General Public License


***********
* License *
***********

PurePostPro - a Pure-FTPD Post processing system.
Copyright (C) 2001 Peter Garner

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.



************
* Overview *
************


Ever since I have been involved at the receiving end of FTP servers,
and having to handle user uploads, I have longed for an intelligent
and automatic way of managing incoming data. This best explained by
looking my situation. I have a growing number of users that send in
data on a daily basis. Logic says that the data should not be uploaded
(and then processed) more than once, but human nature ensures that
this is not always the case. People go on holiday, or off sick, and
other people step in to "help". Quite often I get the same file with a
different name. Another possibility is that more than one user sends
the same file in , differently named - this hasn't happened to me yet,
but it could.. At the end of the day, if data is duplicated in a
business environment, there could be all sorts of dire repercussions.



**********************************
* Where does pure-ftpd come in ? *
**********************************


I originally started off, like many people, with WU-ftpd. It was ok,
but I saw it as a standalone application that didn't lend itself to
external management. I tried a number of other FTP servers over the
years, and then pure-ftpd appeared. With the news that it would
support an uploaded files script capability, things started to look
more interesting, and I devised Purepostpro, a script that would "do
something" with an uploaded file. I wanted to keep it fast and simple,
so I used Perl and MySQL, with wh ich I am familiar. The result is
what you have just downloaded.. I have tried the software with files
up to 28Mb in size and it seems to run through quite quickly.


*********************
* What Does It Do ? *
*********************


The script has two main functions:

1. It logs all incoming uploads to a two-table database;

2. It calculates an MD5 checksum for each file and stores that in the
   database too.


**********************
* How does it work ? *
**********************

The  MySQL database consists of two tables - 'audit' and 'logging'.
I have created a user for that database called 'pureuser'.

I decided  to use Perl MD5 to calculate a checksum for each incoming
file. Once the checksum has been calculated, a connection is made to a
database, 'purepostpro'.

The audit table is updated with all the environment variables that
pure-uploadscript will provide plus the checksum. This is so that you
can actually audit what has been uploaded. Now we have a second step
which uses the logging table in the database. If a user has uploaded
the same file with the same checksum (it's a duplicate) we will just
update the timestamp in the logging table. The rationale here is that
at the end of the day we want a table (logging) that contains a list
of unique files that can be processed.

The logging  table contains an additional column - 'processed' that
contains an 'N' when a new entry is added. The rationale here is that
another application can then be run as required and actually processes
the received files based on the 'processed' flag. When the file has
been processed, the flag will be updated to 'Y'. I am working on this
at the moment ( see ToDo ).

One benefit of this approach is that it doesn't really matter where
users upload to. I'm not suggesting they upload files indiscriminately
but if your processing program always expects an uploaded file to be
in $HOME/DATA and for some reason they have created a new directory
into which they upload, (i.e. $HOME/DATA/October) then your standard
processing software just isn't going to pick it up ;-) Also, if the
person that normally sends the data goes away on holiday, someone else
may decide to upload the data to, say the anonymous /incoming
directory, once again causing problems for your processing software.
Instead, by reading the logging table, the full upload path is
recorded, so it all becomes easier.


***********************
* And another thing.. *
***********************


I have to admit  that I am not a MySQL expert. I am doing in this my
spare time and have had no formal database training. Someone may spot
a better way of doing things - if so, please write in and we can
improve things.. All suggestions are taken seriously!


****************
* Requirements *
****************

* Hardware

I have run this on a Pentium III 450 box with 128Mb of memory, but it
should run on a lower spec machine.

* Software

I use Mysql Ver 14.14 Distrib 5.1.61, for debian-linux-gnu (i686) using readline 6.1.
If you are going to modify the software, you should be familiar with Perl DBI/DBD.

There are some Perl modules needed too: you can get these via CPAN
(http://www.cpan.org). These are:

DBI          The Perl Database Interface by Tim Bunce. I'm currently
             using version 1.18 but I believe there's a later version.

Digest::MD5  Digest::MD5 - Perl interface to the MD5 Algorithm. I'm
             currently using version 2.16.

DBD::MySQL   MySQL driver for the Perl5 Database Interface (DBI). I'm
             currently using version 1.2212.


The thing about this software is that if you feel that MySql isn't
right for you, and wanted to perhaps use Or*cle, you can simply
specify a new DBD name and use the script with probably minimal
changes. To set up the database from scratch, I have created a shell
script - setupPurepro - reproduced below. This also includes the MySQL
commands to update the MySQL database - make sure you are happy doing
this before you run the script - personally I prefer to use something
like phpMyadmin to do this ;-)

NOTE! I've used a password of "purepass", below, but you'll be using 
your own, naturally :-)

-------- >8 snip ---------

#!/bin/sh

mysql -u??? -p??? << BOF

DROP DATABASE IF EXISTS purepostpro;
CREATE DATABASE purepostpro;
USE purepostpro;

DROP TABLE IF EXISTS logging;
CREATE TABLE logging
(
 counter int(6) NOT NULL AUTO_INCREMENT PRIMARY KEY,
 l_time DATETIME,
 u_user VARCHAR(12) NOT NULL,
 u_group VARCHAR(20),
 u_uid VARCHAR(5),
 u_gid VARCHAR(5),
 u_perms CHAR(4),
 u_size VARCHAR(10),
 u_checksum CHAR(32) UNIQUE NOT NULL,
 u_file CHAR(255) NOT NULL,
 processed ENUM('N','P','Y')
);
CREATE INDEX checksum ON logging(u_checksum);
CREATE INDEX username ON logging(u_user);
CREATE INDEX file     ON logging(u_file);

DROP TABLE IF EXISTS audit;
CREATE TABLE audit
(
 counter int(6) NOT NULL AUTO_INCREMENT PRIMARY KEY,
 a_time TIMESTAMP(14),
 a_user VARCHAR(12),
 a_group VARCHAR(20),
 a_uid VARCHAR(5),
 a_gid VARCHAR(5),
 a_perms CHAR(4),
 a_size VARCHAR(10),
 a_checksum CHAR(32),
 a_file CHAR(255)
);

use mysql;
DELETE FROM user WHERE User='purepostpro';
INSERT INTO user(Host,User,Password,Select_priv,Insert_priv,Update_priv) VALUES('localhost','purepostpro',password('purepass'),'Y','Y','Y');

DELETE FROM host WHERE Db='purepostpro';
INSERT INTO host(Host,Db,Select_priv,Insert_priv) VALUES ('localhost','purepostpro','Y','Y');

BOF

mysqladmin -u${SUPERUSER} -p${SUPERPASS} flush-privileges

exit

-------- >8 snip ---------

Now that you have set up the purepostpro database and tables, running
mysqlaccess with parameters shown below will confirm that everything
has been set up:

> mysqlaccess localhost -U superuser -P superpass -u purepostpro -p purepass -d purepostpro

mysqlaccess Version 2.06, 20 Dec 2000
By RUG-AIV, by Yves Carlier (Yves.Carlier@rug.ac.be)
Changes by Steve Harvey (sgh@vex.net)
This software comes with ABSOLUTELY NO WARRANTY.

Access-rights
for USER 'purepostpro', from HOST 'localhost', to DB 'purepostpro'
        +---------------------- +---+ +----------------------+---+
        | Select_priv           | Y | | Execute_priv         | N |
        | Insert_priv           | Y | | Repl_slave_priv      | N |
        | Update_priv           | Y | | Repl_client_priv     | N |
        | Delete_priv           | N | | Create_view_priv     | N |
        | Create_priv           | N | | Show_view_priv       | N |
        | Drop_priv             | N | | Create_routine_priv  | N |
        | Reload_priv           | N | | Alter_routine_priv   | N |
        | Shutdown_priv         | N | | Create_user_priv     | N |
        | Process_priv          | N | | Event_priv           | N |
        | File_priv             | N | | Trigger_priv         | N |
        | Grant_priv            | N | | Ssl_type             | ? |
        | References_priv       | N | | Ssl_cipher           | ? |
        | Index_priv            | N | | X509_issuer          | ? |
        | Alter_priv            | N | | X509_subject         | ? |
        | Show_db_priv          | N | | Max_questions        | 0 |
        | Super_priv            | N | | Max_updates          | 0 |
        | Create_tmp_table_priv | N | | Max_connections      | 0 |
        | Lock_tables_priv      | Y | | Max_user_connections | 0 |
        +-----------------------+---+ +----------------------+---+

**************
* Running It *
**************

In theory, everything should be set up now. Here's my (wrapped)
command-line to start PureFTPD:

/usr1/pureftpd/sbin/pure-ftpd -f ftp -1 -a 100 -B -c 10 -C 3 -d -E -F
   /etc/.ftpbanner -H -k 60% -l puredb:/etc/pureftpd.pdb -m 60 -o -p
   40000:50000 -r -R -s -U 177:022 -O /var/log/pureftpd.log

and the upload script handler:

 /usr/local/sbin/pure-uploadscript -r /usr/bin/PurePostPro &


For this example, we'll get our sample virtual user 'joe' to upload a
file, file1.txt. Once it's completed, we can look at the logging
table and the audit table to see our entry. As this is the first time
our database has been used, there is only one entry in each.

ftp log entry:
-------------
joe [12/Dec/2001:10:39:14 -0000]
    "PUT /home/vftp/joe/FILE1.TXT" 200 308

MySQL log entry:
---------------
mysql> select * from logging\G
*************************** 1. row ***************************
   counter: 1
    l_time: 2001-12-12 10:39:15
    u_user: joe
   u_group: ftpgroup
     u_uid: 101
     u_gid: 101
   u_perms: 600
    u_size: 308
u_checksum: bc0a47f7126a615980a62b7a49667a3d
    u_file: /home/vftp/joe/FILE1.TXT
 processed: N
1 row in set (0.00 sec)

mysql> select * from audit\G
*************************** 1. row ***************************
   counter: 1
    a_time: 20011212103915
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: bc0a47f7126a615980a62b7a49667a3d
    a_file: /home/vftp/joe/FILE1.TXT
1 row in set (0.00 sec)



So far, so good.. Now 'joe' sends in another file, file2.txt, which is
different (internally) from file1.txt. Once again, the upload is logged
to both tables in our database as this is a new file:


ftp log entry:
-------------

joe [12/Dec/2001:10:41:58 -0000]
    "PUT /home/vftp/joe/file2.txt" 200 308

MySQL log entry:
---------------

mysql> select * from logging\G
*************************** 1. row ***************************
   counter: 1
    l_time: 2001-12-12 10:39:15
    u_user: joe
   u_group: ftpgroup
     u_uid: 101
     u_gid: 101
   u_perms: 600
    u_size: 308
u_checksum: bc0a47f7126a615980a62b7a49667a3d
    u_file: /home/vftp/joe/FILE1.TXT
 processed: N
*************************** 2. row ***************************
   counter: 2
    l_time: 2001-12-12 10:41:58
    u_user: joe
   u_group: ftpgroup
     u_uid: 101
     u_gid: 101
   u_perms: 600
    u_size: 308
u_checksum: 9e2c6552dfabd4080fe8575cecd276ba
    u_file: /home/vftp/joe/file2.txt
 processed: N
2 rows in set (0.00 sec)

mysql> select * from audit\G
*************************** 1. row ***************************
   counter: 1
    a_time: 20011212103915
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: bc0a47f7126a615980a62b7a49667a3d
    a_file: /home/vftp/joe/FILE1.TXT
*************************** 2. row ***************************
   counter: 2
    a_time: 20011212104158
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: 9e2c6552dfabd4080fe8575cecd276ba
    a_file: /home/vftp/joe/file2.txt
2 rows in set (0.00 sec)


Now 'joe' sends in another file, my_resume.txt, a copy of file1.txt
uploaded earlier, to a directory named 'secret' which he creates:

ftp log entry:
-------------

Dec 12 11:00:23 entropy pure-ftpd[9713]:
    (?@10.67.43.200) [INFO] joe is now logged in
Dec 12 11:00:29 entropy pure-ftpd[9713]:
    (joe@10.67.43.200) [DEBUG] Command [xmkd] [secret]
Dec 12 11:00:32 entropy pure-ftpd[9713]:
    (joe@10.67.43.200) [DEBUG] Command [cwd] [secret]
Dec 12 11:00:52 entropy pure-ftpd[9713]:
    (joe@10.67.43.200) [DEBUG] Command [port] [10,67,43,200,9,113]
Dec 12 11:00:52 entropy pure-ftpd[9713]:
    (joe@10.67.43.200) [DEBUG] Command [stor] [my_resume.txt]
Dec 12 11:00:53 entropy pure-ftpd[9713]:
    (joe@10.67.43.200) [NOTICE] /home/vftp/joe/secret/my_resume.txt
    uploaded  (308 bytes, 0.97KB/sec)

Now if we have a look the database entries, we can see that the fact
that the upload completed in the 'audit' table..

MySQL log entry:
---------------

mysql> select * from audit\G;
*************************** 1. row ***************************
   counter: 1
    a_time: 20011212103915
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: bc0a47f7126a615980a62b7a49667a3d
    a_file: /home/vftp/joe/FILE1.TXT
*************************** 2. row ***************************
   counter: 2
    a_time: 20011212104158
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: 9e2c6552dfabd4080fe8575cecd276ba
    a_file: /home/vftp/joe/file2.txt
*************************** 3. row ***************************
   counter: 3
    a_time: 20011212110053
    a_user: ftpuser
   a_group: ftpgroup
     a_uid: 101
     a_gid: 101
   a_perms: 600
    a_size: 308
a_checksum: bc0a47f7126a615980a62b7a49667a3d
    a_file: /home/vftp/joe/secret/my_resume.txt
3 rows in set (0.00 sec)

 ... and in the 'logging' table, the new upload has been ignored as it
is a duplicate (same checksum) of file1.txt even though both the location
and filename are different.

mysql> select * from logging\G;
*************************** 1. row ***************************
   counter: 1
    l_time: 2001-12-12 10:39:15
    u_user: joe
   u_group: ftpgroup
     u_uid: 101
     u_gid: 101
   u_perms: 600
    u_size: 308
u_checksum: bc0a47f7126a615980a62b7a49667a3d
    u_file: /home/vftp/joe/FILE1.TXT
 processed: N
*************************** 2. row ***************************
   counter: 2
    l_time: 2001-12-12 10:41:58
    u_user: joe
   u_group: ftpgroup
     u_uid: 101
     u_gid: 101
   u_perms: 600
    u_size: 308
u_checksum: 9e2c6552dfabd4080fe8575cecd276ba
    u_file: /home/vftp/joe/file2.txt
 processed: N
2 rows in set (0.00 sec)

Now, when your external processing script runs, it will reference the
logging table and only process truly unique files.

********
* ToDo *
********

* Understand MySQL better, maybe go on a course :-)

* Tidy up the script - add some sort of error handling;

* Reinstate the email subroutine - this will send mail to someone when
  a certain condition occurs: possibly on receipt of a duplicate file?

* Develop the external processor to process the logging table - I am
  currently working on a Perl daemon to do this. Early tests are very
  encouraging..


****************
* Contact info *
****************

You can email me at: peterg [at] mhmediaonline.eu

Please bear in mind that I have a full-time job, so my "free" time is
quite limited, but I'm determined to make this work so I will respond!


*************
* Thanks .. *
*************

.. to the whole PureFTPd team - it's great software, and a pleasure to
work with.

Source: README.TXT, updated 2012-04-04