Menu

Tree [r4] /
 History

HTTPS access


File Date Author Commit
 dat 2009-04-02 omedalus [r4] Adding some sample training and testing data to...
 include 2009-04-02 omedalus [r1] Migrating seppuku to SourceForge
 obj 2009-04-02 omedalus [r1] Migrating seppuku to SourceForge
 src 2009-04-02 omedalus [r2] Adding some readme notes, and making it print h...
 Makefile 2009-04-02 omedalus [r1] Migrating seppuku to SourceForge
 README 2009-04-02 omedalus [r2] Adding some readme notes, and making it print h...
 license.txt 2009-04-02 omedalus [r1] Migrating seppuku to SourceForge

Read Me

seppuku v1.0.1
Copyright (c) 2009, Mikhail Voloshin
See "licenses.txt" for license information.


BUILD IT
---------------

     > cd seppuku-1.0.1
     > make seppuku

If you want, you can also make and run the unit tests.
     > make unit
     > ./unit
     All test passed.
     >


RUN IT
-----------------
1. Decide what kind of packet you want seppuku to learn to recognize.
2. Using wireshark or tcpdump or what-have-you, create two packet capture (.pcap or .cap) files:
     1. A "hits" (or "examples") file that consists entirely of packets of the type you want seppuku to recognize.
     2. A "misses" (or "counterexamples) file that consists entirely of packets that are NOT of the type you want seppuku to recognize. It actually helps if they're very similar to the desired packet class, but of course not actually identical.
3. Run the hits and misses file through seppuku to create a classifier. Save the classifier.
    > seppuku --trainhit examples.cap --trainmiss counterexamples.cap --save myclass.seppuku
4. Take a new capture file filled with packets seppuku has never seen before, and feed it through seppuku. Seppuku will write out two files, one containing just the hits, another containing just the misses. (Notice that this version of seppuku royally mangles the packet size and time fields.)
    > seppuku --load myclass.sepuku newfile.cap newfile-hits.cap newfile.misses.cap

EXAMPLE USE CASES:
1. You're being flooded with packets from a botnet in China, and you want to see if there's a pattern to the source IP addresses so you know what to block.
2. You run an ISP and you want to try to identify packets from a new filesharing protocol.
3. You have a quarantined machine with a virus on it and you'd like to see if there's anything distinctive about the packets that the virus is sending out.


UNDERSTAND IT
-------------

Seppuku is a tool for packet dissection and bitwise analysis. It uses simple bitwise machine learning algorithms from the ID3 family and (eventually) other plane-splitters in order to perform automated packet packet analysis.

Because it uses machine learning, you use Seppuku by training it to recognize the type of packets you're interested in. Training Seppuku involves giving it the following packet capture (pcap) files:
1. A "train-hits.pcap" file containing only the type of packets you're interested in.
2. A "train-misses.pcap" file containing only packets that are NOT of the type you're interested in. For example, if you're training Seppuku to recognize TCP packets, this file would contain UDP packets, ARP messages, and so on.
3. Optional "test-hits.pcap" and "test-misses.pcap" files, which contains a known mixture of desired and undesired packets. When trained, Seppuku will read this file and try to guess which packet is which; the closer it comes to your known desired results, the more accurately Seppuku has been trained. It is strongly advised that none of the packets in these file be identical to any of the packets in the "hits" and "misses" files.

This tool was developed as a software development demonstration. The author makes no claims of its readiness for commercial use.

The Seppuku source code makes as little use of third-party dependencies as possible. This is for the following reasons:
- Size of the compiled executable. The fewer unneeded functionality I bring in, the better.
- Portability. The lack of third-party dependencies makes it very easy to distribute and build the project.
- Future migration to kernel mode. If you believe that, you may also be interested in this one bridge I have for sale...
- Laziness. Do you know how much I hate dealing with third-party libraries? A lot. In fact, the one thing I hate more than having to try to wrap my brain around someone else's implementation of some algorithm... is implementing that algorithm myself. So whenever I embark on a software project, I always weigh the pain-in-the-assiness of writing a component myself versus using a third-party implmentation and learning someone else's interfaces. Usually I just suck it up and use a third-party implementation whenever I can, but in this particular project, the components are so simple and straightforward that I might as well write them myself. It's the software equivalent of using a manual screwdriver to drive in a woodscrew because the power drill is all the way in the basement behind the dryer and you don't feel like getting your hands covered in lint.

The name "Seppuku" derives from this tool's self-generating packet dissection capabilities. That is, it effectively makes packets dissect themselves.


IMPROVE IT
-----------

I wrote this thing in a single weekend, with a couple extra nights spent tweaking it and making the command-line behavior semi-sensible. But it could really benefit from using appropriate third-party packages to represent simple things like lists and bitvectors and other basic data structures. I wrote the whole thing in C only to show that I can... but in the real world I'd use C++ with boost. So if anybody wants to replace the C junk with, you know, *real* code, then please be my guest!