Re: [Sguil-users] Sguil vs packet captures, not live traffic

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I may be a bit late to the conversation, but I just subscribed to the list
because of this thread.  And forgive the long post, but here's what I know.
I would love to get some feedback on our techniques from this group.

We've been doing research here that involves a look at "big data" network
traffic with open source tools. The gist of it is about how we sift through
tons of data, parse it, structure it, query it, visualize it and extract
knowledge. That's the concept anyway, and something this group knows a lot
about. In the process I've been moving the same pcap files around and doing
a lot of scripting with tcpreplay and now netsniff-ng.  What we're working
on is a solid methodology for doing a discovery and analysis process so big
organizations can take the plunge with greater network monitoring by
getting some situation awareness up front. Or another way to put it is, how
do you tune the IDS offline with lots of raw data and then put a
production-ready system in place so other analysts aren't overwhelmed and
end up just turning it off.

At the heart of this methodology is replaying captured traffic over and
over again with increasingly fine-tuned filters and data partitioning.  We
replay the traffic through the stack of tools (we're using Security Onion
as the base system build) and can examine it with lots of the things
available. Then as we tune snort sigs or thresholds or bpf filters to look
at subsets of traffic, we replay it again into a clean sguil DB and have
another look.  I hope the value of doing this is clear, and Richard I'm
guessing you're looking for the same capabilities in a classroom setting.

For our data collection, we were able to record traffic on a big enterprise
network with interesting things to look at (internal corporate IT and
SCADA) and got about 4 TB in 2 weeks.  I recorded that with daemonlogger
into 1 minute pcap files stored in a directory structure of
YYYY/MM/DD/HH/file.interface.####(unix time).pcap. Then we sneaker-net the
pcaps over to our lab computers for replay and analysis. The concern about
timestamps is a big one (more on this in a second), and this helps us stay
organized. The filename itself can be parsed to get the unix epoch time
into yy/mm/dd/hh, and we know where to find any given original pcap in the
filesystem using the directory structure. To confirm, we do a tcpdump for a
single packet count and pull the timestamp off the first packet in the file.

So to "play" that into the DB, we went through a number of trials to
determine the best method.  All I know is I've found the best method /so
far/.  1.8 TB (1 week's worth of traffic) can be played into securityonion
over a loopback adapter in about 36 hours. Kind of cool that it doesn't
take a whole week, but its not cool to have to wait beyond one working day.
Not much fun to do one search and come back a couple days later to see the
results. That time is achieved if you run with "tcpreplay -t" for "fast as
possible".  We upgraded hardware to give it a lot of processors and RAM and
it turns out we couldn't get that to go faster via loopback.  There is a
limit somewhere, and I believe it's in some buffer size in the driver or
kernel. (any thoughts?)  The loopback is* */not/ intended to do 1Gbe
traffic.  Nor is it a good place to keep your traffic stream clean, because
apparently there are other processes running that think the loopback is
just for internal communications... ;^)

A better way to do it is to create a dummy0 interface (may be called other
things in *BSD, etc, but seems to be widely available). This is useful for
local playback of traffic without crowding the loopback interface.  By the
way, I modified the "sosetup" script with securityonion to change the regex
that looks for available interfaces.  By allowing it to match "lo" or
"dummy[0-9]" along with the current choices of "eth" and others, it will
recognize the interface, name the sensors appropriately, set up the
directories correctly, etc, and works like a charm.

Here's the problem. Apparently the dummy0 interface (on my build of Ubuntu
server 12.04) has the same networking buffer bottleneck as "lo" and I'm not
a driver or kernel developer, though I'd love to roll up my sleeves with
the guidance of someone who knows why that might be happening.

Something else that doesn't work: If you play the traffic back to eth1 or
something when it's not connected to a physical network, the NIC "knows"
and acts like it's playing traffic without error, but shortcuts the process
and drops all the packets. So we've concluded you've got to have two
separate NICs: one for play, one for receive. Someone here mentioned that
there are some drivers or options available for testing NICs that will act
like a loopback on the NIC hardware instead of in the OS. So you should be
able to achieve 1Gbe in test mode. Anyone know anything about this?

One other option to avoid.  We tried two separate VMs on the same ESX
server and tried to play it all "internally" with the magic of
virtualization.  The virtual switch in ESX flew at lightning speed but it
turns out with very serious packet dropping (higher than 50% at some
points) so I guess 'fast as possible" is not possible. There's some
configuration issues to consider here that aren't worth getting into
because of that. I'd love to troubleshoot this one, too, but ...

The best solution was using a couple of 1Gbe NICs and two physical boxes.
The receiver (sguil or securityonion build) gets set up with some private
address on a second interface (keeping one available on a real network for
management) and the second box can be a cheap system, so long as it also
has a 1Gbe NIC and enough RAM to hold the pipeline. Set it up on the same
private subnet and use a cross-over cable to directly attach the two
computers. Then just cram packets down the pipe.  Now we can play a week's
worth of traffic in a few hours and can spend more time analyzing and
looking for needles instead of waiting for the haystack to show up.

Now as for the timestamp issue, we deal with this in two ways. On the one
hand, we're less interested in the actual time (just relative time) because
with one NIC and pfring handling the packet stream, all the tools running
on the box (sguil, sancp, argus, ntop, etc) are in sync. If you want to
find an event of interest in the original pcap, you need to identify the
stream of interest (srcip, srcport, dstip, dstport, proto) and run a
search. I like ngrep for this. There are lots of tools for pcaps, really,
but trying to load everything into Wireshark with pcaps the size we have is
just not practical (or possible?).  Several sustained spikes in traffic
make even the 1-minute files as much as 700MB. So you need something like
grep that tails through the file without having to load the whole thing in
memory.

We also end up with 2 copies of the pcap files this way, which is extra
disk space that should be accounted for.  We played off of a USB3 external
drive and wrote to a big iSCSI disk array. Disk bottle-necking will also be
an issue if it can't keep up with the network. I like RAID striping. My
disks don't need to be huge, I just want more of them.

The other way to deal with the timestamps would be to set the time of the
system to the time of the recorded pcap.  I think this can be synchronized
if you run the replay via an "at" scheduled task (as opposed to, I guess,
hovering your finger over the enter key and trying to catch the right
nanosecond). Then instead of using the --top-speed option, you have to play
it back in real-time. But there are options, at least with tcpreplay, for
which "real-time" is implemented based on machine clock cycles, etc., that
I think will make this process less than precise. So I don't think you need
to do it this way.

I like the idea of loading the pcap files into each tool (or component
group of tools) separately by just reading the files directly and copying
things into the right places.  I just found the replay method to be easier
to set up and helps to preserve the "integrity" of the monitoring system,
as well as keeping the original pcaps offline. This lets us freely destroy
the database, wipe out the file system, or other crazy things and start all
over.

I imagine in an educational setting with any group of analysts, the ability
to replay the same traffic again and again with refined signatures and
filters, would be a useful exercise.  We're able to peel back a lot of
layers of data and find all kinds of interesting things to isolate and
launch another replay. It keeps things very organized. I find it has been a
lot easier to deal with than just turning on the production network
fire-hose and keep drinking until you get the hang of it.

I'm assuming (but don't know - would love to hear) that most folks have
smaller data loads, and this should all definitely scale down, too. Without
the need to playback terabytes, I think setting up a sensor in one VM and
playing into it from another VM at normal speeds would be suitable. But if
anyone is interested in scaling up larger, I'd be interested in talking to
you.

Regards,
Michael Haney
CISSP, QSA, GSEC, GCIA, GCIH, GCFA
Graduate Student
The University of Tulsa
Institute for Information Security