Re: [Sguil-users] Sguil vs packet captures, not live traffic
Status: Beta
Brought to you by:
bamm
From: Michael H. <mic...@ut...> - 2013-03-05 16:57:19
|
I may be a bit late to the conversation, but I just subscribed to the list because of this thread. And forgive the long post, but here's what I know. I would love to get some feedback on our techniques from this group. We've been doing research here that involves a look at "big data" network traffic with open source tools. The gist of it is about how we sift through tons of data, parse it, structure it, query it, visualize it and extract knowledge. That's the concept anyway, and something this group knows a lot about. In the process I've been moving the same pcap files around and doing a lot of scripting with tcpreplay and now netsniff-ng. What we're working on is a solid methodology for doing a discovery and analysis process so big organizations can take the plunge with greater network monitoring by getting some situation awareness up front. Or another way to put it is, how do you tune the IDS offline with lots of raw data and then put a production-ready system in place so other analysts aren't overwhelmed and end up just turning it off. At the heart of this methodology is replaying captured traffic over and over again with increasingly fine-tuned filters and data partitioning. We replay the traffic through the stack of tools (we're using Security Onion as the base system build) and can examine it with lots of the things available. Then as we tune snort sigs or thresholds or bpf filters to look at subsets of traffic, we replay it again into a clean sguil DB and have another look. I hope the value of doing this is clear, and Richard I'm guessing you're looking for the same capabilities in a classroom setting. For our data collection, we were able to record traffic on a big enterprise network with interesting things to look at (internal corporate IT and SCADA) and got about 4 TB in 2 weeks. I recorded that with daemonlogger into 1 minute pcap files stored in a directory structure of YYYY/MM/DD/HH/file.interface.####(unix time).pcap. Then we sneaker-net the pcaps over to our lab computers for replay and analysis. The concern about timestamps is a big one (more on this in a second), and this helps us stay organized. The filename itself can be parsed to get the unix epoch time into yy/mm/dd/hh, and we know where to find any given original pcap in the filesystem using the directory structure. To confirm, we do a tcpdump for a single packet count and pull the timestamp off the first packet in the file. So to "play" that into the DB, we went through a number of trials to determine the best method. All I know is I've found the best method /so far/. 1.8 TB (1 week's worth of traffic) can be played into securityonion over a loopback adapter in about 36 hours. Kind of cool that it doesn't take a whole week, but its not cool to have to wait beyond one working day. Not much fun to do one search and come back a couple days later to see the results. That time is achieved if you run with "tcpreplay -t" for "fast as possible". We upgraded hardware to give it a lot of processors and RAM and it turns out we couldn't get that to go faster via loopback. There is a limit somewhere, and I believe it's in some buffer size in the driver or kernel. (any thoughts?) The loopback is* */not/ intended to do 1Gbe traffic. Nor is it a good place to keep your traffic stream clean, because apparently there are other processes running that think the loopback is just for internal communications... ;^) A better way to do it is to create a dummy0 interface (may be called other things in *BSD, etc, but seems to be widely available). This is useful for local playback of traffic without crowding the loopback interface. By the way, I modified the "sosetup" script with securityonion to change the regex that looks for available interfaces. By allowing it to match "lo" or "dummy[0-9]" along with the current choices of "eth" and others, it will recognize the interface, name the sensors appropriately, set up the directories correctly, etc, and works like a charm. Here's the problem. Apparently the dummy0 interface (on my build of Ubuntu server 12.04) has the same networking buffer bottleneck as "lo" and I'm not a driver or kernel developer, though I'd love to roll up my sleeves with the guidance of someone who knows why that might be happening. Something else that doesn't work: If you play the traffic back to eth1 or something when it's not connected to a physical network, the NIC "knows" and acts like it's playing traffic without error, but shortcuts the process and drops all the packets. So we've concluded you've got to have two separate NICs: one for play, one for receive. Someone here mentioned that there are some drivers or options available for testing NICs that will act like a loopback on the NIC hardware instead of in the OS. So you should be able to achieve 1Gbe in test mode. Anyone know anything about this? One other option to avoid. We tried two separate VMs on the same ESX server and tried to play it all "internally" with the magic of virtualization. The virtual switch in ESX flew at lightning speed but it turns out with very serious packet dropping (higher than 50% at some points) so I guess 'fast as possible" is not possible. There's some configuration issues to consider here that aren't worth getting into because of that. I'd love to troubleshoot this one, too, but ... The best solution was using a couple of 1Gbe NICs and two physical boxes. The receiver (sguil or securityonion build) gets set up with some private address on a second interface (keeping one available on a real network for management) and the second box can be a cheap system, so long as it also has a 1Gbe NIC and enough RAM to hold the pipeline. Set it up on the same private subnet and use a cross-over cable to directly attach the two computers. Then just cram packets down the pipe. Now we can play a week's worth of traffic in a few hours and can spend more time analyzing and looking for needles instead of waiting for the haystack to show up. Now as for the timestamp issue, we deal with this in two ways. On the one hand, we're less interested in the actual time (just relative time) because with one NIC and pfring handling the packet stream, all the tools running on the box (sguil, sancp, argus, ntop, etc) are in sync. If you want to find an event of interest in the original pcap, you need to identify the stream of interest (srcip, srcport, dstip, dstport, proto) and run a search. I like ngrep for this. There are lots of tools for pcaps, really, but trying to load everything into Wireshark with pcaps the size we have is just not practical (or possible?). Several sustained spikes in traffic make even the 1-minute files as much as 700MB. So you need something like grep that tails through the file without having to load the whole thing in memory. We also end up with 2 copies of the pcap files this way, which is extra disk space that should be accounted for. We played off of a USB3 external drive and wrote to a big iSCSI disk array. Disk bottle-necking will also be an issue if it can't keep up with the network. I like RAID striping. My disks don't need to be huge, I just want more of them. The other way to deal with the timestamps would be to set the time of the system to the time of the recorded pcap. I think this can be synchronized if you run the replay via an "at" scheduled task (as opposed to, I guess, hovering your finger over the enter key and trying to catch the right nanosecond). Then instead of using the --top-speed option, you have to play it back in real-time. But there are options, at least with tcpreplay, for which "real-time" is implemented based on machine clock cycles, etc., that I think will make this process less than precise. So I don't think you need to do it this way. I like the idea of loading the pcap files into each tool (or component group of tools) separately by just reading the files directly and copying things into the right places. I just found the replay method to be easier to set up and helps to preserve the "integrity" of the monitoring system, as well as keeping the original pcaps offline. This lets us freely destroy the database, wipe out the file system, or other crazy things and start all over. I imagine in an educational setting with any group of analysts, the ability to replay the same traffic again and again with refined signatures and filters, would be a useful exercise. We're able to peel back a lot of layers of data and find all kinds of interesting things to isolate and launch another replay. It keeps things very organized. I find it has been a lot easier to deal with than just turning on the production network fire-hose and keep drinking until you get the hang of it. I'm assuming (but don't know - would love to hear) that most folks have smaller data loads, and this should all definitely scale down, too. Without the need to playback terabytes, I think setting up a sensor in one VM and playing into it from another VM at normal speeds would be suitable. But if anyone is interested in scaling up larger, I'd be interested in talking to you. Regards, Michael Haney CISSP, QSA, GSEC, GCIA, GCIH, GCFA Graduate Student The University of Tulsa Institute for Information Security |