Re: [Nfsen-discuss] [Nfdump-discuss] Request for comment: flow tagging
Netflow visualisation and investigation tool
Brought to you by:
phaag
From: Peter H. <ph...@us...> - 2012-08-21 20:03:20
|
Hi Thomas, Thanks for your feedback. On 21/8/12 1:30 PM, Tomas Podermanski wrote: > Hi Peter, > snip > > Definitely the second one - today, flexibility more important than > storage and performance. Well - storage means I/O and performance is still a main goal of nfdump. Therefore I still try to keep it a tight as possible, But is seams anyway, as everybody wanted to have multiple tags per flow, means means a total of 32 tags. ( 32 bit bitmask ) per flow record. > >> o tags are numerical ids with an optional string labels. These >> string labels are stored along the flows in the nfdump file. > > I expect that labels will be tied with unique number, so if two differed > labels for one tag is set the only one will be stored. > > set tag 20(mystring1) if dst port 80 > set tag 20(mystring2) if dst port 1234 > > > Either mystring1 or mystring2 will be stored into the file. Am I right? This will result in an error at filter compile time. You can not assign the same ID to multiple labels. Once, a tag/label combination is given in a filter that relation is unique for that filter. > > >> o The nfdump filter language is extended, such that each valid >> nfdump filter expression can assign or filter a tag: >> set tag <nr>[(label)] if <expr> for example: >> # numerical assignment: >> set tag 10 if dst port 80 >> # numerical and string assignment: >> set tag 20(http) if dst port 80 >> o matching tags in the filter language: >> tag <nr> >> tag <label> > > I appreciate to have possibility explicitly define whether the exact > value of the tag or the label is matched in the filter for example. For > example It can cause troubles in cases where tags are uses for storing > users ID and login names and some users choices only numerical login - > there might be conflict between userid and login and improper data could > be returned for this condition. Following syntax and semantics of the > filer would avoid this problem: > > tag <value> : matches tag value or string label > tag value <value> : matches only numerical value of the tag > tag label <label> : matches only string label of the tag If the value is given as a number from 1..32 it's the tag ID, otherwise it's taken as the label. Having a label string 1..32 does not make much sense, therefore keep it simple. This should not result in conflicts. > >> o printing tags in output with %tag >> o instead of a new tag file, tag assignment can be specified in >> a standard nfdump filter file such as: >> >> # tags to be assigned: >> set tag 10(http) if ( src port 80 ) or >> >> # comment your tags/labels >> set tag 11(https) if ( dst port 443) or >> ... > > At the first sight it looks good but there are two problems: > 1. I need to know all values that could appear in the data. I cases > problem when related data for tag have to be looked up for example in > DNS, SQL or geolocation database. It can be solved by going through data > twice - first to get all values, based on that prepare filter file and > the second run for data modification/update. It seems to me a little bit > cumbersome. nfdump can not handle your requests anyway, if you want to have some more specific assignments of labels, such as geolocation ( maybe one day it will.. ) radius identification or anything else, which needs any kind of preprocessing the flow data and creating labels for that. Assigning labels according to any given filter which nfdump understands and is optimised for to process is the strength and power of nfdump. It seems to me also the most elegant way by using what you already know and are working with - the filter language. > > 2. There might be another problem with the number of rules in the filter > file. Lets say that I want use tags to add information about AS numbers. > In that case I would have to create filer file having more that 600 > thousand of records that will look: > > set tag 1 8974(VUTBR) if ( src net 147.229.0.0/16 or src net > 2001:67c:1220::/32 ) or > set tag 2 8974(VUTBR) if ( dst net 147.229.0.0/16 or dst net > 2001:67c:1220::/32 ) or > set tag 1 2852(CESNET) if (src dst net 191.123.0.0/16 or dst net > 2001:233::/30 ) or > set tag 2 2852(CESNET) if (src dst net 191.123.0.0/16 or dst net > 2001:233::/30 ) or > ... and next 600 thousand rows Not sure, if we misunderstood. You have a total of 32 tags 1..32 ( bitmap ), which means at maximum you have an 'or' chain of 32 terms. nfdump is very strong in evaluation terms and therefore label assignment is about the same speed as applying a filter. Assigning AS number with tags is not what tags are intended for. I guess nfbgpd would do a better job here. It adds AS information to flow records for networks or exporters without full routing and puts AS information into the proper AS fields of the flow record according to src/dst IP addresses lookup up BGP data. But nfbgpd is still experimental lab software. In any case, if you want to have such extensive tagging for several hundred's of thousands of terms, I'm not sure if tagging is the way to go. > > I am not sure whether it would be effective enough to have such big > filter definition. I am sure that extending filter definition as you > proposed should be available in nfdump - it easy to use and in many > cases solves the problem (for example adding tag identifying customer's > network range). I will mention some propose to solve that problem in the > text later. As you mentioned, it should be easy to use and should do the job for most cases. Needless to say, that more extreme requirements are not likely to work that way. > > Related to this innovation that you planning to do I'd kindly ask three > questions: > 1. Do you plan that tags could be used for aggregation (-A) and > statisticsc (-s) ? Can be done - sure. That's only a few lines of additional code. > 2. Do you plan that set option in the filter file will also work in more > generic way, so would it be possible to update not only tag items, but > AS numbers or mac addresses as well? It could work in the following way: > > set srcas 8974 if (src dst net 147.229.0.0/16) or > set dstas 12345 if (src dst net 147.229.0.0/16) or > > set insrcmac aa:bb:cc:dd:ee:ff if (src host 147.229.3.15) or > set outsrcmac 12345 if (src host 147.229.3.15) or > ... There are no plans for that now. > > 3. Do you plan that labels will also work for existing items (ip > address, port numbers), example: > set srcip 147.229.3.123(hawk.cis.vutbr.cz) if (src host 147.229.3.123) or > set dstip 147.229.3.123(hawk.cis.vutbr.cz) if (dst host 147.229.3.123) or > set srcport 80(http) if (src src port 80) or > set dstport 80(http) if (dst src port 80) or > ... A label is a synonym for a tag, and for now, I would like to keep it simple to start. > > > > > Not to have only questions I have one idea, that could solve some > problems I mentioned above. Currently nfdump accept only input data in > native/binary nfdump format. If the nfdump would accept data lets say in > CSV format there will be a open and easy way to create own > preprocesor/filter that can modify the data in way that is needed. It > could work in such way: > > nfdump -R nfcapd.xxx -o csv | my_processing_tool | nfdump -C -w nfcapd.yyy > > -C says that input data will be handled into CSV format, and > my_processing_tool can perform all necessary operations (geolocation > database lookup, SQL query, ... ) modify data and return back to nfdump. > I think that many people would appreciate that feature :-). And It also > solves two problem at once - allows modifying value any item and more > effective 'tagging' can be performed. > Hmm .. frankly, I'm not a fan of multiple ASCII conversions. First you convert data at ASCII -> cvs, feed it into data processing, which in turn most likely translates ASCII back to binary first before processing and converts back to ASCII for cvs export, which than nfdump reads and again converts back to binary. For large volume record files this kills your performance. I perfectly understand the need for cvs output to post process data in any way, you like, but do not see really a need to accept cvs to store back into binary. If this processing chain is really need, I'd better would think about a plugin type interface for nfdump - say Perl, Python, C interface to process each individual record. I'd happily put these ideas into the nfdump feature idea collection. :) For now, I would like to concentrate on the tags, and make them to work as good and efficient as possible. Thanks for the valuable feedback. - Peter > > Sorry a little bit for longer email :-) Of course if you need somebody > to test this features we are eager and ready to help with that. We can > test it in scenarios where we uses own patched features very similar to > tags. > > Thanks for your effort a work on a such great tool. > > Tomas > > >> which can be given to nfdump as an argument -f <filter> >> >> Would the tagging system as described above match the >> requirements for those planing to use tags? >> >> Feedback is welcomed. >> >> - Peter >> > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Nfdump-discuss mailing list > Nfd...@li... > https://lists.sourceforge.net/lists/listinfo/nfdump-discuss > -- Be nice to your netflow data. Use NfSen and nfdump :) |