Thanks for your feedback.
On 21/8/12 1:30 PM, Tomas Podermanski wrote:
> Hi Peter,
> Definitely the second one - today, flexibility more important than
> storage and performance.
Well - storage means I/O and performance is still a main goal of
nfdump. Therefore I still try to keep it a tight as possible,
But is seams anyway, as everybody wanted to have multiple tags
per flow, means means a total of 32 tags. ( 32 bit bitmask ) per
>> o tags are numerical ids with an optional string labels. These
>> string labels are stored along the flows in the nfdump file.
> I expect that labels will be tied with unique number, so if two differed
> labels for one tag is set the only one will be stored.
> set tag 20(mystring1) if dst port 80
> set tag 20(mystring2) if dst port 1234
> Either mystring1 or mystring2 will be stored into the file. Am I right?
This will result in an error at filter compile time. You can not assign the
same ID to multiple labels. Once, a tag/label combination is given in a filter
that relation is unique for that filter.
>> o The nfdump filter language is extended, such that each valid
>> nfdump filter expression can assign or filter a tag:
>> set tag <nr>[(label)] if <expr> for example:
>> # numerical assignment:
>> set tag 10 if dst port 80
>> # numerical and string assignment:
>> set tag 20(http) if dst port 80
>> o matching tags in the filter language:
>> tag <nr>
>> tag <label>
> I appreciate to have possibility explicitly define whether the exact
> value of the tag or the label is matched in the filter for example. For
> example It can cause troubles in cases where tags are uses for storing
> users ID and login names and some users choices only numerical login -
> there might be conflict between userid and login and improper data could
> be returned for this condition. Following syntax and semantics of the
> filer would avoid this problem:
> tag <value> : matches tag value or string label
> tag value <value> : matches only numerical value of the tag
> tag label <label> : matches only string label of the tag
If the value is given as a number from 1..32 it's the tag ID, otherwise
it's taken as the label. Having a label string 1..32 does not make much
sense, therefore keep it simple. This should not result in conflicts.
>> o printing tags in output with %tag
>> o instead of a new tag file, tag assignment can be specified in
>> a standard nfdump filter file such as:
>> # tags to be assigned:
>> set tag 10(http) if ( src port 80 ) or
>> # comment your tags/labels
>> set tag 11(https) if ( dst port 443) or
> At the first sight it looks good but there are two problems:
> 1. I need to know all values that could appear in the data. I cases
> problem when related data for tag have to be looked up for example in
> DNS, SQL or geolocation database. It can be solved by going through data
> twice - first to get all values, based on that prepare filter file and
> the second run for data modification/update. It seems to me a little bit
nfdump can not handle your requests anyway, if you want to have some more
specific assignments of labels, such as geolocation ( maybe one day it will.. )
radius identification or anything else, which needs any kind of preprocessing
the flow data and creating labels for that. Assigning labels according to
any given filter which nfdump understands and is optimised for to process is
the strength and power of nfdump. It seems to me also the most elegant way
by using what you already know and are working with - the filter language.
> 2. There might be another problem with the number of rules in the filter
> file. Lets say that I want use tags to add information about AS numbers.
> In that case I would have to create filer file having more that 600
> thousand of records that will look:
> set tag 1 8974(VUTBR) if ( src net 188.8.131.52/16 or src net
> 2001:67c:1220::/32 ) or
> set tag 2 8974(VUTBR) if ( dst net 184.108.40.206/16 or dst net
> 2001:67c:1220::/32 ) or
> set tag 1 2852(CESNET) if (src dst net 220.127.116.11/16 or dst net
> 2001:233::/30 ) or
> set tag 2 2852(CESNET) if (src dst net 18.104.22.168/16 or dst net
> 2001:233::/30 ) or
> ... and next 600 thousand rows
Not sure, if we misunderstood. You have a total of 32 tags 1..32 ( bitmap ),
which means at maximum you have an 'or' chain of 32 terms. nfdump is very strong in
evaluation terms and therefore label assignment is about the same speed
as applying a filter. Assigning AS number with tags is not what tags are
intended for. I guess nfbgpd would do a better job here. It adds AS
information to flow records for networks or exporters without full routing
and puts AS information into the proper AS fields of the flow record according
to src/dst IP addresses lookup up BGP data. But nfbgpd is still experimental
In any case, if you want to have such extensive tagging for several hundred's
of thousands of terms, I'm not sure if tagging is the way to go.
> I am not sure whether it would be effective enough to have such big
> filter definition. I am sure that extending filter definition as you
> proposed should be available in nfdump - it easy to use and in many
> cases solves the problem (for example adding tag identifying customer's
> network range). I will mention some propose to solve that problem in the
> text later.
As you mentioned, it should be easy to use and should do the job for most
cases. Needless to say, that more extreme requirements are not likely to
work that way.
> Related to this innovation that you planning to do I'd kindly ask three
> 1. Do you plan that tags could be used for aggregation (-A) and
> statisticsc (-s) ?
Can be done - sure. That's only a few lines of additional code.
> 2. Do you plan that set option in the filter file will also work in more
> generic way, so would it be possible to update not only tag items, but
> AS numbers or mac addresses as well? It could work in the following way:
> set srcas 8974 if (src dst net 22.214.171.124/16) or
> set dstas 12345 if (src dst net 126.96.36.199/16) or
> set insrcmac aa:bb:cc:dd:ee:ff if (src host 188.8.131.52) or
> set outsrcmac 12345 if (src host 184.108.40.206) or
There are no plans for that now.
> 3. Do you plan that labels will also work for existing items (ip
> address, port numbers), example:
> set srcip 220.127.116.11(hawk.cis.vutbr.cz) if (src host 18.104.22.168) or
> set dstip 22.214.171.124(hawk.cis.vutbr.cz) if (dst host 126.96.36.199) or
> set srcport 80(http) if (src src port 80) or
> set dstport 80(http) if (dst src port 80) or
A label is a synonym for a tag, and for now, I would like to keep it simple
> Not to have only questions I have one idea, that could solve some
> problems I mentioned above. Currently nfdump accept only input data in
> native/binary nfdump format. If the nfdump would accept data lets say in
> CSV format there will be a open and easy way to create own
> preprocesor/filter that can modify the data in way that is needed. It
> could work in such way:
> nfdump -R nfcapd.xxx -o csv | my_processing_tool | nfdump -C -w nfcapd.yyy
> -C says that input data will be handled into CSV format, and
> my_processing_tool can perform all necessary operations (geolocation
> database lookup, SQL query, ... ) modify data and return back to nfdump.
> I think that many people would appreciate that feature :-). And It also
> solves two problem at once - allows modifying value any item and more
> effective 'tagging' can be performed.
Hmm .. frankly, I'm not a fan of multiple ASCII conversions. First you
convert data at ASCII -> cvs, feed it into data processing, which in turn
most likely translates ASCII back to binary first before processing and
converts back to ASCII for cvs export, which than nfdump reads and again
converts back to binary. For large volume record files this kills your
performance. I perfectly understand the need for cvs output to post
process data in any way, you like, but do not see really a need to
accept cvs to store back into binary. If this processing chain is
really need, I'd better would think about a plugin type interface for
nfdump - say Perl, Python, C interface to process each individual record.
I'd happily put these ideas into the nfdump feature idea collection. :)
For now, I would like to concentrate on the tags, and make them to
work as good and efficient as possible.
Thanks for the valuable feedback.
> Sorry a little bit for longer email :-) Of course if you need somebody
> to test this features we are eager and ready to help with that. We can
> test it in scenarios where we uses own patched features very similar to
> Thanks for your effort a work on a such great tool.
>> which can be given to nfdump as an argument -f <filter>
>> Would the tagging system as described above match the
>> requirements for those planing to use tags?
>> Feedback is welcomed.
>> - Peter
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> Nfdump-discuss mailing list
Be nice to your netflow data. Use NfSen and nfdump :)