Having some sort of index would speed up searchings though large amounts of nfcapd files for some IP or net.
A high level design could work as follows:
nfcapd (when given the flag enabling the option) will hold a structure (hash, trie, etc) in memory of seen IP addresses. When nfcapd finalizes a nfcapd file it would also write (or serialize some structure) the IP addresses seen to a file associated with the the nfcapd file.
Example file names:
nfcapd.201110260000 # nfcapd file, full of flows
.ndx.201110260000 # index of IP addresses seen during the time period of the associated nfcapd file
When it comes time to search the flows for a particular IP or network nfdump would first check for the existence of an index file. If one exists, it would use that index to see if the IP/net is present in its associated flow file.
I believe this would drastically speed up nfdump queries over large amounts of nfcapd files (when there is criteria present that includes an IPs/networks).