#45 Log to CDF fix

v1.0 (example)
open
nobody
9
2014-08-15
2013-07-25
Brian Fife
No

Hi,

I think this is why recovering logs doesn't work:

Index: bandwidthd.c

RCS file: /cvsroot/bandwidthd/bandwidthd/bandwidthd.c,v
retrieving revision 1.57
diff -u -r1.57 bandwidthd.c
--- bandwidthd.c 16 Jun 2011 19:34:56 -0000 1.57
+++ bandwidthd.c 25 Jul 2013 18:40:52 -0000
@@ -824,7 +824,7 @@
HostIp2CharIp(IPData->ip, IPBuffer);
fprintf(cdf, "%s,%lu,", IPBuffer, IPData->timestamp);
Stats = &(IPData->Send);
- fprintf(cdf, "%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu", Stats->total, Stats->icmp, Stats->udp, Stats->tcp, Stats->ftp, Stats->http, Stats->mail, Stats->p2p);
+ fprintf(cdf, "%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,", Stats->total, Stats->icmp, Stats->udp, Stats->tcp, Stats->ftp, Stats->http, Stats->mail, Stats->p2p);
Stats = &(IPData->Receive);
fprintf(cdf, "%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu\n", Stats->total, Stats->icmp, Stats->udp, Stats->tcp, Stats->ftp, Stats->http, Stats->mail, Stats->p2p);
}

you were missing a ',' on the output of Stats->p2p from the Send data, which concatenated the value to the receive Stats->total values. I was seeing 17 columns in my .cdf files, instead of the expected 18 (IP, timestamp, 8x send, 8x receive).

Discussion

  • Brian Fife
    Brian Fife
    2013-07-28

    If you have bad .cdf files with 17 columns, you can run the following script to repair the data. Specify the .cdf file as an argument, and redirect the output to a new file.

    ./parse.py old/log.1.0.cdf > log.1.0.cdf

    #!/usr/bin/python
    import sys
    
    with open(sys.argv[1]) as lines:
      for line in lines:
        line = line.rstrip()
        cols = line.split(',')
        if len(cols) == 17:
          p2p = 0
          total = cols[9]
          icmp = cols[10]
          udp = cols[11]
          tcp = cols[12]
          if total.find("0") == 0:
            total = total[1:]
          else:
            newtotal = str(int(icmp) + int(udp) + int(tcp))
            offset = len(total) - len(newtotal)
            for i in xrange(len(newtotal), 0, -1):
              tindex = total[offset:].find(str(newtotal[:i]))
              if (tindex != -1):
                p2p = total[:offset+tindex]
                total = total[offset+tindex:]
                break
          print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" % (cols[0], cols[1], cols[2], cols[3], cols[4], cols[5], cols[6], cols[7], cols[8], p2p, total, cols[10], cols[11], cols[12], cols[13], cols[14], cols[15], cols[16])
        else:
          print line