Menu

FlowGrapher Analysis - Feature request - Multithreadeding

2016-03-28
2016-04-01
  • linuxtardis

    linuxtardis - 2016-03-28

    Hi,
    I want to ask if it would be possible to add multithreading support to the graph analysis. When I click one of the DPT, SPT, DIP, SIP on a graph with big number of flows, it takes a lot of time. Disk I/O is not a problem here because I've mounted a ramdisk on the Flow_Working directory. Very quick look on the code shows that probably most time is spent on iterating through flows and/or drawing the graph.
    This is CPU usage when I use analysis:
    FlowGrapher Analysis
    This is CPU usage when I use FlowViewer report, I would also like to note that I've enabled SiLK multithreading. It has made noticeable speedup. Here the bottleneck is disk I/O.
    FlowViewer Report
    Thanks,
    Jakub Vanek

     

    Last edit: linuxtardis 2016-03-28
  • linuxtardis

    linuxtardis - 2016-03-29

    I've tried to overload FlowGrapher (bps analysis) while I measured the CPU load. First part is multithreaded rwfilter, then there's rwcut and the last is FlowGrapher perl script. This script's execution took the longest time.
    Link to screenshots
    I have SiLK 3.10.1.

     
  • Joe Loiacono

    Joe Loiacono - 2016-04-01

    Jakub,

    Thanks for the data. Yes, FlowGrapher (and FlowGrapher_Analyze) are better utilized on smaller time spans. I do understand the desire to use longer periods, but the turnaround is too long. I have trained myself to work in shorter time frames. An alternative is to use FlowMonitor Groups though I know it may not generate what you're really looking for.

    I would love to speed up the processing. The code is written in Perl and I'm not sure how or even if Perl can do multi-threading. Perhaps FlowGrapher should be written in C and called from the remainder of the code which is OK written in Perl. I suppose the longer time period would be subdivided with each thread handling a shorter time period and with the 'master' code gathering the results and building the tables necessary for the graphing code.

    I've tried to speed it up as much as I can. For example, the Perl package to convert a time to an "epoch" time (i.e., seconds since 1972) used to quickly determine where a data point is in relation to graphing buckets, etc., is really slow and should not be used on a flow record-by-record basis. Instead, I calculate a substitute myself based on the first of the year. The FlowGrapher_Main.cgi script has embedded time checks (via the Time::HiRes package) and you can see how it uses CPU time. Perhaps there are some remaining "speed ups".

    If you have some Ideas I would be very open to considereing them. However, I got myself into a very 'busy' employment situation right now and may not be able to help for some time. :-)

    Best regards, thanks for your interest in FlowViewer, and thanks for helping others,

    Joe

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.