Hi,
I want to ask if it would be possible to add multithreading support to the graph analysis. When I click one of the DPT, SPT, DIP, SIP on a graph with big number of flows, it takes a lot of time. Disk I/O is not a problem here because I've mounted a ramdisk on the Flow_Working directory. Very quick look on the code shows that probably most time is spent on iterating through flows and/or drawing the graph.
This is CPU usage when I use analysis:
This is CPU usage when I use FlowViewer report, I would also like to note that I've enabled SiLK multithreading. It has made noticeable speedup. Here the bottleneck is disk I/O.
Thanks,
Jakub Vanek
Last edit: linuxtardis 2016-03-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've tried to overload FlowGrapher (bps analysis) while I measured the CPU load. First part is multithreaded rwfilter, then there's rwcut and the last is FlowGrapher perl script. This script's execution took the longest time. Link to screenshots
I have SiLK 3.10.1.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the data. Yes, FlowGrapher (and FlowGrapher_Analyze) are better utilized on smaller time spans. I do understand the desire to use longer periods, but the turnaround is too long. I have trained myself to work in shorter time frames. An alternative is to use FlowMonitor Groups though I know it may not generate what you're really looking for.
I would love to speed up the processing. The code is written in Perl and I'm not sure how or even if Perl can do multi-threading. Perhaps FlowGrapher should be written in C and called from the remainder of the code which is OK written in Perl. I suppose the longer time period would be subdivided with each thread handling a shorter time period and with the 'master' code gathering the results and building the tables necessary for the graphing code.
I've tried to speed it up as much as I can. For example, the Perl package to convert a time to an "epoch" time (i.e., seconds since 1972) used to quickly determine where a data point is in relation to graphing buckets, etc., is really slow and should not be used on a flow record-by-record basis. Instead, I calculate a substitute myself based on the first of the year. The FlowGrapher_Main.cgi script has embedded time checks (via the Time::HiRes package) and you can see how it uses CPU time. Perhaps there are some remaining "speed ups".
If you have some Ideas I would be very open to considereing them. However, I got myself into a very 'busy' employment situation right now and may not be able to help for some time. :-)
Best regards, thanks for your interest in FlowViewer, and thanks for helping others,
Joe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,


I want to ask if it would be possible to add multithreading support to the graph analysis. When I click one of the DPT, SPT, DIP, SIP on a graph with big number of flows, it takes a lot of time. Disk I/O is not a problem here because I've mounted a ramdisk on the Flow_Working directory. Very quick look on the code shows that probably most time is spent on iterating through flows and/or drawing the graph.
This is CPU usage when I use analysis:
This is CPU usage when I use FlowViewer report, I would also like to note that I've enabled SiLK multithreading. It has made noticeable speedup. Here the bottleneck is disk I/O.
Thanks,
Jakub Vanek
Last edit: linuxtardis 2016-03-28
I've tried to overload FlowGrapher (bps analysis) while I measured the CPU load. First part is multithreaded rwfilter, then there's rwcut and the last is FlowGrapher perl script. This script's execution took the longest time.
Link to screenshots
I have SiLK 3.10.1.
Jakub,
Thanks for the data. Yes, FlowGrapher (and FlowGrapher_Analyze) are better utilized on smaller time spans. I do understand the desire to use longer periods, but the turnaround is too long. I have trained myself to work in shorter time frames. An alternative is to use FlowMonitor Groups though I know it may not generate what you're really looking for.
I would love to speed up the processing. The code is written in Perl and I'm not sure how or even if Perl can do multi-threading. Perhaps FlowGrapher should be written in C and called from the remainder of the code which is OK written in Perl. I suppose the longer time period would be subdivided with each thread handling a shorter time period and with the 'master' code gathering the results and building the tables necessary for the graphing code.
I've tried to speed it up as much as I can. For example, the Perl package to convert a time to an "epoch" time (i.e., seconds since 1972) used to quickly determine where a data point is in relation to graphing buckets, etc., is really slow and should not be used on a flow record-by-record basis. Instead, I calculate a substitute myself based on the first of the year. The FlowGrapher_Main.cgi script has embedded time checks (via the Time::HiRes package) and you can see how it uses CPU time. Perhaps there are some remaining "speed ups".
If you have some Ideas I would be very open to considereing them. However, I got myself into a very 'busy' employment situation right now and may not be able to help for some time. :-)
Best regards, thanks for your interest in FlowViewer, and thanks for helping others,
Joe