From: Neeraj K. <ne...@cs...> - 2006-08-15 17:42:25
|
Hi all, Thanks for your quick replies! Mr. Scott: you are indeed correct that most of the time is being spent in vil_fill_line. As I mentioned in my previous email, I am using gcc 3.4.4 on cygwin, and my compiler flags are -O3 -pg. Using those settings, the top few lines of the generated profile are: % cumulative self self total time seconds seconds calls s/call s/call name 64.62 3.16 3.16 76662274 0.00 0.00 void vil_fill_line<bool>(bool*, unsigned int, int, bool) 10.02 3.65 0.49 20 0.02 0.02 andImages(vil_image_view<bool> const&, vil_image_view<bool> const&, vil_image_view<bool>&) 4.91 3.89 0.24 20 0.01 0.18 getProbImage(vil_image_view<unsigned int> const&, unsigned int, double, int, int) 4.50 4.11 0.22 1 0.22 0.22 UniformHistoProgrammer<unsigned char>::convert() 3.68 4.29 0.18 1348222 0.00 0.00 void fillRect<bool>(vil_image_view<bool>&, vgl_box_2d<int>, bool) As you can see, my program spends about 65% of its time in vil_fill_line, with the next-most expensive function taking less than 1/6th that time. As for why vil_fill_line is still showing up in the profiled output, I have no idea. I do not know if there are more advanced optimization settings than -O3 (I have to use -pg to enable profiling). I have even tried turning off the profiler, but that doesn't seem to affect performance very much. My only thought for making it run faster was to use memset, as Mr. Vanroose has suggested. Making that change results in the following profile: % cumulative self self total time seconds seconds calls s/call s/call name 27.78 0.50 0.50 20 0.03 0.03 andImages(vil_image_view<bool> const&, vil_image_view<bool> const&, vil_image_view<bool>&) 15.56 0.78 0.28 20 0.01 0.03 getProbImage(vil_image_view<unsigned int> const&, unsigned int, double, int, int) 14.44 1.04 0.26 1 0.26 0.26 UniformHistoProgrammer<unsigned char>::convert() 11.67 1.25 0.21 1348222 0.00 0.00 void fillRect<bool>(vil_image_view<bool>&, vgl_box_2d<int>, bool) As expected, vil_fill_line is gone, and we see that even the calling function fillRect (which now has a call to memset instead of vil_fill_line) has not become much more expensive. Also, the total running time of the program has been cut by about half, suggesting that nearly all of the time spent in vil_fill_line can be avoided by using memset. Of course as Mr. Vanroose pointed out, this can only be applied to images with 1-byte pixels and istep == 1. However, I know that for me (and I would imagine for many others), this is a very common case, and so I agree with Mr. Vanroose's suggestion to explicitly test for that case. My thanks to both of you for your help. --Neeraj |