From: Ron v. O. <R.A...@Wb...> - 2000-05-06 12:40:15
|
Robert L Krawitz wrote: > > Date: Fri, 05 May 2000 22:33:08 +0200 > From: Ron van Ostayen <R.A...@wb...> > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 58.38 48.02 48.02 7650 6.28 8.45 dither_cmyk > 20.23 64.66 16.64 187272000 0.00 0.00 print_color > 18.73 80.07 15.41 7650 2.01 2.01 rgb_to_rgb > 0.97 80.87 0.80 151700 0.01 0.01 > mem_true24_fill_rectangle > 0.52 81.30 0.43 24857 0.02 0.02 escp2_pack > 0.16 81.43 0.13 3521567 0.00 0.00 > gs_debug_c > <snip> > > OK. The results are clear. dither_cmyk is expensive because it > takes a long time each time its called. I looked at this routine > and I must say: I haven't got a clue what's going on :-). > > print_color is expensive partly because it's called so many times. > This routine can therefore perhaps run faster if the large number of > arguments to the routine is reduced, perhaps by combining them in a > structure or by moving them to global space. > > rgb_to_rgb is expensive because it takes a long time each time its > called. > > Well, dither_cmyk and rgb_to_rgb are really just loops over all the > pixels in a row. print_color is actually called four times per > iteration (once per color), except when black is printed, when it's > called only once. > > I'm surprised (and not entirely displeased) to see print_color taking > so little time, actually. You might try marking it inline to see if > that improves matters any. The calls are quick enough that things Debug and profiling information but NO inlining: 46s Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.58 9.28 9.28 3825 2.43 5.01 dither_cmyk 26.58 16.85 7.57 3825 1.98 1.98 rgb_to_rgb 20.22 22.61 5.76 46818000 0.00 0.00 update_dither 14.50 26.74 4.13 46818000 0.00 0.00 print_color 3.05 27.61 0.87 151700 0.01 0.01 mem_true24_fill_rectangle OK. I have inlined update_dither, print_color, calc_rgb_to_hsv and calc_hsv_to_rgb. (with debug and profiling information) Runtime: 22.3s !! Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 56.80 11.86 11.86 3825 3.10 3.10 dither_cmyk 35.15 19.20 7.34 3825 1.92 1.92 rgb_to_rgb 3.74 19.98 0.78 151700 0.01 0.01 mem_true24_fill_rectangle 0.62 20.11 0.13 14283 0.01 0.01 escp2_pack As you can see, update_dither and print_color have been succesfully inlined and more important the runtime is MUCH better. However, the performance gain is smaller if you run code without debug and profiling information. The gain is about 30%. But I think this is substantial enough to put the change into the repository (Robert?). > just might; each call is taking less than 100 NANOseconds. I figure > they're all running out of cache, which is good, but that's still > amazingly fast. What kind of processor do you have? Nothing special. A dual PII-450 system (ASUS P2B-DS, 128Mb). However ghostscript isn't compiled with SMP support. > > Also, try setting saturation to 1.0 to see if rgb_to_rgb speeds up any > (and if so, how much). The real problem is dither_cmyk, though. I -dSaturation=1.0 : 46s (includes debug and profiling information, therefore not optimal) Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.58 9.28 9.28 3825 2.43 5.01 dither_cmyk 26.58 16.85 7.57 3825 1.98 1.98 rgb_to_rgb 20.22 22.61 5.76 46818000 0.00 0.00 update_dither 14.50 26.74 4.13 46818000 0.00 0.00 print_color 3.05 27.61 0.87 151700 0.01 0.01 mem_true24_fill_rectangle -dSaturation=1.2 : 65s (include debug and profiling information, no inline functions) Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 29.09 12.31 12.31 3825 3.22 5.61 rgb_to_rgb 22.19 21.70 9.39 3825 2.45 5.02 dither_cmyk 12.90 27.16 5.46 46818000 0.00 0.00 update_dither 10.96 31.80 4.64 11704500 0.00 0.00 calc_rgb_to_hsv 10.66 36.31 4.51 11704500 0.00 0.00 calc_hsv_to_rgb 10.30 40.67 4.36 46818000 0.00 0.00 print_color 1.91 41.48 0.81 151700 0.01 0.01 mem_true24_fill_rectangle Although dither_cmyk is big, a lot of time is also spend in rgb_to_rgb even with saturation=1.0. Once again, a lot of this extra time is required to log profiling information. Nevertheless I think the code would speed up a lot if all functions that are called for every dot are inlined. > think I know in general what's wrong there, but actually fixing it > won't be trivial. I was afraid it wouldn't be. > > BTW, I couldn't reproduce your ESC900 problem. That output below is > definitely for the 900. Trust me, even if you don't know what to look > for, I do :-) Double check what model you're using. Also, after you > check out the CVS repository, do "make ghost" before copying the .c > and .h files from Ghost into your GhostScript source. > I rechecked my batch file again and I finally found the error. #!/bin/sh for q in 0 1 2 3 do echo Quality $q time gs -q -dNOPAUSE -dBATCH -sDEVICE=stp -dMODEL=13 \ -dQuality=$q -sOutputFile=tiger$q.prn tiger.ps done The -dMODEL=13 is wrong of course. It should be -dModel=13. Sometimes you just HAVE to make a mistake like that. I have rechecked all quality modes and those that should work on an ESC900 do work. Ron van Ostayen R.A...@Wb... Laboratory of Tribology Department of Mechanical Engineering University of Technology Delft |