From: Colin P. A. <co...@co...> - 2008-03-24 15:12:06
|
I've spent the last three weekends making a serious attempt to get to the bottom of the problem of quadratic performance in gexslt that has been present ever since I first successfully ran an identity transformation back in 2004. I used the profiler in EiffelStudio 6.1 (a run that normally last 71 minutes when compiled with gec takes about 14 hours when compiled with ISE 6.1 with the profiler on), and spent lots of time following false leads (the report gives exact numbers for routine calls, which is very useful, and nonsense for time spent e.g. 873.49% for one routine, which is definitely not useful). By this morning I had concluded there was nothing in my code that could explain the results seen - or at least, if there was, I wasn't going to be able to find it using the EiffelStudio profiler. So I edited $GOBO/tool/gec/config/c/gcc.cfg to write out profiling information for gprof (Eric, when can we have the setting of cflags and lflags in system.xace?). The resulting runtime was 118 minutes - a much more acceptable overhead. And the resulting report from gprof is highly illuminating. here follows the first few lines: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 36.58 2943.00 2943.00 GC_mark_from 16.32 4256.06 1313.06 GC_header_cache_miss 13.18 5316.64 1060.58 GC_mark_local 10.57 6166.79 850.15 GC_steal_mark_stack 4.32 6514.36 347.57 GC_add_to_black_list_stack 1.55 6639.04 124.68 GC_push_marked 1.26 6740.27 101.23 GC_find_header 1.22 6838.10 97.83 GC_reclaim_clear 1.10 6926.51 88.41 GC_do_local_mark 0.82 6992.39 65.88 GC_generic_malloc_many 0.64 7044.16 51.77 GC_block_was_dirty 0.63 7095.09 50.93 GC_install_header 0.58 7141.48 46.39 GC_apply_to_all_blocks 0.47 7179.41 37.93 GC_build_fl 0.44 7214.44 35.03 530952040 0.00 0.00 T122f10 0.40 7246.41 31.97 GC_allochblk_nth 0.29 7269.71 23.30 GC_enclosing_mapping 0.26 7290.70 20.99 262713928 0.00 0.00 T506f27 0.25 7311.07 20.37 GC_malloc 0.25 7331.32 20.25 127354522 0.00 0.00 T162x16589 0.22 7348.82 17.50 GC_free_block_ending_at 0.22 7366.13 17.31 GC_malloc_atomic 0.20 7382.11 15.99 3529690218 0.00 0.00 T240f8 0.19 7397.55 15.44 GC_reclaim_block 0.18 7412.41 14.86 GC_reclaim_uninit 0.18 7427.16 14.75 529087403 0.00 0.00 T57f9 0.18 7441.84 14.68 529541505 0.00 0.00 T15f4 0.13 7452.31 10.47 2030496041 0.00 0.00 T240c7 0.11 7461.08 8.77 105048154 0.00 0.00 T27f8 0.10 7469.40 8.32 2049701317 0.00 0.00 GE_new239 0.10 7477.42 8.02 7679890135 0.00 0.00 GE_check_null 0.10 7485.32 7.90 172970992 0.00 0.00 T772f27p1 0.10 7493.20 7.88 2049636054 0.00 0.00 GE_new240 in short, almost all the run time is for memory management (this did not surprise me as typically I see the program using 125-290% CPU - I have a quad-core processor, so fully utilized = 400% - and I configured the boehm-gc for parallel marking). Now my problem is what to do about it (incidentally T122f10 is {DS_ARRAYED_LIST}.item - nothing very surprising about that either, I guess.). Does anyone have any suggested approaches? -- Colin Adams Preston Lancashire |