Thread: [gobo-eiffel-develop] Memory allocation overhead

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've spent the last three weekends making a serious attempt to get to
the bottom of the problem of quadratic performance in gexslt that has
been present ever since I first successfully ran an identity
transformation back in 2004.

I used the profiler in EiffelStudio 6.1 (a run that normally last 71
minutes when compiled with gec takes about 14 hours when compiled with
ISE 6.1 with the profiler on), and spent lots of time
following false leads (the report gives exact numbers for routine
calls, which is very useful, and nonsense for time spent e.g. 873.49%
for one routine, which is definitely not useful).

By this morning I had concluded there was nothing in my code that
could explain the results seen - or at least, if there was, I wasn't
going to be able to find it using the EiffelStudio profiler.

So I edited $GOBO/tool/gec/config/c/gcc.cfg to write out profiling
information for gprof (Eric, when can we have the setting of cflags
and lflags in system.xace?). The resulting runtime was 118 minutes - a
much more acceptable overhead. And the resulting report from gprof is
highly illuminating. here follows the first few lines:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 36.58   2943.00  2943.00                             GC_mark_from
 16.32   4256.06  1313.06                             GC_header_cache_miss
 13.18   5316.64  1060.58                             GC_mark_local
 10.57   6166.79   850.15                             GC_steal_mark_stack
  4.32   6514.36   347.57                             GC_add_to_black_list_stack
  1.55   6639.04   124.68                             GC_push_marked
  1.26   6740.27   101.23                             GC_find_header
  1.22   6838.10    97.83                             GC_reclaim_clear
  1.10   6926.51    88.41                             GC_do_local_mark
  0.82   6992.39    65.88                             GC_generic_malloc_many
  0.64   7044.16    51.77                             GC_block_was_dirty
  0.63   7095.09    50.93                             GC_install_header
  0.58   7141.48    46.39                             GC_apply_to_all_blocks
  0.47   7179.41    37.93                             GC_build_fl
  0.44   7214.44    35.03 530952040     0.00     0.00  T122f10
  0.40   7246.41    31.97                             GC_allochblk_nth
  0.29   7269.71    23.30                             GC_enclosing_mapping
  0.26   7290.70    20.99 262713928     0.00     0.00  T506f27
  0.25   7311.07    20.37                             GC_malloc
  0.25   7331.32    20.25 127354522     0.00     0.00  T162x16589
  0.22   7348.82    17.50                             GC_free_block_ending_at
  0.22   7366.13    17.31                             GC_malloc_atomic
  0.20   7382.11    15.99 3529690218     0.00     0.00  T240f8
  0.19   7397.55    15.44                             GC_reclaim_block
  0.18   7412.41    14.86                             GC_reclaim_uninit
  0.18   7427.16    14.75 529087403     0.00     0.00  T57f9
  0.18   7441.84    14.68 529541505     0.00     0.00  T15f4
  0.13   7452.31    10.47 2030496041     0.00     0.00  T240c7
  0.11   7461.08     8.77 105048154     0.00     0.00  T27f8
  0.10   7469.40     8.32 2049701317     0.00     0.00  GE_new239
  0.10   7477.42     8.02 7679890135     0.00     0.00  GE_check_null
  0.10   7485.32     7.90 172970992     0.00     0.00  T772f27p1
  0.10   7493.20     7.88 2049636054     0.00     0.00  GE_new240

in short, almost all the run time is for memory management (this did
not surprise me as typically I see the program using 125-290% CPU - I
have a quad-core processor, so fully utilized = 400% - and I
configured the boehm-gc for parallel marking).

Now my problem is what to do about it (incidentally T122f10 is
{DS_ARRAYED_LIST}.item - nothing very surprising about that either, I
guess.). Does anyone have any suggested approaches?
-- 
Colin Adams
Preston Lancashire