From: Colin P. A. <co...@co...> - 2008-03-24 15:12:06
|
I've spent the last three weekends making a serious attempt to get to the bottom of the problem of quadratic performance in gexslt that has been present ever since I first successfully ran an identity transformation back in 2004. I used the profiler in EiffelStudio 6.1 (a run that normally last 71 minutes when compiled with gec takes about 14 hours when compiled with ISE 6.1 with the profiler on), and spent lots of time following false leads (the report gives exact numbers for routine calls, which is very useful, and nonsense for time spent e.g. 873.49% for one routine, which is definitely not useful). By this morning I had concluded there was nothing in my code that could explain the results seen - or at least, if there was, I wasn't going to be able to find it using the EiffelStudio profiler. So I edited $GOBO/tool/gec/config/c/gcc.cfg to write out profiling information for gprof (Eric, when can we have the setting of cflags and lflags in system.xace?). The resulting runtime was 118 minutes - a much more acceptable overhead. And the resulting report from gprof is highly illuminating. here follows the first few lines: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 36.58 2943.00 2943.00 GC_mark_from 16.32 4256.06 1313.06 GC_header_cache_miss 13.18 5316.64 1060.58 GC_mark_local 10.57 6166.79 850.15 GC_steal_mark_stack 4.32 6514.36 347.57 GC_add_to_black_list_stack 1.55 6639.04 124.68 GC_push_marked 1.26 6740.27 101.23 GC_find_header 1.22 6838.10 97.83 GC_reclaim_clear 1.10 6926.51 88.41 GC_do_local_mark 0.82 6992.39 65.88 GC_generic_malloc_many 0.64 7044.16 51.77 GC_block_was_dirty 0.63 7095.09 50.93 GC_install_header 0.58 7141.48 46.39 GC_apply_to_all_blocks 0.47 7179.41 37.93 GC_build_fl 0.44 7214.44 35.03 530952040 0.00 0.00 T122f10 0.40 7246.41 31.97 GC_allochblk_nth 0.29 7269.71 23.30 GC_enclosing_mapping 0.26 7290.70 20.99 262713928 0.00 0.00 T506f27 0.25 7311.07 20.37 GC_malloc 0.25 7331.32 20.25 127354522 0.00 0.00 T162x16589 0.22 7348.82 17.50 GC_free_block_ending_at 0.22 7366.13 17.31 GC_malloc_atomic 0.20 7382.11 15.99 3529690218 0.00 0.00 T240f8 0.19 7397.55 15.44 GC_reclaim_block 0.18 7412.41 14.86 GC_reclaim_uninit 0.18 7427.16 14.75 529087403 0.00 0.00 T57f9 0.18 7441.84 14.68 529541505 0.00 0.00 T15f4 0.13 7452.31 10.47 2030496041 0.00 0.00 T240c7 0.11 7461.08 8.77 105048154 0.00 0.00 T27f8 0.10 7469.40 8.32 2049701317 0.00 0.00 GE_new239 0.10 7477.42 8.02 7679890135 0.00 0.00 GE_check_null 0.10 7485.32 7.90 172970992 0.00 0.00 T772f27p1 0.10 7493.20 7.88 2049636054 0.00 0.00 GE_new240 in short, almost all the run time is for memory management (this did not surprise me as typically I see the program using 125-290% CPU - I have a quad-core processor, so fully utilized = 400% - and I configured the boehm-gc for parallel marking). Now my problem is what to do about it (incidentally T122f10 is {DS_ARRAYED_LIST}.item - nothing very surprising about that either, I guess.). Does anyone have any suggested approaches? -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-03-24 16:51:46
|
Colin Paul Adams wrote: > So I edited $GOBO/tool/gec/config/c/gcc.cfg to write out profiling > information for gprof (Eric, when can we have the setting of cflags > and lflags in system.xace?). It's already implemented as far as I know: <option name="c_compiler_options" value="..."/> <option name="link" value="..."/> > Now my problem is what to do about it (incidentally T122f10 is > {DS_ARRAYED_LIST}.item - nothing very surprising about that either, I > guess.). Does anyone have any suggested approaches? You can try reducing the amount of garbage generated, and hence put less stress on the GC. I try to do that on all the Gobo tools, even gec. This is one of my primary goals during the design phase. The net advantage is that these tools can run even without the GC. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-03-24 17:35:56
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> You can try reducing the amount of garbage generated So as an example, in XM_XPATH_STATIC_PROPERTY, I have four ARRAY [BOOLEAN] which are included in every expression. Presumably it would be much cheaper to remove the booleans from the arrays (I can't think why I put them in ARRAYs in the first place). -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2008-03-24 22:28:34
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: >>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> You can try reducing the amount of garbage generated Colin> So as an example, in XM_XPATH_STATIC_PROPERTY, I have four Colin> ARRAY [BOOLEAN] which are included in every Colin> expression. Presumably it would be much cheaper to remove Colin> the booleans from the arrays (I can't think why I put them Colin> in ARRAYs in the first place). This seems very productive. I changed the first two arrays to be just a series of BOOLEANs, and the runtime came down from 71 minutes to 50 minutes (my target is sub-one-minute, so there's a way to go, but it's a good start). So I'll do the other two ARRAYs next. Another possibility is actually to merge all the BOOLEANs as a bit-field. This will save memory but not the number of objects, so I'm not really interested in doing this. It might hurt performance. But I'm interested in what approaches/techniques you use to avoid extra allocations. Do you use free at all? Obviously that won't reduce the number of allocations, but it will reduce the number of objects the GC has to deal with. Does it give any advantage over assigning Void to the reference? Expanded attributes? This implies having default_create as a creation procedure, whereas the Gobo standard is to have an empty make. Another possibility is to avoid OO techniques. For instance, I know from last weekend's profiling that there is a VERY large number of XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by conversion from XM_XPATH_STRING_VALUE. Untyped-atomic is XPath's coercible data type. Although untyped-atomic does not inherit from xs:string in the XPath type hierarchy, I have implemented XM_XPATH_UNTYPED_ATOMIC_VALUE as inheriting from XM_XPATH_STRING_VALUE for convenience, as they are nearly identical excpet for the coercing behaviour. So a clear saving could be made by merging the two classes with a BOOLEAN to indicate which type is actualy meant. Then the coercion to xs:string is simply implemented by flipping the BOOLEAN. I suspect this is going to be a big saving, but it is very anti-OO. -- Colin Adams Preston Lancashire |
From: Daniel T. <dan...@gm...> - 2008-03-25 09:51:44
|
Eric Bezault wrote: > Colin Paul Adams wrote: > >> But I'm interested in what approaches/techniques you use to avoid >> extra allocations. >> >> Do you use free at all? Obviously that won't reduce the number of >> allocations, but it will reduce the number of objects the GC has to >> deal with. Does it give any advantage over assigning Void to the >> reference? >> > > I don't use free. > > >> Expanded attributes? This implies having default_create as a creation >> procedure, whereas the Gobo standard is to have an empty make. >> > > I don't use expanded. > > >> Another possibility is to avoid OO techniques. For instance, I know >> from last weekend's profiling that there is a VERY large number of >> XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by conversion from >> XM_XPATH_STRING_VALUE. Untyped-atomic is XPath's coercible data >> type. Although untyped-atomic does not inherit from xs:string in the >> XPath type hierarchy, I have implemented XM_XPATH_UNTYPED_ATOMIC_VALUE >> as inheriting from XM_XPATH_STRING_VALUE for convenience, as they are >> nearly identical excpet for the coercing behaviour. So a clear saving >> could be made by merging the two classes with a BOOLEAN to indicate >> which type is actualy meant. Then the coercion to xs:string is simply >> implemented by flipping the BOOLEAN. I suspect this is going to be a >> big saving, but it is very anti-OO. >> > > This might be the kind of things I could use indeed. Otherwise I try > to be very careful with strings. String concatenations and substring > operations create a lot of intermediary objects. Be careful as well > when strings get resized (even implicitly by some operations). Likewise > with DS_ARRAYED_... classes: try to create them with a capacity which > is not too big but not too small to avoid too many resizings. I know, > it's often not easy to choose the best capacity at creation time! > I also try to share objects as much as possible. Of course, when > shared, we have to be very careful that these objects don't have > their state modified. When that happens, then we need to clone it > beforehand. I use a lot of shared objects for the tokens when generating > the ASTs in gec. Another technique that I use is to try to reuse > objects, rather than giving them back to the GC and creating new > ones right after. In gec, I have AST visitor classes to implement > the different Eiffel "Degree" compilation passes. They all try to > keep the intermediary objects that they need to process a given > imput class. They reuse these intermediary objects when processing > the next input class, and so forth. I probably use other techniques, > but it's already a good start. One thing to remember is that when an > implementation technique is not "very" OO, try at least to make it > look as if it was OO in the class interface This kind of information is very interesting in my opinion. But I never read something comparable. I am not sure if you have a long term plan to use http://gobo-eiffel.wiki.sourceforge.net/, but I would be willing to publish it there. Is that ok? |
From: Eric B. <er...@go...> - 2008-03-25 14:01:05
|
Daniel Tuser wrote: > This kind of information is very interesting in my opinion. But I never > read something comparable. I am not sure if you have a long term plan to > use http://gobo-eiffel.wiki.sourceforge.net/, but I would be willing to > publish it there. Is that ok? I'm not a big fan of Wikis, so I have no long term plan to use the Gobo project Wiki. I'm fine if Gobo users want to use this space to publish Gobo related information. So feel free to use it. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2008-03-24 23:21:57
|
Colin Paul Adams wrote: > But I'm interested in what approaches/techniques you use to avoid > extra allocations. > > Do you use free at all? Obviously that won't reduce the number of > allocations, but it will reduce the number of objects the GC has to > deal with. Does it give any advantage over assigning Void to the > reference? I don't use free. > Expanded attributes? This implies having default_create as a creation > procedure, whereas the Gobo standard is to have an empty make. I don't use expanded. > Another possibility is to avoid OO techniques. For instance, I know > from last weekend's profiling that there is a VERY large number of > XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by conversion from > XM_XPATH_STRING_VALUE. Untyped-atomic is XPath's coercible data > type. Although untyped-atomic does not inherit from xs:string in the > XPath type hierarchy, I have implemented XM_XPATH_UNTYPED_ATOMIC_VALUE > as inheriting from XM_XPATH_STRING_VALUE for convenience, as they are > nearly identical excpet for the coercing behaviour. So a clear saving > could be made by merging the two classes with a BOOLEAN to indicate > which type is actualy meant. Then the coercion to xs:string is simply > implemented by flipping the BOOLEAN. I suspect this is going to be a > big saving, but it is very anti-OO. This might be the kind of things I could use indeed. Otherwise I try to be very careful with strings. String concatenations and substring operations create a lot of intermediary objects. Be careful as well when strings get resized (even implicitly by some operations). Likewise with DS_ARRAYED_... classes: try to create them with a capacity which is not too big but not too small to avoid too many resizings. I know, it's often not easy to choose the best capacity at creation time! I also try to share objects as much as possible. Of course, when shared, we have to be very careful that these objects don't have their state modified. When that happens, then we need to clone it beforehand. I use a lot of shared objects for the tokens when generating the ASTs in gec. Another technique that I use is to try to reuse objects, rather than giving them back to the GC and creating new ones right after. In gec, I have AST visitor classes to implement the different Eiffel "Degree" compilation passes. They all try to keep the intermediary objects that they need to process a given imput class. They reuse these intermediary objects when processing the next input class, and so forth. I probably use other techniques, but it's already a good start. One thing to remember is that when an implementation technique is not "very" OO, try at least to make it look as if it was OO in the class interface. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-03-28 19:07:07
|
>>>>> "Lothar" == Lothar Scholz <ll...@we...> writes: Lothar> Exactly for this reason some java compilers have a special Lothar> optimization inside the code generator when it detects Lothar> sequences of strings concatenations. It's just a to Lothar> important operation. so what do they do about it? -- Colin Adams Preston Lancashire |
From: Lothar S. <ll...@we...> - 2008-03-29 06:37:18
|
Hello Colin, Saturday, March 29, 2008, 2:07:09 AM, you wrote: >>>>>> "Lothar" == Lothar Scholz <ll...@we...> writes: CPA> Lothar> Exactly for this reason some java compilers have a special CPA> Lothar> optimization inside the code generator when it detects CPA> Lothar> sequences of strings concatenations. It's just a to CPA> Lothar> important operation. CPA> so what do they do about it? The compiler catches this and generates code for a a string buffer object that gets filled with the concatenated parts. Very simple to implement. -- Best regards, Lothar mailto:ll...@we... |
From: Colin P. A. <co...@co...> - 2008-03-29 13:07:53
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> That does not work in Eiffel. Each time you will access the Eric> expanded C, you will not get a reference to it but a copy of Eric> it. That's what I was afraid of. Perhaps we should have a syntax for this - reference assignment (note that I don't know if I can benefit from this or not - but even if I can't there are surely applications that can). -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-03-29 13:36:26
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> That does not work in Eiffel. Each time you will access the > Eric> expanded C, you will not get a reference to it but a copy of > Eric> it. > > That's what I was afraid of. > > Perhaps we should have a syntax for this - reference assignment (note > that I don't know if I can benefit from this or not - but even if I > can't there are surely applications that can). In Eiffel, expanded does not mean that its memory is a subpart of the memory of another object (this is the compiler which does -- or possibly does not -- optimize it in such a way). What Eiffel means by expanded is that an expanded object cannot be shared by two different objects, and as a consequence the language talks about expanded in terms of copy semantics. Allowing reference assignment as you suggest would just open a can of worms in the language definition. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-03-30 09:54:13
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> technique that I use is to try to reuse objects, rather than Eric> giving them back to the GC and creating new ones right Eric> after. In gec, I have AST visitor classes to implement the Eric> different Eiffel "Degree" compilation passes. They all try Eric> to keep the intermediary objects that they need to process a Eric> given imput class. They reuse these intermediary objects Eric> when processing the next input class, and so forth. Can you point me to a specific example? I am currently trying this with the iterators used to evaluate XPath sequences. I am retaining copies on a once DS_ARRAYED_STACK. So far, this is actually increasing the runtime (but it appears sensitive to the size of the stack, so I am currently trying with just a DS_CELL to see if this is better). -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-04-04 12:17:06
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> technique that I use is to try to reuse objects, rather than > Eric> giving them back to the GC and creating new ones right > Eric> after. In gec, I have AST visitor classes to implement the > Eric> different Eiffel "Degree" compilation passes. They all try > Eric> to keep the intermediary objects that they need to process a > Eric> given imput class. They reuse these intermediary objects > Eric> when processing the next input class, and so forth. > > Can you point me to a specific example? Instead of having iterator objects, I use the visitor pattern. See the descendants of ET_AST_PROCESSOR. And for a given task I use the same visitor object on all classes, instead of having a different object each time. This object can then keep some context that will be reinitialized each time (without necessarily having to create new objects). For example in ET_FEATURE_ADAPTATION_RESOLVER, this visitor uses hash-tables in order to make sense out of the inheritance clause feature adaptation of the class being processed. This visitor object is reused for all input classes, without having to create a new set of hash-tables each time. > I am currently trying this with the iterators used to evaluate XPath > sequences. I am retaining copies on a once DS_ARRAYED_STACK. So far, > this is actually increasing the runtime (but it appears sensitive to > the size of the stack, so I am currently trying with just a DS_CELL to > see if this is better). I didn't mean to implement yourself a memory management system by hand. It's not obvious that the time spent doing this manual memory management, with a pool of objects, will be more efficient than the GC itself. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-03-30 16:25:01
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Colin> I am currently trying this with the iterators used to Colin> evaluate XPath sequences. I am retaining copies on a once Colin> DS_ARRAYED_STACK. So far, this is actually increasing the Colin> runtime (but it appears sensitive to the size of the stack, Colin> so I am currently trying with just a DS_CELL to see if this Colin> is better). In fact the DS_CELL proved to be slower than a small stack. But I am now wondering about code inlining. It appears that gec does not inline all three routines new_child_tree_iterator, release_iterator and new_descendant_tree_iterator in the code below. (Note that the bulk of the class is commented out. The test program then runs in 21 minutes and 8 seconds, as opposed to 20 minutes and 28 seconds (+- 3 seconds) with the creations inlined, and no call to release_iterator (there is an extra assignment to Void of the iterator passed to release_iterator following the call to release_iterator, but I hardly think that can account for the difference). With all the commented-out code restored, (and assignemt of 4 default values to attributes in the creation procedures, which costs about 6 seconds) the time rises to 21 minutes and 52 seconds. I can't account for any of this. class XM_XPATH_ITERATOR_POOL feature -- Creation new_child_tree_iterator (a_starting_node: XM_XPATH_TREE_NODE; a_node_test: XM_XPATH_NODE_TEST): XM_XPATH_TREE_CHILD_ENUMERATION is -- New or reused child iterator for `a_starting_node' require starting_node_not_void: a_starting_node /= Void node_test_not_void: a_node_test /= Void do -- if child_tree_iterators.is_empty then create Result.make (a_starting_node, a_node_test) -- else -- Result := child_tree_iterators.item -- Result.make (a_starting_node, a_node_test) -- child_tree_iterators.remove -- end ensure new_child_tree_iterator_not_void: Result /= Void end new_descendant_tree_iterator (a_starting_node: XM_XPATH_TREE_NODE; a_node_test: XM_XPATH_NODE_TEST; a_include_self: BOOLEAN): XM_XPATH_TREE_DESCENDANT_ENUMERATION is -- New or reused descendant iterator for `a_starting_node' require starting_node_not_void: a_starting_node /= Void node_test_not_void: a_node_test /= Void do -- if descendant_tree_iterators.is_empty then create Result.make (a_starting_node, a_node_test, a_include_self) -- else -- Result := descendant_tree_iterators.item -- Result.make (a_starting_node, a_node_test, a_include_self) -- descendant_tree_iterators.remove -- end ensure new_descendant_tree_iterator_not_void: Result /= Void end feature -- Removal release_iterator (a_iterator: XM_XPATH_SEQUENCE_ITERATOR [XM_XPATH_ITEM]) is -- Return `a_iterator' to free memory. require a_iterator_not_void: a_iterator /= Void do -- if a_iterator.is_tree_child_enumeration then -- if child_tree_iterators.count < Maximum_queue_length then -- child_tree_iterators.put (a_iterator.as_tree_child_enumeration) -- end -- elseif a_iterator.is_tree_descendant_enumeration then -- if descendant_tree_iterators.count < Maximum_queue_length then -- descendant_tree_iterators.put (a_iterator.as_tree_descendant_enumeration) -- end -- end end feature {NONE} -- Implementation Maximum_queue_length: INTEGER is 5 -- Limit on queue size for any single iterator type -- child_tree_iterators: DS_ARRAYED_STACK [XM_XPATH_TREE_CHILD_ENUMERATION] -- -- Spare iterators over the child axis of XM_XPATH_TREE_NODEs -- once -- create Result.make (Maximum_queue_length) -- ensure -- child_tree_iterators_not_void: Result /= Void -- end -- descendant_tree_iterators: DS_ARRAYED_STACK [XM_XPATH_TREE_DESCENDANT_ENUMERATION] -- -- Spare iterators over the descendant axis of XM_XPATH_TREE_NODEs -- once -- create Result.make (Maximum_queue_length) -- ensure -- descendant_tree_iterators_not_void: Result /= Void -- end end -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-04-04 16:53:50
|
Eric Bezault wrote: > Colin Paul Adams wrote: >> Do you use free at all? Obviously that won't reduce the number of >> allocations, but it will reduce the number of objects the GC has to >> deal with. Does it give any advantage over assigning Void to the >> reference? > > I don't use free. I just saw that in your last check-in you used MEMORY.free. This is a bad idea in my opinion. And it won't help anyway when using gec+boehm (it's currently implemented as a no-op). I talked about reducing the amount of garbage generated, not about manually managing the memory. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-04-05 06:47:37
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I just saw that in your last check-in you used MEMORY.free. I've removed it now. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2008-04-05 07:08:44
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I just saw that in your last check-in you used MEMORY.free. Eric> This is a bad idea in my opinion. And it won't help anyway Eric> when using gec+boehm (it's currently implemented as a Eric> no-op). Are other features of MEMORY (such as allocate_fast and set_memory_threshold) implemented? -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2008-04-05 07:49:47
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I would surprise me that the level of inlining currently Eric> implemented in gec has anything to do with the time actually Eric> spent in the GC that was shown in your profiling results. That may be the case, but there is some (doubtful) evidence to the contrary. Since currently my execution time is approaching 4 times faster than when I previously profiled, I thought I would profile again to confirm that GC problems are still predominant (I was fairly sure they were, as top shows 200-300% CPU utilization still, which can only come from parallel marking). The new profile confirms this is the case, but there is the suggestion that a failure to inline will be significant if the GC overhead could be largely eliminated (I have named the top three calls - I'm assuming the comment in the .h file immediately precedes the extern declaration.): Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 31.55 675.04 675.04 GC_mark_from 16.24 1022.55 347.51 GC_steal_mark_stack 16.17 1368.49 345.94 GC_header_cache_miss 8.08 1541.30 172.81 GC_mark_local 1.44 1572.07 30.77 GC_reclaim_clear 1.27 1599.21 27.14 530952040 0.00 0.00 T131f10 DS_ARRAYED_LIST [INTEGER_32].item 1.19 1624.57 25.36 GC_push_marked 0.84 1642.47 17.90 GC_generic_malloc_many 0.81 1659.80 17.34 128695782 0.00 0.00 T171x16745 XM_XPATH_NODE.as_tree_node 0.77 1676.37 16.57 262642848 0.00 0.00 T515f29 XM_XPATH_ATTRIBUTE_COLLECTION.is_attribute_index_valid 0.64 1690.09 13.72 GC_do_local_mark 0.58 1702.48 12.39 181451145 0.00 0.00 T17x34 0.52 1713.50 11.02 181905247 0.00 0.00 T15f4 0.50 1724.23 10.73 GC_malloc 0.50 1734.90 10.67 GC_block_was_dirty 0.47 1745.00 10.10 GC_apply_to_all_blocks 0.34 1752.22 7.22 GC_install_header So we can see that the second highest Eiffel routine is XM_XPATH_NODE.as_tree_node. Since the definition of this is: as_tree_node: XM_XPATH_TREE_NODE is -- `Current' seen as a tree node do Result := Current end I would expect this to be a no-op. Is it not a call that provides typing information only to the compiler, so that at run-time there is not need for it? -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-04-05 08:29:20
|
Colin Paul Adams wrote: > I would expect this to be a no-op. Is it not a call that provides > typing information only to the compiler, so that at run-time there is > not need for it? As I claimed many times in the past, no effort has been made yet to improve the performance of the generated C code. IN particular in term of inlining or dynamic binding. My priority is to be fully compliant with ECMA and ISE before tackling performance issues. And from what I can see, spending time on this issue now will not significantly improve your figures as long as most of the time is still spent in the GC. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-04-06 10:33:10
|
>>>>> "Lothar" == Lothar Scholz <ll...@we...> writes: EB> I just saw that in your last check-in you used MEMORY.free. EB> This is a bad idea in my opinion. Why? I was only using it in places where I could guarantee that the object wouldn't be used again (not a frequent case). It can cut down the amount of work needed by the next mark cycle. EB> And it won't help anyway EB> when using gec+boehm (it's currently implemented as a no-op). Lothar> Which GC version are you using? GC_free is implemented in Lothar> "malloc.c" and it works fine. I used it starting with 6.7 Lothar> upto the latest 7.1.beta3-snapshot it was never a no-op. Lothar> I'm using it a lot to free large buffers and it helps to Lothar> keep memory footprint small. Waiting for a 500 KByte Lothar> memory chunk to get freed by a conservative GC is just Lothar> extremely risky. Lothar> Remember if you using double and real values you might Lothar> have a lot of false positives. But Gobo code has to work with ISE as well as GEC. And Manu warned that it might fail with ISE, as the C code might continue to hold a reference to the object. That's why I removed the calls to free. -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-04-06 11:37:55
|
Colin Paul Adams wrote: > EB> I just saw that in your last check-in you used MEMORY.free. > EB> This is a bad idea in my opinion. > > Why? I was only using it in places where I could guarantee that the > object wouldn't be used again (not a frequent case). It's a bad idea for two reasons. If GC have been invented, it's because we cannot trust humans, including you. You can guarantee that the object would not be used again today. But what about in two years from now, after several iterations of refactoring? And I usually don't trust features/constructs that nobody else (except Lothar) used despite the fact that it has been available for more than a decade, and even more when it is related to a so complex thing as a GC. You can use it at your own risk in gestalt or other developments, but if possible it would be nice if we could refrain from using it in the Gobo package. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Emmanuel S. [ES] <ma...@ei...> - 2008-03-24 17:05:19
|
> I used the profiler in EiffelStudio 6.1 (a run that normally > last 71 minutes when compiled with gec takes about 14 hours > when compiled with ISE 6.1 with the profiler on), and spent You should not enable profiler on all classes but just on yours. Then you see much quickly where the bottleneck is on your code rather than trying to profile STRING/ARRAY/ and other basic stuff which is not going to help you much. Once you have identified the bottleneck you can try reduce the scope of profiling to just the bottleneck. Manu |
From: Eric B. <er...@go...> - 2008-03-24 17:16:08
|
Colin Paul Adams wrote: > Now my problem is what to do about it (incidentally T122f10 is > {DS_ARRAYED_LIST}.item - nothing very surprising about that either, I > guess.). Does anyone have any suggested approaches? Can you send me the C code generated for T122f10? I don't understand how we can have 35 seconds self and 7214 seconds cumulative. Unless its thread is blocked when the GC is doing something special on its own thread. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2008-03-24 17:35:26
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> Can you send me the C code generated for T122f10? I don't > Eric> understand how we can have 35 seconds self and 7214 seconds > Eric> cumulative. Unless its thread is blocked when the GC is > Eric> doing something special on its own thread. > > > /* DS_ARRAYED_LIST [INTEGER_32].item */ > T6 T122f10(T0* C, T6 a1) > { > T6 R = 0; > R = (((T112*)(GE_void(((T122*)(C))->a2)))->z2[a1]); > return R; > } > Hmmm, unless you have calls-on-void-target, GE_void is a macro that does not call another function: #define GE_void (!(obj)?GE_check_void(obj):(obj)) So T122f10 does not call any other C function. So I would believe that the values for self and cumulative should be the same. The only explanation that I can think of is that the GC thread is blocking the other thread while executing T12f10. I know nothing about threads, and even less about how it works in the Boehm GC, so what I just said might be completely stupid. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-03-28 12:38:45
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: >> Another possibility is to avoid OO techniques. For instance, I >> know from last weekend's profiling that there is a VERY large >> number of XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by >> conversion from XM_XPATH_STRING_VALUE. Untyped-atomic is >> XPath's coercible data type. Although untyped-atomic does not >> inherit from xs:string in the XPath type hierarchy, I have >> implemented XM_XPATH_UNTYPED_ATOMIC_VALUE as inheriting from >> XM_XPATH_STRING_VALUE for convenience, as they are nearly >> identical excpet for the coercing behaviour. So a clear saving >> could be made by merging the two classes with a BOOLEAN to >> indicate which type is actualy meant. Then the coercion to >> xs:string is simply implemented by flipping the BOOLEAN. I >> suspect this is going to be a big saving, but it is very >> anti-OO. Eric> This might be the kind of things I could use Eric> indeed. It helped, although not as dramatically as eliminating my ARRAY [BOOLEAN]s. Eliminating these 4 arrays takes the time down from 71 minutes to 30 minutes. This change to the untyped atomic values brings it further down to 22 minutes. Eric> Another Eric> technique that I use is to try to reuse objects, rather than Eric> giving them back to the GC and creating new ones right Eric> after. In gec, I have AST visitor classes to implement the Eric> different Eiffel "Degree" compilation passes. They all try Eric> to keep the intermediary objects that they need to process a Eric> given imput class. They reuse these intermediary objects Eric> when processing the next input class, and so forth. Presumably you can only do this because you know when you have finished with an object, and don't have to rely on the garbage collector working out when it can free memory. I don't know how many such cases I am likely to find. If I do, then perhaps there is the possibility of doing memory pooling - allocating a single chunk of memory for a large number of identical objects. I'm not sure how to go about that in Eiffel. I would guess that if I have a reference class C, with default_create as a creation procedure (is that necessary?) then if I create an ARRAY [expanded C] the memory will all be claimed in one call to malloc. Then if I can get a reference to one of these objects (how do I do that in Eiffel?), I can call the real creation procedure (it will need to be exported to the calling class) to initialize the referenced expanded object. I then just have to keep track of free slots for reusing the memory. -- Colin Adams Preston Lancashire |