From: Colin P. A. <co...@co...> - 2008-03-28 12:50:12
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> Otherwise I try to be very careful with Eric> strings. String concatenations and substring operations Eric> create a lot of intermediary objects. Be careful as well Eric> when strings get resized (even implicitly by some Eric> operations). Is there a more efficient way of doing this sort of thing? a_preceding_path := parent.path if STRING_.same_string (a_preceding_path, "/") then Result := STRING_.concat (a_preceding_path, node_name) else Result := STRING_.concat (a_preceding_path, "/") Result := STRING_.appended_string (Result, node_name) Result := STRING_.appended_string (Result, "[") Result := STRING_.appended_string (Result, simple_number) Result := STRING_.appended_string (Result, "]") end -- Colin Adams Preston Lancashire |
From: Lothar S. <ll...@we...> - 2008-03-28 19:23:09
|
Hello Colin, Friday, March 28, 2008, 7:50:06 PM, you wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: CPA> Eric> Otherwise I try to be very careful with CPA> Eric> strings. String concatenations and substring operations CPA> Eric> create a lot of intermediary objects. Be careful as well CPA> Eric> when strings get resized (even implicitly by some CPA> Eric> operations). CPA> Is there a more efficient way of doing this sort of thing? CPA> a_preceding_path := parent.path CPA> if STRING_.same_string (a_preceding_path, "/") then CPA> Result := STRING_.concat (a_preceding_path, node_name) CPA> else CPA> Result := STRING_.concat (a_preceding_path, "/") CPA> Result := STRING_.appended_string (Result, node_name) CPA> Result := STRING_.appended_string (Result, "[") CPA> Result := STRING_.appended_string (Result, simple_number) CPA> Result := STRING_.appended_string (Result, "]") CPA> end Exactly for this reason some java compilers have a special optimization inside the code generator when it detects sequences of strings concatenations. It's just a to important operation. -- Best regards, Lothar mailto:ll...@we... |
From: Berend de B. <be...@po...> - 2008-03-29 03:43:54
Attachments:
smime.p7s
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: >>>>> "Eric" == Eric Bezault <er...@go...> writes: Colin> Is there a more efficient way of doing this sort of thing? Perhaps we should have: STRING_.concat(<<once "1", once "2", once "3", once "4">>) Strings are indeed somewhat trouble in Eiffel. Although the default case is safe, immutable strings and syntactic sugar for efficient string concatenations would have been nice. -- Cheers, Berend de Boer |
From: Eric B. <er...@go...> - 2008-04-04 12:17:20
|
Colin Paul Adams wrote: > But I am now wondering about code inlining. It appears that gec does > not inline all three routines new_child_tree_iterator, release_iterator > and new_descendant_tree_iterator in the code below. (Note that the bulk of > the class is commented out. The test program then runs in 21 minutes > and 8 seconds, as opposed to 20 minutes and 28 seconds (+- 3 seconds) > with the creations inlined, and no call to release_iterator (there is > an extra assignment to Void of the iterator passed to release_iterator > following the call to release_iterator, but I hardly think that can > account for the difference). With all the commented-out code restored, > (and assignemt of 4 default values to attributes in the creation > procedures, which costs about 6 seconds) the time rises to 21 minutes > and 52 seconds. I would surprise me that the level of inlining currently implemented in gec has anything to do with the time actually spent in the GC that was shown in your profiling results. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-04-05 06:51:11
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I would surprise me that the level of inlining currently Eric> implemented in gec has anything to do with the time actually Eric> spent in the GC that was shown in your profiling results. I later checked in the gestalt.h file, and there was no trace of these routines. So I guess they were optimized away (in one case) and inlined in the others. I don't understand how the runtime increase could have occured, but I have abandoned that approach now. -- Colin Adams Preston Lancashire |
From: Emmanuel S. [ES] <ma...@ei...> - 2008-04-04 17:15:45
|
> I just saw that in your last check-in you used MEMORY.free. > This is a bad idea in my opinion. And it won't help anyway > when using gec+boehm (it's currently implemented as a no-op). And this is actually very dangerous because even if you think there are no references to the object, the generated C code might still have a reference or two. So at the next GC cycle it could basically cause a problem (be a segmentation violation or something else). I'm not sure why it was added in MEMORY at the first place, but I would not use it. Manu |
From: Colin P. A. <co...@co...> - 2008-04-05 06:26:37
|
>>>>> "Emmanuel" == Emmanuel Stapf [ES] <ma...@ei...> writes: Emmanuel> think there are no references to the object, the Emmanuel> generated C code might still have a reference or two. So Emmanuel> at the next GC cycle it could basically cause a problem Emmanuel> (be a segmentation violation or something else). I'm not Emmanuel> sure why it was added in MEMORY at the first place, but Emmanuel> I would not use it. In that case it had better be marked obsolete. -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2008-04-05 08:21:18
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> I just saw that in your last check-in you used MEMORY.free. > Eric> This is a bad idea in my opinion. And it won't help anyway > Eric> when using gec+boehm (it's currently implemented as a > Eric> no-op). > > Are other features of MEMORY (such as allocate_fast and > set_memory_threshold) implemented? Not yet in gec. And I haven't studied yet whether the Boehm GC would provide such functionality. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-04-05 08:37:48
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: >> Are other features of MEMORY (such as allocate_fast and >> set_memory_threshold) implemented? Eric> Not yet in gec. And I haven't studied yet whether the Boehm Eric> GC would provide such functionality. OK. But in any case, I am fairly confident that I know what the basic problem I have is. Fixing it is another matter. XSLT is largely concerned with manipulating strings. So I think the lack of read-only strings, enabling substring to avoid copying the string contents, is likely a very major factor. I have some evidence in support of this. 1) The runtime of my program is non-linear wrt the size of the input data set. 2) If I use the tiny tree implementation for the input data set, the runtime increases by more than 4 times. The tiny tree implementation is an application of the flyweight pattern, designed to reduce the number of objects created. I copied the idea from Saxon (the contents of all text and comment nodes is held as a single STRING object within the document, and access to it is by substring, avoiding creating text and comment node objects for this purpose), but I overlooked that substring copies the data in Eiffel. Accordingly, I think I shall abandon my efforts for now, and pursue Manu's suggestion in FreeELKS for an aliased substring feature (I think full copy-on-write semantics will need to be available for STRINGs). I don't doubt that there are opportunities to improve my use of STRINGs in the code (a review would help), but I think this needs tackling first. -- Colin Adams Preston Lancashire |
From: Lothar S. <ll...@we...> - 2008-04-06 10:21:35
|
Hello Eric, Friday, April 4, 2008, 11:55:10 PM, you wrote: EB> Eric Bezault wrote: >> Colin Paul Adams wrote: >>> Do you use free at all? Obviously that won't reduce the number of >>> allocations, but it will reduce the number of objects the GC has to >>> deal with. Does it give any advantage over assigning Void to the >>> reference? >> >> I don't use free. EB> I just saw that in your last check-in you used MEMORY.free. EB> This is a bad idea in my opinion. And it won't help anyway EB> when using gec+boehm (it's currently implemented as a no-op). Which GC version are you using? GC_free is implemented in "malloc.c" and it works fine. I used it starting with 6.7 upto the latest 7.1.beta3-snapshot it was never a no-op. I'm using it a lot to free large buffers and it helps to keep memory footprint small. Waiting for a 500 KByte memory chunk to get freed by a conservative GC is just extremely risky. Remember if you using double and real values you might have a lot of false positives. -- Best regards, Lothar mailto:ll...@we... |
From: Eric B. <er...@go...> - 2008-04-06 11:29:37
|
Lothar Scholz wrote: > EB> I just saw that in your last check-in you used MEMORY.free. > EB> This is a bad idea in my opinion. And it won't help anyway > EB> when using gec+boehm (it's currently implemented as a no-op). > > Which GC version are you using? GC_free is implemented in "malloc.c" > and it works fine. I used it starting with 6.7 upto the latest > 7.1.beta3-snapshot it was never a no-op. I didn't say that GC_free was a no-op. MEMORY.free is. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2008-05-11 19:35:33
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: >>>>> "Eric" == Eric Bezault <er...@go...> writes: >>> Another possibility is to avoid OO techniques. For instance, I >>> know from last weekend's profiling that there is a VERY large >>> number of XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by >>> conversion from XM_XPATH_STRING_VALUE. Untyped-atomic is >>> XPath's coercible data type. Although untyped-atomic does not >>> inherit from xs:string in the XPath type hierarchy, I have >>> implemented XM_XPATH_UNTYPED_ATOMIC_VALUE as inheriting from >>> XM_XPATH_STRING_VALUE for convenience, as they are nearly >>> identical excpet for the coercing behaviour. So a clear saving >>> could be made by merging the two classes with a BOOLEAN to >>> indicate which type is actualy meant. Then the coercion to >>> xs:string is simply implemented by flipping the BOOLEAN. I >>> suspect this is going to be a big saving, but it is very >>> anti-OO. Eric> This might be the kind of things I could use indeed. Colin> It helped, although not as dramatically as eliminating my Colin> ARRAY [BOOLEAN]s. Eliminating these 4 arrays takes the time Colin> down from 71 minutes to 30 minutes. This change to the Colin> untyped atomic values brings it further down to 22 minutes. I decided to take it out in the end. Aliasing meant that I kept having to add bodges to get round bugs, and I couldn't be sure another one wasn't going to keep springing up. The runtime is now back up to 31 minutes. I may take another look at this possibility again in the future, but only after I have implemented my next plan. I have written a class (provisionally named ST_STRING) for fast read-only Unicode strings, plus an accompanying class ST_STRING_BUILDER. The implementation is UTF-32, and substring operations result in two objects sharing the same SPECIAL [INTEGER_32]. It will take me a long time, but I am going to convert the XPath/XPointer/XSLT libraries to use this class (the Unicode regular expression stuff has been on hold since February, and will continue to be so until I finish this). I expect it will make a very significant difference. If it does, I will post the two classes here for review, prior to any check-ins. What would be nice would be to have a common interface between this class and STRING_GENERAL, so as to reduce the amount of duplication of interface routines in the rest of the string library (named READABLE_STRING, perhaps). But I don't know if it will be practical yet. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2008-05-11 19:38:32
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Colin> The implementation is UTF-32, and Colin> substring operations result in two objects sharing the same Colin> SPECIAL [INTEGER_32]. That should have read NATURAL_32. -- Colin Adams Preston Lancashire |