From: Colin P. A. <co...@co...> - 2008-05-11 19:35:33
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: >>>>> "Eric" == Eric Bezault <er...@go...> writes: >>> Another possibility is to avoid OO techniques. For instance, I >>> know from last weekend's profiling that there is a VERY large >>> number of XM_XPATH_UNTYPED_ATOMIC_VALUE objects created by >>> conversion from XM_XPATH_STRING_VALUE. Untyped-atomic is >>> XPath's coercible data type. Although untyped-atomic does not >>> inherit from xs:string in the XPath type hierarchy, I have >>> implemented XM_XPATH_UNTYPED_ATOMIC_VALUE as inheriting from >>> XM_XPATH_STRING_VALUE for convenience, as they are nearly >>> identical excpet for the coercing behaviour. So a clear saving >>> could be made by merging the two classes with a BOOLEAN to >>> indicate which type is actualy meant. Then the coercion to >>> xs:string is simply implemented by flipping the BOOLEAN. I >>> suspect this is going to be a big saving, but it is very >>> anti-OO. Eric> This might be the kind of things I could use indeed. Colin> It helped, although not as dramatically as eliminating my Colin> ARRAY [BOOLEAN]s. Eliminating these 4 arrays takes the time Colin> down from 71 minutes to 30 minutes. This change to the Colin> untyped atomic values brings it further down to 22 minutes. I decided to take it out in the end. Aliasing meant that I kept having to add bodges to get round bugs, and I couldn't be sure another one wasn't going to keep springing up. The runtime is now back up to 31 minutes. I may take another look at this possibility again in the future, but only after I have implemented my next plan. I have written a class (provisionally named ST_STRING) for fast read-only Unicode strings, plus an accompanying class ST_STRING_BUILDER. The implementation is UTF-32, and substring operations result in two objects sharing the same SPECIAL [INTEGER_32]. It will take me a long time, but I am going to convert the XPath/XPointer/XSLT libraries to use this class (the Unicode regular expression stuff has been on hold since February, and will continue to be so until I finish this). I expect it will make a very significant difference. If it does, I will post the two classes here for review, prior to any check-ins. What would be nice would be to have a common interface between this class and STRING_GENERAL, so as to reduce the amount of duplication of interface routines in the rest of the string library (named READABLE_STRING, perhaps). But I don't know if it will be practical yet. -- Colin Adams Preston Lancashire |