## saxon-help

 [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 00:47:39 ```Hi, I have the following questions concerning the implementation of sequences in Saxon: 1. Will the timings for evaluating \$seq[1], \$seq[x], \$seq[2000000] be approximately the same or hugely different? \$seq is a sequence of item(), it may contain simple builtin datatypes as xs:integer and xs:string, or nodes (node()) or document nodes. 2. Will the time for the evaluation of substring(unparsed-text(), 1, 1)), substring(unparsed-text(), x, 1)), substring(unparsed-text(), 2000000, 1)) be approximately the same or hugely different. More specifically, will the evaluation of the first expression above be almost "instantaneous? 3. Will the future versions of Saxon maintain the same behaviour for 1. and 2. or are there any plans to change it? -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Michael Kay - 2007-07-25 06:57:20 ```> 1. Will the timings for evaluating \$seq[1], \$seq[x], > \$seq[2000000] be approximately the same or hugely different? > \$seq is a sequence of item(), it may contain simple builtin > datatypes as xs:integer and xs:string, or nodes (node()) or > document nodes. The first time \$seq[N] is evaluated, the time will be proportional to N. On subsequent occasions it should be effectively instantaneous. A sequence held in a variable is generally evaluated "on demand"; each request evaluates as much of the sequence as is needed to satisfy that request, and saves what it has read in a directly-addressible buffer. A few special cases apply, for example if \$seq was created as the tail of another sequence then it shares its memory. > > 2. Will the time for the evaluation of > substring(unparsed-text(), 1, 1)), > substring(unparsed-text(), x, 1)), substring(unparsed-text(), > 2000000, 1)) be approximately the same or hugely different. > More specifically, will the evaluation of the first > expression above be almost "instantaneous? unparsed-text() reads the whole of the identified resource into a StringValue which wraps a FastStringBuffer which wraps a char[] array. The first call of substring() on such a StringValue will scan the whole array to see whether it contains any UTF-16 surrogate pairs. If it does not, then subsequent calls on substring() will have constant performance; if it does, then subsequent calls on substring() will scan the array from the start each time to count character positions. > > 3. Will the future versions of Saxon maintain the same > behaviour for 1. and 2. or are there any plans to change it? > If opportunities arise to improve performance then I will take them... But there are no specific plans. Michael Kay http://www.saxonica.com/ ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 12:45:58 ```Thanks a lot. This was exactly the information I needed. -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play On 7/24/07, Michael Kay wrote: > > 1. Will the timings for evaluating \$seq[1], \$seq[x], > > \$seq[2000000] be approximately the same or hugely different? > > \$seq is a sequence of item(), it may contain simple builtin > > datatypes as xs:integer and xs:string, or nodes (node()) or > > document nodes. > > The first time \$seq[N] is evaluated, the time will be proportional to N. On > subsequent occasions it should be effectively instantaneous. A sequence held > in a variable is generally evaluated "on demand"; each request evaluates as > much of the sequence as is needed to satisfy that request, and saves what it > has read in a directly-addressible buffer. > > A few special cases apply, for example if \$seq was created as the tail of > another sequence then it shares its memory. > > > > 2. Will the time for the evaluation of > > substring(unparsed-text(), 1, 1)), > > substring(unparsed-text(), x, 1)), substring(unparsed-text(), > > 2000000, 1)) be approximately the same or hugely different. > > More specifically, will the evaluation of the first > > expression above be almost "instantaneous? > > unparsed-text() reads the whole of the identified resource into a > StringValue which wraps a FastStringBuffer which wraps a char[] array. The > first call of substring() on such a StringValue will scan the whole array to > see whether it contains any UTF-16 surrogate pairs. If it does not, then > subsequent calls on substring() will have constant performance; if it does, > then subsequent calls on substring() will scan the array from the start each > time to count character positions. > > > > 3. Will the future versions of Saxon maintain the same > > behaviour for 1. and 2. or are there any plans to change it? > > > If opportunities arise to improve performance then I will take them... But > there are no specific plans. > > Michael Kay > http://www.saxonica.com/ > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 13:18:03 ```> The first time \$seq[N] is evaluated, the time will be proportional to N. On > subsequent occasions it should be effectively instantaneous. A sequence held > in a variable is generally evaluated "on demand"; each request evaluates as > much of the sequence as is needed to satisfy that request, and saves what it > has read in a directly-addressible buffer. Therefore, if we need the same instantaneous (O(C)) access to any item of the sequence, would it best to initially evalluate: \$seq[count(\$seq)] or just count(\$seq) > A few special cases apply, for example if \$seq was created as the tail of > another sequence then it shares its memory. That's geat! Is my understanding correct that in this case (if the bigger sequence (part of which is \$seq) has already all of its items instantaneously accessible then) the items of \$seq are instantaneously accessible? > unparsed-text() reads the whole of the identified resource into a > StringValue which wraps a FastStringBuffer which wraps a char[] array. The > first call of substring() on such a StringValue will scan the whole array to > see whether it contains any UTF-16 surrogate pairs. If it does not, then > subsequent calls on substring() will have constant performance; if it does, > then subsequent calls on substring() will scan the array from the start each > time to count character positions. So, UTF-16 is the worst case... Is it correct then that a file can be read lazily if it starts with a BOM specifying a non-UTF-16 encoding? -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Colin Paul Adams - 2007-07-25 13:32:19 ```>>>>> "Dimitre" == Dimitre Novatchev writes: Dimitre> So, UTF-16 is the worst case... Dimitre> Is it correct then that a file can be read lazily if it Dimitre> starts with a BOM specifying a non-UTF-16 encoding? If you know the encoding of the file, then you can specify it. So there isn't a need that it starts with a BOM. Which is just as well, as most encodings don't have BOMs. -- Colin Adams Preston Lancashire ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Michael Kay - 2007-07-25 13:57:04 ``` > > Therefore, if we need the same instantaneous (O(C)) access to > any item of the sequence, would it best to initially evalluate: > > \$seq[count(\$seq)] > > or just > > count(\$seq) Either would force full evaluation of the sequence, so they're essentially equivalent. But why not let Saxon's lazy strategy be used? > > > > A few special cases apply, for example if \$seq was created > as the tail > > of another sequence then it shares its memory. > > That's geat! Is my understanding correct that in this case > (if the bigger sequence (part of which is \$seq) has already > all of its items instantaneously accessible then) the items > of \$seq are instantaneously accessible? I'd be reluctant to say that will always happen - there are a lot of special cases - but in general I think this should be true. I'm working on a problem from Andrew Welch at the moment that claims subsequence(\$s,2) is faster than remove(\$seq,1), for example - they should be the same. > > > unparsed-text() ... > > So, UTF-16 is the worst case... > No, once it's in a char[] array it's in UTF-16 whatever the original encoding was. The worst case arises when the text contains non-BMP characters that occupy two char[] positions; this prevents direct indexing into the char[] array. Michael Kay http://www.saxonica.com/ ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 14:19:23 ```> But why not let Saxon's lazy strategy be used? There are cases when someone may want an array-like datatype with the same O(c) access time to all the members guaranteed. Further, this will make any item addressable by its index, so one could use the index essentially as a pointer: \$MEM[ \$i ] This would not have been needed if there were in XSLT the reverse function of generate-id() (and anyway, generate-id() only works on nodes). Maybe a generalisation of generate-id() could be specified by EXSLT: genID(\$anything) returning a handle (hate to use the word "pointer") and a get(\$handle) such that get( genID(\$something)) = \$something -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Michael Kay - 2007-07-25 14:29:58 ```> > There are cases when someone may want an array-like datatype > with the same O(c) access time to all the members guaranteed. Well, there's going to be a fixed time T to create the array, and then a constant time t to access each member. If you use lazy evaluation, the only difference is that the cost T will be distributed so that some of the member access values are increased to t+T/x. The total time will be the same or smaller, so how can you tell the difference? You're going to get some variations due to things like garbage collection anyway. > Maybe a generalisation of generate-id() could be specified by EXSLT: > > genID(\$anything) returning a handle (hate to use the word > "pointer") > Sorry, don't understand your use case here at all. Surely if generate-id() returns anything for an atomic value, it would return that value? Atomic values and sequences don't have an identity, they only have a value, so the notion of an id (or pointer) seems flawed. Michael Kay http://www.saxonica.com/ ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 16:12:05 ```> Well, there's going to be a fixed time T to create the array, and then a > constant time t to access each member. If you use lazy evaluation, the only > difference is that the cost T will be distributed so that some of the member > access values are increased to t+T/x. The total time will be the same or > smaller, so how can you tell the difference? Some people need to be able to time their implementations in a reliable/consistent way. Having an accidental T/x distorts the picture (and will distort it most of the time) > Sorry, don't understand your use case here at all. Surely if generate-id() > returns anything for an atomic value, it would return that value? Atomic > values and sequences don't have an identity, they only have a value, so the > notion of an id (or pointer) seems flawed. If the value is a 2MB string, then why not identify this 2MB string with something much more compact (say with an integer)? The same goes for a sequence of 2000000 items. -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play On 7/25/07, Michael Kay wrote: > > > > There are cases when someone may want an array-like datatype > > with the same O(c) access time to all the members guaranteed. > > Well, there's going to be a fixed time T to create the array, and then a > constant time t to access each member. If you use lazy evaluation, the only > difference is that the cost T will be distributed so that some of the member > access values are increased to t+T/x. The total time will be the same or > smaller, so how can you tell the difference? You're going to get some > variations due to things like garbage collection anyway. > > > > Maybe a generalisation of generate-id() could be specified by EXSLT: > > > > genID(\$anything) returning a handle (hate to use the word > > "pointer") > > > > Sorry, don't understand your use case here at all. Surely if generate-id() > returns anything for an atomic value, it would return that value? Atomic > values and sequences don't have an identity, they only have a value, so the > notion of an id (or pointer) seems flawed. > > Michael Kay > http://www.saxonica.com/ > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 16:36:01 ```> If the value is a 2MB string, then why not identify this 2MB string > with something much more compact (say with an integer)? > > The same goes for a sequence of 2000000 items. And if we need to persist a *reference* to this 2MB string, let's say as a node in an xml document and later on retrieve this reference when we need to do something with this string, we must copy the whole 2MB, in case there isn't a way to provide just a short reference. ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Andrew Welch - 2007-07-29 13:30:24 ```On 7/25/07, Dimitre Novatchev wrote: > > Well, there's going to be a fixed time T to create the array, and then a > > constant time t to access each member. If you use lazy evaluation, the only > > difference is that the cost T will be distributed so that some of the member > > access values are increased to t+T/x. The total time will be the same or > > smaller, so how can you tell the difference? > > Some people need to be able to time their implementations in a > reliable/consistent way. Having an accidental T/x distorts the picture > (and will distort it most of the time) > > > Sorry, don't understand your use case here at all. Surely if generate-id() > > returns anything for an atomic value, it would return that value? Atomic > > values and sequences don't have an identity, they only have a value, so the > > notion of an id (or pointer) seems flawed. > > If the value is a 2MB string, then why not identify this 2MB string > with something much more compact (say with an integer)? > > The same goes for a sequence of 2000000 items. Hi Dimitre, I'm not too sure what your goal is here, but I think a simple extension function might be what you're requesting: public class HashMapExt { private static HashMap hashmap = new HashMap(); public static int put(Object item) { int hc = item.hashCode(); hashmap.put(hc, item); return hc; } public static Object get(int key) { return hashmap.get(key); } } and then: produces: abc def The extension takes the item and returns the hash code for it - the simple integer representation you were after. The \$stringIDs variable holds a sequence of integers, and the get() method returns the original strings given the hash code. Is that what you meant? -- http://andrewjwelch.com ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-29 14:37:30 ```Hi Andrew, Yes, this code will do the work, especially if item.hashCode(); can be guaranteed to produce different values for different "item"s Cheers, Dimitre On 7/29/07, Andrew Welch wrote: > On 7/25/07, Dimitre Novatchev wrote: > > > Well, there's going to be a fixed time T to create the array, and then a > > > constant time t to access each member. If you use lazy evaluation, the only > > > difference is that the cost T will be distributed so that some of the member > > > access values are increased to t+T/x. The total time will be the same or > > > smaller, so how can you tell the difference? > > > > Some people need to be able to time their implementations in a > > reliable/consistent way. Having an accidental T/x distorts the picture > > (and will distort it most of the time) > > > > > Sorry, don't understand your use case here at all. Surely if generate-id() > > > returns anything for an atomic value, it would return that value? Atomic > > > values and sequences don't have an identity, they only have a value, so the > > > notion of an id (or pointer) seems flawed. > > > > If the value is a 2MB string, then why not identify this 2MB string > > with something much more compact (say with an integer)? > > > > The same goes for a sequence of 2000000 items. > > Hi Dimitre, > > I'm not too sure what your goal is here, but I think a simple > extension function might be what you're requesting: > > public class HashMapExt { > > private static HashMap hashmap = new HashMap(); > > public static int put(Object item) { > int hc = item.hashCode(); > hashmap.put(hc, item); > return hc; > } > > public static Object get(int key) { > return hashmap.get(key); > } > } > > and then: > > > > > > > > produces: > > abc def > > The extension takes the item and returns the hash code for it - the > simple integer representation you were after. The \$stringIDs variable > holds a sequence of integers, and the get() method returns the > original strings given the hash code. > > Is that what you meant? > > > -- > http://andrewjwelch.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Florent Georges - 2007-07-29 16:44:32 ```Andrew Welch wrote: Hi > public static int put(Object item) { > int hc = item.hashCode(); > hashmap.put(hc, item); > return hc; > } > public static Object get(int key) { > return hashmap.get(key); > } That won't work as two different Java objects can have the same hash code. That should be possible to generate unique IDs externally (not from a method of the object itself), but I don't have an idea out of the top of my head (I guess that if this is possible, that should be easy to find a solution with your favorite search engine). Regards, --drkm _____________________________________________________________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Andrew Welch - 2007-07-29 18:51:17 ```On 7/29/07, Florent Georges wrote: > Andrew Welch wrote: > > Hi > > > public static int put(Object item) { > > int hc = item.hashCode(); > > hashmap.put(hc, item); > > return hc; > > } > > > public static Object get(int key) { > > return hashmap.get(key); > > } > > That won't work as two different Java objects can have the same hash > code. That should be possible to generate unique IDs externally (not > from a method of the object itself), but I don't have an idea out of > the top of my head (I guess that if this is possible, that should be > easy to find a solution with your favorite search engine). hashCode() is _the_ way to generate an ID for an object (which is probably why you can't think of anything better....!) -- http://andrewjwelch.com ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-29 19:31:22 ```On 7/29/07, Andrew Welch wrote: > On 7/29/07, Florent Georges wrote: > > Andrew Welch wrote: > > > > Hi > > > > > public static int put(Object item) { > > > int hc = item.hashCode(); > > > hashmap.put(hc, item); > > > return hc; > > > } > > > > > public static Object get(int key) { > > > return hashmap.get(key); > > > } > > > > That won't work as two different Java objects can have the same hash > > code. That should be possible to generate unique IDs externally (not > > from a method of the object itself), but I don't have an idea out of > > the top of my head (I guess that if this is possible, that should be > > easy to find a solution with your favorite search engine). > > > hashCode() is _the_ way to generate an ID for an object (which is > probably why you can't think of anything better....!) Florent was saying (if I understand him right) that if we have two objects of two different types, it may happen that o1. hashCode() == o2. hashCode() -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play -- > http://andrewjwelch.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Andrew Welch - 2007-07-29 19:39:41 ```On 7/29/07, Dimitre Novatchev wrote: > > > That won't work as two different Java objects can have the same hash > > > code. That should be possible to generate unique IDs externally (not > > > from a method of the object itself), but I don't have an idea out of > > > the top of my head (I guess that if this is possible, that should be > > > easy to find a solution with your favorite search engine). > > > > > > hashCode() is _the_ way to generate an ID for an object (which is > > probably why you can't think of anything better....!) > > > Florent was saying (if I understand him right) that if we have two > objects of two different types, it may happen that > > o1. hashCode() == o2. hashCode() Sure - it depends on the object and the quality of the hashing of algorithm - if you are talking just about strings then you shouldn't have too many problems, unless you would like two instances of the same string to have two different ids. What are the requirements? -- http://andrewjwelch.com ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Florent Georges - 2007-07-29 20:29:28 ```Andrew Welch wrote: > On 7/29/07, Dimitre Novatchev wrote: Hi > > Florent was saying (if I understand him right) that if > > we have two objects of two different types, it may > > happen that > > o1. hashCode() == o2. hashCode() > Sure - it depends on the object and the quality of the > hashing of algorithm - if you are talking just about > strings then you shouldn't have too many problems, unless > you would like two instances of the same string to have > two different ids. > What are the requirements? Dimitre was talking about two functions genId() and get() that would help to reference any item by an atomic value and retrieve later the item from this value: >>> Maybe a generalisation of generate-id() could be >>> specified by EXSLT: >>> genID(\$anything) returning a handle >>> (hate to use the word "pointer") >>> and a >>> get(\$handle) >>> such that >>> get( genID(\$something)) = \$something You proposed to use hashCode() (and a map to store the associations between items and handles). I just pointed out that hashCode() is not suitable to solve this problem (at least not like that). Because if you have two different items whose the Java object's hashCode() returns the same value, you will have: get(genID(\$i1)) = get(genID(\$i2)) which is obviously wrong (ok, because you can't predict the order of evaluation, you *can* have the above equality, or not, which is worst). Actually, I don't think this is a trivial problem. How to deal with atomic values for instance? (because items, except nodes, in XPath don't have identity but Java object do) I could see need for such a behaviour, but only in cases where sequence "pointers" or nested sequences would solve the problem. And I think these would be far easier to implement (at least the former). Regards, --drkm _____________________________________________________________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Michael Kay - 2007-07-29 20:21:50 ```Andrew Welch wrote (addressing Dimitre): > I'm not too sure what your goal is here... I think you're not alone, and I think it's premature to propose solutions until the goal is understood. Dimitre wrote: >Yes, this code will do the work... if hashCode() can be guaranteed to produce different values for different "item"s But what does "different" mean? Perhaps "produces false or an error when compared using the XPath eq operator under the default collation? (or the codepoint collation, perhaps?). Or in the case of nodes, produces false when compared using the "is" operator. Or perhaps the comparison will use deep-equal(). We can only guess. Michael Kay http://www.saxonica.com/ ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-29 20:34:49 ```Yes, It is premature to go deeper into this. I got the answers to my initial questions from Dr. Kay and I thank again for this. I was able to build a prototype of "addressable memory" in pure XSLT, which models an array of item() and every item in the array is addressable by its index in this array. The initial population may take quite long, but after that the access to any item using its index is very fast. Thanks again, Dimitre -----Original Message----- From: saxon-help-bounces@... [mailto:saxon-help-bounces@...] On Behalf Of Michael Kay Sent: Sunday, July 29, 2007 1:22 PM To: 'Mailing list for SAXON XSLT queries' Subject: Re: [saxon] Questions about the implementation of sequences in Saxon Andrew Welch wrote (addressing Dimitre): > I'm not too sure what your goal is here... I think you're not alone, and I think it's premature to propose solutions until the goal is understood. Dimitre wrote: >Yes, this code will do the work... if hashCode() can be guaranteed to produce different values for different "item"s But what does "different" mean? Perhaps "produces false or an error when compared using the XPath eq operator under the default collation? (or the codepoint collation, perhaps?). Or in the case of nodes, produces false when compared using the "is" operator. Or perhaps the comparison will use deep-equal(). We can only guess. Michael Kay http://www.saxonica.com/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ saxon-help mailing list saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Trevor Nash - 2007-07-29 23:07:25 ``` Andrew Welch wrote: > hashCode() is _the_ way to generate an ID for an object (which is > probably why you can't think of anything better....!) > Nonsense. hashCode() returns an int, so there are only 2^32 possible different values. To take strings as an example (never mind general XML objects) these can be any length and there are therefore a lot more than 2^32 of them are possible even in a finite program. So no algorithm can assign a different hash code to each one of them. The point about hashCode is that it guarantees to generate the same number given the same value, and it tries to produce a random result given random values. It does not, and cannot, guarantee to give different values for different objects. The best it can do is approach a probability of 1/2^32 of producing a different value. If you tell me you have built programs using your assertion and they work, I will believe you. You just haven't hit the 2^32 case yet. Trevor -- Melvaig Software Engineering Limited voice: +44 (0) 1445 771363 email: tcn@... web: http://www.melvaig.co.uk Registered in Scotland No 194737 5 Melvaig, Gairloch, Ross-shire IV21 2EA ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Andrew Welch - 2007-07-30 08:58:34 ```On 7/30/07, Trevor Nash wrote: > > > Andrew Welch wrote: > > hashCode() is _the_ way to generate an ID for an object (which is > > probably why you can't think of anything better....!) > > > Nonsense. hashCode() returns an int, so there are only 2^32 possible > different values. > > To take strings as an example (never mind general XML objects) these can > be any length and there are therefore a lot more than 2^32 of them are > possible even in a finite program. So no algorithm can assign a > different hash code to each one of them. > > The point about hashCode is that it guarantees to generate the same > number given the same value, and it tries to produce a random result > given random values. It does not, and cannot, guarantee to give > different values for different objects. The best it can do is approach > a probability of 1/2^32 of producing a different value. > > If you tell me you have built programs using your assertion and they > work, I will believe you. You just haven't hit the 2^32 case yet. The problem as I understood it was to generate an integer ID for a String - a generate-id() for Strings if you like - and perhaps even on that persisted across runs. If you can do a better job than String.hashCode() then please let me know. (I completely agree with you by the way - I'm sure you'll agree with me given the requirements) -- http://andrewjwelch.com ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-30 12:50:37 ```> The problem as I understood it was to generate an integer ID for a > String - a generate-id() for Strings if you like - and perhaps even on > that persisted across runs. Actually no. The problem was to generate an Id to any item() (of any type). I have found a solution to a relaxed problem and it serves my purpose. The relaxed problem doesn't require to return the same Id to two items if they are "equal". It leaves the responsibility to the caller. This means, that if I use this feature twice for the string "Hello", I will get two different Id's and I know what I am doing -- the fact that I need to Id these two identical strings twice means that for me they are "different". This is exactly the same when you put in memory the string "Hello" twice. They have different address and can be used independently of one another. The fact that I have allocated two of them means that I need it this way. -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play On 7/30/07, Andrew Welch wrote: > On 7/30/07, Trevor Nash wrote: > > > > > > Andrew Welch wrote: > > > hashCode() is _the_ way to generate an ID for an object (which is > > > probably why you can't think of anything better....!) > > > > > Nonsense. hashCode() returns an int, so there are only 2^32 possible > > different values. > > > > To take strings as an example (never mind general XML objects) these can > > be any length and there are therefore a lot more than 2^32 of them are > > possible even in a finite program. So no algorithm can assign a > > different hash code to each one of them. > > > > The point about hashCode is that it guarantees to generate the same > > number given the same value, and it tries to produce a random result > > given random values. It does not, and cannot, guarantee to give > > different values for different objects. The best it can do is approach > > a probability of 1/2^32 of producing a different value. > > > > If you tell me you have built programs using your assertion and they > > work, I will believe you. You just haven't hit the 2^32 case yet. > > The problem as I understood it was to generate an integer ID for a > String - a generate-id() for Strings if you like - and perhaps even on > that persisted across runs. If you can do a better job than > String.hashCode() then please let me know. > > (I completely agree with you by the way - I'm sure you'll agree with > me given the requirements) > > -- > http://andrewjwelch.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help > ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Michael Kay - 2007-07-25 16:46:54 ```There are plenty of use cases for richer data types in XPath, for example dictionaries or nested sequences or references to sequences. I'm reluctant to add things piecemeal without careful consideration of what the future language direction is likely to be. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: saxon-help-bounces@... > [mailto:saxon-help-bounces@...] On Behalf > Of Dimitre Novatchev > Sent: 25 July 2007 17:36 > To: Mailing list for SAXON XSLT queries > Subject: Re: [saxon] Questions about the implementation of > sequences in Saxon > > > If the value is a 2MB string, then why not identify this 2MB string > > with something much more compact (say with an integer)? > > > > The same goes for a sequence of 2000000 items. > > > And if we need to persist a *reference* to this 2MB string, > let's say as a node in an xml document and later on retrieve > this reference when we need to do something with this string, > we must copy the whole 2MB, in case there isn't a way to > provide just a short reference. > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and > a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/ _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help ```
 Re: [saxon] Questions about the implementation of sequences in Saxon From: Dimitre Novatchev - 2007-07-25 17:29:57 ```On 7/25/07, Michael Kay wrote: > There are plenty of use cases for richer data types in XPath, for example > dictionaries or nested sequences or references to sequences. I'm reluctant > to add things piecemeal without careful consideration of what the future > language direction is likely to be. Then the only remaining way to refer to an item of *any* type is by using its index in a sequence. I hope this answers your question why a "handle" to any type of item was needed. -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ```