From: Richard J. <ri...@an...> - 2004-07-25 15:43:31
|
On Sun, Jul 25, 2004 at 05:20:57PM +0200, Falk Hueffner wrote: > Hi, > > I have an alternate implementation of bitsets which stores the length > explicitely, thereby avoiding the costly String.length and improving > run time of calculating the first 10000000 primes from 3.43s to 2.33s. > Is there general interest in this? If so, I could add the missing > functions from BitSet.mli and submit it... I don't doubt that it runs faster, but 'String.length' can't be the reason. In OCaml this is O(1), not O(n) as in C. Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment PTHRLIB is a library for writing small, efficient and fast servers in C. HTTP, CGI, DBI, lightweight threads: http://www.annexia.org/freeware/pthrlib/ |
From: Falk H. <hue...@in...> - 2004-07-25 15:58:43
|
Richard Jones <ri...@an...> writes: > On Sun, Jul 25, 2004 at 05:20:57PM +0200, Falk Hueffner wrote: >> I have an alternate implementation of bitsets which stores the length >> explicitely, thereby avoiding the costly String.length and improving >> run time of calculating the first 10000000 primes from 3.43s to 2.33s. >> Is there general interest in this? If so, I could add the missing >> functions from BitSet.mli and submit it... > > I don't doubt that it runs faster, but 'String.length' can't be the > reason. In OCaml this is O(1), not O(n) as in C. Well, as you mentioned in the other mail, while O(1), it's somewhat more costly than a simple memory read. Another reason why my version is faster is that the extlib version contains: let pos , delta = x lsr log_int_size , x land int_size in for which Ocaml will happily allocate a pair on the minor heap. -- Falk |
From: Nicolas C. <war...@fr...> - 2004-07-26 19:11:29
|
> > On Sun, Jul 25, 2004 at 05:20:57PM +0200, Falk Hueffner wrote: > >> I have an alternate implementation of bitsets which stores the length > >> explicitely, thereby avoiding the costly String.length and improving > >> run time of calculating the first 10000000 primes from 3.43s to 2.33s. > >> Is there general interest in this? If so, I could add the missing > >> functions from BitSet.mli and submit it... > > > > I don't doubt that it runs faster, but 'String.length' can't be the > > reason. In OCaml this is O(1), not O(n) as in C. > > Well, as you mentioned in the other mail, while O(1), it's somewhat > more costly than a simple memory read. It's actually a memory read, only with an additionnal shift that can't make such big difference. > Another reason why my version is faster is that the extlib version > contains: > > let pos , delta = x lsr log_int_size , x land int_size in > > for which Ocaml will happily allocate a pair on the minor heap. That's correct. it is optimized now on the CVS. You can check your benchmark again, please do not forgot also to copy the CMX file in the /ocaml/lib directory (the install script is only recently doing this) in order to get cross-modules inlinings. Nicolas Cannasse |
From: Falk H. <hue...@in...> - 2004-07-27 08:04:54
|
"Nicolas Cannasse" <war...@fr...> writes: >> > On Sun, Jul 25, 2004 at 05:20:57PM +0200, Falk Hueffner wrote: >> >> I have an alternate implementation of bitsets which stores the length >> >> explicitely, thereby avoiding the costly String.length and improving >> >> run time of calculating the first 10000000 primes from 3.43s to 2.33s. >> > >> > I don't doubt that it runs faster, but 'String.length' can't be the >> > reason. In OCaml this is O(1), not O(n) as in C. >> >> Well, as you mentioned in the other mail, while O(1), it's somewhat >> more costly than a simple memory read. > > It's actually a memory read, only with an additionnal shift that > can't make such big difference. Well, for example on Alpha, String.length is: ldq t6,-8(a0) lda t7,1 srl t6,0xa,t5 s8subq t5,t7,t4 addq a0,t4,t3 lda at,0(t3) ldq_u t2,0(at) extbl t2,at,t2 subq t4,t2,t1 addq t1,t1,t0 addq t0,0x1,v0 while type t = { len: int; data: string };; let f x = x.len is ldq v0,0(a0) >> Another reason why my version is faster is that the extlib version >> contains: >> >> let pos , delta = x lsr log_int_size , x land int_size in >> >> for which Ocaml will happily allocate a pair on the minor heap. > > That's correct. it is optimized now on the CVS. > You can check your benchmark again, please do not forgot also to copy the > CMX file in the /ocaml/lib directory (the install script is only recently > doing this) in order to get cross-modules inlinings. The difference is now 2.30s vs 2.88s on an Alpha EV68, that is, 25%. -- Falk |
From: Richard J. <ri...@an...> - 2004-07-27 08:33:05
|
On Tue, Jul 27, 2004 at 10:04:34AM +0200, Falk Hueffner wrote: > Well, for example on Alpha, String.length is: > ldq t6,-8(a0) > lda t7,1 > srl t6,0xa,t5 > s8subq t5,t7,t4 > addq a0,t4,t3 > lda at,0(t3) > ldq_u t2,0(at) > extbl t2,at,t2 > subq t4,t2,t1 > addq t1,t1,t0 > addq t0,0x1,v0 > > while type t = { len: int; data: string };; let f x = x.len is > > ldq v0,0(a0) Don't forget that storing the extra int isn't free either. It increases pressure on the cache, particularly for programs which store many short strings. However because it is much less easy to measure, memory pressure is often wrongly ignored. Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment MOD_CAML lets you run type-safe Objective CAML programs inside the Apache webserver. http://www.merjis.com/developers/mod_caml/ |
From: Falk H. <hue...@in...> - 2004-07-27 09:25:04
|
Richard Jones <ri...@an...> writes: > Don't forget that storing the extra int isn't free either. It > increases pressure on the cache, particularly for programs which > store many short strings. Well, Java saves 3 extra words per string ;-) But seriously, you're right, for strings this isn't all that bad, although on 64 bit platforms, it seems to me it should have been possible to use 3 more bits in the tag word. For bit sets, I still think storing the length explicitely is the right thing, since it improves setting and getting bits noticeably, which are much more important operations than setting and getting single characters, and because it seems less likely to me to have lots of small bit sets than lots of small strings. (For my particular application, I need it anyway, since I need "find next 0 bit"...) -- Falk |
From: Nicolas C. <war...@fr...> - 2004-07-27 11:59:19
|
> > Don't forget that storing the extra int isn't free either. It > > increases pressure on the cache, particularly for programs which > > store many short strings. > > Well, Java saves 3 extra words per string ;-) But seriously, you're > right, for strings this isn't all that bad, although on 64 bit > platforms, it seems to me it should have been possible to use 3 more > bits in the tag word. > > For bit sets, I still think storing the length explicitely is the > right thing, since it improves setting and getting bits noticeably, > which are much more important operations than setting and getting > single characters, and because it seems less likely to me to have lots > of small bit sets than lots of small strings. (For my particular > application, I need it anyway, since I need "find next 0 bit"...) > > -- > Falk I commited the changes, using an extra field to store len. Please tell me about it. Regards, Nicolas Cannasse |
From: Markus M. <ma...@oe...> - 2004-07-27 10:02:27
|
On Tue, 27 Jul 2004, Richard Jones wrote: > Don't forget that storing the extra int isn't free either. It > increases pressure on the cache, particularly for programs which store > many short strings. However because it is much less easy to measure, > memory pressure is often wrongly ignored. Not only this: the GC will have more work, because it has to chase another pointer from the structure to the string, and because there is another word (the length field) to scan in this block. Regards, Markus -- Markus Mottl http://www.oefai.at/~markus ma...@oe... |
From: skaller <sk...@us...> - 2004-07-25 19:54:51
|
On Mon, 2004-07-26 at 01:43, Richard Jones wrote: > On Sun, Jul 25, 2004 at 05:20:57PM +0200, Falk Hueffner wrote: > > Hi, > > > > I have an alternate implementation of bitsets which stores the length > > explicitely, thereby avoiding the costly String.length and improving > > run time of calculating the first 10000000 primes from 3.43s to 2.33s. > > Is there general interest in this? If so, I could add the missing > > functions from BitSet.mli and submit it... > > I don't doubt that it runs faster, but 'String.length' can't be the > reason. In OCaml this is O(1), not O(n) as in C. That's irrelevant. I can tell you on a profile of Felix, after the horrid 60% CPU doing polymorphic comparisons, almost the next most expensive function after GC routines was .. String.length. Yes, the code generator does a lot of ugly string concatenations like: a ^ b ^ c ^ d ^ e .. and I'm sure I could improve the performance, however .. My guess is .. every single node in my term data structures contains a source reference, including a filename, which is a string .. so String.length was being called heaps and heaps by the polymorphic comparison.. I'm not claiming you're wrong that this isn't the problem in BitSet .. just that your argument might be suspect: O(1) and fast doesn't make the total overhead small: the number of calls counts too, and also where they're located (cache misses etc ..) -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |