From: Prakash C. <Pra...@im...> - 2004-03-19 12:15:26
Attachments:
bitSet.ml
bitSet.mli
|
Hello ! I made a few changes on the files bitSet.ml and bitSet.mli (see attachements). I : 1 - corrected a few orthograph mistakes (I hope I didn't added new ones) 2 - corrected an "lsl" -> "lsr" (the size of a BitSet was 8 times bigger instead of 8 times smaller) 3 - suggested the name "copy" which is more usual than "clone" 4 - used "blit_string" and "fill_string" for more efficiency 5 - added some functions on sets I hope there are no more mistakes. I have got a question : isn't it possible to use the 32 bits of an int (or 64 bits, depending on the architecture) for efficients BitSets ? The String functions seem to work on 8 bits only. TIA for your answers, Prakash |
From: Nicolas C. <war...@fr...> - 2004-03-19 12:28:21
|
> Hello ! > > I made a few changes on the files bitSet.ml and bitSet.mli (see > attachements). I : > > 1 - corrected a few orthograph mistakes (I hope I didn't added new ones) > 2 - corrected an "lsl" -> "lsr" (the size of a BitSet was 8 times bigger > instead of 8 times smaller) > 3 - suggested the name "copy" which is more usual than "clone" > 4 - used "blit_string" and "fill_string" for more efficiency > 5 - added some functions on sets > > I hope there are no more mistakes. I have got a question : isn't it > possible to use the 32 bits of an int (or 64 bits, depending on the > architecture) for efficients BitSets ? The String functions seem to work > on 8 bits only. > > TIA for your answers, Thanks, I'll review theses changes. Concerning the usage of int, OCaml runtime representation is 31 bits. They are unboxed and shifted by 1 and the lowest bit is set to 1 so it can't be mistaken with an address since all addresses are always aligned on 2 bytes (even 4 on most of 32-bits architectures) . So we could use ints as 31 bits but we would then need to div/mod to get the correct integer. Another nice thing about strings is that their contents are not scanned by the GC while Array of ints are. Regards, Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2004-03-19 18:49:01
|
On Fri, 19 Mar 2004, Nicolas Cannasse wrote: > Concerning the usage of int, OCaml runtime representation is 31 bits. They > are unboxed and shifted by 1 and the lowest bit is set to 1 so it can't be > mistaken with an address since all addresses are always aligned on 2 bytes > (even 4 on most of 32-bits architectures) . So we could use ints as 31 bits > but we would then need to div/mod to get the correct integer. Another nice > thing about strings is that their contents are not scanned by the GC while > Array of ints are. > Having looked at implementing bitset myself, I came to the following conclusions: 1) bitset and big int support should be combined, much like an int can be viewed both as a small integer and as a small bitset, depending. 2) Both should be built on top of gmp. 3) GMP doesn't do everything I want bitset to be able to do yet. Most everything, but not everything (bitblit doesn't exist). 4) I'm on-again off-again trying to get gmp to accept patches to let it do what I want it to, but I'm getting push-back. I'm tempted to just provide the wrappers for the existing gmp functionality, and support the rest when it shows up. Thoughts? -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-03-20 10:23:48
|
> > Concerning the usage of int, OCaml runtime representation is 31 bits. They > > are unboxed and shifted by 1 and the lowest bit is set to 1 so it can't be > > mistaken with an address since all addresses are always aligned on 2 bytes > > (even 4 on most of 32-bits architectures) . So we could use ints as 31 bits > > but we would then need to div/mod to get the correct integer. Another nice > > thing about strings is that their contents are not scanned by the GC while > > Array of ints are. > > > > Having looked at implementing bitset myself, I came to the following > conclusions: > > 1) bitset and big int support should be combined, much like an int can be > viewed both as a small integer and as a small bitset, depending. > > 2) Both should be built on top of gmp. > > 3) GMP doesn't do everything I want bitset to be able to do yet. Most > everything, but not everything (bitblit doesn't exist). > > 4) I'm on-again off-again trying to get gmp to accept patches to let it > do what I want it to, but I'm getting push-back. > > I'm tempted to just provide the wrappers for the existing gmp > functionality, and support the rest when it shows up. Thoughts? GMP is C, ExtLib is OCaml If you want to provide a 3rd party library having very efficient bitsets and big ints based on GMP, then I think it's nice idea. But since we agreed in the beginning that ExtLib will not have C parts (since with that comes a lot of compability, compilation, and installation problems) then I can't see that being put into ExtLib. Current bitsets are useful and enough efficient for the average user. Regards, Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2004-03-20 19:25:10
|
On Sat, 20 Mar 2004, Nicolas Cannasse wrote: > GMP is C, ExtLib is OCaml > If you want to provide a 3rd party library having very efficient bitsets and > big ints based on GMP, then I think it's nice idea. But since we agreed in > the beginning that ExtLib will not have C parts (since with that comes a lot > of compability, compilation, and installation problems) then I can't see > that being put into ExtLib. Current bitsets are useful and enough efficient > for the average user. > A point I had forgotten- my apologies. Also, thinking on it for a bit, gmp has development problems on Windows- I'm unsure of how well it plays with Microsoft's compiler. You're right- it belongs in it's own optional library. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-03-20 10:03:46
|
> I made a few changes on the files bitSet.ml and bitSet.mli (see > attachements). I : > > 1 - corrected a few orthograph mistakes (I hope I didn't added new ones) > 2 - corrected an "lsl" -> "lsr" (the size of a BitSet was 8 times bigger > instead of 8 times smaller) > 3 - suggested the name "copy" which is more usual than "clone" > 4 - used "blit_string" and "fill_string" for more efficiency > 5 - added some functions on sets > > I hope there are no more mistakes. I have got a question : isn't it > possible to use the 32 bits of an int (or 64 bits, depending on the > architecture) for efficients BitSets ? The String functions seem to work > on 8 bits only. Thanks, The CVS have been updated with your changes. To answer more precisely to your questions : - we could actualy use ints as 16 bits values if we add the API to extract them from strings, but we don't :) - we can't use Pervasives.compare because bitsets might be of different sizes but are actually the same integer - I disabled the "deprecated" on "clone", having both copy and clone is okay. Regards, Nicolas Cannasse |
From: Prakash C. <Pra...@im...> - 2004-03-20 11:05:02
|
Nicolas Cannasse wrote: > Thanks, You're welcome :) > The CVS have been updated with your changes. To answer more precisely > to your questions : > - we could actualy use ints as 16 bits values if we add the API to > extract them from strings, but we don't :) Why not to add this API ? It might also be useful in to improve other data structures. And why only 16 bits and not 32 ? > - I disabled the "deprecated" on "clone", having both copy and clone > is okay. Doesn't matter, it was only a suggestion to avoid too much duplicate functions. > - we can't use Pervasives.compare because bitsets might be of > different sizes but are actually the same integer I didn't want to say that Pervasives.compare should be used directly. I suggested something like that : (************************************************************) let extend t old_size new_size = let b = bcreate new_size in bblit !t 0 b 0 old_size; bfill b old_size (new_size - old_size) 0; b let compare t1 t2 = let size1 , size2 = blen !t1 , blen !t2 in if size1 < size2 then Pervasives.compare (extend t1 size1 size2) !t2 else if size1 > size2 then Pervasives.compare !t1 (extend t2 size2 size1) else Pervasives.compare !t1 !t2 (************************************************************) And the function 'extend' can be used in both 'set' and 'toggle' : (************************************************************) let set t x = if x < 0 then error "set"; let pos , delta = x lsr log_int_size , x land int_size in let size = blen !t in if pos >= size then t := extend t size (pos+1); bset !t pos ((bget !t pos) lor (1 lsl delta)) let toggle t x = if x < 0 then error "toggle"; let pos , delta = x lsr log_int_size , x land int_size in let size = blen !t in if pos >= size then t := extend t size (pos+1); bset !t pos ((bget !t pos) lxor (1 lsl delta)) (************************************************************) Bye, -- Prakash |
From: Nicolas C. <war...@fr...> - 2004-03-21 10:02:32
|
> > Thanks, > > You're welcome :) > > > The CVS have been updated with your changes. To answer more precisely > > to your questions : > > - we could actualy use ints as 16 bits values if we add the API to > > extract them from strings, but we don't :) > > Why not to add this API ? It might also be useful in to improve other > data structures. And why only 16 bits and not 32 ? - We have to write C functions for that API. - 16 bits are currently handled by a single unboxed OCaml int. Using all 32-bits is possible if we box integers -> that means that we will alloc/free a lot of blocks around, that's not efficient. > > - we can't use Pervasives.compare because bitsets might be of > > different sizes but are actually the same integer > > I didn't want to say that Pervasives.compare should be used directly. I > suggested something like that : > > (************************************************************) > let extend t old_size new_size = > let b = bcreate new_size in > bblit !t 0 b 0 old_size; > bfill b old_size (new_size - old_size) 0; > b > > let compare t1 t2 = > let size1 , size2 = blen !t1 , blen !t2 in > if size1 < size2 then > Pervasives.compare (extend t1 size1 size2) !t2 > else if size1 > size2 then > Pervasives.compare !t1 (extend t2 size2 size1) > else > Pervasives.compare !t1 !t2 In this case, I think custom compare function is better since it doesn't require an allocation + a copy. There is not so big speed difference between (correctly written) OCaml code and C when compiled in native mode :) Nicolas Cannasse |