From: Janne H. <ja...@hy...> - 2004-12-18 18:46:15
|
I was writing a test for Extlib's BitSet and encountered a weird case. The following does not seem to work as it is supposed to work (intersection of a set with the empty set produces a non-empty set). Run function test_rnd_creation for the buggy behaviour. For some reason test_intersect does not exhibit the same behaviour. A quick glance at the BitSet.intersect would suggest that it is reading past the second argument bit array (bget uses string_unsafe_get). I incorporated this into my tester, but didn't enable it yet. Apologies if this has already been fixed in CVS. I'm using the released 1.3 version. Here is the source for the test case: <snip> module B = BitSet let bitset_of_int n = assert (n <= (1 lsl 29)); let s = B.create 30 in for i = 0 to 29 do if (n land (1 lsl i)) <> 0 then B.set s i done; s let int_of_bitset s = let n = ref 0 in for i = 0 to 29 do if B.is_set s i then n := !n lor (1 lsl i) done; !n let test_rnd_creation () = for i = 0 to 255 do let r1 = Random.int (1 lsl 28) in let s = bitset_of_int r1 in let c = B.copy s in assert (int_of_bitset s = r1); assert (c = s); assert (B.compare c s = 0); B.unite c s; assert (c = s); B.intersect c (B.empty ()); assert (B.count c = 0); done let test_intersect () = for i = 0 to 2550 do let s = bitset_of_int (Random.int 1 lsl 28) in B.intersect s (B.empty ()); assert (B.count s = 0) done </snip> ciao, janne |
From: Nicolas C. <war...@fr...> - 2004-12-19 10:43:19
|
> I was writing a test for Extlib's BitSet and encountered a weird case. > The following does not seem to work as it is supposed to work > (intersection of a set with the empty set produces a non-empty set). Looks like all intersect, unite, and friends have serious flaws. They have been posted on this list by some people, and I added them without checking. Next time I will ;) The functions themselves are not designed correctly, I would like to change them from : val intersect : t -> t -> unit to : val intersect : t -> t -> t which will produce a new bitset. Sounds more logical to me in Ocaml world. Any volunteer for implementation ? Real Life takes quite a lot of my time too ;) Nicolas Cannasse |
From: Janne H. <ja...@hy...> - 2004-12-19 11:34:40
|
> Looks like all intersect, unite, and friends have serious flaws. > They have been posted on this list by some people, and I added them without > checking. Next time I will ;) Yep, that's what I figured as well. But decided I will point my finger at only the ones that I have a tester ready :) I am not yet 100% sure, but it also seems that BitSet.compare is broken. According to the docs, it is not absolutely clear how the comparison should be interpreted, but to me a sensible compare would work the same way as forming a bignum of the bitset and comparing the bignums. Currently this does not appear to be the case. > The functions themselves are not designed correctly, I would like to change > them from : > > val intersect : t -> t -> unit > > to : > > val intersect : t -> t -> t > > which will produce a new bitset. Sounds more logical to me in Ocaml world. > Any volunteer for implementation ? Yes, this is one nitpick that I had in mind. Is it possible to just go and change the API on these? I also think that "union" would be a nicer name "unite" and "difference" (or "subtract") instead of "differentiate". In my mind differentiate associates to calculating derivatives. I might have a little time during the next couple of weeks. So if there's no hurry, perhaps I could volunteer for making these changes. ciao, janne |
From: Bardur A. <oca...@sc...> - 2004-12-19 12:11:59
|
On Sun, Dec 19, 2004 at 01:34:26PM +0200, Janne Hellsten wrote: > > >Looks like all intersect, unite, and friends have serious flaws. > >They have been posted on this list by some people, and I added them without > >checking. Next time I will ;) > > Yep, that's what I figured as well. But decided I will point my > finger at only the ones that I have a tester ready :) > > I am not yet 100% sure, but it also seems that BitSet.compare is broken. > According to the docs, it is not absolutely clear how the comparison > should be interpreted, but to me a sensible compare would work the same > way as forming a bignum of the bitset and comparing the bignums. Compare seems completely broken to me too, at least if one expects it to work like you suggest (which is IMHO the correct way). It doesn't seem to properly account for the two bitsets possibly being of different lengths, it just starts comparing at the MSB even if the difference between the lengths of the sets is more than a byte. It gets more complicated because you can't even use |t1|>|t2| as an indication that t1>t2, because there is AFAICT no guarantee that the leading bit (or even just one bit in the first byte of the representation!) is set. > > Currently this does not appear to be the case. > > >The functions themselves are not designed correctly, I would like to change > >them from : > > > >val intersect : t -> t -> unit > > > >to : > > > >val intersect : t -> t -> t > > > >which will produce a new bitset. Sounds more logical to me in Ocaml world. > >Any volunteer for implementation ? > > Yes, this is one nitpick that I had in mind. Is it possible to just go > and change the API on these? I think the general concensus on this list is to avoid gratuitous changes, but necessary changes for a cleaner/better API are OK. > > I also think that "union" would be a nicer name "unite" and "difference" > (or "subtract") instead of "differentiate". In my mind differentiate > associates to calculating derivatives. If they're changed to behaving in a non-destructive way, then the names should *definitely* be changed. For the sake of consistency I would suggest using the same names as are used in the standard library's Set: union, inter, diff Cheers, -- Bardur Arantsson <ba...@im...> <ba...@sc...> - I'm not evil, I'm... differently motivated! |
From: Janne H. <ja...@hy...> - 2004-12-19 13:07:19
|
On Sun, 19 Dec 2004, Bardur Arantsson wrote: > Compare seems completely broken to me too, at least if one > expects it to work like you suggest (which is IMHO the > correct way). > > It doesn't seem to properly account for the two bitsets > possibly being of different lengths, it just starts > comparing at the MSB even if the difference between the > lengths of the sets is more than a byte. It gets more > complicated because you can't even use |t1|>|t2| as an > indication that t1>t2, because there is AFAICT no > guarantee that the leading bit (or even just one bit in > the first byte of the representation!) is set. I'll fix this one as well. > If they're changed to behaving in a non-destructive way, > then the names should *definitely* be changed. For the > sake of consistency I would suggest using the same names > as are used in the standard library's Set: > > union, inter, diff I will add non-destructive union, diff and inter and make the old ones use the new non-destructive ones. They will make unite et al. just a little slower -- but I guess slower is better than unsafely reading past the arrays. I wonder if anyone has any ideas about the Obj.magic trickery in BitSet? To me it seems that using string.[] with int_of_char would do the same thing. Is it just an optimization? I would assume that ocamlopt with proper inlining should be able to do pretty good job on it without the added trickery. It would be nice to be able to compile ExtLib with bounds checking on, at least when running the test suite. ciao, janne |
From: Bardur A. <oca...@sc...> - 2004-12-19 13:29:59
|
On Sun, Dec 19, 2004 at 03:07:03PM +0200, Janne Hellsten wrote: [--snip--] > >If they're changed to behaving in a non-destructive way, > >then the names should *definitely* be changed. For the > >sake of consistency I would suggest using the same names > >as are used in the standard library's Set: > > > > union, inter, diff > > I will add non-destructive union, diff and inter and make the old ones use > the new non-destructive ones. They will make unite et al. just a little > slower -- but I guess slower is better than unsafely reading past the > arrays. Speed is irrelevant if the correctness isn't there. So, always go for correctness over speed. If someone is concerned about the speed they can always implement/submit a faster version later. > > I wonder if anyone has any ideas about the Obj.magic trickery in BitSet? > To me it seems that using string.[] with int_of_char would do the same > thing. Is it just an optimization? Yes, Obj.magic is basically just a typecast... which works as long as the internal representations of types are "compatible" in some useful way. An example: If I recall correctly, ('a list) is compatible with the record type { elem : 'a; tail : mutable 'a list } In this case Obj.magic can be exploited to implement lists with mutable tails (and similar trickery). I'm no expert on OCaml optimization, but I believe there is a considerable performance benefit in using unsafe_get (bget), unsafe_set (bset), etc. because you can avoid bounds checking and always get proper inlining. Whether this level of optimization is really necessary for BitSet I'll leave as an exercise for the reader. :) > I would assume that ocamlopt with proper inlining should > be able to do pretty good job on it without the added > trickery. I don't think the compiler is currently clever enough (or maybe the language semantics are not strict enough?) to do all the needed inlining, and the bounds checking would still slow you down. > > It would be nice to be able to compile ExtLib with bounds > checking on, at least when running the test suite. > I think compiling without bounds checking for "release" builds is unwise since it may obscure serious bugs which haven't been caught by the test suite. It's basically rude to anyone using the library to (possibly) obscure bugs within the library this way. -- Bardur Arantsson <ba...@im...> <ba...@sc...> God is REAL unless explicitly declared INTEGER. Unknown FORTRAN programmer |
From: Janne H. <ja...@hy...> - 2004-12-19 13:45:30
|
On Sun, 19 Dec 2004, Bardur Arantsson wrote: > I'm no expert on OCaml optimization, but I believe there > is a considerable performance benefit in using unsafe_get > (bget), unsafe_set (bset), etc. because you can avoid > bounds checking and always get proper inlining. Whether > this level of optimization is really necessary for BitSet > I'll leave as an exercise for the reader. :) ... > I don't think the compiler is currently clever enough (or > maybe the language semantics are not strict enough?) to do > all the needed inlining, and the bounds checking would > still slow you down. I took the current BitSet module, modified bget and bset to just use String.[] normally and added the couple of necessary char_of_ints and int_of_chars. I then went on a compiled the BitSet module with ocamlopt -unsafe (to turn off bounds checking). The resulting assembly is pretty much the same as with the Obj.magic+unsafe_get trickery, except that ocamlopt seems to be unable to inline int_of_char calls (which was a surprise to me). > I think compiling without bounds checking for "release" > builds is unwise since it may obscure serious bugs which > haven't been caught by the test suite. It's basically rude > to anyone using the library to (possibly) obscure bugs > within the library this way. Yes. However, it seems that many parts of the library *are* already using unsafe_gets. But I guess people can go and compile with -unsafe themselves if they want the extra performance (if any). ciao, janne |
From: Bardur A. <oca...@sc...> - 2004-12-19 14:21:41
|
On Sun, Dec 19, 2004 at 03:44:54PM +0200, Janne Hellsten wrote: > > On Sun, 19 Dec 2004, Bardur Arantsson wrote: > >I'm no expert on OCaml optimization, but I believe there > >is a considerable performance benefit in using unsafe_get > >(bget), unsafe_set (bset), etc. because you can avoid > >bounds checking and always get proper inlining. Whether > >this level of optimization is really necessary for BitSet > >I'll leave as an exercise for the reader. :) > > ... > > >I don't think the compiler is currently clever enough (or > >maybe the language semantics are not strict enough?) to do > >all the needed inlining, and the bounds checking would > >still slow you down. > > I took the current BitSet module, modified bget and bset to just use > String.[] normally and added the couple of necessary char_of_ints and > int_of_chars. I then went on a compiled the BitSet module with ocamlopt > -unsafe (to turn off bounds checking). The resulting assembly is pretty > much the same as with the Obj.magic+unsafe_get trickery, except that > ocamlopt seems to be unable to inline int_of_char calls (which was a > surprise to me). Generally I would say one should always compile without -unsafe and use unsafe_get and companions in those instances where it can be *guaranteed* to stay within the bounds *AND* where it matters for performance. Even in cases where it is easy to prove that the index is always within bounds, I wouldn't use unsafe_get/set unless it actually matters a great deal for performance. It reduces readability and makes code harder to modify later. > > >I think compiling without bounds checking for "release" > >builds is unwise since it may obscure serious bugs which > >haven't been caught by the test suite. It's basically rude > >to anyone using the library to (possibly) obscure bugs > >within the library this way. > > Yes. However, it seems that many parts of the library > *are* already using unsafe_gets. Yup. Personally, I'd like to see some instances of such use removed from some ExtLib code, but I don't care enough to submit patches :). Examples include Dbi.string_escaped and UTF8.look where the performance difference is most likely tiny. In my opinion overusing unsafe_get/set sets a bad example because people will assume they'll always write bug-free code and just end up using unsafe_get/set "by default". -- Bardur Arantsson <ba...@im...> <ba...@sc...> - If you can remember drinking it, you obviously didn't drink enough of it. Me |
From: Nicolas C. <war...@fr...> - 2004-12-19 15:25:07
|
> In my opinion overusing unsafe_get/set sets a bad example > because people will assume they'll always write bug-free > code and just end up using unsafe_get/set "by default". I agree in general : I never use "unsafe" features in my application code. But ExtLib is a library, and one used for very basic operations. It should then provide the user with the best speed available, or users interested in speed might end up rewriting the code themselves with is not what we want. As for bitset, checks should be done when the interface functions are called (negative index and others) so if they're correctly done we don't need additionnal checks. If they're not, this is a bug and have to be fixed. Nicolas |
From: Janne H. <ja...@hy...> - 2004-12-24 00:31:51
|
Bardur Arantsson wrote: > Generally I would say one should always compile without > -unsafe and use unsafe_get and companions in those > instances where it can be *guaranteed* to stay within the > bounds *AND* where it matters for performance. > > Even in cases where it is easy to prove that the index is > always within bounds, I wouldn't use unsafe_get/set unless > it actually matters a great deal for performance. It > reduces readability and makes code harder to modify later. unsafe_get/set have one problem w.r.t. testing: when they're used, it is impossible to turn *on* bounds checking. When using .()/.[] and compiling with -unsafe, it is at least possible to turn on bounds checking when testing. Having integrated more tests into the extlib testing suite, I have come to the conclusion that bounds checking would be really nice during testing. The current situation is that ExtLib seems to be buffer overflowing in a few places, causing a segfault somewhere when testing ExtString module. This happens with just a handful of test cases, so I'm guessing that there are more problems like this waiting to be uncovered. As most parts of ExtLib are using unsafe_get/set, it is impossible to catch buffer under/overflows in places where they actually happen. As an example, a buffer overflow somewhere can cause an infinite loop inside O'Caml's garbage collector (happens currently in native code). It would be really nice to be able to catch cases like these *before* they cause any damage. Bounds checking does exactly this. Since I guess people are not willing to stop using unsafe_get/put (at least not completely), we should think of some other way to achieve bounds checking so that it doesn't hurt performance. I'm proposing the use of asserts to check bounds when accessing arrays and other data with unsafe methods. We could turn asserts off when building the release library, but we could always have asserts turned on when running the testing suite. This way we would catch possible buffer overruns immediately and not when their damage shows up elsewhere. How would people feel about having release and debug builds of ExtLib? The release would have assertions turned off. This is basically the same as the current release version. The debug build would always have assertions on and thus be slower. However, it would only be used for testing purposes. Merry Christmas, Janne |
From: skaller <sk...@us...> - 2004-12-24 11:05:18
|
On Fri, 2004-12-24 at 11:31, Janne Hellsten wrote: > Since I guess people are not willing to stop using unsafe_get/put (at > least not completely), we should think of some other way to achieve > bounds checking so that it doesn't hurt performance. Why doesn't just the usual get/put and compilation with -unsafe for a production build work? I mean that's the idea of -unsafe isn't it? My guess is that bytecode should always do the check, and native code never. Bytecode is slower, however the bounds check for bytecode is probably *relatively* faster (i.e. a smaller fraction of the access time than for native code). >The debug build would always have >assertions on and thus be slower. However, it would only be used for >testing purposes. Hmm. Sometimes, functions have preconditions which assertions can check. If the assertion is in the function, and debugging is off in a production build, a *client error* can't be detected. This kind of check isn't an internal consistency check.. in theory, the precondition should be checked in the client code -- I'm planning this for Felix, but most programming systems don't support precondition checking in the calling code, only in the called code. So I would question whether the debugging version shouldn't also be available to end users... Why not change the Extlib build to make both? Then the client can switch between debug and production by changing libraries on the command line? -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Nicolas C. <war...@fr...> - 2004-12-24 12:01:17
|
> How would people feel about having release and debug builds of ExtLib? > The release would have assertions turned off. This is basically the > same as the current release version. The debug build would always have > assertions on and thus be slower. However, it would only be used for > testing purposes. It's not so easy. Let's take a blit for example, you might not want to put assert before every blit call so you will write a blit stub that does asserts before directly calling blit, and I'm not sure ocamlopt can inline such method easily. We should not modify the source code of ExtLib for testing, since testing purpose is actually to detect wrong behavior. Having separate tests is ok for me, but if we need to add also tests inside extLib sources (assert) that make twice the same somehow. If there is some overflow problems we can detect them with appropriate testing (for example run a GC major after every call ). Nicolas Cannasse |
From: Janne H. <ja...@hy...> - 2004-12-27 15:53:56
|
make sure |
From: Janne H. <ja...@hy...> - 2004-12-27 15:57:23
|
Sorry about that previous broken mail.. Had a little accident with my mouse. Nicolas Cannasse wrote: > It's not so easy. > Let's take a blit for example, you might not want to put assert before every > blit call so you will write a blit stub that does asserts before directly > calling blit, and I'm not sure ocamlopt can inline such method easily. > We should not modify the source code of ExtLib for testing, since testing > purpose is actually to detect wrong behavior. Having separate tests is ok > for me, but if we need to add also tests inside extLib sources (assert) that > make twice the same somehow. If there is some overflow problems we can > detect them with appropriate testing (for example run a GC major after every > call ). I have to disagree here. Tests are written to find bugs. In order to make the tests more useful, they should also be written such that they make it more easy to actually fix the bugs. Assertions have proven themselves in practice to do just that: to make it more easy locate and fix bugs. Debugging with asserts is easy, since you can isolate your bug into a small region of code. The same cannot be said about getting a random segfault in a random function when executing some totally unrelated code. If properly used, assertions do not incur a performance cost. As you suggested, small wrapper functions go extremely well with assertions. A compiler shouldn't have much of a problem inlining such wrapper functions, since with -noassert (assertions turned off) these wrappers will only contain a jump to the "wrappee" function. (See more details below) I am not proposing big changes to the library. What I'm suggesting is that if we find a problematic piece of code (such as the current BitSet in CVS), we could minefield a couple strategic points with assertions and leave them there. This would allow us to check that there are no buffer overruns in the code. These checks would only get compiled in "debug build". In all other occasions they would be turned off and thus shouldn't cause any overhead. I also think that any new code should be allowed to have assertions in it. This suggests that -noassert should be in the Makefile, at least if you want these checks to be turned off. *** Since earlier in my post I am claiming that assertions do not incur a run-time cost, I made a test to provide some evidence that this is indeed the case. I have an example from BitSet.ml. In BitSet, there exists the following loop (which reads out-of-bounds): --cut-- let intersect t t' = for i = 0 to t.len - 1 do bset t.data i ((bget t.data i) land (bget t'.data i)) done --cut-- The innerloop of the above when compiled with ocamlopt for x86: --cut-- .L548: movl %ecx, %edx sarl $1, %edx movl %edx, 0(%esp) movl (%eax), %edi movl %ecx, %esi sarl $1, %esi movl (%ebx), %edx movzbl (%edx, %esi), %edx lea 1(%edx, %edx), %esi movl %ecx, %ebp sarl $1, %ebp movl (%eax), %edx movzbl (%edx, %ebp), %edx lea 1(%edx, %edx), %edx andl %esi, %edx sarl $1, %edx movl 0(%esp), %esi movb %dl, (%edi, %esi) movl %ecx, %esi addl $2, %ecx movl 4(%esp), %edx cmpl %edx, %esi jne .L548 --cut-- As expected, there are no bound checks. The next step I did was to change bget and bget functions so that they are actually wrappers for functions fast_get and fast_set. I added bounds check assertions to bget and bset. The changes are detailed in the patch below. Naturally, if we compile with assertions on, the bget/bsets are not inlined. Instead, they get turned into function calls: --cut-- .L588: movl 0(%esp), %eax movl (%eax), %eax call camlBitSet__bget_65 .L590: movl %eax, 4(%esp) movl 8(%esp), %eax movl (%eax), %eax movl 12(%esp), %ebx call camlBitSet__bget_65 .L591: movl %eax, %ecx movl 4(%esp), %ebx andl %ebx, %ecx movl 8(%esp), %ebx movl (%ebx), %eax movl 12(%esp), %ebx call camlBitSet__bset_68 .L592: movl 12(%esp), %ebx movl %ebx, %ecx addl $2, %ebx movl %ebx, 12(%esp) movl 16(%esp), %eax cmpl %eax, %ecx jne .L588 --cut-- Again, this was expected. Now, we turn off assertions by adding -noassert to ocamlopt command line and compile the code with wrapped bget/bset. We get the optimized loop back: --cut-- .L553: movl (%eax), %edi movl (%ebx), %edx movl %esi, %ecx sarl $1, %ecx movzbl (%edx, %ecx), %ecx lea 1(%ecx, %ecx), %ecx movl (%eax), %ebp movl %esi, %edx sarl $1, %edx movzbl (%ebp, %edx), %edx lea 1(%edx, %edx), %edx andl %ecx, %edx movl %esi, %ecx sarl $1, %ecx sarl $1, %edx movb %dl, (%edi, %ecx) movl %esi, %edx addl $2, %esi movl 0(%esp), %ecx cmpl %ecx, %edx jne .L553 --cut-- You can do a little better than that by changing "intersect" function by reading t/t'.data pointers into local variables outside of the loop: --cut-- let intersect t t' = let buf = t.data and buf' = t'.data in for i = 0 to t.len - 1 do bset buf i ((bget buf i) land (bget buf' i)) done --cut-- Which yields: --cut-- .L553: movl %ebx, %eax sarl $1, %eax movzbl (%edi, %eax), %eax lea 1(%eax, %eax), %eax movl %ebx, %edx sarl $1, %edx movzbl (%esi, %edx), %edx lea 1(%edx, %edx), %edx andl %eax, %edx movl %ebx, %eax sarl $1, %eax sarl $1, %edx movb %dl, (%esi, %eax) movl %ebx, %eax addl $2, %ebx cmpl %ecx, %eax jne .L553 --cut-- Well it still isn't all that hot, but I guess it's been already discussed and agreed that if you want more performance, you should use a library written more specifically for your purpose. Here are the assertion changes made to BitSet.ml: <patch> diff -u -r1.12 bitSet.ml --- bitSet.ml 22 Dec 2004 07:59:46 -0000 1.12 +++ bitSet.ml 27 Dec 2004 14:42:26 -0000 @@ -21,11 +21,27 @@ type intern let bcreate : int -> intern = Obj.magic String.create -external bget : intern -> int -> int = "%string_unsafe_get" -external bset : intern -> int -> int -> unit = "%string_unsafe_set" +external fast_get : intern -> int -> int = "%string_unsafe_get" +external fast_set : intern -> int -> int -> unit = "%string_unsafe_set" external fast_bool : int -> bool = "%identity" -let bblit : intern -> int -> intern -> int -> int -> unit = Obj.magic String.blit -let bfill : intern -> int -> int -> int -> unit = Obj.magic String.fill +let fast_blit : intern -> int -> intern -> int -> int -> unit = Obj.magic String.blit +let fast_fill : intern -> int -> int -> int -> unit = Obj.magic String.fill +let fast_length : intern -> int= Obj.magic String.length + +let bget s ndx = + assert (ndx >= 0 && ndx < fast_length s); + fast_get s ndx + +let bset s ndx v = + assert (ndx >= 0 && ndx < fast_length s); + fast_set s ndx v + +let bblit src srcoff dst dstoff len = + fast_blit src srcoff dst dstoff len + +let bfill dst start len c = + fast_fill dst start len c + </patch> Best regards, Janne |
From: Nicolas C. <war...@fr...> - 2004-12-27 17:25:53
|
> Here are the assertion changes made to BitSet.ml: [...] Looks ok to me. I just wanted to be sure that you checked that ocamlopt would inline it correctly. Nicolas |
From: skaller <sk...@us...> - 2004-12-20 09:26:14
|
On Sun, 2004-12-19 at 23:11, Bardur Arantsson wrote: > If they're changed to behaving in a non-destructive way, > then the names should *definitely* be changed. For the > sake of consistency I would suggest using the same names > as are used in the standard library's Set: > > union, inter, diff There many be a good reason for a non-functional API as well (performance) -- but please note I said 'may'. In C it would definitely be the case, but in Ocaml it may not be so -- the functional interface is vital, and the same names as Set module is clearly the way to go where appropriate. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Janne H. <ja...@hy...> - 2004-12-22 22:00:47
Attachments:
patch_BitSet.txt
|
Here is a patch that should fix BitSet.compare bugs. I made a relatively extensive tester for it and couldn't find any problems. The compare now works as if the bitvector was a BigInt. I am guessing that that has been the original intent as well. Perhaps someone can take a look before I commit the changes? ciao, janne > Compare seems completely broken to me too, at least if one > expects it to work like you suggest (which is IMHO the > correct way). > > It doesn't seem to properly account for the two bitsets > possibly being of different lengths, it just starts > comparing at the MSB even if the difference between the > lengths of the sets is more than a byte. It gets more > complicated because you can't even use |t1|>|t2| as an > indication that t1>t2, because there is AFAICT no > guarantee that the leading bit (or even just one bit in > the first byte of the representation!) is set. |
From: Janne H. <ja...@hy...> - 2004-12-27 16:07:43
|
I wonder if anyone had a chance to look at this? I'd like to submit this before I embark on other bugfixes on the same module. I'm a little hesitant, since I'm new in ExtLib town. Summary for the patch: fixes the completely broken BitSet.compare. Has been tested relatively well with ExtLib testing suite. ciao, janne Janne Hellsten wrote: > Here is a patch that should fix BitSet.compare bugs. I made a > relatively extensive tester for it and couldn't find any problems. > > The compare now works as if the bitvector was a BigInt. I am guessing > that that has been the original intent as well. > > Perhaps someone can take a look before I commit the changes? > > ciao, > janne > >> Compare seems completely broken to me too, at least if one >> expects it to work like you suggest (which is IMHO the >> correct way). >> It doesn't seem to properly account for the two bitsets >> possibly being of different lengths, it just starts >> comparing at the MSB even if the difference between the >> lengths of the sets is more than a byte. It gets more >> complicated because you can't even use |t1|>|t2| as an >> indication that t1>t2, because there is AFAICT no >> guarantee that the leading bit (or even just one bit in >> the first byte of the representation!) is set. > > > ------------------------------------------------------------------------ > > Index: bitSet.ml > =================================================================== > RCS file: /cvsroot/ocaml-lib/extlib-dev/bitSet.ml,v > retrieving revision 1.12 > diff -u -r1.12 bitSet.ml > --- bitSet.ml 22 Dec 2004 07:59:46 -0000 1.12 > +++ bitSet.ml 22 Dec 2004 21:52:46 -0000 > @@ -109,40 +109,56 @@ > else > false > > -(* we can't use Pervasives.compare because bitsets might be of different > - sizes but are actually the same integer *) > + > +exception Break_int of int > + > +(* Find highest set element or raise Not_found *) > +let find_msb t = > + (* Find highest set bit in a byte. Does not work with zero. *) > + let byte_msb b = > + assert (b <> 0); > + let rec loop n = > + if b land (1 lsl n) = 0 then > + loop (n-1) > + else n in > + loop 7 in > + let n = t.len - 1 > + and buf = t.data in > + try > + for i = n downto 0 do > + let byte = bget buf i in > + if byte <> 0 then raise (Break_int ((i lsl log_int_size)+(byte_msb byte))) > + done; > + raise Not_found > + with > + Break_int n -> n > + | _ -> raise Not_found > + > let compare t1 t2 = > - let size1 = t1.len and size2 = t2.len in > - let size = (if size1 < size2 then size1 else size2) in > - let rec loop2 n = > - if n >= size2 then > - 0 > - else if bget t2.data n <> 0 then > - 1 > - else > - loop2 (n+1) > - in > - let rec loop1 n = > - if n >= size1 then > - 0 > - else if bget t1.data n <> 0 then > - -1 > - else > - loop1 (n+1) > - in > - let rec loop n = > - if n = size then > - (if size1 > size2 then loop1 n else loop2 n) > - else > - let d = bget t2.data n - bget t1.data n in > - if d = 0 then > - loop (n+1) > - else if d < 0 then > - -1 > - else > - 1 > - in > - loop 0 > + let some_msb b = try Some (find_msb b) with Not_found -> None in > + match (some_msb t1, some_msb t2) with > + (None, Some _) -> -1 (* 0-y -> -1 *) > + | (Some _, None) -> 1 (* x-0 -> 1 *) > + | (None, None) -> 0 (* 0-0 -> 0 *) > + | (Some a, Some b) -> (* x-y *) > + if a < b then -1 > + else if a > b then 1 > + else > + begin > + (* MSBs differ, we need to scan arrays until we find a > + difference *) > + let ndx = a lsr log_int_size in > + assert (ndx < t1.len && ndx < t2.len); > + try > + for i = ndx downto 0 do > + let b1 = bget t1.data i > + and b2 = bget t2.data i in > + if b1 <> b2 then raise (Break_int (compare b1 b2)) > + done; > + 0 > + with > + Break_int res -> res > + end > > let equals t1 t2 = > compare t1 t2 = 0 |
From: Bardur A. <oca...@sc...> - 2004-12-27 17:38:52
|
On Mon, Dec 27, 2004 at 06:07:36PM +0200, Janne Hellsten wrote: > I wonder if anyone had a chance to look at this? I'd like to submit > this before I embark on other bugfixes on the same module. I'm a little > hesitant, since I'm new in ExtLib town. > > Summary for the patch: fixes the completely broken BitSet.compare. Has > been tested relatively well with ExtLib testing suite. > I haven't looked at it in detail, but it looks reasonable enough to me... -- Bardur Arantsson <ba...@im...> <ba...@sc...> - We do try to accomodate our customers, but not being a hotel we find it almost impossible. Model Shop Clerk | A Bit of Fry and Laurie |
From: Bardur A. <oca...@sc...> - 2004-12-24 11:32:16
|
On Fri, Dec 24, 2004 at 10:05:10PM +1100, skaller wrote: > On Fri, 2004-12-24 at 11:31, Janne Hellsten wrote: > > > Since I guess people are not willing to stop using unsafe_get/put (at > > least not completely), we should think of some other way to achieve > > bounds checking so that it doesn't hurt performance. > > Why doesn't just the usual get/put and compilation with -unsafe > for a production build work? I mean that's the idea of -unsafe > isn't it? Well, it probably works, but it requires even more confidence that the code is correct, because suddenly *every* array/string access becomes unsafe. I don't think I have that much confidence in the extlib code right now... [--snip--] > >The debug build would always have > >assertions on and thus be slower. However, it would only be used for > >testing purposes. I think this could be implemented by using pa_macro and adding something like (syntax might be slightly wrong, I usually use revised syntax) IFDEF DEBUG module String = struct include String val unsafe_get = get val unsafe_set = set end module Array = struct include Array val unsafe_get = ... ... end ENDIF at the beginning of modules using unsafe_get/set... (You can also use the pa_macro INCLUDE directive, add the above to a file and include that in the appropriate places). This should allow you to enable/disable bounds checking at compilation time. -- Bardur Arantsson <ba...@im...> <ba...@sc...> The best part? I became an ordained minister while not wearing pants. |
From: skaller <sk...@us...> - 2004-12-24 14:13:22
|
On Fri, 2004-12-24 at 22:32, Bardur Arantsson wrote: > On Fri, Dec 24, 2004 at 10:05:10PM +1100, skaller wrote: > > > On Fri, 2004-12-24 at 11:31, Janne Hellsten wrote: > > > > > Since I guess people are not willing to stop using unsafe_get/put (at > > > least not completely), we should think of some other way to achieve > > > bounds checking so that it doesn't hurt performance. > > > > Why doesn't just the usual get/put and compilation with -unsafe > > for a production build work? I mean that's the idea of -unsafe > > isn't it? > > Well, it probably works, but it requires even more > confidence that the code is correct, because suddenly > *every* array/string access becomes unsafe. > I don't think I have that much confidence in the extlib > code right now... The point is that the change needs to be made to test it .. and gain that confidence. I think you should be able to run a version with the checks still in place if you want. > I think this could be implemented by using pa_macro Yes, but I think we should avoid camlp4. This is supposed to be a core library. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |