From: Nicolas C. <war...@fr...> - 2003-06-25 11:13:56
Attachments:
dynArray.ml
dynArray.mli
|
Hi, This new version of Dynarray is far better that the previous ones (including the one I posted) : - using a raw ocaml block with a polymorphic type - doing unchecked field access (since we're checking -by hand-) - no problems with floats (but they are unboxed) - testing for resizing only when removing an element or when not enough space for adding. - correct a lot of bugs (including a infinite loop in exp_resizer when having a length of 0 !) Right now, I think the code is quite stable, although I can I have made some mistake here and there. Notice that I have removed a couple of *enum* functions since they need a little bit rethinking because the array can be muted after the enum is created : the ones currently defined are ok. Comments/bugs report (of course) apprecied. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-06-25 20:09:19
|
A bunch of minor comments: - I don't think we're losing much (if anything) to using for loops instead of array blits. We're moving whole words which are word aligned by definition, which is generally fast in general. Memory bandwidth dominates. I just used the array functions because I could. - Are floats unboxed or boxed? I thought boxed. Note: in this case boxed floats are not the memory hog they might otherwise be, because of the unused elements in the array. The arrays are probably on average 5/8th to 3/4th full. Assuming 3/4th full (assuming the array has been mostly growing, and that it is on average halfway between just having been expanded (50% full) and needing to be expanded again (100% full)), an array of n unboxed floats would take up 2*n*4/3 or 8/3 words per float. Meanwhile, with boxed floats, it would take 2*n + n*4/3 or 10/3 words per float, meaning the boxed version is only 25% more memory usage than the unboxed version. If the dynarray has mostly shrinking, it's on average halfway between having just been shrunk (25% full) and just about to be shrunk (12.5% full), meaning it's on average 3/16th full. At this point unboxing floats takes up 2*n*16/3 or 32/2 words per float. Boxing the floats, on the other hand, takes up 2*n + n*16/3 or 22/3 words per float, or 68.75% the memory unboxing floats would take. So it can even save a signifigant amount of memory to not unbox floats. - the problem of the array being mutated behind enum's back is a general problem with imperitive programming. It's known as a "bug". Note that my implementation may have had other problems, I didn't think about the enumerations that much. I'd say leave the problem children out for now, and if there's demand for them later, we'll add them then. Fight feeping creaturitist! - Is there a reason you split off conservative_exponential_resizer from the normal exponential_resizer? I didn't think the extra couple of branchs were that bad, especially considering that normally you don't even reach them, and of the times you do you are hard limited to O(log N) iterations of either loop. Especially considering you are calling resizer less often. The default definately should be the conservative one. Correctness over small gains in speed. - delete has a bug if you're not deleting the last element and not reallocating the array- the last element doesn't get nulled out. It should be (about line 194): end else begin if idx = d.len then (* erase for GC *) iset d.arr idx (Obj.magic 0) else for i = idx to d.len - 1 do iset d.arr i (iget d.arr (i+1)); done; iset d.arr d.len (Obj.magic 0) (* <-- !!! *) end - in to_array, can we replace the for loop with just: Array.init d.len (fun i -> iget d.arr i) ? This allocates a closure, which is it's only downside. - of_array: do we want to call resizer to see what size to make the original array? For example, if we're using step_resizer, and the array length is not a multiple of the step, the next time we do much of anything with the array we're reallocating the array. Also, this means we don't have any space initially for inserting. But this will make of_array more complicated, we can't just use idup. Brian On Wed, 25 Jun 2003, Nicolas Cannasse wrote: > Hi, > > This new version of Dynarray is far better that the previous ones (including > the one I posted) : > - using a raw ocaml block with a polymorphic type > - doing unchecked field access (since we're checking -by hand-) > - no problems with floats (but they are unboxed) > - testing for resizing only when removing an element or when not enough > space for adding. > - correct a lot of bugs (including a infinite loop in exp_resizer when > having a length of 0 !) > > Right now, I think the code is quite stable, although I can I have made some > mistake here and there. > Notice that I have removed a couple of *enum* functions since they need a > little bit rethinking because the array can be muted after the enum is > created : the ones currently defined are ok. > > Comments/bugs report (of course) apprecied. > > Nicolas Cannasse > |
From: Nicolas C. <war...@fr...> - 2003-06-26 01:59:27
|
> - I don't think we're losing much (if anything) to using for loops instead > of array blits. We're moving whole words which are word aligned by > definition, which is generally fast in general. Memory bandwidth > dominates. I just used the array functions because I could. If you look at the array.ml sources, you'll see that blit is actually doing a for loop :-) > - Are floats unboxed or boxed? I thought boxed. Note: in this case boxed Sorry , I messed up when posting the mail : <<< floats ARE boxed >>> [..] > Meanwhile, with boxed floats, it would take 2*n + n*4/3 or 10/3 words per > float, meaning the boxed version is only 25% more memory usage than the > unboxed version. [...] Agree with you. > - the problem of the array being mutated behind enum's back is a general > problem with imperitive programming. It's known as a "bug". Note that my > implementation may have had other problems, I didn't think about the > enumerations that much. I'd say leave the problem children out for now, > and if there's demand for them later, we'll add them then. Fight feeping > creaturitist! If I can't feel asleep one of theses nights, I will surely add them and eventualy catch headache :) > - Is there a reason you split off conservative_exponential_resizer from > the normal exponential_resizer? I didn't think the extra couple of What do you mean here ? > branchs were that bad, especially considering that normally you don't even > reach them, and of the times you do you are hard limited to O(log N) > iterations of either loop. Especially considering you are calling resizer > less often. The default definately should be the conservative one. > Correctness over small gains in speed. I agree for conservative as default. > - delete has a bug if you're not deleting the last element and not > reallocating the array- the last element doesn't get nulled out. It > should be (about line 194): Thanks, corrected > - in to_array, can we replace the for loop with just: > Array.init d.len (fun i -> iget d.arr i) > ? This allocates a closure, which is it's only downside. This one is only a inlining of the Array.init function. A little better for speed, because no closure and better compilation since same module. > - of_array: do we want to call resizer to see what size to make the > original array? For example, if we're using step_resizer, and the array > length is not a multiple of the step, the next time we do much of anything > with the array we're reallocating the array. Also, this means we don't > have any space initially for inserting. But this will make of_array more > complicated, we can't just use idup. Yes, but since resizes are most of the time amortized , resizing now or later is not very important. I also added "compact" that is shrinking the dynarray to it's minimum size. I will commit the sources on the CVS now. Nicolas Cannasse |
From: John M. S. <sk...@oz...> - 2003-06-27 11:47:15
|
Brian Hurt wrote: > - the problem of the array being mutated behind enum's back is a general > problem with imperitive programming. It's known as a "bug". LOL! Sequenced modifications of data structures is the whole *point* of imperative programming. Nothing wrong with it. The problem is that unlike functional programming there has been no real theoretical understanding how to build understandable imperative programs. I suspect that is no longer the case, with the dualisation of functional programming theory in bisimulation stuff -- a most woeful name if every I hear one :-) There is already a good idea how to model coinductive data structures like streams. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: John M. S. <sk...@oz...> - 2003-06-27 11:52:54
|
Brian Hurt wrote: > A bunch of minor comments: > - Are floats unboxed or boxed? I thought boxed. Float arrays are unboxed. AFAIK, not just floats either: there are also unboxed integral type arrays (if not done, in the works). A hacky work around is a DynArray of arrays of floats. My guess, however, is that dynamic arrays are more useful for symbolic computing than numerical -- although they might be good for sparse matrices, in that case the lack of boxing is likely to *save* a heap of space :-) -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: John M. S. <sk...@oz...> - 2003-06-27 11:20:38
|
Nicolas Cannasse wrote: > mistake here and there. > Notice that I have removed a couple of *enum* functions since they need a > little bit rethinking because the array can be muted after the enum is > created : the ones currently defined are ok. This is a nasty problem with the enum concept. In C++ this is dealt with by saying that iterators ar invalidated by certain operations. In Ocaml, we probably want something better, but it is hard to see how this is possible with a lazy enum constructor for *any* type. Even ordinary arrays can be muted, if their enum constructors are lazy. This problem cannot be made to go away, and shouldn't be made to go away: you may in fact *want* to modify an array before it is finally enumerated, particularly if you need to use the lazy constructors to include the array in some overall enumeration before you can calculate the values. Indeed, one could argue that is a big advantage of the enum concept, that you can do this. As before, I think the real issue to is to precisely define when forcing occurs, because enum is designed to manipulate *objects* and not values. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |