From: Brian H. <bh...@sp...> - 2004-05-27 18:49:57
|
(Sent this to the wrong mailing list originally- sorry. My address book has been fixed, and I apologize for the inconvience) I've attached an ExtMap module to this message. This is an extension/improvement on the standard library's Map module, and (with one caveat, see below) should be drop-in replaceable. I'm not done commenting the code, and I haven't tested it at all (it compiles, though- which is a stronger statement in Ocaml than it is in other languages), which is why it's a request for comments and not an actual submission. The differences are: - I've moved OrderedType into it's own library, Type, and renamed it to be just Ordered. So it's Type.Ordered now, not Map.OrderedType. We have the same ordered type signature in Map and Set. Rather than duplicating this code, we should just have it in a common module. This is a cleanliness issue, not a performance issue. - The tree is now a red-black tree, not a height-balanced tree. The advantage here is that this form uses one less word per entry than height balanced does. The disadvantage is that it's a lot more code. Between this and the extra functionality, the map module now weighs in at about 48K on x86-32 Redhat 9.0 Ocaml 3.07. - Lots of new functions has been added. Some major areas of expansion are: - conversions to and from lists. This is especially usefull for loading up maps with constant values. - conversions to and from enums, to make it compatible with the rest of the library - fold_left and fold_right with the same semantics/calling structure as List.fold_left and List.fold_right - iters and folds on 2 maps- this is what got me started on the project. I want to do sparse vectors as (int, float) maps- and a lot of the BLAS functions want to do iters or folds on 2 maps. Which is why, although they are basically implemented with enums, I've included them. - a compare function. Unfortunately, this doesn't mean that the module, as it is written at the moment, qualifies for Type.Ordered. But it does make it easier to get there. If the response is positive, I'll finish documenting, do some testing, and submit it. I'm also considering the following other small projects: extSet: Set with similiar extensions and based on RB trees. funcarray: a functional array module. This is a tree-based structure that looks like an array- elements have positions 0 .. N-1, are accessed by position (this is different than sets), and are always densely packed (this is different than a map with ints as keys). Accessing any element, adding an element, overwritting an element, or removing an element are O(log N) operations and strictly applicative. Removing element i decrements the locations of all nodes i..N-1. Inserting an element at location i moves everything in i..N-1 to i+1..N, still in O(log N). quadtree- a 2D array of objects based on a quad-tree structure. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-06-02 13:17:30
|
On Thu, May 27, 2004 at 01:56:35PM -0500, Brian Hurt wrote: > [--snip--] > - conversions to and from enums, to make it compatible with the > rest of the library From just glancing quickly at the code: Shouldn't "enum_values" and "enum_keys" be called "values" and "keys" for consistency with the ExtHashtbl naming scheme? > > - fold_left and fold_right with the same semantics/calling > structure as List.fold_left and List.fold_right > > - iters and folds on 2 maps- this is what got me started on the > project. I want to do sparse vectors as (int, float) maps- and a lot of > the BLAS functions want to do iters or folds on 2 maps. Which is why, > although they are basically implemented with enums, I've included them. > > - a compare function. Unfortunately, this doesn't mean that the > module, as it is written at the moment, qualifies for Type.Ordered. But > it does make it easier to get there. > > If the response is positive, I'll finish documenting, do some testing, and > submit it. I'm also considering the following other small projects: I was wondering whether it might be a good idea to actually specify that map/iter/... apply the function in a particular order (i.e. with keys 'sorted' according to the Type.Ordered.compare function)...? I seem to often need to iterate over or extract elements in this order, but cannot currently because the stock Map module doesn't actually specify an ordering (although the current implementation does order elements in 'key-sorted' order). This leads to all sorts of ugly hacks, depending on performance requirements. -- Bardur Arantsson <ba...@im...> <ba...@sc...> - That plot makes perfect sense. Wink, wink - Bender, you said "wink, wink" out loud. - No, I didn't. Raise middle finger. Bender and Zoidberg | Futurama |
From: Brian H. <bh...@sp...> - 2004-06-02 19:59:42
|
On Wed, 2 Jun 2004, Bardur Arantsson wrote: > >From just glancing quickly at the code: Shouldn't > "enum_values" and "enum_keys" be called "values" and > "keys" for consistency with the ExtHashtbl naming scheme? Sure. > I was wondering whether it might be a good idea to > actually specify that map/iter/... apply the function in a > particular order (i.e. with keys 'sorted' according to the > Type.Ordered.compare function)...? > > I seem to often need to iterate over or extract elements > in this order, but cannot currently because the stock Map > module doesn't actually specify an ordering (although the > current implementation does order elements in 'key-sorted' > order). This leads to all sorts of ugly hacks, depending > on performance requirements. > Two questions with regard to this: 1) Would this require that map be based on a tree? a) The specifications I'm giving is tending hard that way anyways, I'm not sure what other data structure would work, and b) even if it is required to be a tree, is this bad? 2) Would we need to supply both forwards and backwards maps and iters? The obvious way to implement these are analogs to fold_left and fold_right- fold, map, and iter have no defined behavior, fold_left, iter_left, and map_left all present increasing keys, fold_right, iter_right, and map_right all present decreasing keys. Would this cause an useless code bloat? I will note that I didn't supply a map2 function, as I couldn't think of a better way to implement it than the obvious fold implementation. My requirements to include new code were not only that the code be usefull, but that I could at least possibly implement a faster/better solution internally than you could implement with the interface supplied. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-06-02 20:21:12
|
On Wed, Jun 02, 2004 at 03:05:30PM -0500, Brian Hurt wrote: > On Wed, 2 Jun 2004, Bardur Arantsson wrote: > > > I was wondering whether it might be a good idea to > > actually specify that map/iter/... apply the function in a > > particular order (i.e. with keys 'sorted' according to the > > Type.Ordered.compare function)...? > > > > I seem to often need to iterate over or extract elements > > in this order, but cannot currently because the stock Map > > module doesn't actually specify an ordering (although the > > current implementation does order elements in 'key-sorted' > > order). This leads to all sorts of ugly hacks, depending > > on performance requirements. > > > > Two questions with regard to this: > > 1) Would this require that map be based on a tree? a) The specifications > I'm giving is tending hard that way anyways, I'm not sure what other data > structure would work, and b) even if it is required to be a tree, is this > bad? I suppose it would require slightly more than strictly necessary, but I can't think of any non-tree-like implementation of a non-destructive map-type structure which can achieve the same low complexity for all the needed operations. (But I'm not _that_ familiar with more advanced data structures, so you should take that with a grain of salt...) > > 2) Would we need to supply both forwards and backwards maps and iters? The > obvious way to implement these are analogs to fold_left and fold_right- > fold, map, and iter have no defined behavior, fold_left, iter_left, and > map_left all present increasing keys, fold_right, iter_right, and > map_right all present decreasing keys. Would this cause an useless code > bloat? I don't think it would necessarily bloat the code since you can implement one (two, perhaps?) generic traversal algorithm which can support all of the above, and then just call it with appropriate closures. > > I will note that I didn't supply a map2 function, as I couldn't think of a > better way to implement it than the obvious fold implementation. My > requirements to include new code were not only that the code be usefull, > but that I could at least possibly implement a faster/better solution > internally than you could implement with the interface supplied. > Well, technically it's quite possible to maintain the key values separately in a Set, and get a "sorted map" that way. Insertion/deletion/etc. costs about twice as much as just using the Map. But if we want to iterate through the (key,value) pairs in that Map in sorted order, there are basically two approaches: 1) Use Set to get sorted list of keys; iterate through keys, using ExtMap.find to retrieve the corresponding value. O(n*log n), n lookups and O(log n) lookup time. 2) Use ExtMap.elements to get a list of (key,value) pairs, sort that and iterate. O(n*log n) from the sort. Neither of which is as fast as just direct traversal of the Map tree would be. -- Bardur Arantsson <ba...@im...> <ba...@sc...> - And... wouldn't you know it, Mianus has a general store. Johnny Knoxville | Jackass |
From: Brian H. <bh...@sp...> - 2004-06-02 20:37:24
|
On Wed, 2 Jun 2004, Bardur Arantsson wrote: > I suppose it would require slightly more than strictly > necessary, but I can't think of any non-tree-like > implementation of a non-destructive map-type structure > which can achieve the same low complexity for all the > needed operations. (But I'm not _that_ familiar with more > advanced data structures, so you should take that with a > grain of salt...) The more I think about it, the more I like basically requiring a tree structure. For the core operations (find, add, remove), a tree structure is already O(log N). The next step faster than that is O(1)- which I don't think we'll get. Well, we get them with a hash, but I don't think you could implement a functional hash, and we already have an imperitive hash. Note that when I say "tree", this also includes "tree-like" structures, like 2,3,4-trees and b-trees. All I need is that the nodes be kept in some sort of sorted order. Allowing you to traverse them in a forward or backwards order in O(N). I notice that I specified that the enumerations come out in sorted order- I wanted to do this so that I could do efficient (O(N)) maps using enums, so long as the map maintained the sorted property. So this may already be a requirement in the new interface. > I don't think it would necessarily bloat the code since > you can implement one (two, perhaps?) generic traversal > algorithm which can support all of the above, and then > just call it with appropriate closures. True. I'd be inclined to declare the directionless comprehensions to be deprecated, and that new code should specify a direction (left or right). > Well, technically it's quite possible to maintain the key > values separately in a Set, and get a "sorted map" that > way. Insertion/deletion/etc. costs about twice as much as > just using the Map. But if we want to iterate through the > (key,value) pairs in that Map in sorted order, there are > basically two approaches: > > 1) Use Set to get sorted list of keys; iterate through > keys, using ExtMap.find to retrieve the corresponding > value. O(n*log n), n lookups and O(log n) lookup time. > > 2) Use ExtMap.elements to get a list of (key,value) pairs, > sort that and iterate. O(n*log n) from the sort. > > Neither of which is as fast as just direct traversal of > the Map tree would be. Yep, they'd qualify. Although I suppose you could use the sorted gaurentee of enums to your advantage- but it'd still be more efficient to do it directly. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-06-02 21:38:35
|
On Wed, Jun 02, 2004 at 03:43:15PM -0500, Brian Hurt wrote: > > > I don't think it would necessarily bloat the code since > > you can implement one (two, perhaps?) generic traversal > > algorithm which can support all of the above, and then > > just call it with appropriate closures. > > True. I'd be inclined to declare the directionless comprehensions to be > deprecated, and that new code should specify a direction (left or right). > That's probably a good idea. I just though of something else which might be useful both for internal use, but also for users of the library (well, me in any case ;)): What about introducing a "node-pointer" type (and "index" if you will) which could be used as a starting point/stopping point for iteration? Then one could do things like: let b = find_ix some_key1 map and e = find_ix some_key2 map in ExtMap.iter ~first:b ~last:e (fun ...) map (where both ~first and ~last are _optional_) for range traversals. Of course this could also be done with a more general "iter" function which could just take the first/last keys themselves as parameters. The latter is slightly less efficient because indexes cannot be reused directly (i.e. extraneous lookups occur if you reuse one of the 'endpoints' in subsequent code). Another option is to just add ~first and ~last parameters to "enum" -- this would allow reuse of the same range (through Enum.clone), but not just one of the 'endpoints'. -- Bardur Arantsson <ba...@im...> <ba...@sc...> |
From: Brian H. <bh...@sp...> - 2004-06-02 21:52:31
|
On Wed, 2 Jun 2004, Bardur Arantsson wrote: > I just though of something else which might be useful both > for internal use, but also for users of the library (well, > me in any case ;)): What about introducing a > "node-pointer" type (and "index" if you will) which could > be used as a starting point/stopping point for iteration? > > Then one could do things like: > > let b = find_ix some_key1 map > and e = find_ix some_key2 map > in > ExtMap.iter ~first:b ~last:e (fun ...) map Why not just make the first and last arguments keys? In other words: ExtMap.iter ~first:some_key1 ~last:some_key2 (fun ...) map How usefull (to people other than you) would such functions be? They wouldn't be that complicated, but I'd be inclined to make them a different function rather than using optional arguments. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-06-02 22:06:38
|
On Wed, Jun 02, 2004 at 04:58:25PM -0500, Brian Hurt wrote: > On Wed, 2 Jun 2004, Bardur Arantsson wrote: > > > I just though of something else which might be useful both > > for internal use, but also for users of the library (well, > > me in any case ;)): What about introducing a > > "node-pointer" type (and "index" if you will) which could > > be used as a starting point/stopping point for iteration? > > > > Then one could do things like: > > > > let b = find_ix some_key1 map > > and e = find_ix some_key2 map > > in > > ExtMap.iter ~first:b ~last:e (fun ...) map > > Why not just make the first and last arguments keys? In other words: > ExtMap.iter ~first:some_key1 ~last:some_key2 (fun ...) map Well, the thing is that by using 'indexes' you can reuse the range 'boundary elements' in case you need to do two traversals with the same 'first' element, but a different 'last' element (or vice versa). The idea is that the actual key lookup happens when the 'index' is built, so you can avoid looking up the keys multiple times. Granted, lookup is only an O(logN) operation, but still... > > How usefull (to people other than you) would such functions be? Of course, I'm not qualified to answer that :) > They wouldn't be that complicated, but I'd be inclined to > make them a different function rather than using optional > arguments. The good thing about using optional arguments for this is that you essentially get 4 functions from 1: You can support ranges of the types [...,b] [a,...] [a,...,b] [...] with one function. (And yes, I *am* currently doing something which requires all four types of ranges for iteration :)). The interface quickly becomes very ugly if you don't use optional arguments in this sort of situation... -- Bardur Arantsson <ba...@im...> <ba...@sc...> - And the best part? I didn't learn a thing. |
From: Brian H. <bh...@sp...> - 2004-06-02 22:20:43
|
On Thu, 3 Jun 2004, Bardur Arantsson wrote: > On Wed, Jun 02, 2004 at 04:58:25PM -0500, Brian Hurt wrote: > > > On Wed, 2 Jun 2004, Bardur Arantsson wrote: > > > > > I just though of something else which might be useful both > > > for internal use, but also for users of the library (well, > > > me in any case ;)): What about introducing a > > > "node-pointer" type (and "index" if you will) which could > > > be used as a starting point/stopping point for iteration? > > > > > > Then one could do things like: > > > > > > let b = find_ix some_key1 map > > > and e = find_ix some_key2 map > > > in > > > ExtMap.iter ~first:b ~last:e (fun ...) map > > > > Why not just make the first and last arguments keys? In other words: > > ExtMap.iter ~first:some_key1 ~last:some_key2 (fun ...) map > > Well, the thing is that by using 'indexes' you can reuse > the range 'boundary elements' in case you need to do two > traversals with the same 'first' element, but a different > 'last' element (or vice versa). The idea is that the > actual key lookup happens when the 'index' is built, so > you can avoid looking up the keys multiple times. Granted, > lookup is only an O(logN) operation, but still... I don't see what we could store in an index that'd make the traversal faster than this code (assuming we have both a first and a last): let iter_range first last f map = let rec loop = function | Empty -> () | R(k, d, l, r) | B(k, d, l, r) -> let rval1 = Ord.compare k first and rval2 = Ord.compare k last in if rval1 >= 0 then begin loop l; if rval2 <= 0 then f k d end; if rval2 <= 0 then loop r in loop map ;; > > They wouldn't be that complicated, but I'd be inclined to > > make them a different function rather than using optional > > arguments. > > The good thing about using optional arguments for this is > that you essentially get 4 functions from 1: You can > support ranges of the types > > [...,b] > [a,...] > [a,...,b] > [...] The disadvantage is that it makes that one function way more complicated. More than twice as complicated, easily, as each place we want to access either argument, we have to deal with both the presence and abscence of said argument. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-06-02 22:47:45
|
On Wed, Jun 02, 2004 at 05:26:41PM -0500, Brian Hurt wrote: > On Thu, 3 Jun 2004, Bardur Arantsson wrote: > > > On Wed, Jun 02, 2004 at 04:58:25PM -0500, Brian Hurt wrote: > > > > > On Wed, 2 Jun 2004, Bardur Arantsson wrote: > > > > > > > I just though of something else which might be useful both > > > > for internal use, but also for users of the library (well, > > > > me in any case ;)): What about introducing a > > > > "node-pointer" type (and "index" if you will) which could > > > > be used as a starting point/stopping point for iteration? > > > > > > > > Then one could do things like: > > > > > > > > let b = find_ix some_key1 map > > > > and e = find_ix some_key2 map > > > > in > > > > ExtMap.iter ~first:b ~last:e (fun ...) map > > > > > > Why not just make the first and last arguments keys? In other words: > > > ExtMap.iter ~first:some_key1 ~last:some_key2 (fun ...) map > > > > Well, the thing is that by using 'indexes' you can reuse > > the range 'boundary elements' in case you need to do two > > traversals with the same 'first' element, but a different > > 'last' element (or vice versa). The idea is that the > > actual key lookup happens when the 'index' is built, so > > you can avoid looking up the keys multiple times. Granted, > > lookup is only an O(logN) operation, but still... > > I don't see what we could store in an index that'd make the traversal > faster than this code (assuming we have both a first and a last): > Sorry, what I was thinking of originally actually requires parent pointers, so it's out of the question for a functional data structure. However, you could just store the comlete path to the 'first' ('last' respectively) key in an index. Then you could effectively "jumpstart" the traversal from the correct point. It would only work if an explicit stack is maintained for the traversal, so it's probably overkill. Just specifying keys is fine, so just forget I mentioned it. :) > > > > They wouldn't be that complicated, but I'd be inclined to > > > make them a different function rather than using optional > > > arguments. > > > > The good thing about using optional arguments for this is > > that you essentially get 4 functions from 1: You can > > support ranges of the types > > > > [...,b] > > [a,...] > > [a,...,b] > > [...] > > The disadvantage is that it makes that one function way more complicated. > More than twice as complicated, easily, as each place we want to access > either argument, we have to deal with both the presence and abscence of > said argument. I think it should definitely be preferred to hide the complexity _inside_ the library functions instead of making the interface uglier. But I don't think the code needs to become needlessly complicated in this case. The below "diff" should do it (I've forgotten traditional syntax, so it's probably full of syntax errors -- I hope the idea is clear, though...): let iter_range first last f map = + let comp_first = + match first with + | Some f -> (fun k -> Ord.compare k first) + | None -> (fun k -> 1) + + and comp_last = + match last with + | Some f -> (fun k -> Ord.compare k last) + | None -> (fun k -> -1) + let rec loop = function | Empty -> () | R(k, d, l, r) | B(k, d, l, r) -> - let rval1 = Ord.compare k first + let rval1 = comp_first k - and rval2 = Ord.compare k last + and rval2 = comp_last k in if rval1 >= 0 then begin loop l; if rval2 <= 0 then f k d end; if rval2 <= 0 then loop r in loop map ;; That should do it...? Technically this is slightly slower than using custom function bodies for each situation because of the extra function calls. -- Bardur Arantsson <ba...@im...> <ba...@sc...> Press any key to continue, or any other key to cancel. |
From: Brian H. <bh...@sp...> - 2004-06-02 18:49:02
|
Finally got approved, I see. This code has at least two bugs, by the way, which I have fixs for. The of_list and of_enum functions didn't build valid trees. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |