From: Brian H. <bh...@sp...> - 2004-08-22 15:30:54
Attachments:
funarr.mli
funarr.ml
|
No enumerations, no to/from lists, no sorting- all things I'd want to add before doing an official submission. Any interest? -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Jesse G. <je...@wi...> - 2004-08-22 21:20:38
|
Brian Hurt wrote: > > No enumerations, no to/from lists, no sorting- all things I'd want to add > before doing an official submission. Any interest? Advantages and disadvantages? -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |
From: Brian H. <bh...@sp...> - 2004-08-23 13:42:49
|
On Sun, 22 Aug 2004, Jesse Guardiani wrote: > Brian Hurt wrote: > > > > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > before doing an official submission. Any interest? > > Advantages and disadvantages? > > O(log N) insertion, deletion, and access anywhere. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-08-22 21:25:57
|
> No enumerations, no to/from lists, no sorting- all things I'd want to add > before doing an official submission. Any interest? Wouldn't that be a specialized version of (int,'a) PMap.t ? Nicolas |
From: Bardur A. <oca...@sc...> - 2004-08-23 07:14:24
|
On Sun, Aug 22, 2004 at 11:26:36PM +0200, Nicolas Cannasse wrote: > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > before doing an official submission. Any interest? > > Wouldn't that be a specialized version of (int,'a) PMap.t ? > Sort of, but I guess you could add more array-like operations, e.g. Funarray.sub a i n Funarray.cut a i n (i.e. leave everything except sub a i n) which don't really "fit" into a PMap, but which *might* be (theoretically, probably not with the current implementation) more efficiently implemented in a purely applicative (immutable) data structure than in an iterative array. Cheers, -- Bardur Arantsson <ba...@im...> <ba...@sc...> - No beer and TV make Homer something something. Homer Simpson | The Simpsons |
From: Brian H. <bh...@sp...> - 2004-08-23 14:06:42
|
On Mon, 23 Aug 2004, Bardur Arantsson wrote: > On Sun, Aug 22, 2004 at 11:26:36PM +0200, Nicolas Cannasse wrote: > > > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > > before doing an official submission. Any interest? > > > > Wouldn't that be a specialized version of (int,'a) PMap.t ? > > > > Sort of, but I guess you could add more array-like > operations, e.g. > > Funarray.sub a i n > Funarray.cut a i n (i.e. leave everything > except sub a i n) I'm not sure how to implement cut in less than O(N log N) without possibly leaving the entire array unbalanced. OK, sub I could implement with an enumerator (of sorts) and a balanced O(N) tree building algorithm. But cut remains a problem. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Bardur A. <oca...@sc...> - 2004-08-23 14:29:20
|
On Mon, Aug 23, 2004 at 09:13:35AM -0500, Brian Hurt wrote: > On Mon, 23 Aug 2004, Bardur Arantsson wrote: > > > On Sun, Aug 22, 2004 at 11:26:36PM +0200, Nicolas Cannasse wrote: > > > > > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > > > before doing an official submission. Any interest? > > > > > > Wouldn't that be a specialized version of (int,'a) PMap.t ? > > > > > > > Sort of, but I guess you could add more array-like > > operations, e.g. > > > > Funarray.sub a i n > > Funarray.cut a i n (i.e. leave everything > > except sub a i n) > > I'm not sure how to implement cut in less than O(N log N) > without possibly leaving the entire array unbalanced. > > OK, sub I could implement with an enumerator (of sorts) and a balanced > O(N) tree building algorithm. But cut remains a problem. > I have a gut feeling it might be possible if the balancing constraints are relaxed and we look at amortized cost instead of worst-case case cost. But if not really sure it can be done if you always require your trees to be balanced. Anyway, your example of delete shows that it's different enough from existing structures to be useful. -- Bardur Arantsson <ba...@im...> <ba...@sc...> - First you get de sugar, den you get de power, den you get de women. Homer Simpson | The Simpsons |
From: Brian H. <bh...@sp...> - 2004-08-23 22:55:09
|
On Mon, 23 Aug 2004, Bardur Arantsson wrote: > I have a gut feeling it might be possible if the balancing > constraints are relaxed and we look at amortized cost > instead of worst-case case cost. But if not really sure it > can be done if you always require your trees to be > balanced. Anyway, your example of delete shows that it's > different enough from existing structures to be useful. I'm actually having thoughts here. To remove any contiguous subrange of length N, you need to remove O(log N) subtrees. The key op to removing a subrange fast is the removal of subtrees fast. Consider the following situation, you have a tree that looks like: A / \ / \ B C / \ / \ ... D E / \ / \ ... ... You want to delete E and it's entire subtree. Now, normally, you'd end up with a tree like this: A / \ / \ B C / \ / ... D / \ ... Which is horribly unbalanced. Instead, what you can do is delete C as well, so you end up with a tree like: A / \ / \ B D / \ / \ ... ... You then need to balance the tree, starting at A and working up, and reinsert C. OK, it's a weight balanced tree, so the weight of B is less than 2x the weight of the original C subtree. Call C 1/2 the weight of B, worst case. Now, assume that E is 2x the weight of D- this means D is 1/3rd the weight of C, or 1/6th the weight of B. But when you do a rotate right, B gives up it's right child. Assume B's right child is 1/2 the weight of it's left child, or 1/3rd the total weight of B. So after the rotate, B's left child will be 2/3rds the original weight of B, and B's right child (now A) will be 1/3 + 1/6 the weight of B, or 1/2 the weight of B. And Balancing constraints are maintained. Ah, here's where the balancing constraint can get violated. Assume B's right child is the heavy child, 2x the weight of B's left child, or 2/3rds the total weight of B. Then when we rotate, we move too many children over. No. In this case you only need to rotate B left before rotating the entire structure right to rebalance. Worst case you need to do two rotates and one reinsertion to rebalance the whole subtree. Humorously, the reinsertion is the expensive part, at O(log N). Deleting M elements from an N element tree is then O((log N) * (log M)). Another thought: appending and prepending one funcarr to another should be an O(log N) operation. I walk down the left or right subtree of the larger tree until I hit node with about the same weight as the smaller tree. I then do a join of those two trees, and rebalance as I walk back up. Inserting is a little bit more tricky, as I can't swap appending and prepending. Might still be doable cheaply. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Brian H. <bh...@sp...> - 2004-08-23 23:54:36
|
Further thoughts: O(log N) appending is the critical function. Delete springs from that. The algorithm works thusly: I want to delete a range a..b of nodes in the current tree. There will be some number c such that a..c are all in the left subtree, and c..b are all in the right subtree. So I delete range a..c in the left subtree, and c..b in the right subtree- each deletion returning a balanced tree. I then append to the two balanced trees, creating another balanced tree (and insert the forgotten root node if it hasn't been deleted). This is the balanced tree I then return. My brain is frying, which is why I didn't see this straight off. Code will show up tomorrow. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Brian H. <bh...@sp...> - 2004-08-23 13:46:09
|
On Sun, 22 Aug 2004, Nicolas Cannasse wrote: > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > before doing an official submission. Any interest? > > Wouldn't that be a specialized version of (int,'a) PMap.t ? > No. With a (int, 'a) PMap.t, if you delete node i, you don't change the index of node i+1. With a 'a Funcarr.t, deleting node i causes node i+1 to move down to become the new node i. There are times when I want the PMap behavior, and times when I want the Funcarr behavior, but they are different behaviors. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-08-23 15:52:59
|
> > > No enumerations, no to/from lists, no sorting- all things I'd want to add > > > before doing an official submission. Any interest? > > > > Wouldn't that be a specialized version of (int,'a) PMap.t ? > > > > No. With a (int, 'a) PMap.t, if you delete node i, you don't change the > index of node i+1. With a 'a Funcarr.t, deleting node i causes node i+1 > to move down to become the new node i. > > There are times when I want the PMap behavior, and times when I want the > Funcarr behavior, but they are different behaviors. That's interesting ! I'm just a little worried about the name, since in general people expect O(1) Arrays accesses. Nicolas |
From: Jim F. <ji...@fa...> - 2004-08-23 16:53:26
|
On Mon, Aug 23, 2004 at 05:51:54PM +0200, Nicolas Cannasse wrote: > That's interesting ! > I'm just a little worried about the name, since in general people expect > O(1) Arrays accesses. It is possible to create functional arrays with O(1) access in many circumstances. For example, the data structure desribed in Melissa O'Neil's PhD thesis[1], has O(1) get/set an element on the most recent version of an array, and O(log n) for prior version. (Typically when using arrays, you only get/set the most recent version of the array. Indeed, for standard O'Caml arrays this is the only option, because they are mutable there are no "prior" versions, they are destroyed whenever you call set.) I wonder if anyone has tried implementing this data structure? I suspect that it would outperform tree based implementations in many situations. I tried implementing it myself, but failed, so I don't actually know how fast it would be in practice. IIRR, I failed because I couldn't understand the explanation of the O(1) solution to the list order problem sufficiently well to implement it. It definitely seems like a useful data structure, and well suited to O'Caml, as it provides a purely functional interface, but does require side effects internally. [1] http://www.cs.hmc.edu/~oneill/research/ (See botom of page) -- Jim Farrand |
From: Diego O. F. P. <Die...@et...> - 2004-08-24 08:50:28
|
Bonjour, > It is possible to create functional arrays with O(1) access in many > circumstances. For example, the data structure desribed in Melissa > O'Neil's PhD thesis[1], has O(1) get/set an element on the most recent > version of an array, and O(log n) for prior version. > > I wonder if anyone has tried implementing this data structure ? Yes : Melissa O'Neill. She may still have the sml sources. They were based on SML/NJ standard library implementation of splay trees. > I suspect that it would outperform tree based implementations in > many situations. Yes : acces to the last version of the array. Slower for older versions and a few garbage-collecting problems. > I tried implementing it myself, but failed, so I don't actually know > how fast it would be in practice. IIRR, I failed because I couldn't > understand the explanation of the O(1) solution to the list order > problem sufficiently well to implement it. The list order problem is related to garbage collection. That is the 'hard' part. Diego Olivier |
From: Brian H. <bh...@sp...> - 2004-08-23 22:57:54
|
On Mon, 23 Aug 2004, Nicolas Cannasse wrote: > I'm just a little worried about the name, since in general people expect > O(1) Arrays accesses. O(1), or just fast? A different name would be happily accepted. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-08-24 06:58:44
|
> On Mon, 23 Aug 2004, Nicolas Cannasse wrote: > > > I'm just a little worried about the name, since in general people expect > > O(1) Arrays accesses. > > O(1), or just fast? > > A different name would be happily accepted. I don't know if such a structure has already a given name in the literature but what about IndexedMap ? Nicolas |
From: Bardur A. <oca...@sc...> - 2004-08-24 07:18:10
|
On Tue, Aug 24, 2004 at 08:57:40AM +0200, Nicolas Cannasse wrote: > > On Mon, 23 Aug 2004, Nicolas Cannasse wrote: > > > > > I'm just a little worried about the name, since in general people expect > > > O(1) Arrays accesses. > > > > O(1), or just fast? > > > > A different name would be happily accepted. > > I don't know if such a structure has already a given name in the literature > but what about IndexedMap ? But IndexedMap might lead people to expect Map-like behavior which e.g. delete doesn't have. I think Funarray is fine, I think most people who really care about the complexity would actually check the documentation/implementation. I would also think that most Array users would use e.g. iter for iteration, and iter *is* O(log n + n) = O(n) which is just the same as for regular arrays. For the occasional "get", O(log n) is *practically* O(1), with 10m elements you get 24 steps, and 20m elements only costs 25 steps. Sure, it's ~25 times slower than an array access, but unless you're only doing array accesses it shouldn't really matter overall. Btw, a small feature request for Funarray (and possibly the other structures where it applies): How about a "bounded" iter function e.g. "iter_range"? The lack of such a function on Arrays/Maps/... has always annoyed me. -- Bardur Arantsson <ba...@im...> <ba...@sc...> - Lousy two-legged pants. Homer Simpson | The Simpsons |
From: Nicolas C. <war...@fr...> - 2004-08-24 07:23:26
|
> Btw, a small feature request for Funarray (and possibly > the other structures where it applies): How about a > "bounded" iter function e.g. "iter_range"? The lack of > such a function on Arrays/Maps/... has always annoyed me. and also map_range, fold_range, .... Theses ops should also be useful on all containers. Guess what ? I think it's a good job for enums :) What about : module Enum ---- val range : t -> int -> int -> t Write once, use alot Nicolas |
From: Bardur A. <oca...@sc...> - 2004-08-24 07:49:25
|
On Tue, Aug 24, 2004 at 09:22:23AM +0200, Nicolas Cannasse wrote: > > Btw, a small feature request for Funarray (and possibly > > the other structures where it applies): How about a > > "bounded" iter function e.g. "iter_range"? The lack of > > such a function on Arrays/Maps/... has always annoyed me. > > and also map_range, fold_range, .... > Theses ops should also be useful on all containers. > Guess what ? I think it's a good job for enums :) > > What about : > > module Enum > ---- > val range : t -> int -> int -> t > > Write once, use alot > Well, almost... 1) Keys are not always ints, so this interface wouldn't work for maps. (You'd have to find out what "index" a given key has in its enumeration). 2) I don't believe the current Enum.t can support this is a efficiently as one would want. For example, (Enum.range e a b) on a Funarray enum would cause b enumeration elements to be visited whereas a good implementation would only cause b-a elements to be visited. There can be a significant difference, just think of a Funarray of 1m elements where you only want to look at subranges of 10 elements at a time. I've proposed it before and I'll propose it again: Maybe ADT.enum should have *optional* parameters ?start, ?stop and ?step where they can be supported. That way you could create enumerations of subranges where they are actually supported and get compile-time errors if they are not supported (i.e. if you decide to change an underlying ADT for another which doesn't support ranges). They could also use the proper "index" types. -- Bardur Arantsson <ba...@im...> <ba...@sc...> - No beer and TV make Homer something something. Homer Simpson | The Simpsons |
From: Diego O. F. P. <Die...@et...> - 2004-08-24 07:55:08
|
Bonjour, > But IndexedMap might lead people to expect Map-like > behavior which e.g. delete doesn't have. I would have named that BinaryTreeArray or MapArray Diego Olivier |
From: Brian H. <bh...@sp...> - 2004-08-24 14:14:14
|
On Tue, 24 Aug 2004, Nicolas Cannasse wrote: > > On Mon, 23 Aug 2004, Nicolas Cannasse wrote: > > > > > I'm just a little worried about the name, since in general people expect > > > O(1) Arrays accesses. > > > > O(1), or just fast? > > > > A different name would be happily accepted. > > I don't know if such a structure has already a given name in the literature > but what about IndexedMap ? It's not really a map. Maybe IndexedTree? -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
From: Nicolas C. <war...@fr...> - 2004-08-24 17:50:42
|
> > > > I'm just a little worried about the name, since in general people expect > > > > O(1) Arrays accesses. > > > > > > O(1), or just fast? > > > > > > A different name would be happily accepted. > > > > I don't know if such a structure has already a given name in the literature > > but what about IndexedMap ? > > It's not really a map. Maybe IndexedTree? Also ok for me. Nicolas |
From: Peter J. <pe...@jo...> - 2004-08-24 19:14:52
|
>>>>>I'm just a little worried about the name, since in general people >>>>>expect O(1) Arrays accesses. >>>> >>>>O(1), or just fast? >>>> >>>>A different name would be happily accepted. >>> >>>I don't know if such a structure has already a given name in the >>>literature but what about IndexedMap ? >> >>It's not really a map. Maybe IndexedTree? The problem with that is that it describes the implementation, rather than the interface. A tree suggests a 2D branching structure, while the FuncArray interface behaves as though it were linear. I'm not convinced having the word Array in the name is really a problem; that's the best description of the interface. Plus, if you were going to try to meet everyone's assumptions about complexity, you'd have to rename ExtList too, because Python "lists" have O(1) accesses... |
From: Jesse G. <je...@wi...> - 2004-08-24 14:22:50
|
Nicolas Cannasse wrote: [...] > That's interesting ! > I'm just a little worried about the name, since in general people expect > O(1) Arrays accesses. I think the fact that Brian publishes the theoretical Big-O scalability of every function in his interface will quickly correct any misconceptions. -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net |
From: Richard J. <ri...@an...> - 2004-08-24 14:57:52
|
On Mon, Aug 23, 2004 at 05:51:54PM +0200, Nicolas Cannasse wrote: > That's interesting ! > I'm just a little worried about the name, since in general people expect > O(1) Arrays accesses. Isn't O(1) access possible with a different kind of functional array structure? One where the "head revision" of the array is stored as an array, and previous "versions" are stored as a list of differences. I vaguely recall such a structure being discussed in either Okasaki or "ML for the Working Programmer", both books having been lent to people so I can't check up the reference right now. Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment MONOLITH is an advanced framework for writing web applications in C, easier than using Perl & Java, much faster and smaller, reusable widget-based arch, database-backed, discussion, chat, calendaring: http://www.annexia.org/freeware/monolith/ |
From: Brian H. <bh...@sp...> - 2004-08-24 15:54:58
|
On Tue, 24 Aug 2004, Richard Jones wrote: > On Mon, Aug 23, 2004 at 05:51:54PM +0200, Nicolas Cannasse wrote: > > That's interesting ! > > I'm just a little worried about the name, since in general people expect > > O(1) Arrays accesses. > > Isn't O(1) access possible with a different kind of functional array > structure? One where the "head revision" of the array is stored as an > array, and previous "versions" are stored as a list of differences. How do you know when someone else has made a change to the array, and you no longer have the most recent version? The "someone else" might be a different thread, or it might just be a different function. Also, the most recent version might be garbage, and you might have different "branchs" of the same data structure. Consider the following scenario: Function a has the most recent version of a funcarr. It passes it to function b, which internally modifies the funcarr. But in returning, the new most recent version is discarded. a accesses the funcarr- it needs to know that it no longer has the most recent version. The a passes the old version of the funcarr to function c, which modifies it yet again- and now we're on a different revision branch than function b's version. And so on. I haven't read said papers yet, but the problems I can see are fearsome. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |