From: Christophe T. <chr...@us...> - 2005-05-25 17:46:57
Attachments:
dllist.mli
|
Hi, Is there a good reason (aside from implementation easiness) not to allow Dllist to be empty? IMHO, empty lists are very useful and it is rather an hindrance that Dllists cannot be. To have an idea of what I have in mind, see the attached interface. Comments are welcome. Regards, ChriS -- P.S. Following the standard library, [create] should be used for _uninitialized_ structures. Here [make] is more standard. |
From: Nicolas C. <war...@fr...> - 2005-05-26 02:18:53
|
> Hi, > > Is there a good reason (aside from implementation easiness) not to > allow Dllist to be empty? IMHO, empty lists are very useful and it is > rather an hindrance that Dllists cannot be. > > To have an idea of what I have in mind, see the attached interface. > Comments are welcome. > > Regards, > ChriS I can't tell exactly since I neither wrote or use Dlllist. Could the Dlllist authors comment on this proposal ? Nicolas |
From: Brian H. <bh...@sp...> - 2005-05-27 02:26:22
|
On Wed, 25 May 2005, Christophe TROESTLER wrote: > Hi, > > Is there a good reason (aside from implementation easiness) not to > allow Dllist to be empty? IMHO, empty lists are very useful and it is > rather an hindrance that Dllists cannot be. My original implementation did allow for empty lists, but it got overridden. Brian |
From: John S. <sk...@us...> - 2005-05-27 12:55:21
|
On Fri, 2005-05-27 at 12:06 +0200, Christophe TROESTLER wrote: > On Thu, 26 May 2005, Brian Hurt <bh...@sp...> wrote: > > > > On Wed, 25 May 2005, Christophe TROESTLER wrote: > > > > > Hi, > > > > > > Is there a good reason (aside from implementation easiness) not to > > > allow Dllist to be empty? IMHO, empty lists are very useful and it is > > > rather an hindrance that Dllists cannot be. > > > > My original implementation did allow for empty lists, but it got > > overridden. > > Where? The CVS version does not seem to allow it. Originally, the dlists had 'header' objects. I think I'm responsible for arguing that this is too complex, the lists should be made from a single simple data structure. As such, an empty list cannot be represented: use dlist opt if you want possibly empty dlists. -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net |
From: John S. <sk...@us...> - 2005-05-27 23:35:19
|
On Fri, 2005-05-27 at 16:23 +0200, Christophe TROESTLER wrote: > On Fri, 27 May 2005, John Skaller wrote: > Can you explain a bit more your point? In C notation slists are made of struct slist { T data; slist *next; }; and dlists are made of struct { T data; dlist * next; dlist *prev; }; neither can represent an empty list in Ocaml. The library supplies the maximally efficient and simplest representation by Occam's Razor, and that cannot be empty. It isn't a bug, but a consequence of the mathematics. Really, what is represented is a doubly linked list node (not a doubly linked list). -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net |
From: John S. <sk...@us...> - 2005-05-28 19:24:51
|
On Sat, 2005-05-28 at 13:44 +0200, Christophe TROESTLER wrote: > On Sat, 28 May 2005, John Skaller <sk...@us...> wrote: > > > > The library supplies the maximally efficient and simplest > > representation by Occam's Razor, and that cannot be empty. > > So it all boils down to efficiency. Well my view is that a *base* library provides simple efficient things -- it is oriented to the computer NOT the user. >From these basic libraries, more abstract and more 'user friendly' interfaces can be implemented. > (Hereafter, Dllist' refers to this > library and Dllist to Extlib one. The Benchmark module is used.) That's kind of confusing .. :) What you have is the same node structure, but have added a 'head' element. The problem with doing that is that you can't so easily chop bits of the list up, or splice bits of a list together. For example, consider two lists A and B, and you join them to create a list C. What happens to A and B? With the current implementation as I understand it, there isn't an issue, A, B, and C are now all the one list. With your implementation the operation is impossible. Joining A to B is a destructive operation: A and B don't exist aftwards. Sorry I am not explaining this well. -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net |
From: Christophe T. <chr...@us...> - 2005-05-29 14:26:40
|
On Sun, 29 May 2005, John Skaller <sk...@us...> wrote: > > Well my view is that a *base* library provides simple efficient > things -- it is oriented to the computer NOT the user. In the first mail your justification calls upon "mathematics" (in a flawed way IMHO), now you refer to the computer. Aren't these at the opposite ends of the spectrum? IMO, the interface is for the user (thus must be mathematically well defined, simple yet powerful) and the implementation is for the computer (thus allowed to use tricks to achieve efficiency). > ExtLib is positioned as a replacement system library. Precisely! If you look at the std lib, it does not provide e.g. balanced binary trees leaving to you the task of implementing Map and Set. It provides data-structure interfaces, not down to the metal low-level libraries (use C if that's what you are after). > The point is that you can construct a 'container' design from a > 'nodes-only' design with a wrapper, but not the other way around, You haven't much looked at the code, have you. If you had, you would have noticed that the wrapper would only use the node data-structure, the code for all functions being different enough to mandate rewriting them. Also, from a point of view of efficiency, providing efficient low level data-structures that become inefficient (compared to a direct implementation) when wrapped to do what the user really wants is a waste of computer and human resources. > For example, consider two lists A and B, and you join them to create > a list C. What happens to A and B? With the current implementation > as I understand it, there isn't an issue, A, B, and C are now all > the one list. It is quite funny to take the [splice] function as a justification for your design when, IMO, its utility is far lower than the possibility of having empty lists. In fact [splice l1 l2] will leave [l1] and [l2] in invalid states when [l1] and [l2] are lists _with_ends_. It is not the case for _circular_ lists -- but given the name of the module, it is my understanding that the doubly-linked list is what is targeted. A circular list module [CircularList] can also be built (and different tricks may be imagined for its implementation). In the end, it all boils down to which ADT we want to build. One wants to use a library for the operations it provides (in an efficient manner) not because it is nice to the computer. > Also, I can't contribute any experience with the particular > implementation in Extlib, because I don't use it. But eventually intend to? ChriS |
From: John S. <sk...@us...> - 2005-05-30 01:37:59
|
On Sun, 2005-05-29 at 16:25 +0200, Christophe TROESTLER wrote: > You haven't much looked at the code, have you. No, i haven't. > Also, from a point of view of efficiency, providing efficient low > level data-structures that become inefficient (compared to a direct > implementation) when wrapped to do what the user really wants is a > waste of computer and human resources. No dispute. The problem is deciding 'what the user really wants'. For some interfaces that is easy, for others it is not. Just try to design a graph library for example :) > It is quite funny to take the [splice] function as a justification for > your design when, IMO, its utility is far lower than the possibility > of having empty lists. In your opinion. However it is NOT my design. I suggested to the original author a nodes-only interface would be more appropriate. I am simply stating it again as best I can, so you can understand the rationale. I'm not claiming it is the best design for Extlib, I'm just explaining that there is a reason it is the way it is. Reopening the discussion is fine by me. If there is a decision to replace the existing code, that is fine by me too. I don't use dlist, it won't affect me. In fact I don't use extlib either (it's too heavy to include in source form in my product). > In the end, it all boils down to which ADT we want to build. Yes. I agree. Except I'd say 'ADTS' plural. > > Also, I can't contribute any experience with the particular > > implementation in Extlib, because I don't use it. > > But eventually intend to? Not in the Felix project, unless INRIA adopts it, which was the original hope. For some other Ocaml development, I may well use it. Part of the reason is enums: they're central to the design of extlib, they're a great idea, the implementation is probably efficient .. but I have some misgivings about the interface. So I'm in the interesting position I think enums are important and useful but not quite right: roughly speaking they confuse forward and input iterators, which C++ takes pains to distinguish: input iterators MUST use eager semantics (or at least schedulable semantics), whereas forward iterators would use lazy semantics. -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net |
From: Christophe T. <chr...@us...> - 2005-05-28 11:48:38
Attachments:
dllist.mli
dllist.ml
|
On Sat, 28 May 2005, John Skaller <sk...@us...> wrote: > > The library supplies the maximally efficient and simplest > representation by Occam's Razor, and that cannot be empty. So it all boils down to efficiency. Of course the problem -- as we experience with the shootout -- is that it is not so easy to measure (because we are talking of the big-O constant, not about asymptotic complexity). So instead of discussing, I propose to experiment! Attached is an updated version of the interface together with an implementation (hopefully without mistakes :). According to my little experiments, it is not so bad. (Hereafter, Dllist' refers to this library and Dllist to Extlib one. The Benchmark module is used.) Example 1: inserting 10000 elements using Dllist'.append and ---------- Dllist.add (they don't exactly do the same thing but Dllist.add doesn't need the "list" to be handled through a reference): Throughputs for Dllist' running 4 times for at least 4 CPU seconds: Dllist': 7 WALL ( 6.25 usr + 0.31 sys = 6.56 CPU) @ 185.98/s (n=1220) 7 WALL ( 6.25 usr + 0.29 sys = 6.54 CPU) @ 186.54/s (n=1220) 7 WALL ( 5.62 usr + 0.30 sys = 5.92 CPU) @ 206.08/s (n=1220) 4 WALL ( 3.92 usr + 0.22 sys = 4.14 CPU) @ 294.69/s (n=1220) Throughputs for Extlib running 4 times for at least 4 CPU seconds: Extlib: 5 WALL ( 4.02 usr + 0.19 sys = 4.21 CPU) @ 187.41/s (n=789) 4 WALL ( 4.01 usr + 0.21 sys = 4.22 CPU) @ 186.97/s (n=789) 4 WALL ( 4.00 usr + 0.20 sys = 4.20 CPU) @ 187.86/s (n=789) 5 WALL ( 4.02 usr + 0.18 sys = 4.20 CPU) @ 187.86/s (n=789) Rate Extlib Dllist' Extlib 188+- 0/s -- [-14%] Dllist' 218+-43/s [16%] -- (The results are between brackets because the variability in the measures does not confirm that the rates are significantly different. The experiment should be repeated more times. A full_major garbage collection was done between the two tests.) Example 2: Inserting 10000 elements and removing them with ---------- Dllist'.remove_last and Dllist.remove (again same canva applies): Throughputs for Dllist' running 4 times for at least 4 CPU seconds: Dllist': 8 WALL ( 7.29 usr + 0.25 sys = 7.54 CPU) @ 295.36/s (n=2227) 6 WALL ( 5.72 usr + 0.21 sys = 5.93 CPU) @ 375.55/s (n=2227) 5 WALL ( 4.85 usr + 0.16 sys = 5.01 CPU) @ 444.51/s (n=2227) 5 WALL ( 3.97 usr + 0.15 sys = 4.12 CPU) @ 540.53/s (n=2227) Throughputs for Extlib running 4 times for at least 4 CPU seconds: Extlib: 5 WALL ( 4.03 usr + 0.19 sys = 4.22 CPU) @ 152.84/s (n=645) 4 WALL ( 3.98 usr + 0.15 sys = 4.13 CPU) @ 156.17/s (n=645) 5 WALL ( 3.96 usr + 0.17 sys = 4.13 CPU) @ 156.17/s (n=645) 4 WALL ( 3.95 usr + 0.17 sys = 4.12 CPU) @ 156.55/s (n=645) Rate Extlib Dllist' Extlib 155+- 1/s -- -62% Dllist' 414+-86/s 166% -- You will probably not agree with the above results -- and maybe I made mistakes in my hastiness -- but the code is there to discuss upon. Cheers, ChriS |
From: Brian H. <bh...@sp...> - 2005-05-28 20:37:26
|
On Sun, 29 May 2005, John Skaller wrote: > Well my view is that a *base* library provides simple > efficient things -- it is oriented to the computer NOT the user. This is where you and I part company. Computer cycles are cheap compared to human cycles. > The problem with doing that is that you can't so easily > chop bits of the list up, or splice bits of a list together. Yep. This is a problem with imperitive data structures. > > For example, consider two lists A and B, and you join them > to create a list C. What happens to A and B? A and B are now invalid lists. Unless A is C. Note that you can leave A and B in valid states. Consider my example implementation I posted earlier, I couple implement the O(1) append like this: let append a b = match a.tail, b.head with | None, _ -> a.head <- b.head; a.tail <- b.tail; b.head <- None; b.tail <- None; () | _, None -> () | Some(x), Some(y) -> x.next <- y; y.prev <- x; a.tail <- b.tail; b.head <- None; b.tail <- None; () ;; Conceptually, all the elements of B are removed from it an appended to A. Yes, it's a destructive operation. So? Welcome to the imperitive world. Brian |
From: John S. <sk...@us...> - 2005-05-29 01:52:17
|
On Sat, 2005-05-28 at 15:39 -0500, Brian Hurt wrote: > > On Sun, 29 May 2005, John Skaller wrote: > > > Well my view is that a *base* library provides simple > > efficient things -- it is oriented to the computer NOT the user. > > This is where you and I part company. OK. > Computer cycles are cheap compared > to human cycles. We don't part company on that statement however. There is actually reason for orienting the library components to what is simple for the machine, as well as plenty of experience that it is a good idea, precisely to make subsequent human use simple. So there is no dispute on goals, merely on how to achieve them. The bottom line is that library design is difficult. It is always hard to choose between a lean, simple, efficient code, and a richer and more immediately useful one. In my opinion .. and it is only that .. programming language standard libraries should not pre-empt the building of the richer, easier to use but more application specific style of library -- they should instead wrap up only low level concepts. ExtLib is positioned as a replacement system library. > > The problem with doing that is that you can't so easily > > chop bits of the list up, or splice bits of a list together. > > Yep. This is a problem with imperitive data structures. > > > > > For example, consider two lists A and B, and you join them > > to create a list C. What happens to A and B? > > A and B are now invalid lists. Unless A is C. With the container design. But this doesn't happen with the nodes only design. That's the point -- all lists are always valid at all times. > Note that you can leave A and B in valid states. Yes, you can in this case, the issue isn't if they're valid, but what they *should* represent. That issue doesn't arise with the nodes only design. > Conceptually, all the elements of B are removed from it an appended to A. > > Yes, it's a destructive operation. So? Welcome to the imperitive world. No, the 'so' is that (a) arbitrary decisions have to be made, this is not the case for append (its clear you're emptying list B when you append it to A), but some other cut and paste operations leave one wondering which lists contain what, and (b) maintaining coherence turns out to be very expensive. The point is that you can construct a 'container' design from a 'nodes-only' design with a wrapper, but not the other way around, so the nodes-only design is the right one to implement in a low level library by Occams razor. Note that I am NOT saying this is any kind of proof the right choice was made -- I'm just saying there is a rational basis for *both* choices. In fact, I'll bet there exist more than just these two designs .. Also, I can't contribute any experience with the particular implementation in Extlib, because I don't use it. -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net |
From: Brian H. <bh...@sp...> - 2005-05-29 04:08:46
|
On Sun, 29 May 2005, John Skaller wrote: >> Computer cycles are cheap compared >> to human cycles. > > We don't part company on that statement however. > > There is actually reason for orienting the library > components to what is simple for the machine, > as well as plenty of experience that it is a good idea, > precisely to make subsequent human use simple. That's what I meant. I prefer to optimize for human use over computer use. And I don't sweat squezing every last cycle out of my code. If I did, I'd be writting in assembler, or C at the most, not Ocaml. > So there is no dispute on goals, merely on how to > achieve them. No, there is a dispute on goals. Easy for humans to use, or easy for computers to use? Or to put it a different way: > > The bottom line is that library design is difficult. > It is always hard to choose between a lean, simple, > efficient code, and a richer and more immediately > useful one. Yep. And I'm down on the richer and more immediately usefull side. > > In my opinion .. and it is only that .. programming language > standard libraries should not pre-empt the building of > the richer, easier to use but more application specific > style of library -- they should instead wrap up only > low level concepts. And if the library was written in Ocaml, it can be rewritten in Ocaml as well. I've yet to see a library in extlib that'd take me more than a day to recreate, worst case. And if I'm recreating the library for a special purpose, often times it takes less time than that, as I only recreate those functions I actually need. Hell, I wrote the central data structure definitions and two core functions as throw-aways into an email. > > ExtLib is positioned as a replacement system library. I'm more of the Java school of system libraries than the C school. > With the container design. But this doesn't happen with the > nodes only design. That's the point -- all lists are always > valid at all times. Define valid. By one definition, in the above example all lists are always valid too. Having no elements does not mean the list is invalid. If you mean that some other code somewhere can get confused because list B changed- that can happen in both implementations. > >> Note that you can leave A and B in valid states. > > Yes, you can in this case, the issue isn't if they're > valid, but what they *should* represent. That issue > doesn't arise with the nodes only design. Yes it does, it's just hidden behind the introduced complexity. In appending B to A, you've changed list B. In my implementation, you empty B. In your implementation, you've appended list A to list B. In both cases, you have still changed list B. Worse yet, now changes to list B also change list A. By the act of appending the two lists, I've created an identity that didn't exist before. We can argue til the cows come home which behavior is better- but you can't pretend that the node-based implementation doesn't have side-effect problems. > (a) arbitrary decisions have to be made, > this is not the case for append (its clear you're emptying list B > when you append it to A), but some other cut and paste operations > leave one wondering which lists contain what, and Yep. And these decisions need to be made no matter what basic design we go with. And they should be consistent, and they should be documented. > > (b) maintaining coherence turns out to be very expensive. > Yep. On the other hand, if you want coherence, you should probably be using an applicative data structure. For imperitive data structures, I'd be inclined to be maximally destructive, because it's likely the destructive variation is what is wanted. > The point is that you can construct a 'container' design > from a 'nodes-only' design with a wrapper, but not the > other way around, so the nodes-only design is the right > one to implement in a low level library by Occams razor. And my point is that if I have to write a wrapper library, I won't use the base library because reimplementing it isn't that hard. Brian |
From: Christophe T. <Chr...@um...> - 2005-05-27 18:48:58
|
On Fri, 27 May 2005, John Skaller wrote: > > On Fri, 2005-05-27 at 12:06 +0200, Christophe TROESTLER wrote: > > On Thu, 26 May 2005, Brian Hurt <bh...@sp...> wrote: > > > > > > On Wed, 25 May 2005, Christophe TROESTLER wrote: > > > > > > > Is there a good reason (aside from implementation easiness) > > > > not to allow Dllist to be empty? IMHO, empty lists are very > > > > useful and it is rather an hindrance that Dllists cannot be. > > > > > > My original implementation did allow for empty lists, but it got > > > overridden. > > > > Where? The CVS version does not seem to allow it. > > Originally, the dlists had 'header' objects. No sure what you mean precisely by that. > I think I'm responsible for arguing that this is too complex, the > lists should be made from a single simple data structure. Can you explain a bit more your point? IMO, when one designs a library for wide use, one should take care about simplicity of the interface, usability and (if relevant) performance. If that means a bit more of coding, that's not a big deal. Here the usability is at stake: I think a [mutable,...] list that cannot be empty is not nearly as useful as one that can. > As such, an empty list cannot be represented: use dlist opt > if you want possibly empty dlists. And redefine every single function of the module to cope with that! That defeats the purpose of using the library doesn't it. ChriS |
From: Christophe T. <chr...@us...> - 2005-05-27 10:08:39
|
On Thu, 26 May 2005, Brian Hurt <bh...@sp...> wrote: > > On Wed, 25 May 2005, Christophe TROESTLER wrote: > > > Hi, > > > > Is there a good reason (aside from implementation easiness) not to > > allow Dllist to be empty? IMHO, empty lists are very useful and it is > > rather an hindrance that Dllists cannot be. > > My original implementation did allow for empty lists, but it got > overridden. Where? The CVS version does not seem to allow it. As an example, consider: # let l = create 1 in demote l; length l;; - : int = 1 (hum). How do you create an empty list? ChriS |
From: Brian H. <bh...@sp...> - 2005-05-28 15:02:47
|
On Fri, 27 May 2005, Christophe TROESTLER wrote: > On Thu, 26 May 2005, Brian Hurt <bh...@sp...> wrote: >> >> On Wed, 25 May 2005, Christophe TROESTLER wrote: >> >>> Hi, >>> >>> Is there a good reason (aside from implementation easiness) not to >>> allow Dllist to be empty? IMHO, empty lists are very useful and it is >>> rather an hindrance that Dllists cannot be. >> >> My original implementation did allow for empty lists, but it got >> overridden. > > Where? The CVS version does not seem to allow it. As an example, > consider: > > # let l = create 1 in > demote l; > length l;; > - : int = 1 > > (hum). How do you create an empty list? If I recall correctly, the original implementation of dllist I submitted way back when worked like this: type 'a node_t = { data: 'a; mutable next: 'a node_t; mutable prev: 'a node_t; };; type 'a t = { mutable head: 'a node_t option; mutable tail: 'a node_t option; };; let empty () = { head = None; tail = None };; let prepend x lst = let rec newnode = { data = x; next = newnode; prev = newnode } in match lst.head with | Some(node) -> newnode.next <- node; node.prev <- newnode; lst.head <- Some(newnode) | None -> let opt = Some(newnode) in lst.tail <- opt; lst.head <- opt ;; Which allows for the obvious empty list representation. It got overrulled in an attempt to save a couple of clock of cycles and maybe a few bytes. Having fought this battle before, at length, and lost, I just said "yeah, sure, whatever". Brian |