From: Brian H. <bh...@sp...> - 2003-05-25 02:55:49
|
Added a create function in place of the old empty. Fixed queue_to_string while I was in there to use a string buffer instead of just endlessly concating strings. This should be more efficient. Note that priorities are still implicitly part of the data. Not that I'm opposed to making priorities just ints, but one major change at a time. Also todo: to_enum, of_enum, to_stream, and of_stream should be added. The to_* should be both by priority and by key. Brian |
From: Nicolas C. <war...@fr...> - 2003-05-26 02:07:09
|
> Also todo: to_enum, of_enum, to_stream, and of_stream should be added. > The to_* should be both by priority and by key. Even if I like Streams I think that enums are somehow much powerful and less error prone (since you can't do next() ). Adding Stream support to psqueue would imply adding it for other data structures as well ( since we went to make things consistent ). I would prefer an Enum.stream / Enum.of_stream and keep only "enum" usage into data structures. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-26 22:36:51
|
On Mon, 26 May 2003, Nicolas Cannasse wrote: > > Also todo: to_enum, of_enum, to_stream, and of_stream should be added. > > The to_* should be both by priority and by key. > > Even if I like Streams I think that enums are somehow much powerful and less > error prone (since you can't do next() ). The more I think about this, the more I disagree. Case in point: consider taking a list of items and turning it into a balanced tree. Using the standard list operations, this is not too horribly ugly: type 'a tree_t = Void | Node of 'a tree_t * 'a * 'a tree_t let tree_from_lst lst = let rec subtree lst len = if (len > 1) then let pivot = len/2 in let left, lst' = subtree lst pivot in match lst' with h :: t -> let right, t' = subtree t (len - pivot - 1) in Node(left, h, right), t' | _ -> assert false else if (len > 0) then match lst with h :: t -> Node(Void, h, Void), t | _ -> assert false else Void, lst in let len = List.length lst in if (len > 0) then let tree, _ = subtree lst (List.length lst) in tree else Void Now consider rewritting this to take an Enum instead of a list. You have to force the algorithm into fold somehow- which means managing a handrolled stack (the above code is double non-tail recursive). Signifigantly uglier. On the other hand, using a stream is almost as bad, as you need a way to find out the number of elements in the stream before starting (above done with a call to String.length). The above code, or rather code much like it, showed up in some exploratory code I was doing to implement a interval map (see the thread in the main list). The idea being that a tree would make inserts and removals O(log N), while I could do (for example) unions by first converting both trees to a list, unioning the lists, and recreating the tree from the unioned list, list: let union a b = let rec loop alst blst accum = match alst, blst with (amin, amax):: atail, (bmin, bmax)::btail -> if (amax < bmin) then loop atail blst ((amin, amax) :: accum) else if (bmax < amin) then loop alst btail ((bmin, bmax) :: accum) else loop atail btail (((min amin bmin), (max amax bmax)) :: accum) | _, [] -> List.rev_append accum alst | [], _ -> List.rev_append accum blst | [], [] -> List.rev accum in let alist = list_from_tree a and blist = list_from_tree b in tree_from_list (loop alist blist) ;; But this requires me to instantiate both lists completely- requiring O(N) memory overhead to complete. If I could "virtualize" the construction of the lists as an enumeration or stream, I could reduce the memory requirements to O(log N). It's not hard to force list_from_tree into an enumeration. I need to fake a stack, but in this case it's cleaner- all I need for a stack is the list of nodes I need to go down the left hand side of. But forcing tree_from_list into Enum.map is neither simpler than the above code, nor less error prone. Signifigantly more complicated and error prone, I would think. > Adding Stream support to psqueue > would imply adding it for other data structures as well ( since we went to > make things consistent ). > I would prefer an Enum.stream / Enum.of_stream and keep only "enum" usage > into data structures. The more I think about, the more I think the solution is to merge Stream and Enum, and create a best-of-breed interface (probably called ExtStream). IMHO, this api would: - have an exported next, like Stream - have an exported count, like Enum - have the usual gang of suspects- map, iter, fold, etc. - next returns 'a option, with None meaning no more elements, like Stream, instead of returning 'a and throwing an exception on no more elements. For the last one, fold (for example) currently looks like: let fold f init t = let ret = ref init in let rec loop accu = loop (f (try t.next() with No_more_elements as e -> ret := accu; raise e) accu) in try loop init with No_more_elements -> !ret If next returns 'a option instead of 'a, we could rewrite it like: let fold f init t = let rec loop accu = match t.next() with None -> accu | Some x -> loop (f x accu) in loop init For sheer code clarity, I think 'a option is a cleaner response than throwing an exception. Brian |
From: Manos R. <er...@cs...> - 2003-05-26 23:07:08
|
On Mon, May 26, 2003 at 05:51:36PM -0500, Brian Hurt wrote: > > The more I think about, the more I think the solution is to merge Stream > and Enum, and create a best-of-breed interface (probably called > ExtStream). IMHO, this api would: > - have an exported next, like Stream > - have an exported count, like Enum On streams generated from list's or files, count is not a sensible operation, or am I mistaken? -- Manos |
From: Brian H. <bri...@ql...> - 2003-05-26 23:17:00
|
On Mon, 26 May 2003, Manos Renieris wrote: > On Mon, May 26, 2003 at 05:51:36PM -0500, Brian Hurt wrote: > > > > The more I think about, the more I think the solution is to merge Stream > > and Enum, and create a best-of-breed interface (probably called > > ExtStream). IMHO, this api would: > > - have an exported next, like Stream > > - have an exported count, like Enum > > On streams generated from list's or files, count is not a sensible > operation, or am I mistaken? > From lists, it's an O(N) operation but easy enough (List.length). For any datastructure where traversing the elements is not destructive, count is also easily doable. But for files, "traversing the elements" is destructive- you read the file. And if the file really is the console and the user typing stuff in, you can't just rewind and reread the file either. At which point you need to create the interim list of objects. But you're right, count needs to be an optional thing. Brian |
From: Brian H. <bri...@ql...> - 2003-05-26 23:34:28
|
Replying to myself... On Mon, 26 May 2003, Brian Hurt wrote: > But you're right, count needs to be an optional thing. > val count: ExtStream.t -> int option where a return value of Some x means the stream has x elements, and a return value of None means that count is not implemented. This has the advantage of roping the type checker in to make sure everyone checks for the non-implemented version. Now, two possibilities for doing a destructive count (i.e. one that clones the ExtStream and then reads ahead to see how many items are left). The first one is: val destructive_count : ExtStream.t -> int let destructive_count enum = match enum.count () with Some x -> x | None -> let e = ExtStream.clone enum in let rec loop count = match e.next() with None -> count | Some _ -> loop (count + 1) in loop 0 i.e. we clone and loop everytime destructive_count is called. The other one is: val destructive_count: ExtStream.t -> ExtStream.t let destructive_count enum = let e = ExtStream.clone enum in let rec loop count = match e.next () in None -> count | Some _ -> loop (count + 1) in let countref = ref (loop 0) in let next () = match enum.next () with None -> None | Some x -> decr countref; Some x and count () = !countref in ExtStream.make next count In this case, we run through the enumeration only once- and create a new enumeration which decrements a reference every call to next. There are advantages and disadvantages to each. Brian |
From: Manos R. <er...@cs...> - 2003-05-26 23:53:44
|
On Mon, May 26, 2003 at 06:49:15PM -0500, Brian Hurt wrote: > > Replying to myself... > > On Mon, 26 May 2003, Brian Hurt wrote: > > > But you're right, count needs to be an optional thing. > > > > val count: ExtStream.t -> int option > > where a return value of Some x means the stream has x elements, and a > return value of None means that count is not implemented. This has the > advantage of roping the type checker in to make sure everyone checks for > the non-implemented version. You have two types (Enum and Stream) and a clear way to distinguish them (whether count always makes sense). If you merge them, you push the distinguishing feature to runtime. Also, it's not clear what the semantics of count on Stream are. Consider Stream.of_list; does it do one traversal on creation to take the count? Does it do it on the first call to count? What if it's on a List with setcdr? Should count be treated as an expensive or an inexpensive op? I think it makes sense to create a stream from an enum, but the opposite is not true. I think we are lucky count is there to remind us. If you agree with this, then the two types should not be unified. -- Manos |
From: Nicolas C. <war...@fr...> - 2003-05-27 07:05:20
|
> You have two types (Enum and Stream) and a clear way to distinguish > them (whether count always makes sense). If you merge them, you push > the distinguishing feature to runtime. > > Also, it's not clear what the semantics of count on Stream are. Consider > Stream.of_list; does it do one traversal on creation to take the count? > Does it do it on the first call to count? What if it's on a List with > setcdr? Should count be treated as an expensive or an inexpensive op? > > I think it makes sense to create a stream from an enum, but the opposite > is not true. I think we are lucky count is there to remind us. If you > agree with this, then the two types should not be unified. I totally agree with you here. The only thing that should be mentionned, is that you can always use Enum instead of Stream, because it's better : - if the user doesn't need count, then it's better because he still can use all functionnal ops in Enum , as well as of_enum implementations in several data structures - if the user does need count, then Enum is providing the best way of doing it (that is : actually really "counting" elements, one by one) in an efficient way (see Enum.force) since only the first call to count will be costly. Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-05-27 06:58:02
|
> > Even if I like Streams I think that enums are somehow much powerful and less > > error prone (since you can't do next() ). > > The more I think about this, the more I disagree. Case in point: consider > taking a list of items and turning it into a balanced tree. Using the > standard list operations, this is not too horribly ugly: > > > type 'a tree_t = Void > | Node of 'a tree_t * 'a * 'a tree_t Errr... I didn't think about recursive structures. It's true that having Enum.next() would be here very better. So ok let's add : val next : 'a Enum.t -> 'a option (updated CVS) > Now consider rewritting this to take an Enum instead of a list. You have > to force the algorithm into fold somehow- which means managing a > handrolled stack (the above code is double non-tail recursive). Since you're building a tree you don't need to be tail rec ( the stack will grow in O(log n) ! ) > For sheer code clarity, I think 'a option is a cleaner response than > throwing an exception. I don't agree here : - the new "exported" next is returning 'a option and that's a good thing ( no need to catch exception, code is more clear ) - BUT the internal next is most of the time "concatening" the next functions in a purely functional way, so for example after two Enum.maps and doing an Enum.iter, you'll have to check the None/Some three times ! Raising an exception which propagate to the upper level is really a better choice. - this choice is only making the code a little harder for the Enum modules, other modules will never have to catch No_more_elements. I think this is a good compromise. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-27 18:54:54
|
On Tue, 27 May 2003, Nicolas Cannasse wrote: > > Now consider rewritting this to take an Enum instead of a list. You have > > to force the algorithm into fold somehow- which means managing a > > handrolled stack (the above code is double non-tail recursive). > > Since you're building a tree you don't need to be tail rec ( the stack will > grow in O(log n) ! ) Yeah. Building a stack isn't the problem- in fact, I have to do that for tree_to_enum. It's the complexity of the stack. > > > For sheer code clarity, I think 'a option is a cleaner response than > > throwing an exception. > > I don't agree here : > - BUT the internal next is most of the time "concatening" the next functions > in a purely functional way, so for example after two Enum.maps and doing an > Enum.iter, you'll have to check the None/Some three times ! For Enum.iter this is correct. However, for doing just about anything else you need to try/catch the exception the same number of times you need to check for None/Some. And I would expect map and fold to be at least as important as iter. In both cases the expected case is for it not to be the end of the enumeration- for the exception not to be thrown, or for Some x to be returned. I don't know how expensive setting up a try/catch block is, but since the branch between None/Some is a highly predictable branch, the cost of the branch is 1 clock cycle *or less*. Add another clock cycle for the memory load (from L1 cache- the word was just allocated and filled). Below the level that would be measurable in any real-world case, would be my guess. In either case, I think this is a premature optimization. Plus there is the problem of having the exported next behaving differently from the imported next- a likely source of confusion. Brian |
From: Nicolas C. <war...@fr...> - 2003-05-28 09:06:56
|
> > I don't agree here : > > - BUT the internal next is most of the time "concatening" the next functions > > in a purely functional way, so for example after two Enum.maps and doing an > > Enum.iter, you'll have to check the None/Some three times ! > > For Enum.iter this is correct. However, for doing just about anything > else you need to try/catch the exception the same number of times you need > to check for None/Some. And I would expect map and fold to be at least as > important as iter. I don't think so... here's current Enum.map implementation : let map f t = { count = t.count; next = (fun () -> f (t.next())); } Every time we're catching only the exception one at the very end, since the map doesn't check for it, and the fold is only used to "finalize" an d Enum. Common usage of enums is the following : let e = DataStructure.enum d in let e = Enum.map f e in ... several maps / filter .... let d = DataStructure.of_enum d (* this one will call Enum.fold or Enum.iter *) In this sequence, there is only 1 catch for the exception if you're using Enum.iter or 2 if you're using Enum.fold , and it does not depends of the number of maps / filters you have done ! Here will be a map with next returning 'a option : let map f t = { count = t.count; next = (fun () -> match t.next() with None -> None | Some e -> Some (f e)); } and thus matching and allocating blocks all around... Exceptions are really cheap in ocaml, better than pattern matching since the caml stay unwinding is very efficient. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-28 16:12:28
|
On Wed, 28 May 2003, Nicolas Cannasse wrote: > > > I don't agree here : > > > - BUT the internal next is most of the time "concatening" the next > functions > > > in a purely functional way, so for example after two Enum.maps and doing > an > > > Enum.iter, you'll have to check the None/Some three times ! > > > > For Enum.iter this is correct. However, for doing just about anything > > else you need to try/catch the exception the same number of times you need > > to check for None/Some. And I would expect map and fold to be at least as > > important as iter. > > I don't think so... here's current Enum.map implementation : Oh. You're right- I forgot Enum.map was O(1). I was mainly thinking of fold, and tossed map in as well without thinking about it. > In this sequence, there is only 1 catch for the exception if you're using > Enum.iter or 2 if you're using Enum.fold , and it does not depends of the > number of maps / filters you have done ! Here will be a map with next > returning 'a option : Enum.fold has to set up an exception stack every time it calls next. It only catches an exception once- but it has to set up the stack frame every single time. And that costs. > Exceptions are really cheap in ocaml, better than pattern matching since the > caml stay unwinding is very efficient. Attached are a pair of simple benchmarks I whipped up. Basically, they compare the relative costs of raising an exception vr.s returing None/Some. I checked the output, and compiling with "ocamlopt -o test test1.ml" (or 2, as the case may be) doesn't "overoptimize" the test. The file called test1.ml uses exceptions, and runs in 2.18 seconds on my system. The file called test2.ml uses Some/None, and runs in 1.3 seconds. Exceptions are fast. Just not as fast as allocating a word of memory and taking a branch. Playing around with the code a little bit leads me to beleive that 0.7 seconds are program overhead, so as an approximation, catching an exception is 2.5x as expensive as Some/None. Rough approximation here. But it's actually a lot closer than I would have guessed. In either case, interface harmonization is a more important issue IMHO. The exported definition of next should not be any more complicated than: let next t = t.next () Brian |
From: Nicolas C. <war...@fr...> - 2003-05-29 02:11:00
|
> Enum.fold has to set up an exception stack every time it calls next. It > only catches an exception once- but it has to set up the stack frame every > single time. And that costs. You're benchmark is just comparing one try...catch with one None/Some matching. When you are doing some O(1) functional operations you have to do 1 try...catch and 'N' None/Some matching. But according to your results I modified the implementation of fold in order not to setup a stack frame every time : let fold f init t = let acc = ref init in let rec loop() = acc := f (t.next()) !acc; loop() in try loop() with No_more_elements -> !acc It should now be very efficient. > In either case, interface harmonization is a more important issue IMHO. > The exported definition of next should not be any more complicated than: > let next t = t.next () I already told several times about this : if we let escape the No_more_elements exception outside of the API, then maybe one iter upper in the call stack will catch it and break iteration , so if the user fail to catch it properly when calling next() it can cause very error prone behaviors since he won't be noticed about the exception being raised. exemple : let e1 = List.enum [1;2;3] let e2 = List.enum [] let combine x = (x , Enum.next e2) let l = Enum.of_list (Enum.map combine e1) => [] So we have to keep the current next implementation. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-29 18:44:30
|
Sorry, I just had an epiphany on this issue. With enums we always have a good count, right? I mean, we're not implementing my "-1 means we don't know how many elements are in the enumeration". The enum code should use this and never cause an exception to be thrown. Therefor it doesn't need to catch the exceptions either. let iter f e = let c = e.count() in for i = 1 to c do f (e.next()) done let fold f x e = let rec loop c accu = if c <= 0 then accu else loop (c-1) (f (e.next ()) accu) in loop (e.count ()) x Calling next when count == 0, when there are no more elements, is an *error*. Of course it throws an exception. But the Enum code shouldn't do that. And if next doesn't have any more elements and count > 0, then something exceptional did happen. The above code strikes me as being clean and nice. And it solves your problem of an exception from next being "accidentally" caught at a higher level, and enum doesn't catch exceptions. And Enum.next has the same interface as the next passed into Enum.make has. The above code strikes me as being clean and easy to understand. Two comments on this: 1) on some data structures, calling count will be expensive- for example, on lists it's O(N). 2) Your example changes behavior: > > exemple : > > let e1 = List.enum [1;2;3] > let e2 = List.enum [] > > let combine x = > (x , Enum.next e2) > > let l = Enum.of_list (Enum.map combine e1) > > => [] Here, instead of returning the empty list, the above code throws an uncaught exception. Which is actually, I think, the correct thing to do. What you wanted to express above is, I think: let combine a b = a, b in let l = List.of_enum (Enum.map2 combine e1 e2) But this also allows you to do: let combine x = try (x, Enum.next e2) with No_more_elements -> x, 0 in let l = List.of_enum (Enum.map combine e1) As a general note, data structures we add in the future should have fast (O(1)) length routines, even at the cost of a little more bookkeeping. Not a lot we can do with lists at this point, but if everything else should be fast. And at worst case we spin through the data structure twice. This actually encourages piling multiple maps onto a single list, and not actually instantiating the intermediate lists. Sort of the cheap version of lazy evaluation. Brian |
From: Alan P. <ap...@re...> - 2003-05-31 21:43:06
|
In article <Pin...@ea...>, Brian Hurt wrote: > > Sorry, I just had an epiphany on this issue. With enums we always have a > good count, right? I mean, we're not implementing my "-1 means we don't > know how many elements are in the enumeration". > > The enum code should use this and never cause an exception to be thrown. > Therefor it doesn't need to catch the exceptions either. > > let iter f e = > let c = e.count() in > for i = 1 to c do > f (e.next()) > done > > let fold f x e = > let rec loop c accu = > if c <= 0 then accu > else loop (c-1) (f (e.next ()) accu) > in > loop (e.count ()) x This undoes the lazy nature of Enum.filter, Enum.filter_map, and even more seriously, input_enum and input_char_enum. Basically, anywhere the code uses Enum.from will be a problem when relying on Enum.count. |
From: Brian H. <bri...@ql...> - 2003-06-01 19:49:12
|
On Thu, 29 May 2003, Alan Post wrote: > > let iter f e = > > let c = e.count() in > > for i = 1 to c do > > f (e.next()) > > done > > > This undoes the lazy nature of Enum.filter, Enum.filter_map, and even > more seriously, input_enum and input_char_enum. Basically, anywhere > the code uses Enum.from will be a problem when relying on Enum.count. > This doesn't change the lazy nature of map- map remains the same. Calling count on the result of Enum.from does force the entire enumeration into memory- granted. Which is why I thought of the has_next() idea. Which means we only need to be one element ahead ever. Personally, I'd also like a way for count to be able to say "I don't know how many elements are in the enumeration". But I got shot down on that last time around. Brian |