You can subscribe to this list here.
2003 |
Jan
|
Feb
(81) |
Mar
(97) |
Apr
(88) |
May
(80) |
Jun
(170) |
Jul
(9) |
Aug
|
Sep
(18) |
Oct
(58) |
Nov
(19) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(9) |
Mar
(28) |
Apr
(164) |
May
(186) |
Jun
(101) |
Jul
(143) |
Aug
(387) |
Sep
(69) |
Oct
(14) |
Nov
(8) |
Dec
(99) |
2005 |
Jan
(10) |
Feb
(34) |
Mar
(24) |
Apr
(7) |
May
(41) |
Jun
(20) |
Jul
(3) |
Aug
(23) |
Sep
(2) |
Oct
(26) |
Nov
(41) |
Dec
(7) |
2006 |
Jan
(6) |
Feb
(3) |
Mar
(11) |
Apr
|
May
|
Jun
(5) |
Jul
(8) |
Aug
(20) |
Sep
|
Oct
(6) |
Nov
(5) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
(3) |
May
(2) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(6) |
Nov
(19) |
Dec
(11) |
2008 |
Jan
|
Feb
(7) |
Mar
(9) |
Apr
(21) |
May
(42) |
Jun
(27) |
Jul
(28) |
Aug
(26) |
Sep
(16) |
Oct
(32) |
Nov
(49) |
Dec
(65) |
2009 |
Jan
(35) |
Feb
(20) |
Mar
(36) |
Apr
(42) |
May
(111) |
Jun
(99) |
Jul
(70) |
Aug
(25) |
Sep
(15) |
Oct
(29) |
Nov
(3) |
Dec
(18) |
2010 |
Jan
(10) |
Feb
(4) |
Mar
(57) |
Apr
(63) |
May
(71) |
Jun
(64) |
Jul
(30) |
Aug
(49) |
Sep
(11) |
Oct
(4) |
Nov
(2) |
Dec
(3) |
2011 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2024 |
Jan
(1) |
Feb
(3) |
Mar
(6) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2025 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(5) |
Oct
|
Nov
|
Dec
|
From: Nicolas C. <war...@fr...> - 2003-06-04 02:38:39
|
Hi list, After the recent talks about enum features, I have made a major update. First, I started to think about the "clone" implementation : it was actually very complicated and pretty inefficient : in fact, cloning an enumeration on a list or on an array should be O(1) since we only need to get a different pointer on the list of copy the current index for an array. Clone was trying his best by using a shared list as cache between the clone and the original, but this was costly in the end, and actually doing several clones was adding more and more overload. So I decided to add clone:(unit -> 'a t) as argument for Enum.make function. Like this, we can have an efficient clone when available. Enum.from is then having a default clone which is calling "force" and the clone() again. Since force is building the list of elements, he can provide an O(1) clone function. Now that the clone issues has been resolved, I have been adding peek (the real one, doesn't fetch any data) and "empty : 'a t -> bool". In order to satisfy people who are interested in count() cost, I have been adding a boolean flag "fast_count" that give an hint about count cost. So we have for exemple : let empty t = if fast_count t then t.count() = 0 else peek t = None Please note that fast_count doesn't mean O(1) , since it can depends on the underlying implementation of the count function given to make as parameter. After that, I've been updated the *.enum functions of DynArray , ExtList and ExtString to provide a good clone implementation. I also improved ExtList.enum count() function by using a little hack so List.length will be called only once. Now my next though is about having "clone" being an optional parameter in Enum.make. Actually most of the time when you create a data structure with Enum.make, providing a next() and count() function, implementing clone() can be done in an efficient way, but is requiring a little thinking and some recursion, which make Enum.make a bit more (too much?) difficult to use for the average user. Then clone should perhaps be implemented as "inneficient by default" an the user would care providing an efficient primitive only if he's actually using it. Nicolas Cannasse |
From: Alan P. <ap...@re...> - 2003-06-03 02:10:18
|
In article <Pin...@ea...>, Brian Hurt wrote: > > That depends upon the program. I'd be most worried about 10+ maps on > 10_000_000 elements myself. With non-trivial map functions. As that's > where Enum really starts to shine- as you do wavefront processing. And > where tail recursion stops being an optimization and starts being required > for correctness. This is the problem with microbenchmarks. > > I'll see if I can't whip something up. Yes, real-world applications are better than benchmarks. I suggest, however, that benchmarks are better than speculation. |
From: Nicolas C. <war...@fr...> - 2003-06-03 01:21:43
|
> > Thanks, I get the same results on the windows MSVC version of OCaml. The > > None/Some approach is really slow. > > (BTW, I resolved Brian problem with having Enum exported next different from > > imported one by renaming exported "next" to "peek") > > Nope. Just moved it around. "peek" to me doesn't have side effects. > I'll think on names for a bit. True. I was thinking that the peek function of Stream was actualy removing it but after re-reading the documentation, it doesn't. What about "get" then ? Nicolas |
From: Brian H. <bri...@ql...> - 2003-06-02 22:24:11
|
On Mon, 2 Jun 2003, Nicolas Cannasse wrote: > > I think it would be a good idea to add unit tests to all the library > > functions using OUnit: > > http://home.wanadoo.nl/maas/ocaml/ > > > > I think unit tests are *less* necessary in Ocaml due to our wonderful type > > system, but they're still a good idea. > > I agree, but I'm not familiar enough with others langages Unit tests library > to see if this implementation is a convenient way of doing unit tests. Maybe > we could write a library mixing unit and benchmarks tests ? ( something > similar to Doug Bagley's one ). Theses two should stick well together. > My preference would be to be able to compile the library *without* the unit testing code in it. This will require some thought, and probably some makefile hacking. Brian |
From: Brian H. <bri...@ql...> - 2003-06-02 22:23:41
|
On Mon, 2 Jun 2003, Nicolas Cannasse wrote: > Thanks, I get the same results on the windows MSVC version of OCaml. The > None/Some approach is really slow. > (BTW, I resolved Brian problem with having Enum exported next different from > imported one by renaming exported "next" to "peek") Nope. Just moved it around. "peek" to me doesn't have side effects. I'll think on names for a bit. > > Please notice that you're here testing for the best case, where count is > O(1) , but sometimes it can be O(N) or even worse since sometimes we need to > create an intermediate list and do a lot of I/O (reading a file for > example). We can always implement an O(1) has_next given an O(1) next. This is only an advantage if we avoid creating an intermediate data structure, however. My personal opinion is that count should be able to return "I don't know"/"to find that out requires the creation of an intermediate datastructure". Assuming count returns int option, this would allow for code like: let rec use_enum e = match Enum.count e with Some c -> (* we have a good count *) ... | None -> (* explicitly create the intermediate datastructure *) use_enum (List.enum (List.of_enum e)) Alternatively, the user code might have more efficient ways of dealing with unknown length enumerations. For example, Dynarray.append_enum might look something like: let append_enum arr e = match Enum.count e with Some c -> (* Normal code, that prereserves c elements in the array, so the data is only copied once *) | None -> (* Just repeatedly call add *) Enum.iter (Dynarray.add arr) e > Interesting. In the first time, I would say this is a good idea, since with > your benchs we can see that there is little improvement when iterating using > a O(1) counter against using an exception . But how can we tell that > e.count() is fast ? We can surely tells when it is slow : Enum.from could > put a boolean to true saying that the "worst count" will be called. But what > about Enum.make ? For example , ExtList.enum has an O(N) count, since we > don't want to count the elements we creating an enum ( this is the lazy > approach : do it only when we need it ). We only call the O(N) count when we're about to do an O(N) function anyways (we never need to call count when doing a map). I see three possibilities here (assuming we're calling it on a list): 1) The list is short enough to fit into cache and is already in cache. This seems to be what we're benchmarking most of the time- in which case calling List.length will be very cheap. It will probably be about the same cost as N has_next calls, maybe even less. 2) The list is short enough to fit into cache, and is not already (entirely) in cache. In which case, the main cost (unless we're doing a very expensive function on each element) will be the cost of loading the list into cache. Remember that on modern CPUs, a cache miss can be 300+ clock cycles. 3) The list is too long to fit into cache. Here, has_next really shines. As if we use count, we have to load the entire list into cache twice. hash_next, however, will have the effect of simply loading the next element into cache (which we were going to have to do anyways). Case #3 is when performance matters the most- but paradoxically, it's when the various approachs we are discussing change performance the least. The cost is dominated by the memory latency, or the user functions. > When is the O(N) count faster than the exception catching ? I would say > "depends on the size of the list"... but we don't know the size :-) So if we > can classify "bad" counts we can't do it for "good" counts, and this > optimization will not work... No matter which way we pick to do things, I can probably come up with a scenario where "it'd be faster to do it this other way". I'd rate the performance of all the approaches as "acceptable" and go on. > > > So I think the question is: is it worth a 10-20% performance hit in > > the library to unify the stream and enum types? Myself: yes. If I'm worrying about every single clock cycle, I'm not writting in Ocaml, I'm writting in assembler. If I'm writting in Ocaml I'm looking for power and expressivity (this is, in fact, the normal case) and giving up much more than 10-20% in performance. > > My question is : when do we need speed ? For small lists when few operations > or for big lists with lots of operations ? With 5 maps on a 1000 elements > list, I got 200% speed using exceptions instead of None/Some. That depends upon the program. I'd be most worried about 10+ maps on 10_000_000 elements myself. With non-trivial map functions. As that's where Enum really starts to shine- as you do wavefront processing. And where tail recursion stops being an optimization and starts being required for correctness. This is the problem with microbenchmarks. I'll see if I can't whip something up. Brian |
From: Brian H. <bri...@ql...> - 2003-06-02 20:57:34
|
On Mon, 2 Jun 2003, Nicolas Cannasse wrote: > But we can't rewrite fold/iter using has_next because, has you mentionned, > the default has_next will test count > 0 and the default count will force > the enumeration of the data structure, resulting the building of an > intermediate list, which is what we actually don't want to do. Actually, we could have the default do a lookahead, like: let default_has_next next count = let lookahead = ref None in let has_next' () = match lookahead with Some x -> true | None -> try lookahead := Some (next ()); true with No_more_elements -> false and next' () = match lookahead with Some x -> lookahead := None; x | None -> next() and count' () -> match lookahead with Some _ -> 1 + (count ()) | None -> count() in Enum.make ~has_next:has_next' ~next:next' ~count:count' And then override it when count is O(1). Note that a map always gets the count and has_next functions, so we only ever allocate a single Some block (note that next doesn't allocate a some block). > Of course we could add the has_next function as a parameter to make but : > - when we call make, we usually have a good count , so testing count > 0 is > not so costly I agree that this is the default case- a fast count. Which is why I was making it the default case. On the other hand, handling the case when count is expensive is more complex code, so maybe the library should be supplying that. > - as the next function is returning a 'a option (and I still think that's > the best way of doing it) the user will not need an has_next to loop. No. I was thinking of doing it for the library. Another question I have: how often are we going to get "deep maps" (i.e. more than 1 layer deep)? I expect the vast majority of cases are going to be either: 1) no maps at all (converting from one data structure to another), or 2) exactly one map (doing a single conversion). So that, on average, we have less than 1 map we're going through. Now, I want to encourage deep maps, as it's a form of wavefront processing and thus more efficient in general. But I don't think they will be common. In optimizing, you optimize for the common case. Brian |
From: Alan P. <ap...@re...> - 2003-06-02 09:15:00
|
In article <000d01c328df$e687e6b0$2713f9ca@WARP>, Nicolas Cannasse wrote: > >> Another possibility would be to be clever about having fold and iter >> use (e.count ()) when it is fast, and use exception catching when >> (e.count ()) is slow, without actually having separate types. > > Interesting. In the first time, I would say this is a good idea, since with > your benchs we can see that there is little improvement when iterating using > a O(1) counter against using an exception . But how can we tell that > e.count() is fast ? We can surely tells when it is slow : Enum.from could > put a boolean to true saying that the "worst count" will be called. But what > about Enum.make ? For example , ExtList.enum has an O(N) count, since we > don't want to count the elements we creating an enum ( this is the lazy > approach : do it only when we need it ). > When is the O(N) count faster than the exception catching ? I would say > "depends on the size of the list"... but we don't know the size :-) So if we > can classify "bad" counts we can't do it for "good" counts, and this > optimization will not work... It doesn't have to be perfect to be an improvement. Enums from arrays or dynarrays, for instance, have fast counts. Enum.map would inherit the fast_count flag, whereas most other enum transforms (e.g. Enum.filter) would set the fast_count flag to zero. This could also be done by keeping the stream and enum types separate, with the stream type having no count. Either way would give the performance benefit; it's really a matter of making a nice library API for people. Doing it with types, there would be: Enum.map: ('a -> 'b) -> 'a enum -> 'b enum Stream.map: ('a -> 'b) -> 'a stream -> 'b stream Enum.filter: ('a -> bool) -> 'a enum -> 'a stream *** Note *** Stream.filter: ('a -> bool) -> 'a stream -> 'a stream Enum.iter: ('a -> unit) -> 'a enum -> unit Stream.iter: ('a -> unit) -> 'a stream -> unit where Enum.iter and Enum.filter use count, and Stream.iter and Stream.filter use a try-catch. List.stream: 'a list -> 'a stream List.from_stream: 'a stream -> 'a list List.from_enum: 'a enum -> 'a list Array.enum: 'a array -> 'a enum Array.from_stream: 'a stream -> 'a array Array.from_enum: 'a enum -> 'a array There is no List.enum, since you can't count a list easily. List.from_stream does a try-catch, and List.from_enum uses count. Of course reading lines from a file handle would be a stream, rather than an enum. There could be a function: Enum.from_stream: 'a stream -> 'a enum with warnings on it that it may be rather expensive, to do the Enum.force magic. Array.from_stream could be implemented as: let from_stream s = from_enum (Enum.from_stream s);; I don't see how a Stream.from_enum function would be useful. >> So I think the question is: is it worth a 10-20% performance hit in >> the library to unify the stream and enum types? > > Which unification ? You mean, using None/Some next function ? I am not at all suggesting using options to do the end-of-stream notification. Certainly I think that using exceptions is a much better way to go. The "unification" I was referring to was the decision about whether to have distinct enum and stream types. The "performance hit" I was referring to was the cost of using exceptions to iterate over an enum when you could have used the count. |
From: Nicolas C. <war...@fr...> - 2003-06-02 08:22:26
|
> I wrote a little program to benchmark the three approaches I've seen > discussed on the list for implementing streams/enums: > > 1. (e.next ()) returns an option > 2. for-loop style, relying on the count (not suitable for > streams, including filtered enums) > 3. exception-catching style (the current implementation) > > It tests folds over maps of enums of constant ints. [...] Thanks, I get the same results on the windows MSVC version of OCaml. The None/Some approach is really slow. (BTW, I resolved Brian problem with having Enum exported next different from imported one by renaming exported "next" to "peek") Please notice that you're here testing for the best case, where count is O(1) , but sometimes it can be O(N) or even worse since sometimes we need to create an intermediate list and do a lot of I/O (reading a file for example). That leads to your next question... > Another possibility would be to be clever about having fold and iter > use (e.count ()) when it is fast, and use exception catching when > (e.count ()) is slow, without actually having separate types. Interesting. In the first time, I would say this is a good idea, since with your benchs we can see that there is little improvement when iterating using a O(1) counter against using an exception . But how can we tell that e.count() is fast ? We can surely tells when it is slow : Enum.from could put a boolean to true saying that the "worst count" will be called. But what about Enum.make ? For example , ExtList.enum has an O(N) count, since we don't want to count the elements we creating an enum ( this is the lazy approach : do it only when we need it ). When is the O(N) count faster than the exception catching ? I would say "depends on the size of the list"... but we don't know the size :-) So if we can classify "bad" counts we can't do it for "good" counts, and this optimization will not work... > So I think the question is: is it worth a 10-20% performance hit in > the library to unify the stream and enum types? Which unification ? You mean, using None/Some next function ? As excepted, the smaller the list is and the lesser there is map operations applied on, then the better is None/Some performances compared to exceptions one (note that you need 0 maps and list size less than 50 elements to have None/Some being faster that exceptions). My question is : when do we need speed ? For small lists when few operations or for big lists with lots of operations ? With 5 maps on a 1000 elements list, I got 200% speed using exceptions instead of None/Some. Nicolas Cannasse |
From: Alan P. <ap...@re...> - 2003-06-02 06:58:03
|
I wrote a little program to benchmark the three approaches I've seen discussed on the list for implementing streams/enums: 1. (e.next ()) returns an option 2. for-loop style, relying on the count (not suitable for streams, including filtered enums) 3. exception-catching style (the current implementation) It tests folds over maps of enums of constant ints. Runs of it (ocamlopt compiled, of course) demonstrate to me that: a) The option-based approach is slowest, especially when doing lots of nested maps. b) The for-loop approach is fastest. c) The exception-based approach is in the middle, performance-wise. In the pathological case (only one map, with only five elements in the enum), this approach takes 1.4 times as much CPU as the for-loop approach. With three nested maps, and 500 elements in the enum, it requires 1.13 times the CPU. So I think the question is: is it worth a 10-20% performance hit in the library to unify the stream and enum types? Another possibility would be to be clever about having fold and iter use (e.count ()) when it is fast, and use exception catching when (e.count ()) is slow, without actually having separate types. Below is the program listing. Please let me know if I screwed up somewhere. Alan ------------------------------------------------------------ (* * bench.ml * * Author: Alan Post <ap...@re...> * Usage: test performance of various implementations of Enum * * 1: option-based (also suitable for streams) * 2: count-based (not suitable for streams) * 3: exception-based (also suitable for streams) *) type 'a enum_1 = { mutable next: unit -> 'a option; mutable count: unit -> int };; type 'a enum_2 = { mutable next: unit -> 'a; mutable count: unit -> int };; exception No_more_elements;; let rec fold_1 f x e = match e.next () with None -> x | Some v -> fold_1 f (f v x) e and map_1 f e = { count = e.count; next = (fun () -> match e.next () with None -> None | Some v -> Some (f v))} and init_1 value num = let rnum = ref num in { count = (fun () -> !rnum); next = (fun () -> match !rnum with 0 -> None | _ -> Some (decr rnum; value))} and fold_2 f x e = let rec loop c accu = match c with 0 -> accu | _ -> loop (c-1) (f (e.next ()) accu) in loop (e.count ()) x and map_2 f e = { count = e.count; next = (fun () -> f (e.next ()))} and init_2 value num = let rnum = ref num in { count = (fun () -> !rnum); next = (fun () -> match !rnum with 0 -> raise No_more_elements | _ -> (decr rnum; value))} and fold_3 f x e = let result = ref x in let rec loop () = result := f (e.next ()) !result; loop () in try loop () with No_more_elements -> !result in let map_3 = map_2 and init_3 = init_2 in let nest f n = let rec loop f g n = match n with 0 -> g | _ -> loop f (fun e -> f (g e)) (n-1) in loop f f n and which = Sys.argv.(1) and items = (int_of_string Sys.argv.(2)) and nests = (int_of_string Sys.argv.(3)) and iters = (int_of_string Sys.argv.(4)) in let once = match which with "1" -> (fun () -> ignore (fold_1 (+) 0 ((nest (map_1 (( * ) 2)) nests) (init_1 1 items)))) | "2" -> (fun () -> ignore (fold_2 (+) 0 ((nest (map_2 (( * ) 2)) nests) (init_2 1 items)))) | "3" -> (fun () -> ignore (fold_3 (+) 0 ((nest (map_3 (( * ) 2)) nests) (init_3 1 items)))) | _ -> raise (Invalid_argument which) in let rec loop n = if n == 0 then () else (once (); loop (n-1)) in loop iters; print_string "done\n";; |
From: Nicolas C. <war...@fr...> - 2003-06-02 03:09:28
|
> I think it would be a good idea to add unit tests to all the library > functions using OUnit: > http://home.wanadoo.nl/maas/ocaml/ > > I think unit tests are *less* necessary in Ocaml due to our wonderful type > system, but they're still a good idea. I agree, but I'm not familiar enough with others langages Unit tests library to see if this implementation is a convenient way of doing unit tests. Maybe we could write a library mixing unit and benchmarks tests ? ( something similar to Doug Bagley's one ). Theses two should stick well together. Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-06-02 02:52:22
|
> Sorry, forgot to mention: the default has_next calls count() and compares > it against 0. Which means only in places where count() would be > signifigantly more expensive than has_next() do we need to (or can) supply > a seperate has_next(). Thanks Brian, I already had some thoughs about adding this has_next function , which can be useful. But we can't rewrite fold/iter using has_next because, has you mentionned, the default has_next will test count > 0 and the default count will force the enumeration of the data structure, resulting the building of an intermediate list, which is what we actually don't want to do. Seen like this, the has_next should be only used on the user side, and could be implemented by : let has_next t = t.count() > 0 with some documentation telling that it can be pretty ineficient (the first call only). Of course we could add the has_next function as a parameter to make but : - when we call make, we usually have a good count , so testing count > 0 is not so costly - when we don't call make (call from) we don't have a count... I can see few samples when we can peek data (without actually reading it) to see if there is some available, but I'm not sure they're worth it. - as the next function is returning a 'a option (and I still think that's the best way of doing it) the user will not need an has_next to loop. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-06-01 19:49:12
|
On Thu, 29 May 2003, Alan Post wrote: > > let iter f e = > > let c = e.count() in > > for i = 1 to c do > > f (e.next()) > > done > > > This undoes the lazy nature of Enum.filter, Enum.filter_map, and even > more seriously, input_enum and input_char_enum. Basically, anywhere > the code uses Enum.from will be a problem when relying on Enum.count. > This doesn't change the lazy nature of map- map remains the same. Calling count on the result of Enum.from does force the entire enumeration into memory- granted. Which is why I thought of the has_next() idea. Which means we only need to be one element ahead ever. Personally, I'd also like a way for count to be able to say "I don't know how many elements are in the enumeration". But I got shot down on that last time around. Brian |
From: Alan P. <ap...@re...> - 2003-05-31 21:43:06
|
In article <Pin...@ea...>, Brian Hurt wrote: > > Sorry, I just had an epiphany on this issue. With enums we always have a > good count, right? I mean, we're not implementing my "-1 means we don't > know how many elements are in the enumeration". > > The enum code should use this and never cause an exception to be thrown. > Therefor it doesn't need to catch the exceptions either. > > let iter f e = > let c = e.count() in > for i = 1 to c do > f (e.next()) > done > > let fold f x e = > let rec loop c accu = > if c <= 0 then accu > else loop (c-1) (f (e.next ()) accu) > in > loop (e.count ()) x This undoes the lazy nature of Enum.filter, Enum.filter_map, and even more seriously, input_enum and input_char_enum. Basically, anywhere the code uses Enum.from will be a problem when relying on Enum.count. |
From: Brian H. <bri...@ql...> - 2003-05-30 21:39:59
|
Sorry, forgot to mention: the default has_next calls count() and compares it against 0. Which means only in places where count() would be signifigantly more expensive than has_next() do we need to (or can) supply a seperate has_next(). Brian |
From: Brian H. <bri...@ql...> - 2003-05-30 21:33:53
|
Count is called, generally, for two reasons: 1) To pre-reserve space for all the elements in the enumeration. This is how dynArray uses it. 2) To see if there are any more elements left in the enumeration (count > 0). The second case is the interesting one- it can always be O(1). And it's the information we most often need. So why not provide it directly? Iter and map then start looking like: let iter f t = let rec loop () = if t.has_next() then f (t.next()); loop () else () in loop () let fold f init t = let rec loop accu = if t.has_next() then loop (f (t.next()) accu) else accu in loop init In extString, we'd have: let enum l = let lr = ref l in Enum.make ~next:(fun () -> match !lr with | [] -> raise Enum.No_more_elements | h :: t -> lr := t; h ) ~count:(fun () -> length !lr ) ~has_next:(fun () -> match !lr with | [] -> false | _ -> true ) This gets rid of the annoying O(N) count cost. Brian |
From: Brian H. <bri...@ql...> - 2003-05-30 21:16:01
|
I think it would be a good idea to add unit tests to all the library functions using OUnit: http://home.wanadoo.nl/maas/ocaml/ I think unit tests are *less* necessary in Ocaml due to our wonderful type system, but they're still a good idea. Brian |
From: Brian H. <bri...@ql...> - 2003-05-30 20:22:27
|
On Fri, 30 May 2003, Diego Olivier Fernandez Pons wrote: > Bonjour, > > > Because I have this dream of someday going through and changing the node > > definitions so that instead of: > > Of course I understand... but having a data structure that might be > slower but exists and works is better than a fast data structure which > does not exists yet or is bugged. > Agreed. But the bug occurred because I misremembered how red-black trees worked, and didn't work throught the logic enough (and obviously didn't test enough). Not because I picked too complex of a datastructure. Brian |
From: Diego O. F. P. <Die...@et...> - 2003-05-30 19:37:51
|
Bonjour, > Because I have this dream of someday going through and changing the node > definitions so that instead of: Of course I understand... but having a data structure that might be slower but exists and works is better than a fast data structure which does not exists yet or is bugged. Diego Olivier |
From: Brian H. <bri...@ql...> - 2003-05-30 19:13:42
|
On Fri, 30 May 2003, Diego Olivier Fernandez Pons wrote: > Why don't you try AVL balancing scheme ? As stated by Ralf Hinze, it > does not have the property of a finite number of rebalancing > operations in the delete process (logarithmic instead) but should be > much more easy to implement. Ralf Hinze uses in his paper the > weight-balanced scheme which is quite similar. Because I have this dream of someday going through and changing the node definitions so that instead of: type 'a node_t = Node of color_t * dominance_t * ... (i.e. use a word each for color and dominance) of instead doing: type 'a node_t = RedLeft of ... (* red color, left dominance *) | RedRight of ... (* red color, right dominance *) | BlackLeft of ... | BlackRight of ... Color and dominance then become 1 bit values effectively. The pattern matching, however, then becomes *ahem* interesting. I can't simply throw in a _ in the proper place to ignore color or dominance. I may at that point switch away from pattern matching to branching conditionals, I don't know. But this is the sort of thing which a library can do. Brian |
From: Diego O. F. P. <Die...@et...> - 2003-05-30 19:00:02
|
Bonjour, On Fri, 30 May 2003, Brian Hurt wrote: > Basically, what I need to do is break my single modify function into three > different functions- one for insert (number of nodes increases), one for > delete (number of node decreases), and while I'm at it one for update > (number of nodes and structure of the tree remains the same). Why don't you try AVL balancing scheme ? As stated by Ralf Hinze, it does not have the property of a finite number of rebalancing operations in the delete process (logarithmic instead) but should be much more easy to implement. Ralf Hinze uses in his paper the weight-balanced scheme which is quite similar. Diego Olivier |
From: Brian H. <bri...@ql...> - 2003-05-30 18:49:56
|
There is a balancing problem with PSQueue on delete- deletes do not preserve balancing constraints and produce illegal red-black trees. The problem was that until now I didn't know how to fix this problem. I think I've figured out what needs to happen. Unfortunately, the fix is going to be somewhat ugly. It'll happen this weekend, and I should get working code in next week. Basically, what I need to do is break my single modify function into three different functions- one for insert (number of nodes increases), one for delete (number of node decreases), and while I'm at it one for update (number of nodes and structure of the tree remains the same). Brian |
From: Nicolas C. <war...@fr...> - 2003-05-30 07:33:06
|
Corrected the implementations of : append , concat and clone, who were causing strange behaviors when mixed with calls to force(). The main problem is that only force should be authorized to modify the next and count functions after creation. I modified the append and concat so they don't , but clone need to modify the enum passed as parameter, and is needing to store the current next and count functions when called, and then giving buggy behaviors , especially when counting elements. The current implementation seems to resolve all problems, I made some tests when cloning twice and then printing/counting in different orders to check if the results are the same. Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-05-30 03:59:36
|
> let enum_dup c val = > (* returns an enumeration which is just val repeated c times *) > let cref = ref c in > let next () = if (!cref) > 0 then decr cref; val > else raise No_more_elements > and count () = !cref > in > Enum.make ~next:next ~count:count Added to the CVS : Enum.init : int -> (int -> 'a) -> 'a t Similar to List.init and Array.init Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-29 18:44:30
|
Sorry, I just had an epiphany on this issue. With enums we always have a good count, right? I mean, we're not implementing my "-1 means we don't know how many elements are in the enumeration". The enum code should use this and never cause an exception to be thrown. Therefor it doesn't need to catch the exceptions either. let iter f e = let c = e.count() in for i = 1 to c do f (e.next()) done let fold f x e = let rec loop c accu = if c <= 0 then accu else loop (c-1) (f (e.next ()) accu) in loop (e.count ()) x Calling next when count == 0, when there are no more elements, is an *error*. Of course it throws an exception. But the Enum code shouldn't do that. And if next doesn't have any more elements and count > 0, then something exceptional did happen. The above code strikes me as being clean and nice. And it solves your problem of an exception from next being "accidentally" caught at a higher level, and enum doesn't catch exceptions. And Enum.next has the same interface as the next passed into Enum.make has. The above code strikes me as being clean and easy to understand. Two comments on this: 1) on some data structures, calling count will be expensive- for example, on lists it's O(N). 2) Your example changes behavior: > > exemple : > > let e1 = List.enum [1;2;3] > let e2 = List.enum [] > > let combine x = > (x , Enum.next e2) > > let l = Enum.of_list (Enum.map combine e1) > > => [] Here, instead of returning the empty list, the above code throws an uncaught exception. Which is actually, I think, the correct thing to do. What you wanted to express above is, I think: let combine a b = a, b in let l = List.of_enum (Enum.map2 combine e1 e2) But this also allows you to do: let combine x = try (x, Enum.next e2) with No_more_elements -> x, 0 in let l = List.of_enum (Enum.map combine e1) As a general note, data structures we add in the future should have fast (O(1)) length routines, even at the cost of a little more bookkeeping. Not a lot we can do with lists at this point, but if everything else should be fast. And at worst case we spin through the data structure twice. This actually encourages piling multiple maps onto a single list, and not actually instantiating the intermediate lists. Sort of the cheap version of lazy evaluation. Brian |
From: Brian H. <bri...@ql...> - 2003-05-29 17:53:51
|
On Thu, 29 May 2003, William D. Neumann wrote: > On Thu, 29 May 2003, Brian Hurt wrote: > > Yeah, exposing phony nulls was one of the philisophical issues I was > wrestling with as I added my expansion hack. At the time I decided that > the ability to do this outweighed the potential to do something silly (and > I promised myself to always look out for null values (the null I'm using > in this case would never be a valid value, so I can ignore them when I run > across them). One of the reasons I didn't want to use options as the array elements- for the cases where they weren't needed, and there really was a safe null. > > And you're right, expanding the array the way I described it won't work > like I wanted, you have to do something along the lines of: > > let null = "";; > let j = DynArray.make 2 null;; > DynArray.add j "blah";; > DynArray.add j "flah";; > > (* Now I want to add something to slot 9 *) > > ignore (Dynarray.append j (DynArray.init 8 8 null (fun x -> null)));; > DynArray.set j 9 "flee";; Given: let enum_dup c val = (* returns an enumeration which is just val repeated c times *) let cref = ref c in let next () = if (!cref) > 0 then decr cref; val else raise No_more_elements and count () = !cref in Enum.make ~next:next ~count:count You don't actually need to create the intermediate array. You can just do: DynArray.set_enum j (DynArray.length j) (enum_dup 8 ""); DynArray.set j 9 "flee";; Mental note to self: need to add DynArray.append_enum. > Sigh...why does doing things the right way always interfere with doing > what I think I want... Because what you think you wanted to do is not what you really wanted to do. Perhaps instead of a Dynamic Array, you wanted a hash table mapping int->'a? Brian |