From: Alan P. <ap...@re...> - 2003-06-02 06:58:03
|
I wrote a little program to benchmark the three approaches I've seen discussed on the list for implementing streams/enums: 1. (e.next ()) returns an option 2. for-loop style, relying on the count (not suitable for streams, including filtered enums) 3. exception-catching style (the current implementation) It tests folds over maps of enums of constant ints. Runs of it (ocamlopt compiled, of course) demonstrate to me that: a) The option-based approach is slowest, especially when doing lots of nested maps. b) The for-loop approach is fastest. c) The exception-based approach is in the middle, performance-wise. In the pathological case (only one map, with only five elements in the enum), this approach takes 1.4 times as much CPU as the for-loop approach. With three nested maps, and 500 elements in the enum, it requires 1.13 times the CPU. So I think the question is: is it worth a 10-20% performance hit in the library to unify the stream and enum types? Another possibility would be to be clever about having fold and iter use (e.count ()) when it is fast, and use exception catching when (e.count ()) is slow, without actually having separate types. Below is the program listing. Please let me know if I screwed up somewhere. Alan ------------------------------------------------------------ (* * bench.ml * * Author: Alan Post <ap...@re...> * Usage: test performance of various implementations of Enum * * 1: option-based (also suitable for streams) * 2: count-based (not suitable for streams) * 3: exception-based (also suitable for streams) *) type 'a enum_1 = { mutable next: unit -> 'a option; mutable count: unit -> int };; type 'a enum_2 = { mutable next: unit -> 'a; mutable count: unit -> int };; exception No_more_elements;; let rec fold_1 f x e = match e.next () with None -> x | Some v -> fold_1 f (f v x) e and map_1 f e = { count = e.count; next = (fun () -> match e.next () with None -> None | Some v -> Some (f v))} and init_1 value num = let rnum = ref num in { count = (fun () -> !rnum); next = (fun () -> match !rnum with 0 -> None | _ -> Some (decr rnum; value))} and fold_2 f x e = let rec loop c accu = match c with 0 -> accu | _ -> loop (c-1) (f (e.next ()) accu) in loop (e.count ()) x and map_2 f e = { count = e.count; next = (fun () -> f (e.next ()))} and init_2 value num = let rnum = ref num in { count = (fun () -> !rnum); next = (fun () -> match !rnum with 0 -> raise No_more_elements | _ -> (decr rnum; value))} and fold_3 f x e = let result = ref x in let rec loop () = result := f (e.next ()) !result; loop () in try loop () with No_more_elements -> !result in let map_3 = map_2 and init_3 = init_2 in let nest f n = let rec loop f g n = match n with 0 -> g | _ -> loop f (fun e -> f (g e)) (n-1) in loop f f n and which = Sys.argv.(1) and items = (int_of_string Sys.argv.(2)) and nests = (int_of_string Sys.argv.(3)) and iters = (int_of_string Sys.argv.(4)) in let once = match which with "1" -> (fun () -> ignore (fold_1 (+) 0 ((nest (map_1 (( * ) 2)) nests) (init_1 1 items)))) | "2" -> (fun () -> ignore (fold_2 (+) 0 ((nest (map_2 (( * ) 2)) nests) (init_2 1 items)))) | "3" -> (fun () -> ignore (fold_3 (+) 0 ((nest (map_3 (( * ) 2)) nests) (init_3 1 items)))) | _ -> raise (Invalid_argument which) in let rec loop n = if n == 0 then () else (once (); loop (n-1)) in loop iters; print_string "done\n";; |
From: Nicolas C. <war...@fr...> - 2003-06-02 08:22:26
|
> I wrote a little program to benchmark the three approaches I've seen > discussed on the list for implementing streams/enums: > > 1. (e.next ()) returns an option > 2. for-loop style, relying on the count (not suitable for > streams, including filtered enums) > 3. exception-catching style (the current implementation) > > It tests folds over maps of enums of constant ints. [...] Thanks, I get the same results on the windows MSVC version of OCaml. The None/Some approach is really slow. (BTW, I resolved Brian problem with having Enum exported next different from imported one by renaming exported "next" to "peek") Please notice that you're here testing for the best case, where count is O(1) , but sometimes it can be O(N) or even worse since sometimes we need to create an intermediate list and do a lot of I/O (reading a file for example). That leads to your next question... > Another possibility would be to be clever about having fold and iter > use (e.count ()) when it is fast, and use exception catching when > (e.count ()) is slow, without actually having separate types. Interesting. In the first time, I would say this is a good idea, since with your benchs we can see that there is little improvement when iterating using a O(1) counter against using an exception . But how can we tell that e.count() is fast ? We can surely tells when it is slow : Enum.from could put a boolean to true saying that the "worst count" will be called. But what about Enum.make ? For example , ExtList.enum has an O(N) count, since we don't want to count the elements we creating an enum ( this is the lazy approach : do it only when we need it ). When is the O(N) count faster than the exception catching ? I would say "depends on the size of the list"... but we don't know the size :-) So if we can classify "bad" counts we can't do it for "good" counts, and this optimization will not work... > So I think the question is: is it worth a 10-20% performance hit in > the library to unify the stream and enum types? Which unification ? You mean, using None/Some next function ? As excepted, the smaller the list is and the lesser there is map operations applied on, then the better is None/Some performances compared to exceptions one (note that you need 0 maps and list size less than 50 elements to have None/Some being faster that exceptions). My question is : when do we need speed ? For small lists when few operations or for big lists with lots of operations ? With 5 maps on a 1000 elements list, I got 200% speed using exceptions instead of None/Some. Nicolas Cannasse |
From: Alan P. <ap...@re...> - 2003-06-02 09:15:00
|
In article <000d01c328df$e687e6b0$2713f9ca@WARP>, Nicolas Cannasse wrote: > >> Another possibility would be to be clever about having fold and iter >> use (e.count ()) when it is fast, and use exception catching when >> (e.count ()) is slow, without actually having separate types. > > Interesting. In the first time, I would say this is a good idea, since with > your benchs we can see that there is little improvement when iterating using > a O(1) counter against using an exception . But how can we tell that > e.count() is fast ? We can surely tells when it is slow : Enum.from could > put a boolean to true saying that the "worst count" will be called. But what > about Enum.make ? For example , ExtList.enum has an O(N) count, since we > don't want to count the elements we creating an enum ( this is the lazy > approach : do it only when we need it ). > When is the O(N) count faster than the exception catching ? I would say > "depends on the size of the list"... but we don't know the size :-) So if we > can classify "bad" counts we can't do it for "good" counts, and this > optimization will not work... It doesn't have to be perfect to be an improvement. Enums from arrays or dynarrays, for instance, have fast counts. Enum.map would inherit the fast_count flag, whereas most other enum transforms (e.g. Enum.filter) would set the fast_count flag to zero. This could also be done by keeping the stream and enum types separate, with the stream type having no count. Either way would give the performance benefit; it's really a matter of making a nice library API for people. Doing it with types, there would be: Enum.map: ('a -> 'b) -> 'a enum -> 'b enum Stream.map: ('a -> 'b) -> 'a stream -> 'b stream Enum.filter: ('a -> bool) -> 'a enum -> 'a stream *** Note *** Stream.filter: ('a -> bool) -> 'a stream -> 'a stream Enum.iter: ('a -> unit) -> 'a enum -> unit Stream.iter: ('a -> unit) -> 'a stream -> unit where Enum.iter and Enum.filter use count, and Stream.iter and Stream.filter use a try-catch. List.stream: 'a list -> 'a stream List.from_stream: 'a stream -> 'a list List.from_enum: 'a enum -> 'a list Array.enum: 'a array -> 'a enum Array.from_stream: 'a stream -> 'a array Array.from_enum: 'a enum -> 'a array There is no List.enum, since you can't count a list easily. List.from_stream does a try-catch, and List.from_enum uses count. Of course reading lines from a file handle would be a stream, rather than an enum. There could be a function: Enum.from_stream: 'a stream -> 'a enum with warnings on it that it may be rather expensive, to do the Enum.force magic. Array.from_stream could be implemented as: let from_stream s = from_enum (Enum.from_stream s);; I don't see how a Stream.from_enum function would be useful. >> So I think the question is: is it worth a 10-20% performance hit in >> the library to unify the stream and enum types? > > Which unification ? You mean, using None/Some next function ? I am not at all suggesting using options to do the end-of-stream notification. Certainly I think that using exceptions is a much better way to go. The "unification" I was referring to was the decision about whether to have distinct enum and stream types. The "performance hit" I was referring to was the cost of using exceptions to iterate over an enum when you could have used the count. |
From: Brian H. <bri...@ql...> - 2003-06-02 22:23:41
|
On Mon, 2 Jun 2003, Nicolas Cannasse wrote: > Thanks, I get the same results on the windows MSVC version of OCaml. The > None/Some approach is really slow. > (BTW, I resolved Brian problem with having Enum exported next different from > imported one by renaming exported "next" to "peek") Nope. Just moved it around. "peek" to me doesn't have side effects. I'll think on names for a bit. > > Please notice that you're here testing for the best case, where count is > O(1) , but sometimes it can be O(N) or even worse since sometimes we need to > create an intermediate list and do a lot of I/O (reading a file for > example). We can always implement an O(1) has_next given an O(1) next. This is only an advantage if we avoid creating an intermediate data structure, however. My personal opinion is that count should be able to return "I don't know"/"to find that out requires the creation of an intermediate datastructure". Assuming count returns int option, this would allow for code like: let rec use_enum e = match Enum.count e with Some c -> (* we have a good count *) ... | None -> (* explicitly create the intermediate datastructure *) use_enum (List.enum (List.of_enum e)) Alternatively, the user code might have more efficient ways of dealing with unknown length enumerations. For example, Dynarray.append_enum might look something like: let append_enum arr e = match Enum.count e with Some c -> (* Normal code, that prereserves c elements in the array, so the data is only copied once *) | None -> (* Just repeatedly call add *) Enum.iter (Dynarray.add arr) e > Interesting. In the first time, I would say this is a good idea, since with > your benchs we can see that there is little improvement when iterating using > a O(1) counter against using an exception . But how can we tell that > e.count() is fast ? We can surely tells when it is slow : Enum.from could > put a boolean to true saying that the "worst count" will be called. But what > about Enum.make ? For example , ExtList.enum has an O(N) count, since we > don't want to count the elements we creating an enum ( this is the lazy > approach : do it only when we need it ). We only call the O(N) count when we're about to do an O(N) function anyways (we never need to call count when doing a map). I see three possibilities here (assuming we're calling it on a list): 1) The list is short enough to fit into cache and is already in cache. This seems to be what we're benchmarking most of the time- in which case calling List.length will be very cheap. It will probably be about the same cost as N has_next calls, maybe even less. 2) The list is short enough to fit into cache, and is not already (entirely) in cache. In which case, the main cost (unless we're doing a very expensive function on each element) will be the cost of loading the list into cache. Remember that on modern CPUs, a cache miss can be 300+ clock cycles. 3) The list is too long to fit into cache. Here, has_next really shines. As if we use count, we have to load the entire list into cache twice. hash_next, however, will have the effect of simply loading the next element into cache (which we were going to have to do anyways). Case #3 is when performance matters the most- but paradoxically, it's when the various approachs we are discussing change performance the least. The cost is dominated by the memory latency, or the user functions. > When is the O(N) count faster than the exception catching ? I would say > "depends on the size of the list"... but we don't know the size :-) So if we > can classify "bad" counts we can't do it for "good" counts, and this > optimization will not work... No matter which way we pick to do things, I can probably come up with a scenario where "it'd be faster to do it this other way". I'd rate the performance of all the approaches as "acceptable" and go on. > > > So I think the question is: is it worth a 10-20% performance hit in > > the library to unify the stream and enum types? Myself: yes. If I'm worrying about every single clock cycle, I'm not writting in Ocaml, I'm writting in assembler. If I'm writting in Ocaml I'm looking for power and expressivity (this is, in fact, the normal case) and giving up much more than 10-20% in performance. > > My question is : when do we need speed ? For small lists when few operations > or for big lists with lots of operations ? With 5 maps on a 1000 elements > list, I got 200% speed using exceptions instead of None/Some. That depends upon the program. I'd be most worried about 10+ maps on 10_000_000 elements myself. With non-trivial map functions. As that's where Enum really starts to shine- as you do wavefront processing. And where tail recursion stops being an optimization and starts being required for correctness. This is the problem with microbenchmarks. I'll see if I can't whip something up. Brian |
From: Nicolas C. <war...@fr...> - 2003-06-03 01:21:43
|
> > Thanks, I get the same results on the windows MSVC version of OCaml. The > > None/Some approach is really slow. > > (BTW, I resolved Brian problem with having Enum exported next different from > > imported one by renaming exported "next" to "peek") > > Nope. Just moved it around. "peek" to me doesn't have side effects. > I'll think on names for a bit. True. I was thinking that the peek function of Stream was actualy removing it but after re-reading the documentation, it doesn't. What about "get" then ? Nicolas |
From: Alan P. <ap...@re...> - 2003-06-03 02:10:18
|
In article <Pin...@ea...>, Brian Hurt wrote: > > That depends upon the program. I'd be most worried about 10+ maps on > 10_000_000 elements myself. With non-trivial map functions. As that's > where Enum really starts to shine- as you do wavefront processing. And > where tail recursion stops being an optimization and starts being required > for correctness. This is the problem with microbenchmarks. > > I'll see if I can't whip something up. Yes, real-world applications are better than benchmarks. I suggest, however, that benchmarks are better than speculation. |