Re: [Ocaml-lib-devel] benchmarking enum approaches

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 2 Jun 2003, Nicolas Cannasse wrote:

> Thanks, I get the same results on the windows MSVC version of OCaml. The
> None/Some approach is really slow.
> (BTW, I resolved Brian problem with having Enum exported next different from
> imported one by renaming exported "next" to "peek")

Nope.  Just moved it around.  "peek" to me doesn't have side effects.  
I'll think on names for a bit.

> 
> Please notice that you're here testing for the best case, where count is
> O(1) , but sometimes it can be O(N) or even worse since sometimes we need to
> create an intermediate list and do a lot of I/O (reading a file for
> example).

We can always implement an O(1) has_next given an O(1) next.  This is only 
an advantage if we avoid creating an intermediate data structure, however.

My personal opinion is that count should be able to return "I don't 
know"/"to find that out requires the creation of an intermediate 
datastructure".  Assuming count returns int option, this would allow for 
code like:

let rec use_enum e =
    match Enum.count e with
       Some c -> (* we have a good count *) ...
       | None ->
           (* explicitly create the intermediate datastructure *)
           use_enum (List.enum (List.of_enum e))

Alternatively, the user code might have more efficient ways of dealing 
with unknown length enumerations.  For example, Dynarray.append_enum might 
look something like:

let append_enum arr e =
    match Enum.count e with
        Some c -> (* Normal code, that prereserves c elements in the
                     array, so the data is only copied once *)
        | None ->
            (* Just repeatedly call add *)
            Enum.iter (Dynarray.add arr) e

> Interesting. In the first time, I would say this is a good idea, since with
> your benchs we can see that there is little improvement when iterating using
> a O(1) counter against using an exception . But how can we tell that
> e.count() is fast ? We can surely tells when it is slow : Enum.from could
> put a boolean to true saying that the "worst count" will be called. But what
> about Enum.make ? For example , ExtList.enum has an O(N) count, since we
> don't want to count the elements we creating an enum ( this is the lazy
> approach : do it only when we need it ).

We only call the O(N) count when we're about to do an O(N) function 
anyways (we never need to call count when doing a map).  I see three 
possibilities here (assuming we're calling it on a list):

1) The list is short enough to fit into cache and is already in cache.  
This seems to be what we're benchmarking most of the time- in which case 
calling List.length will be very cheap.  It will probably be about the 
same cost as N has_next calls, maybe even less.

2) The list is short enough to fit into cache, and is not already
(entirely) in cache.  In which case, the main cost (unless we're doing a
very expensive function on each element) will be the cost of loading the
list into cache.  Remember that on modern CPUs, a cache miss can be 300+
clock cycles.

3) The list is too long to fit into cache.  Here, has_next really shines.  
As if we use count, we have to load the entire list into cache twice.  
hash_next, however, will have the effect of simply loading the next 
element into cache (which we were going to have to do anyways).

Case #3 is when performance matters the most- but paradoxically, it's when
the various approachs we are discussing change performance the least.  
The cost is dominated by the memory latency, or the user functions.

> When is the O(N) count faster than the exception catching ? I would say
> "depends on the size of the list"... but we don't know the size :-) So if we
> can classify "bad" counts we can't do it for "good" counts, and this
> optimization will not work...

No matter which way we pick to do things, I can probably come up with a 
scenario where "it'd be faster to do it this other way".  I'd rate the 
performance of all the approaches as "acceptable" and go on.

> 
> > So I think the question is:  is it worth a 10-20% performance hit in
> > the library to unify the stream and enum types?

Myself: yes.

If I'm worrying about every single clock cycle, I'm not writting in Ocaml, 
I'm writting in assembler.  If I'm writting in Ocaml I'm looking for power 
and expressivity (this is, in fact, the normal case) and giving up much 
more than 10-20% in performance.

> 
> My question is : when do we need speed ? For small lists when few operations
> or for big lists with lots of operations ? With 5 maps on a 1000 elements
> list, I got 200% speed using exceptions instead of None/Some.

That depends upon the program.  I'd be most worried about 10+ maps on
10_000_000 elements myself.  With non-trivial map functions.  As that's
where Enum really starts to shine- as you do wavefront processing.  And
where tail recursion stops being an optimization and starts being required
for correctness.  This is the problem with microbenchmarks.

I'll see if I can't whip something up.

Brian