[Ocaml-lib-devel] Re: benchmarking enum approaches

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

In article <000d01c328df$e687e6b0$2713f9ca@WARP>, Nicolas Cannasse wrote:
> 
>> Another possibility would be to be clever about having fold and iter
>> use (e.count ()) when it is fast, and use exception catching when
>> (e.count ()) is slow, without actually having separate types.
> 
> Interesting. In the first time, I would say this is a good idea, since with
> your benchs we can see that there is little improvement when iterating using
> a O(1) counter against using an exception . But how can we tell that
> e.count() is fast ? We can surely tells when it is slow : Enum.from could
> put a boolean to true saying that the "worst count" will be called. But what
> about Enum.make ? For example , ExtList.enum has an O(N) count, since we
> don't want to count the elements we creating an enum ( this is the lazy
> approach : do it only when we need it ).
> When is the O(N) count faster than the exception catching ? I would say
> "depends on the size of the list"... but we don't know the size :-) So if we
> can classify "bad" counts we can't do it for "good" counts, and this
> optimization will not work...

It doesn't have to be perfect to be an improvement.  Enums from arrays
or dynarrays, for instance, have fast counts.  Enum.map would inherit
the fast_count flag, whereas most other enum transforms (e.g.
Enum.filter) would set the fast_count flag to zero.

This could also be done by keeping the stream and enum types separate,
with the stream type having no count.  Either way would give the
performance benefit; it's really a matter of making a nice library API
for people.

Doing it with types, there would be:

  Enum.map:      ('a -> 'b)   -> 'a enum   -> 'b enum
  Stream.map:    ('a -> 'b)   -> 'a stream -> 'b stream
  Enum.filter:   ('a -> bool) -> 'a enum   -> 'a stream    *** Note ***
  Stream.filter: ('a -> bool) -> 'a stream -> 'a stream
  Enum.iter:     ('a -> unit) -> 'a enum   -> unit
  Stream.iter:   ('a -> unit) -> 'a stream -> unit

where Enum.iter and Enum.filter use count, and Stream.iter and
Stream.filter use a try-catch.

  List.stream: 'a list -> 'a stream
  List.from_stream: 'a stream -> 'a list
  List.from_enum: 'a enum -> 'a list

  Array.enum: 'a array -> 'a enum
  Array.from_stream: 'a stream -> 'a array
  Array.from_enum: 'a enum -> 'a array

There is no List.enum, since you can't count a list easily.
List.from_stream does a try-catch, and List.from_enum uses count.

Of course reading lines from a file handle would be a stream, rather
than an enum.

There could be a function:

  Enum.from_stream: 'a stream -> 'a enum

with warnings on it that it may be rather expensive, to do the
Enum.force magic.  Array.from_stream could be implemented as:

  let from_stream s = from_enum (Enum.from_stream s);;

I don't see how a Stream.from_enum function would be useful.

>> So I think the question is:  is it worth a 10-20% performance hit in
>> the library to unify the stream and enum types?
> 
> Which unification ? You mean, using None/Some next function ?

I am not at all suggesting using options to do the end-of-stream
notification.  Certainly I think that using exceptions is a much
better way to go.  The "unification" I was referring to was the
decision about whether to have distinct enum and stream types.  The
"performance hit" I was referring to was the cost of using exceptions
to iterate over an enum when you could have used the count.