Re: [Ocaml-lib-devel] Enum inconsistent

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Brian Hurt wrote:

> On Sat, 21 Jun 2003, John Skaller wrote:

>>The fast_count function must be elided:
>>it is a symptom of the designer not being
>>able to make up his mind exactly what
>>the model is.
>>
>>This is non-trivial. The only way to count
>>a filtered enumeration is eagerly,
>>on the other hand next can be applied lazily.
>>
> 
> I think part of the problem is that enumeration is becoming a victim of 
> it's own success.  We're starting to want to apply it in ways not 
> originally conceived of.  It's a great way to "glue together" parts of a 
> program.  But this is putting stress on the original model, as we keep 
> going "well, this situation is almost exactly like an enumeration, 
> except we need to be able to do this...".

Yes, but it isn't really the model that is being stressed
but its specification.

It is OK to fix a specification that eliminates some uses,
in fact it is necessary.

The same problem had to be dealt with for C++ STL iterators.
There, they have conceptual distinctions between:

input iterator
forward iterator
bidirectional iterator
random iterator

Whilst input iterators can be copied, nothing

is said about the currency or validity of any
iterator than the one last incremented: the others
are all invalidated by this operation.

On the other hand forward iterator are required
to hold their position.

So: a file handle is an input iterator, since
when you seek on it, the seek point of all
copies changes.

A file handle together with a file position
is a forward iterator, since it always seeks
from the place it last finished reading.

In STL, algorithms specify requirements on
iterators. Here, you're trying to package
up the algorithms *with* the iterators

[the beauty of STL is that containers and algorithms
interact VIA iterators, and so iterators are used
to factor container design and algorithm design]

In doing that, you are more or less forced
to decide which kind of iterator the algorithms
will work on:

if you choose input iterators, a count consumes
the enumeration: it has to, and it also must
be REQUIRED to: cheating with a fast transparent
count can't be allowed: someone might actually
rely on the enumeration being consumed, for example
in an iter2 which ought to terminate immediately
if you count one of the arguments.

Input iterators are good for coinductive data structures
like streams, where the next operation is destructive.

Forward iterators are more useful in general,
and almost always work fine with inductive
data structures: although, for example,
counting might consume an enumeration,
you do the count on a clone of the iterator,
so the copied iterator is not affected.

> There are time we *need* a count, almost no matter how expensive it is.  

Yes. The question is: can we allow it to consume the
enumeration?

In the spec I committed, the answer is NO:
the user supplied count is not permitted to consume
the enumeration.

Effectively, I'm specifying forward iterators.
This rules out enumerations of coinductive
data structures like streams.

> All the places where I can see from being usefull, init as defined above 
> is just as usefull.  I don't have a problem with this.  But I'm not sure 
> it solves the basic problems.

Sure it does. The problem is simply that you have to specify
that count is non-consuming or consuming. If you cannot implement
such a count, you cannot use the Enum module. No problem.
You may need TWO enum modules: there are TWO kinds of iterator.

For 'from', you can even leave it provided you make it
eager: it must expand the function immediately to obtain
the count. The point is that then the client *knows* by
specification when the function is called: immediately.

Otherwise, it is useless because you *cannot* count it
without destroying the enumeration permanently,
defeating the purpose.

Clearly, delaying the expansion doesn't work.
If you append two copies of the same function,
what happens? When the expansion is forced,
the function is called in arbitrary order.

let seq = ref 0 in
let g () = if !seq > 2000 raise No_value; incr seq; !seq in

Enum.append (Enum.from g, Enum.from g)

The constructed enum (at the moment) is an
input iterator. Clearly any counting will
skip some values of seq .. indeed, it will change
the behaviour of subsequent calls.

On the other hand, init whilst not necessarily
producing a forward iterator, WILL produce
one if the argument is a function (mathematically).
So the interface is good, the onus is on the client
to provide a purely functional argument.

The reason I think counting should be required NOT
to consume the enumeration is that it means counting
can be made lazy.

BUT you have to eliminate filters :-)

	count (add(f,g)) = add (count f, count g)

is the requirement. Here, 'add' means 'add integers'
in the count value category, and 'add enumerations'
in the enumeration category (of type T): for map:

	count (map f) = map (count f)

Here the RHS map is 'identity function'.

	count (filter f) = filter (count f)

Woops! There is no possible RHS filter,
because count has lost the values of f,
and filter needs them. There is some
category theory here about functors and
natural transformations .. but the idea
should be clear that count must commute
with any constructor you use.

Fold preserves additive structure :-)

	count (fold f) = fold (count f)

The RHS fold is the function f x = 1 :-)
Note this is true EVEN IF THE RESULT OF THE FOLD
IS AN ENUMERATION.

In particular, to make say an array from a lazy enumeration, you can count
the enumeration, allocate the array, and provide
an extraction function to initialise it,
all without needing to expand the lazy data structure.

It doesn't matter if this is slow or internally
clumbsy: the client can improve performance
by calling 'force' first. Otherwise they know
the order next is called on their data structures
(the order of elements in the array)

So the rule would be, for forward iterators:
force is only called internally when all the
forced enumerations would be conumed anyhow.
In that case, it is OK to force: note that
forward iterators BY SPECIFICATION act independently.
So the order of forcing subcomponents is irrelevant,
what matters is that next is called until end on each
subcomponent enumeration.

Err... does that make sense?

iter, iteri et all: may call force
fold: must not
count: must not
...

I think the overall design is good. You only need
to commit to a specification and the follow through
the implications.

The only real difficulty is that there is more than
one possible specification.

If that means two enum modules, so be it: don't try
to make a single module do two distinct jobs.

A module with 'count' intrinsic should be forward iterators.
A module without count should be input iterators.

You do not need count to make an array. You make a list,
count that, and then make an array .. of course
Dynarray might be better than a list or whatever.

If there is duplicated code in these two modules,
you can factor it out into a third algorithms module,
and then include it.

By the way, I think being able to lazily append
input iterators and then force them, for example
to concatenate two files, is very useful.

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850