From: John M. S. <sk...@oz...> - 2003-06-22 00:58:25
|
Brian Hurt wrote: > On Sat, 21 Jun 2003, John Skaller wrote: >>The fast_count function must be elided: >>it is a symptom of the designer not being >>able to make up his mind exactly what >>the model is. >> >>This is non-trivial. The only way to count >>a filtered enumeration is eagerly, >>on the other hand next can be applied lazily. >> > > I think part of the problem is that enumeration is becoming a victim of > it's own success. We're starting to want to apply it in ways not > originally conceived of. It's a great way to "glue together" parts of a > program. But this is putting stress on the original model, as we keep > going "well, this situation is almost exactly like an enumeration, > except we need to be able to do this...". Yes, but it isn't really the model that is being stressed but its specification. It is OK to fix a specification that eliminates some uses, in fact it is necessary. The same problem had to be dealt with for C++ STL iterators. There, they have conceptual distinctions between: input iterator forward iterator bidirectional iterator random iterator Whilst input iterators can be copied, nothing is said about the currency or validity of any iterator than the one last incremented: the others are all invalidated by this operation. On the other hand forward iterator are required to hold their position. So: a file handle is an input iterator, since when you seek on it, the seek point of all copies changes. A file handle together with a file position is a forward iterator, since it always seeks from the place it last finished reading. In STL, algorithms specify requirements on iterators. Here, you're trying to package up the algorithms *with* the iterators [the beauty of STL is that containers and algorithms interact VIA iterators, and so iterators are used to factor container design and algorithm design] In doing that, you are more or less forced to decide which kind of iterator the algorithms will work on: if you choose input iterators, a count consumes the enumeration: it has to, and it also must be REQUIRED to: cheating with a fast transparent count can't be allowed: someone might actually rely on the enumeration being consumed, for example in an iter2 which ought to terminate immediately if you count one of the arguments. Input iterators are good for coinductive data structures like streams, where the next operation is destructive. Forward iterators are more useful in general, and almost always work fine with inductive data structures: although, for example, counting might consume an enumeration, you do the count on a clone of the iterator, so the copied iterator is not affected. > There are time we *need* a count, almost no matter how expensive it is. Yes. The question is: can we allow it to consume the enumeration? In the spec I committed, the answer is NO: the user supplied count is not permitted to consume the enumeration. Effectively, I'm specifying forward iterators. This rules out enumerations of coinductive data structures like streams. > All the places where I can see from being usefull, init as defined above > is just as usefull. I don't have a problem with this. But I'm not sure > it solves the basic problems. Sure it does. The problem is simply that you have to specify that count is non-consuming or consuming. If you cannot implement such a count, you cannot use the Enum module. No problem. You may need TWO enum modules: there are TWO kinds of iterator. For 'from', you can even leave it provided you make it eager: it must expand the function immediately to obtain the count. The point is that then the client *knows* by specification when the function is called: immediately. Otherwise, it is useless because you *cannot* count it without destroying the enumeration permanently, defeating the purpose. Clearly, delaying the expansion doesn't work. If you append two copies of the same function, what happens? When the expansion is forced, the function is called in arbitrary order. let seq = ref 0 in let g () = if !seq > 2000 raise No_value; incr seq; !seq in Enum.append (Enum.from g, Enum.from g) The constructed enum (at the moment) is an input iterator. Clearly any counting will skip some values of seq .. indeed, it will change the behaviour of subsequent calls. On the other hand, init whilst not necessarily producing a forward iterator, WILL produce one if the argument is a function (mathematically). So the interface is good, the onus is on the client to provide a purely functional argument. The reason I think counting should be required NOT to consume the enumeration is that it means counting can be made lazy. BUT you have to eliminate filters :-) count (add(f,g)) = add (count f, count g) is the requirement. Here, 'add' means 'add integers' in the count value category, and 'add enumerations' in the enumeration category (of type T): for map: count (map f) = map (count f) Here the RHS map is 'identity function'. count (filter f) = filter (count f) Woops! There is no possible RHS filter, because count has lost the values of f, and filter needs them. There is some category theory here about functors and natural transformations .. but the idea should be clear that count must commute with any constructor you use. Fold preserves additive structure :-) count (fold f) = fold (count f) The RHS fold is the function f x = 1 :-) Note this is true EVEN IF THE RESULT OF THE FOLD IS AN ENUMERATION. In particular, to make say an array from a lazy enumeration, you can count the enumeration, allocate the array, and provide an extraction function to initialise it, all without needing to expand the lazy data structure. It doesn't matter if this is slow or internally clumbsy: the client can improve performance by calling 'force' first. Otherwise they know the order next is called on their data structures (the order of elements in the array) So the rule would be, for forward iterators: force is only called internally when all the forced enumerations would be conumed anyhow. In that case, it is OK to force: note that forward iterators BY SPECIFICATION act independently. So the order of forcing subcomponents is irrelevant, what matters is that next is called until end on each subcomponent enumeration. Err... does that make sense? iter, iteri et all: may call force fold: must not count: must not ... I think the overall design is good. You only need to commit to a specification and the follow through the implications. The only real difficulty is that there is more than one possible specification. If that means two enum modules, so be it: don't try to make a single module do two distinct jobs. A module with 'count' intrinsic should be forward iterators. A module without count should be input iterators. You do not need count to make an array. You make a list, count that, and then make an array .. of course Dynarray might be better than a list or whatever. If there is duplicated code in these two modules, you can factor it out into a third algorithms module, and then include it. By the way, I think being able to lazily append input iterators and then force them, for example to concatenate two files, is very useful. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |