From: Andreas R. <ros...@ps...> - 2002-12-13 14:59:43
|
Here is a collection of issues and comments we gathered when implementing the I/O stack from the Standard Basis (primitive, stream, imperative I/O) for Alice. While in general the specification seems to be pretty precise and complete, we sometimes found it hard to understand the semantic details of stream I/O, especially since many of them can only be derived indirectly from the examples in the discussion section and there appear to be some minor ambiguities and inconsistencies. Also, the PrimIO and StreamIO functors cannot always be implemented as suggested, because of their parametricity in types such as position and element. As a general note, the I/O interface does not seem to have been designed with concurrency in mind. In particular, augmenting readers and writers cannot be made thread-safe, AFAWCS. This is a bit of a problem for us, since Alice is relying on concurrency. However, that does not seem to be an issue easily solved. - Leif Kornstaedt, Andreas Rossberg The IO structure ---------------- * exception Io: - function field: (pedantic) The wording seems to imply that only functions from STREAM_IO raise the Io exception, but this is clearly not the case (consider TextIO.openIn to name just one). * datatype buffer_mode: - There is no specification of what precisely line buffering is supposed to mean, in particular for non-text streams. The PRIM_IO signature --------------------- * Synopsis: - (pedantic) It says that "higher level I/O facilities do not access the OS structure directly...". That's somewhat misleading since OS does not provide the same functionality anyway (if any, it was the Posix structure). * type reader: - Unlike for writers, it is not specified what the minimal set of operations is that a reader must support. - It is not specified whether multiple end-of-streams may occur. Since they are anticipated for StreamIO, one should expect them to be possible for underlying readers as well. However, this requires clarification of the semantics of several operations. - readArr, readArrNB: It is specified nowhere what the option for sz is supposed to mean, i.e. what the semantics of NONE is (presumably as for slices). - readVec, readVecNB: Unlike all other similar read and write functions, these two do not accept an option for the size argument. - avail: The description suggests that the function can be used as a hint by inputAll. However, this information is too inaccurate to be useful, since (apart from translation issues) the physical size of elements cannot be obtained (in particular in the StreamIO functor, which is parametric in the element type). In practice, endPos seems to be more useful for this purpose. So it is not clear what purpose avail could actually serve at all at the abstraction level provided by readers. - endPos: (1) May it block? For example, when reading from terminal or from another kind of stream, this can be naturally expected. (2) Which position is returned if there are multiple end-of-streams? - getPos, setPos, endPos, verifyPos: Description should start with "when present". - setPos, endPos: Should not raise an exception if unimplemented, but rather be NONE. Actually, the implementation notes on writers state that endPos *must* be implemented for readers. - Implementation note, item 6: Why is it likely that the client uses getPos frequently? And why should the reader count *untranslated* elements (and how would there be actual elements before translation)? (See also comments on STREAM_IO.filePosIn) * type writer: - writeVec, writeArr, writeVecNB, writeArrNB: (1) Again, it is not specified what the optional size means. (2) When may k < sz occur without having IO failure? If it is arbitrary, then there appears to be no correct way to write a sequence of elements, because it is neither possible to detect partial element writes (which are explained in the paragraph before the Implementation Notes), nor to complete such writes. This particularly implies that the StreamIO functor cannot implement flushing correctly (see below). - getPos, setPos, endPos, verifyPos: Description should start with "when present". - getPos, setPos: Should not raise an exception if unimplemented, but rather be NONE. - last paragraph before Implementation Note: Typo, double "plus". - first sentence in Implementation Note: (pedantic) Why is this put into the implementation notes when it actually seems to be a requirement of the specification? - last paragraph of Implementation Note: (1) States that readers must implement getPos, which seems to be contradicted by its optional type. (2) Typo, double "need". * openVector: - Is this supposed to support random access? Note that for types generated with the PrimIO functor it cannot (see below)! That seems to make this function rather useless. * augmentReader, augmentWriter: - It is not possible to synthesize operations in a way that is thread-safe in concurrent systems, hence it should be noted that augmenting is potentially dangerous. * There is no reference to the PrimIO functor. The PrimIO functor ------------------ * General problems: - Since the implementation is necessarily parametric in the pos type, openVector, nullRd, nullWr cannot create readers that allow random access, although one would expect that at least for openVector. * Functor argument: - Structure names A and V are inconsistent with the StreamIO and ImperativeIO functors. - Type pos has to be an eqtype to match the result signature. - Since the extract and copy functions have been removed/changed from ARRAY and VECTOR signatures, the PrimIO functor now naturally requires slice structures for efficient implementation. (Likewise the StreamIO functor) * Functor result: - Type sharing of the pos type is not specified, though essential for this functor being useful at all. The STREAM_IO signature ----------------------- * Synopsis: - An exception likely to be raised in by the underlying reader/writer is Size, which is not mentioned. OTOH, Fail can only occur in the rare case of user-supplied readers/writers, as the Basis itself is supposed to never raise it. * type out_pos: - A note on the meaning of this type would be desirable, since its canonical representation is (outstream * pos) rather than pos. (That also may have caused confusion in the discussion of imperative I/O, see below.) * input1: - The signature of this function is inconsistent with all other input functions. It should rather have type instream -> elem option * instream which in fact appears to be the type assumed in the discussion example relating input1 to inputN. * input: - Typo, s/ay/may/ * inputN: - This function is somewhat underspecified for n=0. In particular, may it block? Is it required to raise Io if the underlying reader is closed? * input, input1, inputN, inputAll: - (pedantic) Descriptions speak of "underlying system calls", although the reader may not actually depend on system calls. Preferably speak of "underlying reader" only. * closeIn: - Likewise, description speaks of "releasing system resources". This should be replaced by saying that it closes the underlying reader (which is not even specified as is). * closeOut: - Does the function attempt to close the stream even if flushing fails? - Why is it possible to close terminated streams? That seems to allow unfortunate interference with another stream that has been created from the extracted writer. * mkInstream, getReader: - The table seems to imply that mkInstream always augments its reader. This is inappropriate for concurrent environments (see above). - Should getReader return the original or the augmented reader? - The table still includes the removed getPosIn and setPosIn functions. * mkOutstream, getWriter: - Likewise. * filePosIn: - There seems to be no way to implement this function for buffered I/O, because the reader position that corresponds to a mid-block-element is not available and cannot be calculated in general. So how is this meant? - Typo, s/character/element/ * filePosOut: - Likewise. * getWriter: - It is non-obvious what the precise meaning of "terminating" a stream is. If this is merely setting a status flag then a corresponding note would be helpful. * getPosOut: - May this flush the stream (and hence raise Io exceptions)? * setPosOut: - This may raise an exception because the position has been invalidated after obtaining it (e.g. by file truncation performed by another process). - Typo, s/underlying device/underlying writer/ * setBufferMode, getBufferMode: - There is no specification of the semantics of line buffering, in particular for non-text streams. (See also comments on StreamIO functor) - It is not specified whether the stream may be flushed when set to LINE_BUF mode (may cause Io exception). It seems unreasonable to require it not to do so (assuming that line buffering is intended to maintain the invariant that the buffer never contains line breaks). - The synopsis of this function uses "ostr", while all others use "f" for streams. * setPosOut, setBufferMode, getWriter: - Can raise an exception if flushing fails. * Discussion: - The statement that closing a stream just causes the not-yet-determined part of the stream to be empty should probably be generalised to explain what *truncating* a stream means (getReader also truncates the stream). - Example of freshly opened stream: s/mkInstream r/mkInstream(r, vector [])/ s/size/length/ - nreads example: s/mkInstream r/mkInstream(r, vector [])/ s/size/length/ - input1/inputN relation example: (1) Inconsistent with the actual typing of input1 (see above). (2) Typo, s/inputN f/inputN(f,1)/ - Unbuffered I/O, 1st example: (1) Typos, s/mkInstream(reader)/mkInstream(reader, vector [])/ s/PrimIO.Rd{chunkSize,...}/(PrimIO.RD{chunksize,...}, v)/ (2) More importantly, the actual condition appears to be incorrect. It should read: (chunkSize > 1 orelse length v = 1) andalso endOfStream f' - Unbuffered I/O, 2nd example: s/mkInstream(reader)/mkInstream(reader, vector [])/ s/PrimIO.Rd{chunkSize,...}/(PrimIO.RD{chunksize,...}, v)/ The condition must be corrected as above. * There is no reference to the StreamIO functor. The StreamIO functor -------------------- * General problems: - It is impossible for this functor to support line buffering, since it has no way of knowing which element consists a line break. This could be solved by changing the someElem functor argument to a breakElem argument. - It is also impossible to utilize reader's endPos for pre-allocation, because the functor is parametric in the position type. * Functor argument: - Since the extract and copy functions have been removed/changed from ARRAY and VECTOR signatures, the StreamIO functor now naturally requires slice structures for efficient implementation. (Likewise the PrimIO functor) * Functor result: - Type sharing of the result types is not specified. * Discussion, paragraph on flushing: - Most of this discussion rather belongs to the description of STREAM_IO. - Everything said here is not restricted to flushOut, but applies to flushing in general. - Unfortunately, it is left unspecified where flushing may happen and, consequently, where respective Io exceptions may occur. - Write retries as suggested here seem to be impossible to implement correctly using the writer interface as specified (see comments on PRIM_IO.writer). - According to the writer description, write operations may never return an element count of 0, so the last sentence is misleading. * Discussion, last paragraph: - Typo, missing ")" * Implementation note: - 3rd bullet: typo, s/PrimIO.augmentIn/PrimIO.augmentReader/ - 5th and 6th bullet: The endPos function cannot be utilized as suggested, because the functor is necessarily parametric in the position type. The IMPERATIVE_IO signature --------------------------- * General comment: - It is unfortunate that imperative I/O is asymmetric with respect to providing (limited) random access on input vs. output streams - the former requires going down to the lower-level stream I/O. That makes imperative I/O a somewhat incomplete abstraction layer. - Likewise, it would be desirable if there were ways for performing full-fledged random access without leaving the imperative I/O abstraction layer, at least for streams were it is suitable (e.g. BinIO). Despite the statement in the discussion this is neither available for input nor for output streams (see comments below). * closeIn: - Typo, s/S.closeIn/StreamIO.closeIn/ * flushOut: - Typo, s/S.flushOut/StreamIO.flushOut/ * closeOut: - Typo, s/S.closeOut/StreamIO.closeOut/ * Discussion: - Equivalences, last line: s/StreamIO.output/StreamIO.flushOut/ - Paragraph about random-access on output streams: It says that BinIO.StreamIO.out_pos = Position.int. This is not true, we have BinPrimIO.pos = Position.int, but that is a completely different type. In fact, it is impossible to implement out_pos as Position.int. * There is no reference to the ImperativeIO functor. The ImperativeIO functor ------------------------ * Functor argument: - The Array argument is unnecessary. * Functor result: - Type sharing of the result types is not specified. The TEXT_STREAM_IO signature ---------------------------- * General comment: - Why bother separating this signature from STREAM_IO? => outputSubstr can easily be generalised to outputSlice (for good), => if line buffering is part of STREAM_IO, inputLine might be as well. The TextIO structure -------------------- * General comment: - Systems providing WideText should also provide a WideTextIO structure (they have to provide WideTextPrimIO already, which seems inconsistent). * Interface: - Duplicated type constraints for StreamIO.reader and StreamIO.writer. The BinIO structure -------------------- * Interface: - Type sharing with BinPrimIO is not specified (unlike for TextIO), i.e. the following constraints are missing: where type StreamIO.reader = BinPrimIO.reader where type StreamIO.writer = BinPrimIO.writer where type StreamIO.pos = BinPrimIO.pos |