From: Andreas R. <ros...@ps...> - 2002-07-25 13:14:27
|
Like Matthew I started implementing the latest version of the Basis spec for Alice and Hamlet. I'm quite happy with most of the changes. It was a surprise to discover the presence of a Windows structure, though :-) Here is my list of comments, some of which may duplicate observations already made by Matthew. They primarily cover global issues and the required part of the library, though I haven't looked deeper into the IO and Posix parts yet. I also included some proposals for modest additions to the library, which I believe are useful and fit its spirit. Trivial bugs, typos, cosmetics ------------------------------ * Overview: - INT_INF appears in the list of required signatures. - WordArray2 appears under the list of required structures, instead of optional ones. * LIST_PAIR: - Typo in description of allEq: double "the". * SUBSTRING: - The scan example uses the deprecated "all" function. * VECTOR_SLICE: - Typo in synopsis of subslice: s/opt/sz/. - Typo in description of subslice: s/|arr|/|sl|/. - Typo in description of findi: s/appi/findi/. - Signature sometimes uses Vector.vector instead of plain vector. - The equation for mapi can be simplified to: Vector.fromList (foldri (fn (i,a,l) => f(i,a)::l) [] slice) * MONO_VECTOR_SLICE and ARRAY_SLICE and MONO_ARRAY_SLICE: - Typo in synopsis of subslice: s/opt/sz/. - Typo in description of findi: s/appi/findi/. * BYTE: - Accidental "val" keyword in synopsis of some functions. * TEXT_IO: - The "where" constraints contain erroneously qualified ids. - The specification of the TEXT_IO signature is not valid SML'97, since StreamIO is specified twice. You might want to add a comment regarding that. - The constraints for types vector and elem are redundant (in fact, invalid), because the signature TEXT_STREAM_IO already specifies the necessary equations. * The use of variable names is sometimes inconsistent: - Predicate arguments to higher-order functions are usually named "f" (eg. List.all), sometimes "p" (eg. String.tokens, StringCvt.splitl), and sometimes even "pred" (eg. ListPair.all). - Similarly, fold functions mostly use "init" to name initial accumulators, except in the List and ListPair modules. Ambiguities / Unclear Details ----------------------------- * Overview: - The subsection about dependencies among optional modules has disappeared. Does that mean that there aren't any anymore? (The nice subsection about design rules and conventions also has gone.) * The intended meaning of opaque signature constraints is not always clear to me. Sometimes the prose contains remarks about additional equalities that are not appearent from the signature constraints. For example, is or isn't - Text.Char.char = Char.char ? (and so on for the rest of Text) - LargeInt.int = IntN.int (for some structure IntN) ? (likewise LargeWord.word, LargeReal.real) - Char.string = String.string ? - Math.real = Real.real ? In particular, the spec sometimes speaks of "equal structures", which has no real technical meaning in SML'97. Note that from the opaque matching on the overview page one might even conclude that General.unit <> {} ! * The type specification of String.string and CharVector.vector is circular: structure String :> STRING where type string = CharVector.vector structure CharVector :> MONO_VECTOR where type vector = String.string Likewise for Substring.substring and CharVectorSlice.slice. A respective defining structure should be chosen. * STRING: - Function fromString has a special case that is not covered by implementing the function through straight-forward iterative application of the Char.scan function, namely a trailing gap escape (\f...f\) as in "foo\\ \\" or "foo\\ \\\000" (where \000 is an non-convertible character). Several implementations I tried get that detail wrong, so a corresponding note might be in order. Moreover, it is not completely obvious from the description what the result should be for strings that contain a gap escape as the only convertible sequence, e.g. "\\ \\" or "\\ \\\000" - it is supposed to be SOME "", I guess. * SUBSTRING: - Shouldn't span raise Span if i' < i? Otherwise, contrary to the prose, it in fact accepts arguments where ss' is left to ss, as long as they overlap (which is rather odd). - For the curried triml/trimr it is not clear whether an Subscript exception has to be raised already if k < 0 but no second argument is applied. Naming and structuring ---------------------- Its nicely chosen regular naming conventions and structure are two of the aspects I like most about the Standard Basis. The following list enumerates the few cases where I feel that the spec violates its own conventions. * WORD: - The fromLargeWord and toLargeWord functions should drop the "Word" suffix to be consistent with the corresponding functions in the REAL and INTEGER signatures. * CHAR: - The functions contains/notContains should be moved to the STRING signature, as they are similar to find/exist operations and thus functionality of the aggregate. The type string could then be removed from the signature. * ARRAY_SLICE and MONO_ARRAY_SLICE: - The function copyVec seems completely out of place: it does neither operate on array slices, nor on vectors. But honestly I have got no idea where else to put it :-( * STRING and SUBSTRING: - There is a certain asymmetry between slices and substrings which tends to confuse at least myself when hacking. For more consistency I propose: (1) changing the type of Substring.substring to string * int * int option -> substring (for consistency with VectorSlice.slice), (2) renaming Substring.slice to Substring.subsubstring, (for consistency with VectorSlice.subslice), (3) removing Substring.{app,foldl,foldr} (there are no similar functions in the STRING signature, and in both cases they are available through CharVector/CharVectorSlice), (4) removing String.extract and Substring.extract (the same functionality is available through CharVector[Slice]). - I believe the deprecated Substring.all can be removed for good. After all, there are more serious incompatible changes being made (e.g. array copying functions). * Vectors and arrays: - While the lib consistently uses the to/from convention for conversions on basic types, it sometimes uses adhoc conventions for aggregates. I propose renaming: (1) Array.vector to Array.toVector (2) VectorSlice.vector to VectorSlice.toVector, (3) ArraySlice.vector to ArraySlice.toVector, (4) Substring.string to Substring.toString, - Since the copy functions have only 3, mostly distinctly typed arguments now, there no longer seems to be a strong reason to require passing those by notationally heavy records. * INT_INF: - The presence of bit fiddling operators in that signature is something that feels exceptionally ad-hoc. Either they should be available for all integer types, or there should be a separate WORD_INF, with appropriate conversions, that makes these available. * Toplevel: - Now that there is Word.~ (which is good) it seems rather odd that the toplevel ~ is not overloaded for words, i.e. does not have type num-> num. * Net functionality: - I really like the idea of structuring the library namespace as it has been done with the OS and Posix structures. I would prefer to see something similar being done for the added network functionality. More precisely, I propose (1) moving the structures Socket, INetSock, GenericSock, and the three Net*DB structures into a new wrapper structure Net (renaming Net*DB to *DB), (2) defining a corresponding signature NET, (3) renaming the signatures SOCKET, GENERIC_SOCK and INET_SOCK to NET_SOCKET, NET_GENERIC_SOCK and NET_INET_SOCK, resp., (4) moving UnixSock to the Unix structure (renamed as Socket). Misc. proposals for additional functionality -------------------------------------------- Here is a small collection of miscellaneous simple functions which I believe the library is still lacking, either because they are commonly useful or because they would make the library more regular. * LIST and LIST_PAIR: - The IMHO single most convenient extension to the library would be indexed morphisms on lists, i.e. adding val appi : (int * 'a -> unit) -> 'a list -> unit val mapi : (int * 'a -> 'b) -> 'a list -> 'b list val foldli : (int * 'a * 'b -> 'b) -> 'b -> 'a list -> 'b val foldri : (int * 'a * 'b -> 'b) -> 'b -> 'a list -> 'b val findi : (int * 'a -> bool) -> 'a list -> (int * 'a) option - Likewise for LIST_PAIR. - LIST_PAIR does not support partial mapping: val mapPartial : ('a * 'b -> 'c option) -> 'a list * 'b list -> 'c list * LIST, VECTOR, ARRAY, etc.: - Another function on lists that would be very useful from my perspective is val appr : ('a -> unit) -> 'a list -> unit and its indexed sibling val appri : (int * 'a -> unit) -> 'a list -> unit which traverse the list from right to left. - Likewise for all aggregate types. - All aggregates come with a fromList function. I often feel the need to have inverse toList functions. Use of foldr is obfuscating. * OPTION: - Often using isSome is a bit clumsy. I thus propose adding the dual val isNone : 'a option -> bool * STRING and SUBSTRING: - For historical reasons we have {String,Substring}.size instead of *.length, which is inconsistent with all other aggregates and frequently lets me mix them up when I use them side by side. I propose adding aliases String.maxLen String.length Substring.length * WideChar and WideString: - There is no convenient way to convert between the standard and wide character set. Would it be reasonable to introduce LargeChar and LargeString structures (and so on) and have the CHAR and STRING signatures enriched by fromLarge/toLarge functions, as for numbers? That would also allow a program to select the widest character set available (which is currently impossible within the language). * String conversion: - I don't quite see the rationale for which signatures contain a scan function and which don't. I believe it makes sense to have scan in every signature that has fromString. - There should be a function val scanC : (Char.char, 'a) StringCvt.reader -> (char, 'a) StringCvt.reader to scan strings as C characters. This would make Char.fromCString and particularly String.fromCString more modular. - How about a dual writer abstraction as with type ('a,'b) writer = 'a * 'b -> 'b option and supporting fmt functions for basic types? Such a thing might be useful for writing to streams or buffers. * Vectors: For some time now I have been trying to use vectors more often instead of an often inappropriate list representation. This is sometimes made more difficult simply because the library support isn't as good as for lists. It improved in the updated version but still I miss: - Array.fromVector, - Vector.mapPartial, - Vector.rev, - Vector.append (though I guess concat is good enough), - most of all: a VectorPair structure. * Hash functions: - Giving every basic type a (default) hash function in addition to comparison would be quite useful in conjunction with container libraries. * There is no defining structure for references. I would like to see signature REF structure Ref : REF where REF contains: datatype ref = datatype ref val ! : 'a ref -> 'a val := : 'a ref * 'a -> unit val swap : 'a ref * 'a ref -> unit (* or :=: ? *) val map : ('a -> 'a) -> 'a ref -> 'a ref You might then consider removing ! and := from GENERAL. * Signature conventions: Some additional conventions would make use of Basis types as functor arguments more convenient: - Each signature defining an abstract type should make that type available under the alias "t" as well (this includes monomorphic types as well as polymorphic ones). - Every equality type should come with an explicit equality function val eq : t * t -> bool to move away from the reliance on eqtypes. - There should be a uniform name for canonical constructor functions, e.g. "new" (or at least an alias). -- Andreas Rossberg, ros...@ps... "Computer games don't affect kids; I mean if Pac Man affected us as kids, we would all be running around in darkened rooms, munching magic pills, and listening to repetitive electronic music." - Kristian Wilson, Nintendo Inc. |