[ooc-compiler] Re: Strings
Brought to you by:
mva
|
From: dirk.muysers <dir...@li...> - 2001-05-17 10:53:45
|
Please have a close look at the way strings are implemented
in the Critical Mass Modula-3
(cm3) library.
It uses a single abstract type, which is transparent to
location (program constant, reference type), to content
(ASCII or Unicode, or a mixture thereof) and yet is
implemented in a rather efficient way.
REVEAL
TEXT = BRANDED Brand OBJECT
METHODS
get_info (VAR i: Info);
get_char (i: CARDINAL): CHAR := GetChar;
get_wide_char (i: CARDINAL): WIDECHAR := GetWideChar;
get_chars (VAR a: ARRAY OF CHAR; start:
CARDINAL) := GetChars;
get_wide_chars (VAR a: ARRAY OF WIDECHAR; start:
CARDINAL) := GetWideChars;
END;
TYPE
Info = RECORD
start : ADDRESS; (* non-NIL => string is at [start ..
start+length) *)
length : CARDINAL; (* length of string in characters *)
wide : BOOLEAN; (* => string contains WIDECHARs. *)
END;
From this simple abstraction, they have derived a number of
concrete classes that take into account program text
constant strings, dynamically allocated strings. One of the
subclasses is a pair that allows text being structured as a
tree (e.g. as a lazy evaluation of concatenation), and many
other optimisations in order to avoid excessive consing.
I think it really is worth a look as there is matter for
thought here.
----- Original Message -----
From: "Michael van Acken" <mi...@de...>
To: "Marco Oetken" <Mar...@we...>
Cc: <ooc...@li...>
Sent: Tuesday, May 15, 2001 11:09 PM
Subject: Re: [ooc-compiler] Proposal for an Email-Module;
RFC
| While I have in mind to add an immutable string data type
to the
| language eventually, I don't think this will happen in the
near
| future. Adding immutable strings as a library is an
option, but I
| believe this is a also a rather huge leap in Oberon
philosophy (see
| below).
[...]|
| The Oberon view on "strings" is quite fuzzy IMO. We have
the concept
| of string constants, for which only the compiler has a
type of its
| own, and some built-in operators and the `Strings' module.
In a way,
| the operators and `Strings' _define_ the Oberon concept of
strings.
| The key "ideas" presented there are: a) memory managment
is done by
| the user ("if you use `Strings', make sure you have
allocated enough
| space"), and b) mutability (changes work within the
pre-allocated
| space).
|
| The current ADT:String addresses the memory managment
issue by
| managing a heap object on its own, but explicitly permits
changes to
| the string object. Because the low-level character data
is also
| exposed (ADT:String.array is an exported read-only
pointer), there is
| no way the compiler can enforce that the array is not
modified.
| Please note that I want to keep the "array" field exported
even with
| this loophole, because I'm using it quite often to analyse
character
| data or to write out the string's data.
|
| I'm using ADT:String quite often (a quick count reveals
60+ imports),
| but this is no reason to change it -- provided only the
module and
| type name must be replaced.
|
| _______________________________________________
| ooc-compiler mailing list
| ooc...@li...
| http://lists.sourceforge.net/lists/listinfo/ooc-compiler
|