Re: [Ocaml-lib-devel] regeps (repost)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 2004-09-11 at 02:54, Richard Jones wrote:
> (Repost: original was apparently tagged as sp*m).
> 
> Not really sure I understand it from a user's point of view.
> 
> For me what really matters is that I could write something like:
> 
>   if string =~ /(a+)(b*)/ then ...

This can be done down the track.

That's under the heading 'parsers'. One write a parser
for 'Perl Re' or "Emacs' or 'Glob' or 'Posix' and translates
that string to a REGEXP and then run thru the engine.

String based Re's are convenient for 'Micky Mouse' jobs.
They're totally unsuitable as fundamental components.

Reasons:

(1) for complex regexps, regular *expressions* are untenable.
Regular *definitions* are mandatory. Thats when you use
a sequence of named regexps like in Lex.

(2) Encoding both the regexp operators and data in
the same string is untenable for complex regexps.
All that escaping is a problem. Numbered groups
is a very bad idea for complex regexps like a lexer
specification for a programming language.

And there is another reason too:

(3) Encoding character data and regexp operators
in a string is not i18n compatible. It isn't
possible to deem certain characters as the operators,
because you don't know what the character set is.

(4) Strings are 8 bit. The engine must support 32 bit.

(5) String based Re's aren't typesafe.

-- 
John Skaller, mailto:sk...@us...
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net