Hi Ken,
Ken Williams wrote:
> ---------------------------
> ROMANPAT="m"*
> ("d"?"c"{0,3}|"c"("d"|"m"))
> ("l"?"x"{0,3}|"x"("l"|"c"))
> ("v"?"i"{0,3}|"i"("v"|"x"))
> ...
> {ROMANPAT} / [^a-z] { return newTok("ROMAN"); }
> {ROMANPAT} $ { return newTok("ROMAN"); }
> ---------------------------
> ---------------------------
> Lookahead expression must have match with at least length 1.
> {ROMANPAT} / [^a-z] { return newTok("ROMAN"); }
> ---------------------------
>
> Does the error refer to the lookahead (what’s after the slash), or the
> main regex (before the slash)? IIUC the lookahead itself matches 1
> character, so that shouldn’t be an issue. The ROMANPAT regex can indeed
> match a zero-length string, so perhaps that’s the problem.
Yes, the problem is the ROMANPAT macro.
> Is there some way to force this rule to have a non-zero-length match?
How about spelling out all of the alternatives such that at least one
character is mandatory? Something like (warning: untested):
ROMANPAT = "m"+ ("d"? "c"{0,3} | "c" [dm])
("l"? "x"{0,3} | "x" [lc])
("v"? "i"{0,3} | "i" [vx])
| ("d" "c"{0,3} | "c"{1,3} | "c" [dm])
("l"? "x"{0,3} | "x" [lc])
("v"? "i"{0,3} | "i" [vx])
| ("l" "x"{0,3} | "x"{1,3} | "x" [lc])
("v"? "i"{0,3} | "i" [vx])
| ("v" "i"{0,3} | "i"{1,3} | "i" [vx])
Steve
P.S.: Wikipedia says that the formulation you've specified is modern,
and that things like IIIII (5) and VV (10) and XIIII (14) were once used.
|