Paul Paterson wrote:
> How do people usually handle case-insensitive matching?
>
Usually (until now) I've just hacked it by explicitly creating [aA][tT]
ranges in the very few cases it came up. However, given that it's an
occasional request, I've just added support for it to SimpleParse.
Usage is like so:
production := c"match without case" # not 'c' prefix
which will match at run-time as if the production had been
spelled:
production := ([mM],[aA],[tT],[cC],[hH],' ',[wW],...)
Implication being that case-insensitive literals are *far* heavier than
regular literals (probably a few orders of magnitude slower), as under
the covers they are actually creating a sequential group with literal
and range sub-elements. Runs of multiple non-case-carrying letters will
be treated as a single literal, BTW.
> My source text (VB) is generally well behaved with respect to case
> (the VB IDE formats all keywords etc) but now I am planning on trying
> to parse VBScript and I can't make the same assumption since a lot of
> it is edited outside the VB IDE.
>
> The options seem to be:
>
> 1. duplicate all keywords in the grammar (but what if someone types
> "SubRoutine"?) 2. convert all text to lower case prior to parsing (but
> my text contains strings and variable names where case is important)
> 3. parse the lower case text but then get the parse tree elements from
> the original text
>
That might work (it's pretty easy too, as the dispatchprocessor has an
argument for which source-string to use), but could mess up when a
keyword is case-sensitive.
> 4. do a pre-parse to replace all keywords with a standard form 5.
> assume everything is ok and try to recover when parsing fails 6.
> something else ...
>
> Has anyone else "solved" this problem?
>
Hopefully the CILiteral feature will solve it for you. Let me know,
Mike
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
|