-
It would be very useful to support these to avoid the [Ww][Hh][Yy] idiom, which gets tedious after the first 10 words or so. Ideally, this would be a supported by implementing UTR #30 (http://unicode.org/reports/tr30/tr30-3.html).
2009-07-14 17:37:24 UTC in Lexical Analyzer Generator Quex
-
Working well, including the /x+ case. Thanks.
2009-07-10 18:43:57 UTC in Lexical Analyzer Generator Quex
-
Thanks, works fine.
2009-07-10 18:41:02 UTC in Lexical Analyzer Generator Quex
-
Currently, to split contractions, I have to put the following in the mode (which could be more concisely expressed with a single action):
CAN/NOT => QUEX_TKN_WORD(Lexeme);
[Cc]an/not => QUEX_TKN_WORD(Lexeme);
GIM/ME => QUEX_TKN_WORD(Lexeme);
[Gg]im/me => QUEX_TKN_WORD(Lexeme);
GON/NA =>...
2009-07-10 18:39:31 UTC in Lexical Analyzer Generator Quex
-
Sorry, I didn't test this sufficiently; I had the idea that it was working for gime/me|lem/me (but wasn't). The error message is definitely much more useful, thanks.
2009-07-10 18:36:52 UTC in Lexical Analyzer Generator Quex
-
Thanks very much for the quick fix; yes, looks to be working.
2009-07-10 18:01:56 UTC in Lexical Analyzer Generator Quex
-
token {
WORD
LASTWORD
SENTBOUND
OTHER
}
define {
WHITESPACE [ \t\n]+
WORDCHAR [_A-Za-z0-9]
ALLWCHAR {WORDCHAR}|[.]
UENDCNT N'T|'S|'D|'M|'LL|'RE|'VE|'YE
LENDCNT n't|'s|'d|'m|'ll|'re|'ve|'ye
ENDCONT {UENDCNT}|{LENDCNT}
USTARTCNT GIM/ME|LEM/ME
LSTARTCNT [Gg]im/me|[Ll]em/me
STARTCNT {USTARTCNT}|{LSTARTCNT}.
2009-07-09 14:03:18 UTC in Lexical Analyzer Generator Quex
-
(from quex compilation):
token {
WORD
LASTWORD
SENTBOUND
OTHER
}
define {
WHITESPACE [ \t\n]+
WORDCHAR [_A-Za-z0-9]
ALLWCHAR {WORDCHAR}|[.]
UENDCNT N'T|'S|'D|'M|'LL|'RE|'VE|'YE
LENDCNT n't|'s|'d|'m|'ll|'re|'ve|'ye
ENDCONT {UENDCNT}|{LENDCNT}
USTARTCNT GIM/ME|LEM/ME
LSTARTCNT [Gg]im/me|[Ll]em/me...
2009-07-09 13:59:45 UTC in Lexical Analyzer Generator Quex
-
BLAH gim/me|lem/me|d/'ye
gives:
tokelex.qx:13:error: Missing identifier for pattern definition.
2009-07-09 13:29:29 UTC in Lexical Analyzer Generator Quex
-
Sorry, I'm too lazy to split these, partly because I suspect they're all a problem with pre-condition handling.
OTHER matches two chars on input dog's 'silly'
define {
WORD [A-Za-z0-9]+
WHITESPACE [ \r\t\n]+
}
mode standard
{
=> QUEX_TKN_TERMINATION;
{WHITESPACE} {}
{WORD}/'s => QUEX_TKN_WORD(Lexeme);
{WORD}/'s/ => QUEX_TKN_WORD(Lexeme);
{WORD} =>...
2009-07-09 13:28:04 UTC in Lexical Analyzer Generator Quex