Re: [SimpleParse] Having SimpleParser ignore whitespaces
Brought to you by:
mcfletch
|
From: Mike C. F. <mcf...@ro...> - 2002-09-10 16:12:43
|
At the moment, there's no such mechanism. This is largely a historic
artefact of the parsing mechanism that SimpleParse uses. "Normal" EBNF
tools are 2-stage processors. You define a set of tokens, a set of
punctuation, and a set of whitespace, then tokenise into discreet tokens
using those low-level definitions (lexing). You then just deal with the
resulting tokens. The definitions are a requirement for the engine and
if they happen to make certain things more convenient for the user,
well, that's okay, as long as they aren't too happy about it :) ;) .
Since SimpleParse never had the need to require those 3 definitions (it
doesn't lex), it's never grown a way to specify or use them.
If you wanted to add this functionality, you'd have lots of ways to do
it (here's three off the top of my head):
modify objectgenerator.Range and Literal to always add a generic
"consume whitespace after parse" tag to the tag-table they produce.
modify objectgenerator.SequentialGroup to insert a whitespace-consumer
between each pair.
add a new group-type to the SimpleParse EBNF format (e.g. using just a
space between the element tokens) which defines a white-space-seperated
group. You will then need to make sure that a := b c d := e is
un-ambiguous (basically a name is now a name iff it is not followed by a
:= token, so use a negative look-ahead check).
In all of those cases, you need to declare the composition of
"whitespace" somewhere and make it available to your objectgenerator
classes (likely in the generator object). In all save the last, you'd
need a way to differentiate when you do/do-not want the whitespace
consumption.
BTW: Manually altering the tag-tables SimpleParse produces is probably
one of the most painful ways to have your brain explode :o) . I don't
even look at them 99% of the time I'm working with the system. The only
real reason to do it is to debug an error in SimpleParse or to try to
optimise the tables it generates.
Feel free to shout if this was unclear,
Mike
Karl Trygve Kalleberg wrote:
> Hi fellow parsists.
>
> I notice that all of the example grammars include whitespaces in the
> productions explicitly. Is there any simple way to tell SimpleParse that
> the charset "[ \t\n\r]+" is considered a generic token separator, as is
> customary with other EBNF tools ?
>
> funcall := id, '(', arglist, ')', ';'
>
> is most definitely easier to read and reason about than
>
> funcall := id, ws, '(', ws, arglist, ws, ')', ws, ';'
>
> I tried modifying the resultant tuple returned by generator.buildParser
> thusly;
>
> parser = generator.buildParser(decl).parserbyname('root')
> parser = ((None,TextTools.AllInSet,TextTools.set(' \r\n\t'),+1),) + parser
> pprint.pprint( TextTools.tag( input, parser ))
>
> but that does not seem to have any effect.
>
> Any suggestions/pointers to solutions are most welcome.
>
>
> Kind regards,
>
> Karl T
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone? Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> SimpleParse-users mailing list
> Sim...@li...
> https://lists.sourceforge.net/lists/listinfo/simpleparse-users
>
--
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
|