Re: [SimpleParse] Having SimpleParser ignore whitespaces
Brought to you by:
mcfletch
From: Mike C. F. <mcf...@ro...> - 2002-09-10 16:12:43
|
At the moment, there's no such mechanism. This is largely a historic artefact of the parsing mechanism that SimpleParse uses. "Normal" EBNF tools are 2-stage processors. You define a set of tokens, a set of punctuation, and a set of whitespace, then tokenise into discreet tokens using those low-level definitions (lexing). You then just deal with the resulting tokens. The definitions are a requirement for the engine and if they happen to make certain things more convenient for the user, well, that's okay, as long as they aren't too happy about it :) ;) . Since SimpleParse never had the need to require those 3 definitions (it doesn't lex), it's never grown a way to specify or use them. If you wanted to add this functionality, you'd have lots of ways to do it (here's three off the top of my head): modify objectgenerator.Range and Literal to always add a generic "consume whitespace after parse" tag to the tag-table they produce. modify objectgenerator.SequentialGroup to insert a whitespace-consumer between each pair. add a new group-type to the SimpleParse EBNF format (e.g. using just a space between the element tokens) which defines a white-space-seperated group. You will then need to make sure that a := b c d := e is un-ambiguous (basically a name is now a name iff it is not followed by a := token, so use a negative look-ahead check). In all of those cases, you need to declare the composition of "whitespace" somewhere and make it available to your objectgenerator classes (likely in the generator object). In all save the last, you'd need a way to differentiate when you do/do-not want the whitespace consumption. BTW: Manually altering the tag-tables SimpleParse produces is probably one of the most painful ways to have your brain explode :o) . I don't even look at them 99% of the time I'm working with the system. The only real reason to do it is to debug an error in SimpleParse or to try to optimise the tables it generates. Feel free to shout if this was unclear, Mike Karl Trygve Kalleberg wrote: > Hi fellow parsists. > > I notice that all of the example grammars include whitespaces in the > productions explicitly. Is there any simple way to tell SimpleParse that > the charset "[ \t\n\r]+" is considered a generic token separator, as is > customary with other EBNF tools ? > > funcall := id, '(', arglist, ')', ';' > > is most definitely easier to read and reason about than > > funcall := id, ws, '(', ws, arglist, ws, ')', ws, ';' > > I tried modifying the resultant tuple returned by generator.buildParser > thusly; > > parser = generator.buildParser(decl).parserbyname('root') > parser = ((None,TextTools.AllInSet,TextTools.set(' \r\n\t'),+1),) + parser > pprint.pprint( TextTools.tag( input, parser )) > > but that does not seem to have any effect. > > Any suggestions/pointers to solutions are most welcome. > > > Kind regards, > > Karl T > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > SimpleParse-users mailing list > Sim...@li... > https://lists.sourceforge.net/lists/listinfo/simpleparse-users > -- _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ |