[Pyparsing] RE: python indentation grammar
Brought to you by:
ptmcg
From: Michel P. <mi...@di...> - 2005-08-18 20:58:24
|
On Thu, 2005-08-18 at 13:57 -0500, Paul McGuire wrote: > Michel - > > Not so much global data, as it is parsing state preserved inside the > pyparsing class instances (namely the cacheing of exception instances). I > am fairly certain that calling parseString is not thread-safe, and you > should interlock calls to it if you have multiple threads calling it. Oh I'm sorry, what I meant to say was different threads will be calling different instances, not the same instance. IE, every thread will have its own SPARQLGrammar.Query instance. sliplib used module vars and declared global vars and thus the _whole module_, and all of its features, cannot be used from different threads, but different instances of pyparsing classes should be fine. I think. ;) > I have made a few attempts at indentation-based parsing in the past, but I > looked at them last night, and they are really not so good. I think the key > will be in a) using a parse action with col() to detect the indentation > level of the current line, and b) keeping a global stack of indentations > levels seen thus far, so that you can tell if your current line is part of > the current indent level, a deeper level or a higher level. Sounds good. Something to think about would be encapsulating the indentation level in something other than a global var so that it is thread safe. Maybe the parse action can be a callable instance that keeps this level internal? class IndentationAction(object): level = 0 def __call__(self, *args): # ... indentation tracking logic indent = White().parseAction(IndentationAction()) or something like that. > When creating your test cases, be sure to add unfriendly tests, such as > nested levels that unwind to a higher nesting than just the immediate > parent. That is: > > A > A1 > A2 > A2a > A2aa > A2ab > A2b > A2ba > A3 > > Since there is no A2c entry (to be a peer of A2a and A2b), your parser will > end up doing a double pop from the indentation stack. > > Also, what would this data signify? > > A > A1 > A2 > A2a > A2aa > A2ab > A2b > A2ba > A2.5 > A3 > > Note that A2.5 is more indented than A2 and A3, but less indented than A2a > and A2b. I'm guessing this case should probably be an error (and if you > detect it in a parse action, you should raise ParseFatalException instead of > simple ParseException, to halt parsing immediately). Right, obviously we went good structured representation but not necessarily the exact semantics of Python, unless desired. I'll work some more on this over the weekend and let you know what my results are. -Michel |