[Pyparsing] Slow parsing with indentedBlock()
Brought to you by:
ptmcg
From: Philipp R. <phi...@gm...> - 2009-11-15 16:55:22
|
Hi guys, I have a problem with indentedBlock() and a relatively simple grammar that relies on indentation and recursion. I'm seeing extremely long execution times. On one example that I've attached it doesn't terminate within an hour. The trick suggested elswehere with commenting out a FollowedBy statement in Pyparsing (line 3633) doesn't work, as it breaks the grammar. Packrat parsing doesn't work either; it appears to be incompatible with indentedBlock(). I've attached a test case that gets broken by enabling Packrat. I'm using Pyparsing 1.5.2 on Python 2.6.4. I'd be extremely grateful for hints how to redesign my grammar so that I get more controlled execution times. The grammar itself isn't really intricate. It's for an obscure data representation format for entity-relationship data. It works around nested entity and relationship declarations. Entity-relationship dependencies are marked by indentation. Entities are declared by entity names followed by a number. In addition, URI references can appear as entities, either inline in angle brackets, or using a one-time external URI declaration. The grammar is fairly self-explanatory; here's the basics in BNF: entity-reference: (entity-name number) | (<inlineURI>) external-URI-decl: entity-reference = <inlineURI> entity-declaration: entity-reference ["parameter"] [par_n:value_n]+ inline-relation: @relation-name [par_n:value_n] entity-reference multiline-relation: @relation_name [par_n:value_n] NEWLINE INDENT entity-declaration-block UNDENT entity-declaration-block: entity-declaration NEWLINE INDENT [inline-relation|multiline-relation]* UNDENT relationClause: entity-reference [inline-relation|multiline-relation]+ suite: [entity-declaration-block | relationClause | external-URI-decl]+ I've attached a (simple but rather verbose) Pyparsing version of the grammar, as well as a couple of test cases. Execution time seems to go up extremaly fast as things become more indented. Thanks, Philipp # ================= GRAMMAR DEFINITION ==================== import pyparsing as pp #pp.ParserElement.enablePackrat() indentStack = [1] colon = pp.Literal(u":") equal = pp.Literal(u"=") relMarker = pp.Literal(u"@") srelMarker = pp.Literal(u"$") subrelMarker = pp.Literal(u"_") entityName = pp.Word(pp.alphas) entityNumber = pp.Word(pp.nums).setParseAction(lambda t: int(t[0])) entityRef = pp.Group(pp.Combine(entityName + entityNumber) ).setResultsName("entityRef") inlineURI = pp.Group(pp.QuotedString(u'<', escQuote=None, multiline=False, endQuoteChar=u">")).setResultsName("inlineURI") externalURI = pp.Group( entityRef + equal.suppress() + inlineURI ).setResultsName("externalURI") relationName = pp.Word(pp.alphanums+"-") relationRef = pp.Combine(relMarker + relationName) superrelationRef = pp.Combine(srelMarker + relationName) subrelationRef = pp.Combine(subrelMarker + relationName) parameterName = pp.Word(pp.alphanums+"-").setResultsName("parameterName") parValueQuote = pp.QuotedString( u'"',u"\\",escQuote=None, multiline=True ).setResultsName("parameterValue") parValueDirect = pp.Word( pp.alphanums+"-_" ).setResultsName("parameterValue") parameterStmt = pp.Group( parameterName + colon.suppress() + (parValueQuote | parValueDirect) ).setResultsName("parameterStmt") parameterBlock = pp.Group( pp.OneOrMore( parameterStmt ) ).setResultsName("parameterBlock") superparameter = pp.QuotedString( u'"',u"\\",escQuote=None, multiline=True ).setResultsName("superparameter") entityDecl = pp.Group( entityRef + (parameterBlock | superparameter + pp.Optional(parameterBlock)) ).setResultsName("entityDecl") entityDeclBlock = pp.Forward() relationInline = pp.Group( relationRef + pp.Optional(parameterBlock) + (inlineURI | entityRef) ).setResultsName("relationInline") relationMultiline = pp.Group( relationRef + pp.Optional(parameterBlock) + pp.indentedBlock( entityDeclBlock, indentStack).setResultsName("indentedBlock") ).setResultsName("relationMultiline") entityDeclBlock << pp.Group( entityDecl + pp.Optional ( pp.indentedBlock( relationMultiline | relationInline, indentStack).setResultsName("indentedBlock") ) ).setResultsName("entityDeclBlock") relationClause = pp.Group( (inlineURI | entityRef) + (relationInline | relationMultiline # ) ).setResultsName("relationClause") suite = pp.OneOrMore ( relationClause | externalURI | entityDeclBlock ).setResultsName("suite").setDebug() # ================= TEST CASES ==================== TESTCASES = { "basic": u"""\ A1 par:value @rel relpar:value A2 @rel A3 entpar:value @rel relpar:value A4 entpar:value @rel <some:URI/reference> A2=<some:URI/reference> """, "packrat": u"""\ C1 id:"mutawalli" type:"appointment" @refersToAppointmentTarget <waqf:ent/office/d408:d408-mutawalli> @regulatesSuccession AO1 id:"muezzinAppointment" @hasAppointee <waqf:ent/office/world:muezzin-ayyub> C2 id:"revenue" type:"revenue" @refersToIncome <waqf:ent/payable/d408:waqfIncome> @regulatesDistribution D1 id:"distStart" @hasChildNode DF1 id:"haqq at-tawliya" denom:"10" num:"1" @hasRecipient <waqf:ent/office/d408:d408-mutawalli> DR1 id:"remaining90percent" @hasRecipient <waqf:ent/office/d408:d408-mutawalli> @inExchangeFor <waqf:ent/duty/d408:d408-light> @inExchangeFor <waqf:ent/duty/d408:d408-furniture>""", "slow": u"""\ DT1 id:"dist-taxation" tax:"xaraj" @hasChildNode D1 id:"dist-repair" @hasRecipient <waqf:ent/office/d405:mutawalli-405> @inExchangeFor X1 id:"graverepair" type:"repair" @dutyTarget <waqf:ent/structure/d405:grave-waqif> @hasChildNode DR1 id:"dist-remainder" @hasChildNode DF1 id:"dist-haqqattawliya" denom:"10" num:"1" @hasRecipient <waqf:ent/office/d405:mutawalli-405> @hasChildNode DR2 id:"dist-remainder2" @hasChildNode DF2 id:"dist-imam" denom:"3" num:"2" @hasRecipient <waqf:ent/office/world:imam-ayyub> @inExchangeFor X2 id:"prayer-imam" type:"prayer" @dutyTarget Y1 "text" id:"pid1" schedule:"sat" type:"m" @hasChildNode DF3 id:"dist-muezzin" denom:"3" num:"1" @hasRecipient <waqf:ent/office/world:muezzin-ayyub> @inExchangeFor X2 id:"prayer-muezzin" type:"prayer" @dutyTarget Y2 "text" id:"pid2" schedule:"fri" type:"m" @hasChildNode DR3 id:"dist-fuqara-symbolic" @hasRecipient <waqf:ent/group/world:fuqara> """} suite.parseString(TESTCASES["basic"], parseAll = True) # Packrat breaks the following suite.parseString(TESTCASES["packrat"], parseAll = True) # The following takes extremely long suite.parseString(TESTCASES["slow"], parseAll = True) |