Thread: [Pyparsing] Beginner parsing problem
Brought to you by:
ptmcg
From: Florian L. <mai...@xg...> - 2013-01-05 10:40:38
|
Hello, I've just started working with pyparsing >>> from pyparsing import * text = """ FoamFile { version 2.0; format ascii; class volVectorField; object U; }""" text2 = " class volVectorField;" FKeyValue = Word(printables) + Word(printables) + ";" FDictionary = Word(printables) + "{" + OneOrMore( FKeyValue ) + "}" print FKeyValue.parseString(text2) # Works fine print FDictionary.parseString(text) # Fails <<< (I use the F prefix to avoid name clashes with pyparsing stuff, might change set and switch to more selective import). The last print fails: Traceback (most recent call last): File "parse.py", line 22, in <module> print FKeyValue.parseString(text2) File "/home/florian/scratch/pyparsing.py", line 1006, in parseString raise exc pyparsing.ParseException: Expected ";" (at char 28), (line:1, col:29) What is wrong there? If I understood the documentation right, newlines are ignored, just like whitespace. It's pyparsing downloaded from the 1.5.x svn branch. Thanks, Florian |
From: Paul M. <pt...@au...> - 2013-01-05 16:59:23
|
Florian - Welcome to pyparsing! When writing your parser, you'll have to keep in mind that pyparsing does not do any kind of lookahead unless you explicitly tell it to. "printables" is a string containing all ASCII characters that are not whitespace - this includes the ';' character. So when you define your FKeyValue value part as "Word(printables)", this will consume all non-whitespace characters, even the terminating ';'. This is in contrast to something you might do in a regular expression, in which ".*;" would match "lslsd;" - the regular expression implicitly terminates the ".*" when it sees the semicolon. But pyparsing is purely left-to-right, unless you include some lookahead escapes of your own. One way to do this in the Word construct is to be more selective in the string that you use to create the expression - in this case, we'll try just doing every printable character except for ';'. Instead of "Word(printables)", you could do "Word(''.join(c for c in printables if c != ';'))". I found myself doing this quite a lot and it annoyed me, so I added a convenience argument to Word, excludeChars. You can define a Word using a large string of characters, and then just exclude one or two of them, in your case like this: Word(printables, excludeChars=';') Now if you use this expression for your value expression in FKeyValue, it should parse better. By extension, I would also suggest that you narrow down what you expect to see as the identifiers in your key and dictionary, so that you don't accidentally read in braces or other punctuation, perhaps something like: identifier = Word(alphas, alphanums) FKeyValue = identifier + Word(printables,excludeChars=';') + ";" FDictionary = identifier + "{" + OneOrMore( Group(FKeyValue) ) + "}" Also, by Grouping your FKeyValue's, it will help you iterate over the key-value pairs, as it will give them more organizing structure. Please look over some of the articles that are linked from the wiki's Documentation page (http://pyparsing.wikispaces.com/Documentation), for more examples and expression topics. Also, the Discussion tab (http://pyparsing.wikispaces.com/page/messages/home) of the wiki's Home page includes many Q&A threads on various pyparsing problems. Best of luck, -- Paul McGuire -----Original Message----- From: Florian Lindner [mailto:mai...@xg...] Sent: Saturday, January 05, 2013 4:40 AM To: pyp...@li... Subject: [Pyparsing] Beginner parsing problem Hello, I've just started working with pyparsing >>> from pyparsing import * text = """ FoamFile { version 2.0; format ascii; class volVectorField; object U; }""" text2 = " class volVectorField;" FKeyValue = Word(printables) + Word(printables) + ";" FDictionary = Word(printables) + "{" + OneOrMore( FKeyValue ) + "}" print FKeyValue.parseString(text2) # Works fine print FDictionary.parseString(text) # Fails <<< (I use the F prefix to avoid name clashes with pyparsing stuff, might change set and switch to more selective import). The last print fails: Traceback (most recent call last): File "parse.py", line 22, in <module> print FKeyValue.parseString(text2) File "/home/florian/scratch/pyparsing.py", line 1006, in parseString raise exc pyparsing.ParseException: Expected ";" (at char 28), (line:1, col:29) What is wrong there? If I understood the documentation right, newlines are ignored, just like whitespace. It's pyparsing downloaded from the 1.5.x svn branch. Thanks, Florian ---------------------------------------------------------------------------- -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122912 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Florian L. <mai...@xg...> - 2013-01-05 20:16:52
|
Hey Paul! Thanks for the welcome and the thorough explanation! I have more or less solved my original problem in the mean time, but still I'm at the very beginning! Unfortunately I don't have a formal description of the language I try to model. It's thought along the lines of c++. So I'll need to fiddle with the allowed characters in my primitives. (BTW: It's the configuration language of the OpenFOAM CFD tool box http://www.openfoam.com/) My primitives are: ident = Word(alphanums + ".") semi = Literal(";").suppress() lcb = Literal("{").suppress() rcb = Literal("}").suppress() I'll keep in mind what you said about excludeChars and maybe change ident that way, I'll have to try out. One problem I've encountered is that a key-value pair could be like that key and all this is value; I catch that with: FKeyValue = Group(ident + SkipTo(semi) + semi) Since the file could also have key value pairs at the root level (not within any dict) I do: ParameterFile = ZeroOrMore(FDictionary | FKeyValue) Dictionaries could be arbitrarily nested FDictionary = Forward() FDictionary << Dict(Group(ident + lcb + Dict(ZeroOrMore(FKeyValue | FDictionary)) + rcb)) I still have problems getting the recursive definition right (the underlying problem is probably getting the recursive defintion right ;-) My sample text is: prob = """dictname { subdict { key value; key2 value2; } }""" and parsing.dump() that gives: [['dictname', ['subdict', '{\n key value'], ['key2', 'value2']]] - dictname: [['subdict', '{\n key value'], ['key2', 'value2']] - key2: value2 - subdict: { key value Thanks for any suggestions and have a nice weekend! Florian Am Samstag, 5. Januar 2013, 10:59:04 schrieb Paul McGuire: > Florian - > > Welcome to pyparsing! > > When writing your parser, you'll have to keep in mind that pyparsing does > not do any kind of lookahead unless you explicitly tell it to. "printables" > is a string containing all ASCII characters that are not whitespace - this > includes the ';' character. So when you define your FKeyValue value part as > "Word(printables)", this will consume all non-whitespace characters, even > the terminating ';'. This is in contrast to something you might do in a > regular expression, in which ".*;" would match "lslsd;" - the regular > expression implicitly terminates the ".*" when it sees the semicolon. But > pyparsing is purely left-to-right, unless you include some lookahead escapes > of your own. > > One way to do this in the Word construct is to be more selective in the > string that you use to create the expression - in this case, we'll try just > doing every printable character except for ';'. Instead of > "Word(printables)", you could do "Word(''.join(c for c in printables if c != > ';'))". I found myself doing this quite a lot and it annoyed me, so I > added a convenience argument to Word, excludeChars. You can define a Word > using a large string of characters, and then just exclude one or two of > them, in your case like this: Word(printables, excludeChars=';') Now if > you use this expression for your value expression in FKeyValue, it should > parse better. > > By extension, I would also suggest that you narrow down what you expect to > see as the identifiers in your key and dictionary, so that you don't > accidentally read in braces or other punctuation, perhaps something like: > > identifier = Word(alphas, alphanums) > FKeyValue = identifier + Word(printables,excludeChars=';') + ";" > FDictionary = identifier + "{" + OneOrMore( Group(FKeyValue) ) + "}" > > Also, by Grouping your FKeyValue's, it will help you iterate over the > key-value pairs, as it will give them more organizing structure. > > Please look over some of the articles that are linked from the wiki's > Documentation page (http://pyparsing.wikispaces.com/Documentation), for more > examples and expression topics. Also, the Discussion tab > (http://pyparsing.wikispaces.com/page/messages/home) of the wiki's Home page > includes many Q&A threads on various pyparsing problems. > > Best of luck, > -- Paul McGuire > > > > -----Original Message----- > From: Florian Lindner [mailto:mai...@xg...] > Sent: Saturday, January 05, 2013 4:40 AM > To: pyp...@li... > Subject: [Pyparsing] Beginner parsing problem > > Hello, > > I've just started working with pyparsing > > > from pyparsing import * > > text = """ > FoamFile > { > version 2.0; > format ascii; > class volVectorField; > object U; > }""" > > text2 = " class volVectorField;" > > FKeyValue = Word(printables) + Word(printables) + ";" > FDictionary = Word(printables) + "{" + OneOrMore( FKeyValue ) + "}" > > print FKeyValue.parseString(text2) # Works fine print > FDictionary.parseString(text) # Fails <<< > > (I use the F prefix to avoid name clashes with pyparsing stuff, might change > set and switch to more selective import). > > The last print fails: > > Traceback (most recent call last): > File "parse.py", line 22, in <module> > print FKeyValue.parseString(text2) > File "/home/florian/scratch/pyparsing.py", line 1006, in parseString > raise exc > pyparsing.ParseException: Expected ";" (at char 28), (line:1, col:29) > > > What is wrong there? If I understood the documentation right, newlines are > ignored, just like whitespace. > > It's pyparsing downloaded from the 1.5.x svn branch. > > Thanks, > Florian > > ---------------------------------------------------------------------------- > -- > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, > Windows 8 Apps, JavaScript and much more. Keep your skills current with > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and > experts. SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122912 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Florian L. <mai...@xg...> - 2013-01-07 13:21:55
|
Hey Paul! Thanks for the welcome and the thorough explanation! I have more or less solved my original problem in the mean time, but still I'm at the very beginning! Unfortunately I don't have a formal description of the language I try to modell. It's thought along the lines of c++. So I'll need to fiddle with the allowed characters in my primitives. (BTW: It's the configuration language of the OpenFOAM CFD tool box http://www.openfoam.com/) My primitives are: ident = Word(alphanums + ".") semi = Literal(";").suppress() lcb = Literal("{").suppress() rcb = Literal("}").suppress() I'll keep in mind what you said about excludeChars and maybe change ident that way, I'll have to try out. One problem I've encountered is that a key-value pair could be like that key and all this is value; I catch that with: FKeyValue = Group(ident + SkipTo(semi) + semi) Since the file could also have key value pairs at the root level (not within any dict) I do: ParameterFile = ZeroOrMore(FDictionary | FKeyValue) Dictionaries could be arbitrarily nested FDictionary = Forward() FDictionary << Dict(Group(ident + lcb + Dict(ZeroOrMore(FKeyValue | FDictionary)) + rcb)) I still have problems getting the recursive definition right (the underlying problem is probably getting the recursive defintion right ;-) My sample text is: prob = """dictname { subdict { key value; key2 value2; } }""" and parsing.dump() that gives: [['dictname', ['subdict', '{\n key value'], ['key2', 'value2']]] - dictname: [['subdict', '{\n key value'], ['key2', 'value2']] - key2: value2 - subdict: { key value Thanks for any suggestions! Florian Am Samstag, 5. Januar 2013, 10:59:04 schrieb Paul McGuire: > Florian - > > Welcome to pyparsing! > > When writing your parser, you'll have to keep in mind that pyparsing does > not do any kind of lookahead unless you explicitly tell it to. "printables" > is a string containing all ASCII characters that are not whitespace - this > includes the ';' character. So when you define your FKeyValue value part as > "Word(printables)", this will consume all non-whitespace characters, even > the terminating ';'. This is in contrast to something you might do in a > regular expression, in which ".*;" would match "lslsd;" - the regular > expression implicitly terminates the ".*" when it sees the semicolon. But > pyparsing is purely left-to-right, unless you include some lookahead escapes > of your own. > > One way to do this in the Word construct is to be more selective in the > string that you use to create the expression - in this case, we'll try just > doing every printable character except for ';'. Instead of > "Word(printables)", you could do "Word(''.join(c for c in printables if c != > ';'))". I found myself doing this quite a lot and it annoyed me, so I > added a convenience argument to Word, excludeChars. You can define a Word > using a large string of characters, and then just exclude one or two of > them, in your case like this: Word(printables, excludeChars=';') Now if > you use this expression for your value expression in FKeyValue, it should > parse better. > > By extension, I would also suggest that you narrow down what you expect to > see as the identifiers in your key and dictionary, so that you don't > accidentally read in braces or other punctuation, perhaps something like: > > identifier = Word(alphas, alphanums) > FKeyValue = identifier + Word(printables,excludeChars=';') + ";" > FDictionary = identifier + "{" + OneOrMore( Group(FKeyValue) ) + "}" > > Also, by Grouping your FKeyValue's, it will help you iterate over the > key-value pairs, as it will give them more organizing structure. > > Please look over some of the articles that are linked from the wiki's > Documentation page (http://pyparsing.wikispaces.com/Documentation), for more > examples and expression topics. Also, the Discussion tab > (http://pyparsing.wikispaces.com/page/messages/home) of the wiki's Home page > includes many Q&A threads on various pyparsing problems. > > Best of luck, > -- Paul McGuire > > > > -----Original Message----- > From: Florian Lindner [mailto:mai...@xg...] > Sent: Saturday, January 05, 2013 4:40 AM > To: pyp...@li... > Subject: [Pyparsing] Beginner parsing problem > > Hello, > > I've just started working with pyparsing > > > from pyparsing import * > > text = """ > FoamFile > { > version 2.0; > format ascii; > class volVectorField; > object U; > }""" > > text2 = " class volVectorField;" > > FKeyValue = Word(printables) + Word(printables) + ";" > FDictionary = Word(printables) + "{" + OneOrMore( FKeyValue ) + "}" > > print FKeyValue.parseString(text2) # Works fine print > FDictionary.parseString(text) # Fails <<< > > (I use the F prefix to avoid name clashes with pyparsing stuff, might change > set and switch to more selective import). > > The last print fails: > > Traceback (most recent call last): > File "parse.py", line 22, in <module> > print FKeyValue.parseString(text2) > File "/home/florian/scratch/pyparsing.py", line 1006, in parseString > raise exc > pyparsing.ParseException: Expected ";" (at char 28), (line:1, col:29) > > > What is wrong there? If I understood the documentation right, newlines are > ignored, just like whitespace. > > It's pyparsing downloaded from the 1.5.x svn branch. > > Thanks, > Florian > > ---------------------------------------------------------------------------- > -- > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, > Windows 8 Apps, JavaScript and much more. Keep your skills current with > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and > experts. SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122912 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |