pyparsing-users Mailing List for Python parsing module (Page 30)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Tom W. <tom...@gm...> - 2006-03-07 01:06:42
|
Cool, I was afraid that I was missing something simple, some sort of 'splitOn('?')' function. Glad to see that I'm not quite that dense. That example works well, urlparse looks rather interesting too. I may go with it for now, seems a little less wordy in this case (Python seems to be well oriented to support my innate desire to type less). BTW, the examples are the best part about pyparsing. While the docs are fairly clear as well, the examples made it a real breeze to dig in. Now I guess I just need to spend some serious time going through the python library to see what other nifty gems I'm missing. Thx, Tom On 3/6/06, Paul McGuire <pa...@al...> wrote: > Tom - > > Thanks for the glowing compliments on pyparsing! For your immediate > question, standard python includes a module called urlparse that may be > sufficient for you. > > On the other hand, if you are set on using a pure-pyparsing solution, I > looked that the source for urlparse a while ago, and came up with this > (patterned after urlparse's strange logic): > > scheme_chars =3D alphanums + "+-." > urlscheme =3D Word( scheme_chars ) > netloc_chars =3D "".join( [ c for c in printables if c not in "/.= " ] ) > netloc =3D delimitedList( Word( netloc_chars ), ".", combine=3DTr= ue ) > path_chars =3D "".join( [ c for c in printables if c not in "?" ]= ) > path =3D Word( path_chars ) > query_chars =3D "".join( [ c for c in printables if c not in "#" = ] ) > query =3D Word( query_chars ) > fragment =3D Word( printables+" " ) > _urlBNF =3D Combine(Optional(urlscheme.setResultsName("scheme") += ":" > ) + > Optional(Literal("//").suppress() + netloc, > default=3D"").setResultsName("netloc") + > Optional(path.setResultsName("path"), default= =3D"") > + > Optional(Literal("?").suppress() + query, > default=3D"").setResultsName("query") + > Optional(Literal("#").suppress() + fragment, > default=3D"").setResultsName("fragment") ) > > > Using your test string, I wrote the following test code: > testurl =3D "http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3D= zone:56" > urlParts =3D _urlBNF.parseString(testurl) > print testurl > for k in urlParts.keys(): > print "urlParts.%s =3D %s" % (k,urlParts[k]) > > Giving: > http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56 > urlParts.fragment =3D > urlParts.path =3D /adframe.php > urlParts.scheme =3D http > urlParts.netloc =3D 11.11.111.11 > urlParts.query =3D n=3Dad1f311a&what=3Dzone:56 > > > I hope this gets you going - let us know! > > Regards, > -- Paul > > > > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of Tom Wie= be > Sent: Monday, March 06, 2006 11:59 AM > To: pyp...@li... > Subject: [Pyparsing] splitting the query from a url? > > Hi all, > > First off, thanks for this wonderful module. I was able to extend the > httpserverlogparser.py example to do 90% of what I need in a matter of > minutes, with a bare minimum of Python experience. I can see using > PyParsing a lot moving forward. > > Stuck on one last little bit though, given a fairly standard combined > log from apache with the form: > > www.domain.com 11.111.11.111 - - [16/Feb/2004:10:35:12 -0800] "GET > /ads/redirectads/468x60redirect.htm?foo=3Dbar&bar=3Dfoo HTTP/1.1" 200 541 > "http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56" "Mozilla/4.= 0 > (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.20 [ru\"]" > > I've added support for the virtualhost at the start of the log, and > have split the http action, request and http version into separate > entities, what I want to do now is split the query off from the url in > the request. > > request =3D Word( printables ) > > Works to grab the whole request, URL and query combined but, > everything I've tried thus far to split on the (optional) ? that > starts a get query has failed. Basically, I think what I'm trying to > get is "everything up to the question mark, if it's there, otherwise > everything til the next field". > > For this case, I'm actually going to be just throwing the query away > so doing anything of note with it really doesn't matter right now. > > I know I'll feel dumb for asking this as soon as I see the answer but, > a gentle nudge in the right direction would be greatly appreciated. > > Tom > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmdlnk&kid=110944&bid$1720&dat=121642 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > |
From: Paul M. <pa...@al...> - 2006-03-06 21:19:51
|
Tom - Thanks for the glowing compliments on pyparsing! For your immediate question, standard python includes a module called urlparse that may be sufficient for you. =20 On the other hand, if you are set on using a pure-pyparsing solution, I looked that the source for urlparse a while ago, and came up with this (patterned after urlparse's strange logic): scheme_chars =3D alphanums + "+-." urlscheme =3D Word( scheme_chars ) netloc_chars =3D "".join( [ c for c in printables if c not in = "/." ] ) netloc =3D delimitedList( Word( netloc_chars ), ".", = combine=3DTrue ) path_chars =3D "".join( [ c for c in printables if c not in "?" = ] ) path =3D Word( path_chars ) query_chars =3D "".join( [ c for c in printables if c not in "#" = ] ) query =3D Word( query_chars ) fragment =3D Word( printables+" " ) _urlBNF =3D Combine(Optional(urlscheme.setResultsName("scheme") = + ":" ) +=20 Optional(Literal("//").suppress() + netloc, default=3D"").setResultsName("netloc") +=20 Optional(path.setResultsName("path"), = default=3D"") + Optional(Literal("?").suppress() + query, default=3D"").setResultsName("query") +=20 Optional(Literal("#").suppress() + fragment, default=3D"").setResultsName("fragment") ) Using your test string, I wrote the following test code: testurl =3D = "http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56" urlParts =3D _urlBNF.parseString(testurl) print testurl for k in urlParts.keys(): print "urlParts.%s =3D %s" % (k,urlParts[k]) Giving: http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56 urlParts.fragment =3D=20 urlParts.path =3D /adframe.php urlParts.scheme =3D http urlParts.netloc =3D 11.11.111.11 urlParts.query =3D n=3Dad1f311a&what=3Dzone:56 I hope this gets you going - let us know! Regards, -- Paul =20 -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Tom = Wiebe Sent: Monday, March 06, 2006 11:59 AM To: pyp...@li... Subject: [Pyparsing] splitting the query from a url? Hi all, First off, thanks for this wonderful module. I was able to extend the httpserverlogparser.py example to do 90% of what I need in a matter of minutes, with a bare minimum of Python experience. I can see using PyParsing a lot moving forward. Stuck on one last little bit though, given a fairly standard combined log from apache with the form: www.domain.com 11.111.11.111 - - [16/Feb/2004:10:35:12 -0800] "GET /ads/redirectads/468x60redirect.htm?foo=3Dbar&bar=3Dfoo HTTP/1.1" 200 = 541 "http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56" = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.20 [ru\"]" I've added support for the virtualhost at the start of the log, and have split the http action, request and http version into separate entities, what I want to do now is split the query off from the url in the request. request =3D Word( printables ) Works to grab the whole request, URL and query combined but, everything I've tried thus far to split on the (optional) ? that starts a get query has failed. Basically, I think what I'm trying to get is "everything up to the question mark, if it's there, otherwise everything til the next field". For this case, I'm actually going to be just throwing the query away so doing anything of note with it really doesn't matter right now. I know I'll feel dumb for asking this as soon as I see the answer but, a gentle nudge in the right direction would be greatly appreciated. Tom ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting = language that extends applications into web and mobile media. Attend the live = webcast and join the prime developer group breaking into this new coding = territory! http://sel.as-us.falkag.net/sel?cmdlnk&kid=110944&bid$1720&dat=121642 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Tom W. <tom...@gm...> - 2006-03-06 17:58:37
|
Hi all, First off, thanks for this wonderful module. I was able to extend the httpserverlogparser.py example to do 90% of what I need in a matter of minutes, with a bare minimum of Python experience. I can see using PyParsing a lot moving forward. Stuck on one last little bit though, given a fairly standard combined log from apache with the form: www.domain.com 11.111.11.111 - - [16/Feb/2004:10:35:12 -0800] "GET /ads/redirectads/468x60redirect.htm?foo=3Dbar&bar=3Dfoo HTTP/1.1" 200 541 "http://11.11.111.11/adframe.php?n=3Dad1f311a&what=3Dzone:56" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.20 [ru\"]" I've added support for the virtualhost at the start of the log, and have split the http action, request and http version into separate entities, what I want to do now is split the query off from the url in the request. request =3D Word( printables ) Works to grab the whole request, URL and query combined but, everything I've tried thus far to split on the (optional) ? that starts a get query has failed. Basically, I think what I'm trying to get is "everything up to the question mark, if it's there, otherwise everything til the next field". For this case, I'm actually going to be just throwing the query away so doing anything of note with it really doesn't matter right now. I know I'll feel dumb for asking this as soon as I see the answer but, a gentle nudge in the right direction would be greatly appreciated. Tom |
From: Tim C. <ti...@ea...> - 2006-02-24 00:55:14
|
Hello, I am starting to use pyparsing as a way to input delimited data into my scripts. I want to get the next integer and haven't figured out how to do it. Here is what I have been able to come up with for the next float: ---- p_point = Literal('.') p_plusorminus = Optional(Literal('+') | Literal('-')) p_number = Word(nums) NextFloat = SkipTo(Group(p_plusorminus + Optional(p_number) + p_point + p_number)) + Word(nums + 'eE-+.') ---- The next integer has perplexed me though. Here is what I want: '34' would parse as (['', '34'], {}) '45.3 23' would parse as (['45.3 ', '23'], {}) '4.5 4.7e11 10/12/2006' would parse as (['4.5 4.7e11 ', '10'], {}) As an example of how I hope to use NextInteger... p_string = NextInteger + NextInteger p_string.parseString('4.5 4.7e11 10/12/2006') (['4.5 4.7e11 ', '10', '/', '12'], {}) Thank you for pyparsing. I have just scratched the surface and I am really impressed. Kindest regards, Tim Cera |
From: Paul M. <pa...@al...> - 2006-01-30 19:45:50
|
This is a known problem with pyparsing 1.4, that I accidentally introduced a generator expression in the new QuotedString class. I have a corrected version that I can release as 1.4.1, I was hoping to bundle in any other code-fixes too. Since you are already the second person to have this problem, I'll go ahead and release 1.4.1 as it currently stands. If you need an immediate fix in the next 10 minutes, make this change to your pyparsing code: Lines 1220-1, replace: ------------- '|(' + ')|('.join("%s[^%s]" % (re.escape(self.quoteChar[:i]), _escapeRegexRangeChars(self.quoteChar[i])) for i in range(len(self.quoteChar)-1,0,-1)) + ')' ------------- With: ------------- '|(' + ')|('.join(["%s[^%s]" % (re.escape(self.quoteChar[:i]), _escapeRegexRangeChars(self.quoteChar[i])) for i in range(len(self.quoteChar)-1,0,-1)]) + ')' ------------- (Note the inserted []'s that change the generator expression to a list comprehension, supported under Py2.3.) Sorry for being so sloppy! - Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Stephen Waterbury Sent: Monday, January 30, 2006 12:35 PM To: pyp...@li... Subject: [Pyparsing] Trouble installing pyparsing 1.4 I get the following error: waterbug@bigboote:/usr/local/src/Python/pyparsing/pyparsing-1.4$ python setup.py install Traceback (most recent call last): File "setup.py", line 6, in ? from pyparsing import __version__ File "/usr/local/src/Python/pyparsing/pyparsing-1.4/pyparsing.py", line 1221 _escapeRegexRangeChars(self.quoteChar[i])) for i in range(len(self.quoteChar)-1,0,-1)) + ')' ^ SyntaxError: invalid syntax -------------------------------------------- I'm on Debian testing, running the standard debian python package: Python 2.3.5 (#2, Aug 30 2005, 15:50:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Steve ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Stephen W. <go...@co...> - 2006-01-30 18:35:19
|
I get the following error: waterbug@bigboote:/usr/local/src/Python/pyparsing/pyparsing-1.4$ python setup.py install Traceback (most recent call last): File "setup.py", line 6, in ? from pyparsing import __version__ File "/usr/local/src/Python/pyparsing/pyparsing-1.4/pyparsing.py", line 1221 _escapeRegexRangeChars(self.quoteChar[i])) for i in range(len(self.quoteChar)-1,0,-1)) + ')' ^ SyntaxError: invalid syntax -------------------------------------------- I'm on Debian testing, running the standard debian python package: Python 2.3.5 (#2, Aug 30 2005, 15:50:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Steve |
From: Paul M. <pa...@al...> - 2006-01-02 05:56:27
|
Here's a word-to-value converter, just as a sample. This is the kind of problem that is most suitable for pyparsing. I'll include it in the examples directory for the next release. -- Paul # wordsToNum.py # Copyright 2006, Paul McGuire # # Sample parser grammar to read a number given in words, and return the numeric value. # from pyparsing import * def makeNumericParseAction(val): return lambda s,l,t: val def makeLit(s,val): return CaselessLiteral(s).setName(s).setParseAction( makeNumericParseAction(val) ) unitDefinitions = [ ("zero", 0), ("one", 1), ("two", 2), ("three", 3), ("four", 4), ("five", 5), ("six", 6), ("seven", 7), ("eight", 8), ("nine", 9), ("ten", 10), ("eleven", 11), ("twelve", 12), ("thirteen", 13), ("fourteen", 14), ("fifteen", 15), ("sixteen", 16), ("seventeen", 17), ("eighteen", 18), ("nineteen", 19), ] units = Or( [ makeLit(s,v) for s,v in unitDefinitions ] ) tensDefinitions = [ ("ten", 10), ("twenty", 20), ("thirty", 30), ("forty", 40), ("fifty", 50), ("sixty", 60), ("seventy", 70), ("eighty", 80), ("ninety", 90), ] tens = Or( [ makeLit(s,v) for s,v in tensDefinitions ] ) hundreds = makeLit("hundred", 100) majorDefinitions = [ ("thousand", int(1e3)), ("million", int(1e6)), ("billion", int(1e9)), ("trillion", int(1e12)), ("quadrillion", int(1e15)), ("quintillion", int(1e18)), ] mag = Or( [ makeLit(s,v) for s,v in majorDefinitions ] ) def wordprod(s,l,t): ret = 1 for v in t: ret *= v return ret def wordsum(s,l,t): return sum(t) and_ = Suppress(makeLit("and",0)) numPart = (((( units + Optional(hundreds) ).setParseAction(wordprod) + Optional(and_) + Optional(tens)).setParseAction(wordsum) ^ tens ) + Optional(units) ).setParseAction(wordsum) numWords = OneOrMore( (numPart + Optional(mag)).setParseAction(wordprod) + Optional(and_) ).setParseAction(wordsum) numWords.ignore("-") print numWords.parseString("one hundred twenty") print numWords.parseString("one hundred and twenty") print numWords.parseString("one hundred and three") print numWords.parseString("one hundred twenty-three") print numWords.parseString("one hundred and twenty three") print numWords.parseString("one hundred twenty three million") print numWords.parseString("one hundred and twenty three million") print numWords.parseString("one hundred twenty three million and three") print numWords.parseString("fifteen hundred and sixty five") print numWords.parseString("zero") |
From: Paul M. <pa...@al...> - 2006-01-02 04:17:30
|
Timmy - I don't think pyparsing will address this task any better than just straight Python. Now if you want to parse a word expression like "one hundred and twenty three dollars and sixty seven cents" and get back the value 123.67, then I think pyparsing might be of interest. I just googled for such a number-to-words function in Python, and I didn't readily find one, but here is a URL for a Java version, that looks fairly convertible to Python: http://www.sourcecodesworld.com/howto/java/java-0426.asp. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Timmy Sent: Sunday, January 01, 2006 7:52 PM To: pyp...@li... Subject: [Pyparsing] Is it easy for pyparsing to convert dollar value to word? Hello, I'm thinking that whether pyparsing can do the job of dollar value-to-word converter. I'm writing a program to convert dollar value such as $123.67 to word description. That means convert it to "one hundred and twenty three dollars and sixty seven cents". Do you think it is an easy or hard task? Can any one give me hints or pointer to do so? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Timmy <ti...@ne...> - 2006-01-02 01:51:49
|
Hello, I'm thinking that whether pyparsing can do the job of dollar value-to-word converter. I'm writing a program to convert dollar value such as $123.67 to word description. That means convert it to "one hundred and twenty three dollars and sixty seven cents". Do you think it is an easy or hard task? Can any one give me hints or pointer to do so? |
From: Paolo L. <p....@hy...> - 2005-10-04 11:22:56
|
Hi Paul, first of all... thank you very much for you work. Excellent! I'm using a pyparsing for a project with ambiguous grammars. I realized that I need a "GLR" solution. Problem Statement ----------------- A very simple example of the shortcoming of the current pyparsing implementation is the following: AB = Literal('AB') node = AB ^ 'A' ^ 'BC' g = OneOrMore(node) x = g.parseString("ABC") If you relax the requirement for Or to return the longest match you could get an overall better match: ['A', 'BC'] is better than ['AB'] The idea is that a local optimization (longest match for Or) does not imply a global optimization. Solution -------- As a proof of concept I modified pyparsing to get the result. The hack is really ugly (10 minutes work) but I think it shows the concept: http://www.enuan.com/glr.tgz Some more ideas: 1. Parse Element classes should not return a single ParseResults instance chosen with local optimization but, optionally, return the whole set of possible ParseResults (See ResultSet) 2. Expressions should evaluate recursively ALL solution. the final result is a ResultSet I think it could be viable to modify parse and parseImpl in order to always return a ResultSet. In the example above the resultset would be: ['A', 'BC'] ['AB'] ['A'] TODO BE VERIFIED ---------------- - Actions: without big changes it would be possible to support actions that don't clobber globals.. do you think this is a strong limitation? - It would be nice to rate the solutions. The default criterium could be that of weighting all matches in this way: 1 (default char weight for Parse Element) * num_chars_matched That could be done in parseString and should be customizable... CONCLUSION :-) ---------- I'm going to invest a lot on pyparsing. But I defenitely need this feature. I'd be more that happy in supporting you in any way... Do you think this is something that we can expect to be available in pyparsing? Ciao!! Paolo |
From: Michel P. <mi...@di...> - 2005-08-31 05:01:17
|
On Tue, 2005-08-30 at 20:54 -0500, Paul McGuire wrote: > Do this right at the end of your initGrammar method, after assigning > self.block: > > self.block = ZeroOrMore( ...etc, etc. ) > self.block.leaveWhitespace() > > This will recursively set whitespace handling through the whole bnf, not > just for the root node, and your whitespace handling should be more > predictable. After adding this line, I reran your test, and I think I got > all the between-tag whitespace you were looking for. > > I'm also glad asXML() seems to be working adequately for you. As I > mentioned before, this method is a bit iffy, so it is fortunate that you are > getting such good results. Cool! I'll check this out. > > Congratulations on such a sophisticated parsing application! You should see rdflib sparql support: http://svn.rdflib.net/trunk/rdflib/sparql/grammar.py it parses 28 of the 64 standard sparql queries. -Michel > > -- Paul > |
From: Paul M. <pa...@al...> - 2005-08-31 01:54:53
|
Michel - This part of pyparsing is not well-documented at all, since I typically discourage people from writing whitespace-sensitive parsers. Very often, people come from writing regexp's and try to figure out how to explicitly handle whitespace between tokens, and I have to explain that pyparsing doesn't require explicit whitespace handling, that whitespace is assumed to be a token delimiter, but that the whitespace itself is skipped/ignored by default. However, your grammar is *by its nature* whitespace-sensitive. So you probably need to call the leaveWhitespace() method on your root parse object, self.block, as in: self.block.leaveWhitespace() Do this right at the end of your initGrammar method, after assigning self.block: self.block = ZeroOrMore( ...etc, etc. ) self.block.leaveWhitespace() This will recursively set whitespace handling through the whole bnf, not just for the root node, and your whitespace handling should be more predictable. After adding this line, I reran your test, and I think I got all the between-tag whitespace you were looking for. I'm also glad asXML() seems to be working adequately for you. As I mentioned before, this method is a bit iffy, so it is fortunate that you are getting such good results. Congratulations on such a sophisticated parsing application! -- Paul |
From: Michel P. <mi...@di...> - 2005-08-31 00:38:17
|
On Thu, 2005-08-18 at 13:57 -0500, Paul McGuire wrote: > I have made a few attempts at indentation-based parsing in the past, but I > looked at them last night, and they are really not so good. I think the key > will be in a) using a parse action with col() to detect the indentation > level of the current line, and b) keeping a global stack of indentations > levels seen thus far, so that you can tell if your current line is part of > the current indent level, a deeper level or a higher level. Well I have made a bit more progress on this, as well as some great progress on the sparql parser with pyparsing. On the indentation problem, I have the following module. The relavent pyparsing code is down near the end: https://svn.cignex.com/public/slipr/slipr/slipr.py It's pretty self contained. When this module is run, it tries to parse the test file: https://svn.cignex.com/public/slipr/data/pyinrdf.slpr and I've got everything matching fine, except the whitespace. ;) For some reason I can't get the whitespace action to work right, it only matches about every other whitespace in the doc. Here's some of the output. Notice at the end how some of the whitespace is not matched between tags: <tag> <name> <identifier>RDF</identifier> </name> <attrs> <name> <identifier>python</identifier> </name> <string>"http://namespaces.zemantic.org/python#"</string> </attrs> </tag> [' '] [' '] <tag> <name> <identifier>Ontology</identifier> </name> <attrs> <name> <identifier>python</identifier> </name> <value> <identifier>bob</identifier> </value> </attrs> </tag> [' '] [' '] <tag> <name> <identifier>Class</identifier> </name> <attrs> <name> <identifier>Object</identifier> </name> </attrs> </tag> [' '] <tag> <name> <identifier>issubclass</identifier> </name> <attrs> <name> <identifier>Object</identifier> </name> </attrs> </tag> <tag> <name> <identifier>isinstance</identifier> </name> <attrs> <name> <identifier>Object</identifier> </name> </attrs> </tag> I'm not sure what's wrong, can anyone spot a simple error or suggest another way to handle the indentation issue? Thanks, -Michel |
From: Michel P. <mi...@di...> - 2005-08-18 20:58:24
|
On Thu, 2005-08-18 at 13:57 -0500, Paul McGuire wrote: > Michel - > > Not so much global data, as it is parsing state preserved inside the > pyparsing class instances (namely the cacheing of exception instances). I > am fairly certain that calling parseString is not thread-safe, and you > should interlock calls to it if you have multiple threads calling it. Oh I'm sorry, what I meant to say was different threads will be calling different instances, not the same instance. IE, every thread will have its own SPARQLGrammar.Query instance. sliplib used module vars and declared global vars and thus the _whole module_, and all of its features, cannot be used from different threads, but different instances of pyparsing classes should be fine. I think. ;) > I have made a few attempts at indentation-based parsing in the past, but I > looked at them last night, and they are really not so good. I think the key > will be in a) using a parse action with col() to detect the indentation > level of the current line, and b) keeping a global stack of indentations > levels seen thus far, so that you can tell if your current line is part of > the current indent level, a deeper level or a higher level. Sounds good. Something to think about would be encapsulating the indentation level in something other than a global var so that it is thread safe. Maybe the parse action can be a callable instance that keeps this level internal? class IndentationAction(object): level = 0 def __call__(self, *args): # ... indentation tracking logic indent = White().parseAction(IndentationAction()) or something like that. > When creating your test cases, be sure to add unfriendly tests, such as > nested levels that unwind to a higher nesting than just the immediate > parent. That is: > > A > A1 > A2 > A2a > A2aa > A2ab > A2b > A2ba > A3 > > Since there is no A2c entry (to be a peer of A2a and A2b), your parser will > end up doing a double pop from the indentation stack. > > Also, what would this data signify? > > A > A1 > A2 > A2a > A2aa > A2ab > A2b > A2ba > A2.5 > A3 > > Note that A2.5 is more indented than A2 and A3, but less indented than A2a > and A2b. I'm guessing this case should probably be an error (and if you > detect it in a parse action, you should raise ParseFatalException instead of > simple ParseException, to halt parsing immediately). Right, obviously we went good structured representation but not necessarily the exact semantics of Python, unless desired. I'll work some more on this over the weekend and let you know what my results are. -Michel |
From: Michel P. <mi...@ci...> - 2005-08-18 19:11:38
|
Thanks for the tips, Paul, on the sparql grammar. It can successfully parse about a dozen or so test queries. I've handed of that portion for the moment to another team member who will be wiring the grammar up to the query logic (which is already done, thanks to Ivan Herman from the W3C) and soon rdflib will have top to bottom sparql support! C ouple of things related to that before I get onto my question: 1) have you considered distributing pyparsing as an egg? 2) pypi seems to have a slightly out of data version, 3) any new pyparsing releases in the near future? Ok, onto a question. I've been working on a new syntax flavor of RDF/XML. It's basically SLiP (Something Like Python http://www.scottsweeney.com/projects/slip/) but a re-implementation for a couple of reasons, the current SLiP implementation: - uses a lot of module globals, we need concurrent parsing/generation - is based on an older expat style parsing - does nice xml->slip, but the slip->xml is garbage - uses an inefficient backtracking parse algorithm The new implementation will: - be class based using no module globals - use xml.sax for parsing xml->slip (done) - use pyparsing for parsing slip->xml (this email) In addition to reimplementing SLiP I plan to extend it to have some RDF eye candy, basically just some syntax short hands that save on typing lots of "rdf:ID=" and "rdf:resource=" and some other nice stuff. For example, here is some RDF/XML <?xml version="1.0" ?> <!DOCTYPE rdf:RDF > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://namespaces.zemantic.org/python#" xml:base="http://namespaces.zemantic.org/python#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:python="http://namespaces.zemantic.org/python#"> <owl:Ontology rdf:about="http://namespaces.zemantic.org/python#"> <rdfs:comment> A Python RDF ontology. </rdfs:comment> </owl:Ontology> <!-- declaration properties--> <owl:ObjectProperty rdf:ID="of"> <rdfs:subPropertyOf rdf:resource="rdfs:domain"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="value"> <rdfs:subPropertyOf rdf:resource="rdfs:range"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="attribute"> <rdfs:subPropertyOf rdf:resource="owl:ObjectProperty"/> becomes the SLiP: rdf:RDF(xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#, xmlns:xmlns=http://namespaces.zemantic.org/python#, xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#, xmlns:owl=http://www.w3.org/2002/07/owl#, xmlns:python=http://namespaces.zemantic.org/python#, xml:base="http://namespaces.zemantic.org/python#"): owl:Ontology(rdf:about="http://namespaces.zemantic.org/python#"): rdfs:comment(): "A python RDF ontology." owl:ObjectProperty(rdf:ID="of"): rdfs:subPropertyOf(rdf:resource="rdfs:domain"): owl:ObjectProperty(rdf:ID="value"): rdfs:subPropertyOf(rdf:resource="rdfs:range"): owl:ObjectProperty(rdf:ID="attribute"): rdfs:subPropertyOf(rdf:resource="owl:ObjectProperty"): which in turn becomes the SLiPR: RDF(xmlns:python="http://namespaces.zemantic.org/python#"): Ontology() """ A Python RDF ontology. """ # declaration properties ObjectProperty(of): subPropertyOf(domain) ObjectProperty(value): subPropertyOf(range) ObjectProperty(attribute): subPropertyOf(ObjectProperty) value(Object) I want to write a pyparsing grammar that parses SLiP and SLiPR. The tag and attribute matching parts are easy and I've made good progress on that, the hard part has been parsing and understanding the python-style indentation based syntax. So far the only idea I've had is to keep track of the current indentation level somewhere and register a parser action on indentation tokens that detects the indentation syntax changes. But I was hoping for a "pure" pyparsing solution that required no actions or state variables. Any pointers? Thanks! -Michel |
From: Paul M. <pa...@al...> - 2005-08-18 18:57:18
|
Michel - Not so much global data, as it is parsing state preserved inside the pyparsing class instances (namely the cacheing of exception instances). I am fairly certain that calling parseString is not thread-safe, and you should interlock calls to it if you have multiple threads calling it. I have made a few attempts at indentation-based parsing in the past, but I looked at them last night, and they are really not so good. I think the key will be in a) using a parse action with col() to detect the indentation level of the current line, and b) keeping a global stack of indentations levels seen thus far, so that you can tell if your current line is part of the current indent level, a deeper level or a higher level. When creating your test cases, be sure to add unfriendly tests, such as nested levels that unwind to a higher nesting than just the immediate parent. That is: A A1 A2 A2a A2aa A2ab A2b A2ba A3 Since there is no A2c entry (to be a peer of A2a and A2b), your parser will end up doing a double pop from the indentation stack. Also, what would this data signify? A A1 A2 A2a A2aa A2ab A2b A2ba A2.5 A3 Note that A2.5 is more indented than A2 and A3, but less indented than A2a and A2b. I'm guessing this case should probably be an error (and if you detect it in a parse action, you should raise ParseFatalException instead of simple ParseException, to halt parsing immediately). -- Paul |
From: Michel P. <mi...@di...> - 2005-08-18 17:30:42
|
On Thu, 2005-08-18 at 06:35 -0500, Paul McGuire wrote: > Michel - > > Glad the sparql parsing is proceeding well. > > I'm not sure pyparsing is going to go much better than your current parser, > given the warts that you cite: > - pyparsing is not very good in multi-thread code, for the same reasons you > mention, mostly use of globals. I don't see any vars declared global in pyparsing unless you've added them recently. I don't see any of the other usual thread-killing warts either, like mutable default arguments or module level vars. I've no experience with any other kind of global state in python, by my eyes pyparsing should be pretty threadsafe, but hey, you're the author. ;) I'd be more than willing to try and fix and/or verify pyparsing with multiple threads. Really the thread issue is just a minor concern, I don't think they're be much concurrent parsing as much as generation, so I can get a way with locks for now if it's totally necessary. > - pyparsing's asXML() output for parsed results is somewhat hit-or-miss. I > really should remove that code for now, or at least label it as "shaky". I'm just using it for visual verification for now, so it's shakyness is ok for me. > > To do indentation-based parsing, you will need a parse action to do the > indentation work, and a stack to keep track of the current indentation > levels, so that you can unwind to previous indent levels. Here's one > suggestion if you haven't thought of it already: use pyparsing's > col(loc,strg) built-in inside the parse action, to compute the column of the > starting text. Great, that's what I imagined, but the col() trick will be usefull, thanks Paul! -Michel |
From: Michel P. <mi...@di...> - 2005-08-18 16:51:20
|
Thanks for the tips, Paul, on the sparql grammar. It can successfully parse about a dozen or so test queries. I've handed of that portion for the moment to another team member who will be wiring the grammar up to the query logic (which is already done, thanks to Ivan Herman from the W3C) and soon rdflib will have top to bottom sparql support! C ouple of things related to that before I get onto my question: 1) have you considered distributing pyparsing as an egg? 2) pypi seems to have a slightly out of data version, 3) any new pyparsing releases in the near future? Ok, onto a question. I've been working on a new syntax flavor of RDF/XML. It's basically SLiP (Something Like Python http://www.scottsweeney.com/projects/slip/) but a re-implementation for a couple of reasons, the current SLiP implementation: - uses a lot of module globals, we need concurrent parsing/generation - is based on an older expat style parsing - does nice xml->slip, but the slip->xml is garbage - uses an inefficient backtracking parse algorithm The new implementation will: - be class based using no module globals - use xml.sax for parsing xml->slip (done) - use pyparsing for parsing slip->xml (this email) In addition to reimplementing SLiP I plan to extend it to have some RDF eye candy, basically just some syntax short hands that save on typing lots of "rdf:ID=" and "rdf:resource=" and some other nice stuff. For example, here is some RDF/XML <?xml version="1.0" ?> <!DOCTYPE rdf:RDF > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://namespaces.zemantic.org/python#" xml:base="http://namespaces.zemantic.org/python#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:python="http://namespaces.zemantic.org/python#"> <owl:Ontology rdf:about="http://namespaces.zemantic.org/python#"> <rdfs:comment> A Python RDF ontology. </rdfs:comment> </owl:Ontology> <!-- declaration properties--> <owl:ObjectProperty rdf:ID="of"> <rdfs:subPropertyOf rdf:resource="rdfs:domain"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="value"> <rdfs:subPropertyOf rdf:resource="rdfs:range"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="attribute"> <rdfs:subPropertyOf rdf:resource="owl:ObjectProperty"/> becomes the SLiP: rdf:RDF(xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#, xmlns:xmlns=http://namespaces.zemantic.org/python#, xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#, xmlns:owl=http://www.w3.org/2002/07/owl#, xmlns:python=http://namespaces.zemantic.org/python#, xml:base="http://namespaces.zemantic.org/python#"): owl:Ontology(rdf:about="http://namespaces.zemantic.org/python#"): rdfs:comment(): "A python RDF ontology." owl:ObjectProperty(rdf:ID="of"): rdfs:subPropertyOf(rdf:resource="rdfs:domain"): owl:ObjectProperty(rdf:ID="value"): rdfs:subPropertyOf(rdf:resource="rdfs:range"): owl:ObjectProperty(rdf:ID="attribute"): rdfs:subPropertyOf(rdf:resource="owl:ObjectProperty"): which in turn becomes the SLiPR: RDF(xmlns:python="http://namespaces.zemantic.org/python#"): Ontology() """ A Python RDF ontology. """ # declaration properties ObjectProperty(of): subPropertyOf(domain) ObjectProperty(value): subPropertyOf(range) ObjectProperty(attribute): subPropertyOf(ObjectProperty) value(Object) for a more complete example of the SLiPR language see: https://svn.cignex.com/public/slipr/pyinrdf.slpr I want to write a pyparsing grammar that parses SLiP and SLiPR. The tag and attribute matching parts are easy and I've made good progress on that, the hard part has been parsing and understanding the python-style indentation based syntax. So far the only idea I've had is to keep track of the current indentation level somewhere and register a parser action on indentation tokens that detects the indentation syntax changes. But I was hoping for a "pure" pyparsing solution that required no actions or state variables. Any pointers? Thanks! -Michel |
From: Michel P. <mi...@di...> - 2005-08-15 03:33:23
|
On Sun, 2005-08-14 at 17:29 -0500, Paul McGuire wrote: > Michel - > > Wow! This is quite an ambitious "first project" for using pyparsing, but > you seem to have gotten pretty far. Thanks! And thanks for your note, as soon as I get a chance tonight/tommorow I'll go over it. I originally started with a reverse of your spec, but then I went the other way with all the forwards. I think you're right that they should be reversed. Thanks for pyparsing it works really great! > > There is a very simple bug in your parser. Line 372 is: > > _VAR_ = Word("?", alphanums+'_.-', min=2) > > It should be > > _VAR_ << Word("?", alphanums+'_.-', min=2) doh! I'm guessing this is probably the #1 first mistake with pyparsing? ;) -Michel |
From: Michel P. <mi...@ci...> - 2005-08-14 22:33:41
|
On Sun, 2005-08-14 at 12:43 -0700, Michel Pelletier wrote: > Hi, > > Recently Ivan Herman implemented the SPARQL query language logic for > rdflib but there was no parser. None of us are parser geeks but I > decided to try out a few different ones and I'm experimenting with > pyparsing. You an see my first draft at > > http://svn.rdflib.net/trunk/rdflib/sparql/grammar.py > > I'm having trouble with a part of the grammar and I was hoping someone > else was running into this. Here are my test queries so far: Hmm.. it apears to be my use of forward declared terminals. When I replace a terminal with the same expression it's declared with it works. Oh well, now I can make progress. ;) -Michel |
From: Paul M. <pa...@al...> - 2005-08-14 22:29:24
|
Michel - Wow! This is quite an ambitious "first project" for using pyparsing, but you seem to have gotten pretty far. There is a very simple bug in your parser. Line 372 is: _VAR_ = Word("?", alphanums+'_.-', min=2) It should be _VAR_ << Word("?", alphanums+'_.-', min=2) Since you predefine a Forward for _VAR_ and then used that Forward as part of the definition of a select command, reassigning a different expression to _VAR_ loses the previous object entirely. After making this change, all of your tests appear to pass. (You have a similar bug on line 440, in the definition of _NCNAME_.) Here are some other stylistic comments: 1. GraphPattern << PatternElement + ZeroOrMore(PatternElement) Can be more cleanly defined as: GraphPattern << OneOrMore(PatternElement) And similarly for: TriplePatternList << TriplePattern + ZeroOrMore(TriplePattern) 2. String << _STRING_LITERAL1_ | _STRING_LITERAL2_ Since _STRING_LITERAL1_ is just a sglQuotedString (and used nowhere else), and _STRING_LITERAL2_ is a dblQuotedString (and also used nowhere else), you might try using String << quotedString Since pyparsing defined quotedString as sglQuotedString | dblQuotedString. 3. _INTEGER_LITERAL_ << (Optional(oneOf("+ -")) + _DECIMAL_LITERAL_ + Optional(oneOf("l L")) | _HEX_LITERAL_ + Optional(oneOf("l L"))) _HEX_LITERAL_ << zero + oneOf("x X") + Word(nums + srange('[a-f]') + srange('[a-f]')) _FLOATING_POINT_LITERAL_ << (Optional(oneOf("+ -")) + Word(nums) + dot + Word(nums) + Optional(_EXPONENT_) | dot | OneOrMore(nums) + Optional(_EXPONENT_) | OneOrMore(nums) + _EXPONENT_) Don't forget that pyparsing accepts whitespace between elements of an And expression. I don't think you want to accept "0 x 001Ab8" as a hex literal. This expression would also return the results as a list of tokens, so "0x001Ab8" would be returned as ['0','x','001Ab8']. Fix this by wrapping these literal expressions in a Combine class, which does 2 things: requires that all elements be adjacent (although this can be overridden if desired); and concatenates all of the matched tokens into a single string. Also, a point about srange (I assume that one of your '[a-f]'s should be '[A-F]'). Srange can accept multiple ranges or single characters within its range argument, so you could just as easily define HEX_LITERAL as (adding in the Combine): _HEX_LITERAL_ << Combine( zero + oneOf("x X") + Word(nums + srange('[a-fA-F]')) ) Or even: _HEX_LITERAL_ << Combine( zero + oneOf("x X") + Word(srange('[0-9a-fA-F]')) ) (Hmmm, you are about the umpteenth person I've seen have to define the valid set of hex number characters, maybe I should add this as a pyparsing helper...). Lastly: You have chosen a very common EBNF->pyparsing technique of defining *many* Forward elements for your constructions, and the populating them later using the '<<' operator. This is needed since most EBNF's are top-down definitions, but Python requires variables to be defined before they can be referenced. I would propose reading your EBNF in reverse, so that you can simply define constructions in advance of when they are referenced, so there is no need to create empty Forward's to be defined later. I mean, these statements just don't seem to merit being Forwards (as well as probably requiring a Combine here and there): _LANG_ << at + _A2Z_ + Optional(dash + _A2Z_) _A2Z_ << Word(alphas) _DECIMAL_LITERAL_ << _DIGITS_ _EXPONENT_ << oneOf("e E") + Optional(oneOf("+ -")) + Word(nums) String << quotedString _QNAME_ << Optional(_NCNAME_ + colon) + _NCNAME_ _VAR_ << Word("?", alphanums+'_.-', min=2) _DIGITS_ << Word(nums) _NCNAME_ << Word(alphas+'_', alphanums+'_.-') Reserve Forward's for those expressions that are truly recursive, such that they must be referenced before they are fully defined (that is, they must be referenced within their own definition). For instance, arithmetic expressions usually fall into this category, plus your SELECT statements do when a select can be embedded within another select statement's WHERE clause. Forward's incur a fair bit of extra call overhead, and this always translates into poor performance. Instead, just start with a basic definition of _NCNAME_ and work backwards, and all should work much more cleanly (and don't have to remember to use '<<' instead of '=' :) ). _NCNAME_ = Word(alphas+'_', alphanums+'_.-') _DIGITS_ = Word(nums) _VAR_ = Word("?", alphanums+'_.-', min=2) _QNAME_ = Optional(_NCNAME_ + colon) + _NCNAME_ String = quotedString _EXPONENT_ = oneOf("e E") + Optional(oneOf("+ -")) + Word(nums) _DECIMAL_LITERAL_ = _DIGITS_ _FLOATING_POINT_LITERAL_ = Combine(Optional(oneOf("+ -")) + Word(nums) + dot + Word(nums) + Optional(_EXPONENT_) | dot | OneOrMore(nums) + Optional(_EXPONENT_) | OneOrMore(nums) + _EXPONENT_) _A2Z_ = Word(alphas) _LANG_ = Combine(at + _A2Z_ + Optional(dash + _A2Z_)) Of course, I would be the last to argue with demonstrated progress, and you have really gotten quite far without any of my help, and things do seem to be working so far. So take as much/little/none of this advice as you like, and best of luck using pyparsing! (and keep us all posted on how things are going) -- Paul McGuire |
From: Michel P. <mi...@di...> - 2005-08-14 19:42:45
|
Hi, Recently Ivan Herman implemented the SPARQL query language logic for rdflib but there was no parser. None of us are parser geeks but I decided to try out a few different ones and I'm experimenting with pyparsing. You an see my first draft at http://svn.rdflib.net/trunk/rdflib/sparql/grammar.py I'm having trouble with a part of the grammar and I was hoping someone else was running into this. Here are my test queries so far: ts = ["SELECT *", "SELECT DISTINCT *", "SELECT ?title", "SELECT ?title, ?name", "SELECT * FROM <a> WHERE ( <book1> <title> ?title )", ] and this is the top level definition for "Query" and "ReportFormat" and their coresponding EBNF rules: # [1] Query ::= PrefixDecl* ReportFormat PrefixDecl* FromClause? WhereClause? Query << (ZeroOrMore(PrefixDecl) + ReportFormat + ZeroOrMore(PrefixDecl) + Optional(FromClause) + Optional(WhereClause)) # [2] ReportFormat ::= 'select' 'distinct'? <VAR> ( CommaOpt <VAR> )* # | 'select' 'distinct'? '*' # | 'construct' TriplePatternList # | 'construct' '*' # | 'describe' VarOrURI ( CommaOpt VarOrURI )* # | 'describe' '*' # | 'ask' ReportFormat << (select + Optional(distinct) + Group(delimitedList(_VAR_)) | select + Optional(distinct) + star | construct + TriplePatternList | construct + star | describe + delimitedList(VarOrURI) | describe + star | ask) My problem is with the third and fourth test queries. The first two and the last pass no problem. I can't seem to match Group(delimitedList(_VAR_)) properly in the grammar, but I can match it well enough from the command line: >>> s = SPARQLGrammar.select + Optional(SPARQLGrammar.distinct) + Group(delimitedList(SPARQLGrammar._VAR_)) >>> s.parseString('select ?title') (['select', (['?title'], {})], {}) >>> s.parseString('select ?title, ?bob') (['select', (['?title', '?bob'], {})], {}) but when I run the grammar on the same query above, ReportFormat doesn't match: >>> ## working on region in file /tmp/python-104181Id.py... tokens = ['select', '*'] tokens = ['select', 'distinct', '*'] SELECT ?title ^ (at char 7), (line:1, col:8) SELECT ?title, ?name ^ (at char 7), (line:1, col:8) tokens = ['select', '*', 'from', 'a'] >>> I'm not sure what's wrong here, the first expression in ReportFormat looks just like the one I did in the command line, but I can't get it to match. Anyone have any pointers? Thanks, -Michel |
From: <pt...@au...> - 2005-05-17 15:54:15
|
Sebastian - First of all, welcome to the world of pyparsing! I hope it serves well for you. Second, I must say that, embarking on a grammar as complex as JavaScript is a pretty big job, and you may want to start with something less detailed until you are familiar with pyparsing. But finally, to answer your question, yes, pyparsing has support for recursive grammars. The Forward() class is a placeholder that allows you to define a Python variable for a parse expression, but to define its parse sub-grammar at a later time. In general, the sequence is: var = Forward() # define var using Forward() var2 = blah | blah2 | var #define one or more other constructs using var var << '(' + var2 + ')' # define var contents, load using '<<' operator This last step is crucial - do NOT do: var = '(' + var2 + ')' # WRONG!!! This is a common error when defining Forward()'s, just be careful. In setting up a grammar for JS, I expect you will have a number of Forward elements, not just for function definitions. You will have arithmetic functions to evaluate, grouped statements composed of simple or grouped statements, class definitions, etc. Look at the fourFn.py and idlParse.py examples that ship with pyparsing. They may give you some hints. For a simple example, here is a list parser. In Python, if you print out a list to a string, this parser will reconstruct the list - it is a safe alternative to using eval(). Notice how the listDef is a Forward, is used as part of a listItem, then the listDef body is loaded using '<<'. This is similar to what you will need to do. Good luck! -- Paul # get pyparsing at http://pyparsing.sourceforge.net from pyparsing import quotedString, Forward, Literal,delimitedList,Group,removeQuotes quotedString.setParseAction(removeQuotes) testdata = """[["abc","def",["ghi","jkl"],"mno"],["pqr","stu", ["vwx","yz0"],["123",["456","789"]],"$$$"],"^^^"]""" lbrack = Literal("[").suppress() rbrack = Literal("]").suppress() listDef = Forward() # add more things to listItem, such as integers, etc. if your list has other than quoted strings listItem = quotedString | listDef listDef << lbrack + Group( delimitedList(listItem) ) + rbrack results = listDef.parseString(testdata) def printList(l,ind=0): for item in l: if isinstance(item,list): print " "*ind,"+" printList(item, ind+2) else: print " "*ind,"-",item printList(results[0].asList()) |
From: Sebastian W. <seb...@1u...> - 2005-05-17 12:32:18
|
Hi! I am trying to build parsers for css and javascript with pyparsing. Pyparsing seems to be really great for this job. I have a problem to build recursive matches, for example a snippet that can parse the following. function foo() { function bar() { } somecode(); } The "function" could be used at any level. I tried something like the following, what doesn't work. js_function = Literal("function") + Word (alphanums) + Literal("(") + Literal(")") + Literal("{") + OneOrMore ( js_function | ... ) Literal("}") js_function is not already defined in the OneOrMore if I want to use it there. This is something like a recursion. Have you any idea or quirk to get this work. Thank you. Best regards, Sebastian -- Sebastian Werner Application-Development (Pustefix-Core) 1&1 Internet AG seb...@1u... Fon: ++49 721 91374 - 154 Mobile: ++49 179 4590730 The Web is a procrastination apparatus: It can absorb as much time as is required to ensure that you won't get any real work done. - Jakob Nielsen (Usability Guru) There is no substitute for experience. - Darckness (Gentoo Kernel Hacker) |
From: Michele P. <mic...@un...> - 2005-04-28 14:39:14
|
pt...@au... wrote: > Michele - > > First of all, are you sure this is a valid entry? The one example > file I have would list this as: > > controls { > inet 127.0.0.1 allow { any; }; keys { "key";}; > }; > > instead of your version: > > controls { > inet 127.0.0.1 allow { any; } keys { "key";}; > }; > I broke my mind for search an error on my parsing strings and the problem was a ";" !!! > > Pyparsing includes some debugging capabilities so you can peek into > the parsing logic process. Try changing the two lines: > > simple = Group(value + ZeroOrMore(value) + ";") > statement = Group(value + ZeroOrMore(value) + "{" + > Optional(toplevel) + "}" + ";") > > to > simple = Group(value + ZeroOrMore(value) + ";").setDebug() > statement = Group(value + ZeroOrMore(value) + "{" + > Optional(toplevel) + "}" + ";").setDebug() > > You will now start to see messages during parsing when each of these > expressions is tried, and either succeeds or throws an exception. > > After you get this running, let us know what you find. > > -- Paul I saw this method and I find that is very useful. I'll use it. Thanks, Michele |