pyparsing-users Mailing List for Python parsing module (Page 6)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Paul M. <pt...@au...> - 2013-11-06 12:55:12
|
You might look at one of the variations on parsing that pyparsing expressions can do. The typical parser case is one which the parser handles all the input text. It requires the most work because it has to handle everything in the input. You can also write a pyparsing parser that only matches part of the input file, and then scan or search for just those parts. I think this may be suitable for your case. Look over the following code and see how searchString and scanString return the matching lines, and how with scanString (which returns a Python generator - if you're not familiar with these, look it up), you can pull out the text between parses, since scanString returns not only the matching text, but also the start and end locations. -- Paul from pyparsing import * line_of_words = OneOrMore(Word(alphas)) inputText = """\ sldjf lskjflsja lasdfljsdf owiuerowue ndf 122 1203 080182 0123 1023021 013802 02108 aslkjweoiur olsuaperu lsfiwuer kfdsldf 293749237 029 927397 2979 29793732974 9237 82739 sjfdhhwl oewr lwkejrlj wlehrnmb 34982 9392 """ # find all groups of words using searchString for line in line_of_words.searchString(inputText): print line # prints: # ['sldjf', 'lskjflsja', 'lasdfljsdf', 'owiuerowue', 'ndf'] # ['aslkjweoiur', 'olsuaperu', 'lsfiwuer', 'kfdsldf'] # ['o'] # ['sjfdhhwl', 'oewr', 'lwkejrlj', 'wlehrnmb'] # find all groups and their start/end locations using scanString for line,start,end in line_of_words.scanString(inputText): print line # prints: # ['sldjf', 'lskjflsja', 'lasdfljsdf', 'owiuerowue', 'ndf'] # ['aslkjweoiur', 'olsuaperu', 'lsfiwuer', 'kfdsldf'] # ['o'] # ['sjfdhhwl', 'oewr', 'lwkejrlj', 'wlehrnmb'] # use scanString to associate intervening text with matched line parsedData = [] scanner = line_of_words.scanString(inputText) lastLine,lastStart,lastEnd = next(scanner) for line, start, end in scanner: parsedData.append((lastLine, inputText[lastEnd:start].splitlines())) lastLine,lastEnd = line,end # add final group after last parsed line parsedData.append((lastLine, inputText[lastEnd:].splitlines())) for line,data in parsedData: print '-', ' '.join(line) for d in data: print ' ', d # prints #- sldjf lskjflsja lasdfljsdf owiuerowue ndf # # 122 # 1203 080182 0123 1023021 013802 # 02108 # #- aslkjweoiur olsuaperu lsfiwuer kfdsldf # # 293749237 # 029 927397 2979 29793732974 # 9237 #- o # 82739 # #- sjfdhhwl oewr lwkejrlj wlehrnmb # # 34982 9392 # -----Original Message----- From: Hanchel Cheng [mailto:han...@br...] Sent: Tuesday, November 05, 2013 7:15 PM To: pyp...@li... Subject: [Pyparsing] Using grammar as a condition for loop Hello! I have a text file in a structure like this: ######start####### [line1 matching grammar] #[text] #[text] [text] [line2 matching grammar] #[text] [etc.] #######end####### There can be N amounts of lines with or without the # under each indent with a line that matches the grammar. I'm checking for the grammar, then I would like to check all the lines until the next line that follows the grammar. Something like... for line in text_file: if not(line matches grammar): do something Can pyparsing do this? If not, any suggestions? I can give more info if necessary. I really appreciate the help! Kind regards, Hanchel ---------------------------------------------------------------------------- -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com |
From: Mario R. O. <nim...@gm...> - 2013-11-06 03:32:34
|
> Can pyparsing do this? Saying yes is pretty much an understatement, but thats what you want to hear I guess, so yes it can. Dtb/Gby ======= Mario R. Osorio "... Begin with the end in mind ..." http://www.google.com/profiles/nimbiotics On Tue, Nov 5, 2013 at 8:14 PM, Hanchel Cheng <han...@br...> wrote: > Hello! > > I have a text file in a structure like this: > ######start####### > [line1 matching grammar] > #[text] > #[text] > [text] > > [line2 matching grammar] > #[text] > [etc.] > #######end####### > There can be N amounts of lines with or without the # under each indent > with a line that matches the grammar. > > I'm checking for the grammar, then I would like to check all the lines > until the next line that follows the grammar. > > Something like... > for line in text_file: > if not(line matches grammar): > do something > > Can pyparsing do this? If not, any suggestions? I can give more info if > necessary. > > I really appreciate the help! > > Kind regards, > Hanchel > > ------------------------------------------------------------------------------ > November Webinars for C, C++, Fortran Developers > Accelerate application performance with scalable programming models. > Explore > techniques for threading, error checking, porting, and tuning. Get the most > from the latest Intel processors and coprocessors. See abstracts and > register > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Hanchel C. <han...@br...> - 2013-11-06 01:14:54
|
Hello! I have a text file in a structure like this: ######start####### [line1 matching grammar] #[text] #[text] [text] [line2 matching grammar] #[text] [etc.] #######end####### There can be N amounts of lines with or without the # under each indent with a line that matches the grammar. I'm checking for the grammar, then I would like to check all the lines until the next line that follows the grammar. Something like... for line in text_file: if not(line matches grammar): do something Can pyparsing do this? If not, any suggestions? I can give more info if necessary. I really appreciate the help! Kind regards, Hanchel |
From: <pt...@au...> - 2013-10-29 18:50:13
|
This might be a good time to invoke some level of moderation on the discussion. There is already confusion over who said what, who assumed what, and so on. The original post definitely asked a question in an area in which pyparsing is a sledgehammer to swat a fly. In the case of pyparsing, some people do start with flies as simple test cases, or choose to publish those in online support forums as a distilled-down example of some larger problem. (If the latter, I *really* appreciate it, when someone can post a short specific issue instead of a 60-line parser with the issue buried somewhere in it.) As it happens, this particular example points up a bit of a pitfall in pyparsing, that of looking for TABs in your parser - the call to parseWithTabs is easily overlooked. I don't think it is at all out of line to suggest non-pyparsing solutions to particular posts here, and I've found that investigating alternative solutions is always of value: sometimes I find a better way, and sometimes I get better insights into the original selected option. But this *is* a pyparsing mailing list, not a general Python support list, so if there is a bias to pyparsing-based solutions, I think that could be forgiven. -- Paul (I *would* prefer though that we avoid discourteous language - nobody is being paid to respond to these emails, I think most are posted in good faith and with good intentions, and usually with some amount of personal investment. I think there is room to disagree and still maintain civility and respect.) |
From: Mark L. <bre...@ya...> - 2013-10-29 16:24:07
|
On 29/10/2013 15:46, Mario R. Osorio wrote: > Mark, > > Only two mistakes were made here: > > 1. In lue of more detailed background, you *assumed* Roggisch is trying > to use pyparsing just to look for "\t"'s, and > 2. The rest of us stupids here (me included of course) > *assumed*Roggisch just asked a very specific question pertaining a > more complex > issue from which he might have wanted to keep us away either because he > doesn't need help on anything else or because this is the first such issue > he finds where he needs help. > > See the two mistakes? neither do I ... The one and only mistake here is > that everyone assumed what each wanted to assume > > Now, in my particular case, I can give you a reason for that (which I do > not pretend to use as an excuse): "I prefer to think everyone has a certain > level of knowledge, though not necessarily of experience, otherwise you > wouldn't be here but probably asking your teacher at school". That is of > course, another assumption but, hey! I'm just one more stupid human being > trusting other human beings!!. > > > The only *sheer unadulterated rubbish* here have been your comments. Please > be more of a boy!* > * > What are you on about? The OP was Hanchel Cheng, not Diez B. Roggisch. The OP said "My question is pretty basic: I have a string 'name\tdate\t\tlocation'. What would I do to ensure that between the 'name' and 'date' there is exactly one tab and between 'date' and 'location' there is exactly two tabs?" What is there to assume about that? Diez replied "I would forego pyparsing and use single string-functions" and later "yes, you can do it with pyparsing, but IMHO it's overkill". D* then wrote his piece to which I replied, obviously very much agreeing with Diez. Have I missed something obvious? -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence |
From: Mario R. O. <nim...@gm...> - 2013-10-29 15:46:42
|
Mark, Only two mistakes were made here: 1. In lue of more detailed background, you *assumed* Roggisch is trying to use pyparsing just to look for "\t"'s, and 2. The rest of us stupids here (me included of course) *assumed*Roggisch just asked a very specific question pertaining a more complex issue from which he might have wanted to keep us away either because he doesn't need help on anything else or because this is the first such issue he finds where he needs help. See the two mistakes? neither do I ... The one and only mistake here is that everyone assumed what each wanted to assume Now, in my particular case, I can give you a reason for that (which I do not pretend to use as an excuse): "I prefer to think everyone has a certain level of knowledge, though not necessarily of experience, otherwise you wouldn't be here but probably asking your teacher at school". That is of course, another assumption but, hey! I'm just one more stupid human being trusting other human beings!!. The only *sheer unadulterated rubbish* here have been your comments. Please be more of a boy!* * Dtb/Gby ======= Mario R. Osorio "... Begin with the end in mind ..." http://www.google.com/profiles/nimbiotics On Tue, Oct 29, 2013 at 11:18 AM, Mark Lawrence <bre...@ya...>wrote: > IMHO sheer unadulterated rubbish. What you're saying is that when you > want to crack a nut you skip the sledge hammer stage and use a steam > roller. Here string methods are perfectly adequate for the task so use them > > > Kindest regards. > > Mark Lawrence. > > > > On Tuesday, 29 October 2013, 15:08, d* <d*@y23.org> wrote: > > Hi, > >I see nothing wrong with using pyparsing. There are actually many ways > to solve the problem here. If you expect to be using pyparsing more in the > future and expect to have multiple users maintaining the code I'd keep it > simple and just stick to one paradigm and stay in the pyparsing realm. > > > >I'm no expert at pyparsing yet, and find myself still continuing to learn > it as I go. I've become a fan of pair programming as well where two of us > are learning pyparsing at the same time. > > > >There is nothing wrong with regex. I use it where its needed, but in > general, I have found I don't mix it with the pyparsing code modules. > > > >And by doing so I've managed to get several different 'grammars' now > robustly working. > > > >I don't see the overkill argument. When one walks into a factory and > looks at the machinery for example. Is overkill that a machine the size of > a truck cuts the same pattern 2000 times a day to an endless supply of > steel? I doubt it. The work probably was done in the past by less refined > machines and more than likely you are looking at the latest and newest > production model for that type of work. I see it as, "Why do extra work, > when you can have code that has been tested and end up doing less work?" > > > >Good luck on which ever method(s) you choose to implement. And don't > forget to have some fun while you are doing it :) > >David > > > > > >> -------Original Message------- > >> From: Diez B. Roggisch <de...@we...> > >> To: Hanchel Cheng <han...@br...> > >> Cc: pyp...@li... < > pyp...@li...> > >> Subject: Re: [Pyparsing] Check for tabs > >> Sent: Oct 29 '13 03:09 > >> > >> > >> On Oct 29, 2013, at 12:41 AM, Hanchel Cheng <han...@br...> > wrote: > >> > >> > Regardless of "all [I] have," I'd like to know if pyparser can check > for a specific number of tabs between alphanumeric strings. If there are > not two tabs between the 2nd and 3rd word, I'd like to error out. Is > pyparsing truly overkill for this task? > >> > >> I think by now you have your answer: yes, you can do it with > pyparsing, but IMHO it's overkill, if that's all you ask it to do. Probably > even using a regex would be more opaque then necessary. > >> > >> I've use pyparsing happily quite a few times to e.g. parse CSS or > small DSLs. But for this kind of thing, I'd use string-methods. > >> > >> Diez > >> > ------------------------------------------------------------------------------ > >> Android is increasing in popularity, but the open development platform > that > >> developers love is also attractive to malware creators. Download this > white > >> paper to learn more about secure code signing practices that can help > keep > >> Android apps secure. > >> > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Pyparsing-users mailing list > >> Pyp...@li... > >> https://lists.sourceforge.net/lists/listinfo/pyparsing-users > >> > > > > >------------------------------------------------------------------------------ > >Android is increasing in popularity, but the open development platform > that > >developers love is also attractive to malware creators. Download this > white > >paper to learn more about secure code signing practices that can help keep > >Android apps secure. > > > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > >_______________________________________________ > >Pyparsing-users mailing list > >Pyp...@li... > >https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > > > > > > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Mark L. <bre...@ya...> - 2013-10-29 15:18:51
|
IMHO sheer unadulterated rubbish. What you're saying is that when you want to crack a nut you skip the sledge hammer stage and use a steam roller. Here string methods are perfectly adequate for the task so use them Kindest regards. Mark Lawrence. On Tuesday, 29 October 2013, 15:08, d* <d*@y23.org> wrote: Hi, >I see nothing wrong with using pyparsing. There are actually many ways to solve the problem here. If you expect to be using pyparsing more in the future and expect to have multiple users maintaining the code I'd keep it simple and just stick to one paradigm and stay in the pyparsing realm. > >I'm no expert at pyparsing yet, and find myself still continuing to learn it as I go. I've become a fan of pair programming as well where two of us are learning pyparsing at the same time. > >There is nothing wrong with regex. I use it where its needed, but in general, I have found I don't mix it with the pyparsing code modules. > >And by doing so I've managed to get several different 'grammars' now robustly working. > >I don't see the overkill argument. When one walks into a factory and looks at the machinery for example. Is overkill that a machine the size of a truck cuts the same pattern 2000 times a day to an endless supply of steel? I doubt it. The work probably was done in the past by less refined machines and more than likely you are looking at the latest and newest production model for that type of work. I see it as, "Why do extra work, when you can have code that has been tested and end up doing less work?" > >Good luck on which ever method(s) you choose to implement. And don't forget to have some fun while you are doing it :) >David > > >> -------Original Message------- >> From: Diez B. Roggisch <de...@we...> >> To: Hanchel Cheng <han...@br...> >> Cc: pyp...@li... <pyp...@li...> >> Subject: Re: [Pyparsing] Check for tabs >> Sent: Oct 29 '13 03:09 >> >> >> On Oct 29, 2013, at 12:41 AM, Hanchel Cheng <han...@br...> wrote: >> >> > Regardless of "all [I] have," I'd like to know if pyparser can check for a specific number of tabs between alphanumeric strings. If there are not two tabs between the 2nd and 3rd word, I'd like to error out. Is pyparsing truly overkill for this task? >> >> I think by now you have your answer: yes, you can do it with pyparsing, but IMHO it's overkill, if that's all you ask it to do. Probably even using a regex would be more opaque then necessary. >> >> I've use pyparsing happily quite a few times to e.g. parse CSS or small DSLs. But for this kind of thing, I'd use string-methods. >> >> Diez >> ------------------------------------------------------------------------------ >> Android is increasing in popularity, but the open development platform that >> developers love is also attractive to malware creators. Download this white >> paper to learn more about secure code signing practices that can help keep >> Android apps secure. >> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pyparsing-users mailing list >> Pyp...@li... >> https://lists.sourceforge.net/lists/listinfo/pyparsing-users >> > >------------------------------------------------------------------------------ >Android is increasing in popularity, but the open development platform that >developers love is also attractive to malware creators. Download this white >paper to learn more about secure code signing practices that can help keep >Android apps secure. >http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk >_______________________________________________ >Pyparsing-users mailing list >Pyp...@li... >https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > |
From: d* <d*@y23.org> - 2013-10-29 15:07:58
|
Hi, I see nothing wrong with using pyparsing. There are actually many ways to solve the problem here. If you expect to be using pyparsing more in the future and expect to have multiple users maintaining the code I'd keep it simple and just stick to one paradigm and stay in the pyparsing realm. I'm no expert at pyparsing yet, and find myself still continuing to learn it as I go. I've become a fan of pair programming as well where two of us are learning pyparsing at the same time. There is nothing wrong with regex. I use it where its needed, but in general, I have found I don't mix it with the pyparsing code modules. And by doing so I've managed to get several different 'grammars' now robustly working. I don't see the overkill argument. When one walks into a factory and looks at the machinery for example. Is overkill that a machine the size of a truck cuts the same pattern 2000 times a day to an endless supply of steel? I doubt it. The work probably was done in the past by less refined machines and more than likely you are looking at the latest and newest production model for that type of work. I see it as, "Why do extra work, when you can have code that has been tested and end up doing less work?" Good luck on which ever method(s) you choose to implement. And don't forget to have some fun while you are doing it :) David > -------Original Message------- > From: Diez B. Roggisch <de...@we...> > To: Hanchel Cheng <han...@br...> > Cc: pyp...@li... <pyp...@li...> > Subject: Re: [Pyparsing] Check for tabs > Sent: Oct 29 '13 03:09 > > > On Oct 29, 2013, at 12:41 AM, Hanchel Cheng <han...@br...> wrote: > > > Regardless of "all [I] have," I'd like to know if pyparser can check for a specific number of tabs between alphanumeric strings. If there are not two tabs between the 2nd and 3rd word, I'd like to error out. Is pyparsing truly overkill for this task? > > I think by now you have your answer: yes, you can do it with pyparsing, but IMHO it's overkill, if that's all you ask it to do. Probably even using a regex would be more opaque then necessary. > > I've use pyparsing happily quite a few times to e.g. parse CSS or small DSLs. But for this kind of thing, I'd use string-methods. > > Diez > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Diez B. R. <de...@we...> - 2013-10-29 07:09:15
|
On Oct 29, 2013, at 12:41 AM, Hanchel Cheng <han...@br...> wrote: > Regardless of "all [I] have," I'd like to know if pyparser can check for a specific number of tabs between alphanumeric strings. If there are not two tabs between the 2nd and 3rd word, I'd like to error out. Is pyparsing truly overkill for this task? I think by now you have your answer: yes, you can do it with pyparsing, but IMHO it's overkill, if that's all you ask it to do. Probably even using a regex would be more opaque then necessary. I've use pyparsing happily quite a few times to e.g. parse CSS or small DSLs. But for this kind of thing, I'd use string-methods. Diez |
From: Paul M. <pt...@au...> - 2013-10-29 04:49:19
|
One of pyparsing's default behaviors is to skip over whitespace, so when whitespace is part of your parser, you sometimes have to take some extra steps. In the case of <TAB>s, you need to override pyparsing's default of replacing tabs in the input string with spaces. Otherwise, the input string first gets run through str.expandtabs before being parsed, and the tab expressions in your grammar will never match. Here's how: word = Word(alphas) tab = White('\t', exact=1) data = word + tab + word + tab + tab + word # this step is important data.parseWithTabs() s = 'name\tdate\t\tlocation' data.parseString(s).asList() ['name', '\t', 'date', '\t', '\t', 'location'] If you leave out this call, you'll get a ParseException because your input string will have been expanded to 'name date location', with no tabs. -- Paul -----Original Message----- From: Hanchel Cheng [mailto:han...@br...] Sent: Monday, October 28, 2013 3:52 PM To: pyp...@li... Subject: [Pyparsing] Check for tabs Hello! I'm new to pyparsing and relatively new to Python as well. I'm really astounded by the multitude of functionality that pyparsing has. My question is pretty basic: I have a string 'name\tdate\t\tlocation'. What would I do to ensure that between the 'name' and 'date' there is exactly one tab and between 'date' and 'location' there is exactly two tabs? Thanks, Hanchel ---------------------------------------------------------------------------- -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Mario R. O. <nim...@gm...> - 2013-10-29 01:05:58
|
I'm by no means an expert, but a couple of years ago I did finish what I like to call a medium difficulty parser. Yes you can definetly do that with pyparse and regex Dtb/Gby ======= Mario R. Osorio "... Begin with the end in mind ..." http://www.google.com/profiles/nimbiotics On Mon, Oct 28, 2013 at 6:41 PM, Hanchel Cheng <han...@br...>wrote: > Regardless of "all [I] have," I'd like to know if pyparser can check for a > specific number of tabs between alphanumeric strings. If there are not two > tabs between the 2nd and 3rd word, I'd like to error out. Is pyparsing > truly overkill for this task? > > Thanks, > Hanchel > > -----Original Message----- > From: Diez B. Roggisch [mailto:de...@we...] > Sent: Monday, October 28, 2013 1:58 PM > To: Hanchel Cheng > Cc: pyp...@li... > Subject: Re: [Pyparsing] Check for tabs > > > > > I'm new to pyparsing and relatively new to Python as well. I'm really > astounded by the multitude of functionality that pyparsing has. > > My question is pretty basic: > > I have a string 'name\tdate\t\tlocation'. > > What would I do to ensure that between the 'name' and 'date' there is > exactly one tab and between 'date' and 'location' there is exactly two tabs? > > If that's all you have, I would forego pyparsing and use single > string-functions like > > name, date, _, location = line.split("\t") > > plus of course some error-handling. > > Diez > > > > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Hanchel C. <han...@br...> - 2013-10-28 22:41:23
|
Regardless of "all [I] have," I'd like to know if pyparser can check for a specific number of tabs between alphanumeric strings. If there are not two tabs between the 2nd and 3rd word, I'd like to error out. Is pyparsing truly overkill for this task? Thanks, Hanchel -----Original Message----- From: Diez B. Roggisch [mailto:de...@we...] Sent: Monday, October 28, 2013 1:58 PM To: Hanchel Cheng Cc: pyp...@li... Subject: Re: [Pyparsing] Check for tabs > > I'm new to pyparsing and relatively new to Python as well. I'm really astounded by the multitude of functionality that pyparsing has. > My question is pretty basic: > I have a string 'name\tdate\t\tlocation'. > What would I do to ensure that between the 'name' and 'date' there is exactly one tab and between 'date' and 'location' there is exactly two tabs? If that's all you have, I would forego pyparsing and use single string-functions like name, date, _, location = line.split("\t") plus of course some error-handling. Diez |
From: Diez B. R. <de...@we...> - 2013-10-28 20:58:01
|
> > I'm new to pyparsing and relatively new to Python as well. I'm really astounded by the multitude of functionality that pyparsing has. > My question is pretty basic: > I have a string 'name\tdate\t\tlocation'. > What would I do to ensure that between the 'name' and 'date' there is exactly one tab and between 'date' and 'location' there is exactly two tabs? If that's all you have, I would forego pyparsing and use single string-functions like name, date, _, location = line.split("\t") plus of course some error-handling. Diez |
From: Hanchel C. <han...@br...> - 2013-10-28 20:52:37
|
Hello! I'm new to pyparsing and relatively new to Python as well. I'm really astounded by the multitude of functionality that pyparsing has. My question is pretty basic: I have a string 'name\tdate\t\tlocation'. What would I do to ensure that between the 'name' and 'date' there is exactly one tab and between 'date' and 'location' there is exactly two tabs? Thanks, Hanchel |
From: Hans-Peter J. <hp...@ur...> - 2013-09-23 23:28:40
|
Got it, it was a matter of excluding the right things.. Sorry for disturbance, Pete |
From: Hans-Peter J. <hp...@ur...> - 2013-09-23 13:04:41
|
Something removed the script. Hmm. Inlined below.. On Montag, 23. September 2013 14:19:40 Hans-Peter Jansen wrote: > Hi, > > after years of creating hand crafted parsers for many reasons, a new task > smelled like being a good candidate for starting with pyparsing. The first > steps look very promising, BTW. The fiddling with regexp can be very mind > boggling, while using such more or less simple python expressions is much > handier.. > > I have to process some machine generated PDF-content, where I don't have any > influence on the creating side. > > After extracting text with PDFMiner, I have to parse what you would some > people call an unholy mess.. The major point is, it is dependent on line > breaks, and empty lines. > > Attached is my starting point. Excuse some german labels please... > > The script tries to parse the address data in three different forms, but > address1 is the one that creates problems. The 4th address in the test data > contains such a biest. The problem here is, the line between "Herr Pumuckl" > and "Bibi Blocksbergstrasse" contains a blank. I try to detect an empty line > with: > > ParserElement.setDefaultWhitespaceChars(' \t\r') > NL = LineEnd().suppress() > empty = (NL + NL).suppress() > > Although, the blank is part of default whitespace chars, it seems to get in > the way for the empty expression test. Why? > > Let me know, if the script is still to complex, I can reduce it, but this > might help those, that tries to archive something similar.. > > Thanks in advance, > Pete # -*- coding: utf-8 -*- from pyparsing import * ParserElement.setDefaultWhitespaceChars(' \t\r') NL = LineEnd().suppress() empty = (NL + NL).suppress() line = restOfLine + NL line.setParseAction(lambda t: [t[0].strip()]) name1 = line('name1') name2 = line('name2') strasse = line('strasse') plz = Word(alphanums).setResultsName('plz') ort = line('ort') land = line('land') bestimmt = Literal(u'Bestimmt für').suppress() address1 = Group(name1 + name2 + empty + strasse + plz + ort + land) + empty address2 = Group(name1 + name2 + strasse + plz + ort + land) + bestimmt address3 = Group(name1 + strasse + plz + ort + land) + bestimmt address = empty + Suppress(u'Warenempfänger') + empty + (address1 ^ address2 ^ address3) teststr = u""" Warenempfänger Metronom Tick-Tack 12, Zone Industrielle Schéleck 22 3225 Bettembourg Luxemburg Bestimmt für Warenempfänger Humfti-Bumfti AG Herr Wichtig Landwehrstr. 1 34454 Bad Arolsen-Mengeringhausen Deutschland Bestimmt für Warenempfänger Fa. Simsalabim Im Acker 88 76437 Rastatt Deutschland Bestimmt für Warenempfänger Hotzenplotz GmbH Herrn Pumuckl Bibi Blocksberggasse 1 66955 Pirmasens Deutschland Warenempfänger Uga Uga Am Nashorn 66 66424 Homburg / Saar Deutschland Bestimmt für """ for idx, (tok, sloc, eloc) in enumerate(address.scanString(teststr)): try: print 'page %s: (0x%x, 0x%x): \n%s' % (idx, sloc, eloc, tok[0].asDict()) except ParseException, err: log.error('page %s: %s' % err) log.error(err.line) log.error(' ' * (err.column - 1) + '^') |
From: Hans-Peter J. <hp...@ur...> - 2013-09-23 12:45:43
|
Hi, after years of creating hand crafted parsers for many reasons, a new task smelled like being a good candidate for starting with pyparsing. The first steps look very promising, BTW. The fiddling with regexp can be very mind boggling, while using such more or less simple python expressions is much handier.. I have to process some machine generated PDF-content, where I don't have any influence on the creating side. After extracting text with PDFMiner, I have to parse what you would some people call an unholy mess.. The major point is, it is dependent on line breaks, and empty lines. Attached is my starting point. Excuse some german labels please... The script tries to parse the address data in three different forms, but address1 is the one that creates problems. The 4th address in the test data contains such a biest. The problem here is, the line between "Herr Pumuckl" and "Bibi Blocksbergstrasse" contains a blank. I try to detect an empty line with: ParserElement.setDefaultWhitespaceChars(' \t\r') NL = LineEnd().suppress() empty = (NL + NL).suppress() Although, the blank is part of default whitespace chars, it seems to get in the way for the empty expression test. Why? Let me know, if the script is still to complex, I can reduce it, but this might help those, that tries to archive something similar.. Thanks in advance, Pete |
From: Thomas G. <th...@go...> - 2013-08-28 21:31:37
|
On Wed Aug 28 2013 10:07:14 PM CEST, Paul McGuire <pt...@au...> wrote: > Thomas - > > Sorry about this inconvenience, I tried my best to maintain cross-version > compatibility overall, but Python3 had just one incompatible syntax > feature too many, forcing me to cut the Gordian Knot. > > Assuming that the default Python version installed on Debian is at least > version 2.6, then you should be safe in adopting the version-unified > pyparsing 2.0.1 and beyond, as this version works with Python 2.6 and up > in the 2.x path, and Python 3.0 and up on the 3.x path. FYI Wheezy has v2.6 & 2.7, and Sid/Jessie only 2.7. Thomas |
From: Paul M. <pt...@au...> - 2013-08-28 20:07:28
|
Thomas - Sorry about this inconvenience, I tried my best to maintain cross-version compatibility overall, but Python3 had just one incompatible syntax feature too many, forcing me to cut the Gordian Knot. Assuming that the default Python version installed on Debian is at least version 2.6, then you should be safe in adopting the version-unified pyparsing 2.0.1 and beyond, as this version works with Python 2.6 and up in the 2.x path, and Python 3.0 and up on the 3.x path. I tried to convey this to Julian earlier this week, would you please inform the rest of the Debian team? Thanks, -- Paul -----Original Message----- From: Thomas Goirand [mailto:zi...@de...] Sent: Wednesday, August 28, 2013 2:37 PM To: Julian Taylor Cc: pyp...@li... Subject: Re: [Pyparsing] pyparsing python2 python3 tarball, future plans On 08/27/2013 07:20 PM, Julian Taylor wrote: > Hello, > In the 2.0.0 the splitting of the source in two tarballs for python2 > and > python3 was troublesome for some distributions, e.g. Debian had to add > a second source package in order to package it. Yes, it was. I had to go through Debian experimental for the upload, then migrate both packages to SID once the new package was done. > Now I discovered that in 2.0.1 this splitting was reverted again. This is really annoying. I'm not even sure how I will be able to deal with this, probably I will have to ping the Debian FTP masters, and go through Experimental once more. > So to avoid unnecessary work in Debian (and all its derivatives) I'd > like to inquire if this same source python2 and python3 tarball is > intended to stay for the foreseeable future or if it might be split > again in the next release. Likewise. I also would recommend that 2 packages are created (and by that, I mean with 2 different names!!!), as it is very confusing for everyone. Something like "pyparsing-oldpython" would do, for example. Anyway, thanks Julian for letting me know. Otherwise, I would have missed it. Thomas ---------------------------------------------------------------------------- -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Thomas G. <zi...@de...> - 2013-08-28 19:56:37
|
On 08/27/2013 07:20 PM, Julian Taylor wrote: > Hello, > In the 2.0.0 the splitting of the source in two tarballs for python2 and > python3 was troublesome for some distributions, e.g. Debian had to add a > second source package in order to package it. Yes, it was. I had to go through Debian experimental for the upload, then migrate both packages to SID once the new package was done. > Now I discovered that in 2.0.1 this splitting was reverted again. This is really annoying. I'm not even sure how I will be able to deal with this, probably I will have to ping the Debian FTP masters, and go through Experimental once more. > So to avoid unnecessary work in Debian (and all its derivatives) I'd > like to inquire if this same source python2 and python3 tarball is > intended to stay for the foreseeable future or if it might be split > again in the next release. Likewise. I also would recommend that 2 packages are created (and by that, I mean with 2 different names!!!), as it is very confusing for everyone. Something like "pyparsing-oldpython" would do, for example. Anyway, thanks Julian for letting me know. Otherwise, I would have missed it. Thomas |
From: Paul M. <pt...@au...> - 2013-08-27 17:53:27
|
Julian - No, this is the way things will stay for the foreseeable future. This single version 2.0.1 and onward works for Python 2.6, 2.7, and 3.x. 1.5.7 is retained only for versions of Python 2.5 and older. Thanks for asking! -- Paul -----Original Message----- From: Julian Taylor [mailto:jta...@go...] Sent: Tuesday, August 27, 2013 12:20 PM To: pyp...@li... Cc: Thomas Goirand Subject: pyparsing python2 python3 tarball, future plans Hello, In the 2.0.0 the splitting of the source in two tarballs for python2 and python3 was troublesome for some distributions, e.g. Debian had to add a second source package in order to package it. Now I discovered that in 2.0.1 this splitting was reverted again. So to avoid unnecessary work in Debian (and all its derivatives) I'd like to inquire if this same source python2 and python3 tarball is intended to stay for the foreseeable future or if it might be split again in the next release. Best Regards, Julian Taylor |
From: Julian T. <jta...@go...> - 2013-08-27 17:20:21
|
Hello, In the 2.0.0 the splitting of the source in two tarballs for python2 and python3 was troublesome for some distributions, e.g. Debian had to add a second source package in order to package it. Now I discovered that in 2.0.1 this splitting was reverted again. So to avoid unnecessary work in Debian (and all its derivatives) I'd like to inquire if this same source python2 and python3 tarball is intended to stay for the foreseeable future or if it might be split again in the next release. Best Regards, Julian Taylor |
From: Peng Yu <pen...@gm...> - 2013-07-21 16:43:57
|
Hi, Things like fnumber in http://pyparsing.wikispaces.com/file/view/fourFn.py/30154950/fourFn.py is very convenient to use. I'm wondering if this can be considered to be put in pyparsing so that users don't have to copy and paste from pyparsing examples. Thanks. -- Regards, Peng |
From: Mika S. <mik...@gm...> - 2013-04-15 07:25:00
|
Hi, Thanks for advice ! I certainly will be using multiprocessing module. To clarify a bit more about memory consumption problem. I am using latest 2.0.0 pyparsing and python 3.3.1. If I try using my class e.g. like below (with scanString), I will get unwanted effect. psp = parser.parseBlock(script) for i in psp: print(i) If I won't go through the generator/psp, I won't get raising memory effect. I have tried removing my custom token class, but still no luck. Thanks for advices -Mika On Fri, Apr 12, 2013 at 12:01 PM, Diez B. Roggisch <de...@we...> wrote: > Hi, > > > > I do have small application which is using pyparsing from multiple > > threads. The pyparsing is singleton and also the actual parseString is > > inside Lock()s, so it should be thread safe. (Below cuts from the script) > > > > The problem is that after the parseBlock has returned the ParseResults > > for me, and I go through the whole list, I can not get free the the full > > ParseResult dictionaries and if my parseBlock is called quite many times > > with different scripts, I end in the situation where I do have huge > amount > > of dictionaries (len(objgraph.by_type('dict'))) and loosing memory bit by > > bit. I have tried deleting the entries with del, but haven't fully > figured > > out the correct way of cleaning the ParseResults. How could I do the > > deleting for returned ParseResults ? > > > > I have tested using both scanString and parseString for my case, but I > > think parseString would be more suitable. And both raises the memory > usage. > > > > Thank you very much for any tips and huge thanks for pyparsing, > > Can't comment on MT-fitness for pyparsing, but one thing I know for sure: > Python multiprocessing module is a *blast*, and it will give you proper > scaling and is easy to use. > > So maybe you can shell out the parsing to a multiprocessing-worker, > reaping the benefits of real parallelization + processes which can be > destructed & re-created to deal with any memory issues whatsoever? > > Diez > > |
From: Diez B. R. <de...@we...> - 2013-04-12 09:01:42
|
Hi, > I do have small application which is using pyparsing from multiple > threads. The pyparsing is singleton and also the actual parseString is > inside Lock()s, so it should be thread safe. (Below cuts from the script) > > The problem is that after the parseBlock has returned the ParseResults > for me, and I go through the whole list, I can not get free the the full > ParseResult dictionaries and if my parseBlock is called quite many times > with different scripts, I end in the situation where I do have huge amount > of dictionaries (len(objgraph.by_type('dict'))) and loosing memory bit by > bit. I have tried deleting the entries with del, but haven't fully figured > out the correct way of cleaning the ParseResults. How could I do the > deleting for returned ParseResults ? > > I have tested using both scanString and parseString for my case, but I > think parseString would be more suitable. And both raises the memory usage. > > Thank you very much for any tips and huge thanks for pyparsing, Can't comment on MT-fitness for pyparsing, but one thing I know for sure: Python multiprocessing module is a *blast*, and it will give you proper scaling and is easy to use. So maybe you can shell out the parsing to a multiprocessing-worker, reaping the benefits of real parallelization + processes which can be destructed & re-created to deal with any memory issues whatsoever? Diez |