pyparsing-users Mailing List for Python parsing module (Page 17)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Boštjan J. <ml...@ja...> - 2008-12-03 07:59:39
|
Hello! I'd like to parse a page by searching if it contains searched word and after that I have a known syntax. Let me explain: <a lot of text with unknown length with possible line breaks> <searched word> <known syntax to parse> ..... If I try to use Word(alphas), it stops at line 2 charachter 15 (is there a limit)? Should I just use pythons index command? Hope I was clear with the question/explanation. Boštjan |
From: spir <den...@fr...> - 2008-11-19 11:43:01
|
Here is how I imagine the right field of pattern lines -- as of now: wed 19 of nov 2008, 12:41 here. === What fits in? Post-match jobs. All actions that apply on, or use, results generated according to this precise the pattern. This includes: * mutation (Post-) Parse action that returns a transformed result. * task Parse action that performs an additional operation when a match is found -- but let the result unchanged. * format Kind of intermediate between mutation and task. Will not change the result's content, only its format. This includes packing (=Group-ing), concatenation (=Combine), maybe toList() and Dict(). * pattern name = result type Products, or relevant results, can get a 'nature' field. May be implemented as attribute or dict item. Says what kind of result it is, hence no need to reparse when kinds of results are not predictable. Appropriate actions can be launched directly. Grammar/Parser directives may let relevant results be automatically typed, using pattern name. * 'star' (?) (new idea) Product identification, kind of flag. A special code, e.g. '*' used to identify relevant patterns. Meaning the ones needed for further processing. === Order There is a kind of natural order, maybe. Could be for instance: action : star? (mutation | task | format)* name? === comments separator I am not fully happy of ':' beeing the separator between expression and action fields. All right, especially because it is same as the name/expression separator. But there may be better; not obvious enough to my taste, actually. Mutations Must obviously be defined elsewhere. This is outside the scope of grammar itself. Can be built-in types/functions such as int(). Typically, I guess, cope only with the result as argument. Mutations & Formats It is not clear for me if the formatted result should then really be of target type (e.g. list, int) or a ParseResults object holding this content. I am not sure to understand what really happens when using such functions in ParseResults itself. There are very heterogeneous cases: int() or whatever ParseAction returning new result object asList() ParseResults method Dict() Group() Combine() ParserElement subtypes === star: token vs product Paul chose to write a single-pass parser generator. This is perfectly good, especially when grammar in pure code ;-). Still, for the programmer, results do not all have the same status: some are intermediate results (I call them tokens), some are the results one needs to cope with after parsing (produts) -- even if only for output. I find it rather natural to make a kind of flagging possible. I have some use for that in mind, but probably there is more. Note: there is no absolute need to write down token patterns, they could be sub-expressions of products. But it highly enhances legibility and avoids repetition. This is even more relevant for a text grammar which first purpose is clarity. Actually, I would support 'staring' (or anything that has a similar semantics) even if it had no real use: because it makes sense... May also be stored on product itself, together with nature/role. Precisely, as I see it, this flag is particuliarly meaningful in combination with nature and/or role. It shows what's important - and what's not. The 'star' code would allow, after parsing, the programmer to sort out relevant results. This code would also allow, and filter, automatic actions such as naming, suppression, or whatever. These actions could be controlled with parser 'switch' directives (all default values to False). All of this mess can be written manually for patterns on which it applies. But global control seems clearer to me and avoids useless grammar overload. Now, this is nothing specific to text grammar. Instead, global control could exist for pyParsing coded grammar, like presently whitespace control. === grammar/parser directives Some rough ideas. whitespace _ respect_whitespace (default: ignore) _ list of (default: idem pyParsing) product _ all_products (default: no) avoids staring all _ product_list (default: []) alternative to staring in-line _ type_dict (default: {}) --> automatic instanciation (*) typing/naming _ type_all (default: no) _ type_products actions _ product_format (default: None) default format, e.g. Dict _ product_action (default: None) default conversion, e.g. int _ product_task (default: None) default side task, e.g. count, add, print _ all_format (default: None) default format, e.g. Dict _ all_action (default: None) default conversion, e.g. int _ all_task (default: None) default side task, e.g. count, add, print (*) This is a specific need of mine. No idea if it matches a common requirements. Objects will instanciated with a type defined by the product's nature, and init data taken from the product itself. Looks like: type_dict[product.nature](product) Could be a parse action applying to all products -- but products only. denis |
From: spir <den...@fr...> - 2008-11-19 09:51:21
|
Paul McGuire a écrit : > Ralph - > > Actually, spir/Denis is proposing a specialized form of BNF, which would be > compilable into pyparsing constructs (I think that's where we left this idea > last), probably using a pyparsing parser. Exactly. Would be highly useful for a present project of mine: I need to write the grammar at runtime because it depends on user choices. Then I realized that it may be worthwhile for other needs; kind of alternative, or desigh-time format. Reason why I let my explorations public, could be a collaborative work. Now, just ignore if you do not care. > I do agree, though, that writing up an actual example would be very > instructive, both in conveying the concepts, and in actually testing the > syntax out. How about this for a test structure, a log message: I do agree, too. Simply illustrating, testing and such things are not my cup of tee... each one his/her favorite playing field. If you're interested, contribute with what you like to do or what you are good at. > An integer message number > Date/time timestamp > Severity (Debug/Info/Warning/Error) > Message everything up to the end of line > > Example: > 10001 2008/11/18 12:34:56 Error A bad thing happened > > Use the syntax to name each field, each field in the date-timestamp, and > convert the integers to actual integers and the date-timestamp into a > datetime.datetime (that is a datetime object as imported from the Python > datetime module). Anyway,... some notes: * Nothing is fix in grammar format yet. All open to critics. Just propose better alternatives with rationale. * There may be parsing directives to allow programmer control (like preprocessor). * Right side is a field for all post-parsing actions including Group-ing Combin-ing, pattern naming (= result typing). * All patterns could be automatically named (using left-side name) with a "grammar directive": this feature does not fit here, as some patterns are not to keep (they define tokens, as opposed to products). * All "post-processed" patterns could be named: this would be all right here. No need to rename on right side like I did below -- all products would then hold their type like "naturally". * Some fields can be especially typed to catch their special *role* in a specific situation. I wrote the 'text' pattern to illustrate this feature with 'severity': results would get 'word' & 'severity' respectively as .nature & .role attributes. d : [0-9] number : d+ : int "number" date : d4 '/' d2 '/' d2 time : d2 ':' d2 ':' d2 date_time : date time : <> datetime.datetime "date_time" word : [printable]+ : text : word"severity" word* : <> str "text" message : number data_time text : () "message" NL : '\10' '\13' '\13\10' log : message (NL message)+ comments: ~ d - NL: There may be constants for such things --> [digit] [new_line] ~ date time: Use of ints as quantifiers seems both useful and easy. ~ date_time, text: <> means concatenation (= Combine()) ~ message: () or {} could mean packing (= Group()). Not to be confused with in-pattern expression grouping with (), too... {} for Dict() instead? See previous & next posts. I realize that a whole lot of charset constants could be worthwhile, too. Imo separation of pattern expression and post-parsing action is very good for legibility. About right side actions: I will send a separate post on the topic -- have a better image of what fits in, and how. > You can also look back on the examples I've posted over the past few years > on comp.lang.python and use one of them. > > -- Paul > > > -----Original Message----- > From: Ralph Corderoy [mailto:ra...@in...] > Sent: Tuesday, November 18, 2008 2:40 PM > To: spir > Cc: [pyParsing] > Subject: Re: [Pyparsing] text grammar -- first rough > > > Hi spir, > > I don't wish to discourage your using the list as a sounding board, but I > for one would have more inclination to read in detail if there was an > example at the top that contrasts how I'd have to write something now in > PyParsing compared to the better way I could write it with your suggestions. nou spennn yer prrishuz taym n inerdji ! *I* will not supply samples only for advocating. I like design instead. I do this for I wish to do it, because it fulfills a need of mine, and brings me pleasure. If *you* feel like having illustrations, if you think it would be worthwhile for others maybe: do it. denis > Cheers, > > > Ralph. > |
From: Paul M. <pt...@au...> - 2008-11-19 00:50:57
|
Ralph - Actually, spir/Denis is proposing a specialized form of BNF, which would be compilable into pyparsing constructs (I think that's where we left this idea last), probably using a pyparsing parser. I do agree, though, that writing up an actual example would be very instructive, both in conveying the concepts, and in actually testing the syntax out. How about this for a test structure, a log message: An integer message number Date/time timestamp Severity (Debug/Info/Warning/Error) Message everything up to the end of line Example: 10001 2008/11/18 12:34:56 Error A bad thing happened Use the syntax to name each field, each field in the date-timestamp, and convert the integers to actual integers and the date-timestamp into a datetime.datetime (that is a datetime object as imported from the Python datetime module). You can also look back on the examples I've posted over the past few years on comp.lang.python and use one of them. -- Paul -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...] Sent: Tuesday, November 18, 2008 2:40 PM To: spir Cc: [pyParsing] Subject: Re: [Pyparsing] text grammar -- first rough Hi spir, I don't wish to discourage your using the list as a sounding board, but I for one would have more inclination to read in detail if there was an example at the top that contrasts how I'd have to write something now in PyParsing compared to the better way I could write it with your suggestions. Cheers, Ralph. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Ralph C. <ra...@in...> - 2008-11-18 20:40:20
|
Hi spir, I don't wish to discourage your using the list as a sounding board, but I for one would have more inclination to read in detail if there was an example at the top that contrasts how I'd have to write something now in PyParsing compared to the better way I could write it with your suggestions. Cheers, Ralph. |
From: spir <den...@fr...> - 2008-11-18 17:52:57
|
purpose * lighter grammar to manipulate at design time * standard & easily legible format to communicate * conversion at design- or run-time -- adaptable design goals * clear clear clear * adapted to pyParsing's way * limited, simple, not full-featured contents: vocabulary organisation expressions actions features === v o c a b u l a r y ============================== This list is intended to help and identify elements, factors, views on the topic of text grammer for pyParsing. Help and not mess up things, notions, words. Feel free to change, add, redefine. Unsupported features are not here. literal Literal string to be taken as is. Usually in quotes or apostrophes. set choice Set as choice list of character literals. Expressed liked a (regexp-like) range. alternative choice Expression of a choice between alternative possibilities. In pyParsing FirstMatch='|'. Question: support of longest_match=Or='^'. row A sequence of items that must appear sequentially for a pattern to match. Expressed with ',' in BNF, with '+' in pyParsing code. Question: support of '&'=Each? quantifier Operator used to define the number of potential (sub-)expressions. Usually '?', '*', '+'. Possible support of literal numbers for exact length item Any part of a pattern expression: literal, token, range, or a (sub-)group. May be quantified, named, grouped. group Grouping of (sub-)expressions to allow use of quantity, naming, or any other overall action, on the whole group. Usually expressed as '(...)'. token (Sub-)Pattern result expressed for the sake of clarity or for avoiding repetition; appears in the expression of a (super-)pattern. Can be opposed to 'product'. product Pattern result relevant for the application, either as final result items or for further processing. Often contains tokens. Usually, packed, concatenated, and/or mutated. May be automatically named/typed. pack ~ pyParsing Group(). Expression of a product as a sequence of tokens. concatenation ~ pyParsing Combine(). Expression of a product as concatenation of tokens. mutation ~ pyParsing parseAction with returns a value. Expression of a product as transformation of a raw parse result, using a built-in or custom function. task ~ pyParsing parseAction with no return value. A task is an action executed when a match is found, but does not change the result. id, name Identifier of a pattern. Will become attribute 'nature' of its products. use, use case Use of a (sub-)pattern in a specific situation inside a super-pattern, and defined with a custom name. Will become attribute 'role' of its products. === o r g a n i s a t i o n ============================= For the sake of clarity, useless symbols, that do not actually express anything, will be avoided. E.g. standard (E)BNF ':==', ',' separator, final ';' and '[...]' for options will not be used. These characters still may be used in an expressive function. We need: * pattern name: left side * pattern expression: middle side * pattern action(s): right side As the name will become a code identifier, it will hold no whitespace. There is thus no need for a (visible) separator with the expression, whitespace is enough. Still, using e.g. ':' may help legibility. We need a separator for the action part of a line. It can be the same as the first separator, if any, or another one. I would propose as simplest solution to use ':' for both separation (id est BNF ':==' reduced to ':'). wich leads to name : expression : action --------------------------------------------- integer : [0-9]+ : int decimal : integer '.' integer : float number : decimal | integer : "num" add : {number '+' number} : "add" calc : add+ where: * action without quote is function cal * action with quotes is pattern naming (--> result nature) * {} may be packing = Group-ing pattern : name ':' expression (':' action*) === e x p r e s s i o n ========================= Here are pattern expression elements described; and also expressed using the in progress specification of pyParsing"s text grammar itself ;-) === literal A literal is expressed in quotes. We may chose only apostrophs (simple quotes) for this, which allows use of double quotes for another function -- such as identifying use cases, or concatenation. Backslash is used for escaping single chars which happen to be grammar codes -- first of all, backslash itself. It is also used to identify characters by ordinals (code numbers). I would rather support only decimal & hex (with 'x') ordinals; and that these ordinals have a fix length of resp 3 & 2 digits -- left-padding with '0'. This is more legible & simpler, esp. when a number follows an ordinal. (below dd means decimal digit, hd means hex digit) char : char_literal | '\\' dd dd dd | '\\x' hd hd literal : '\'' char+ '\'' === set/choice A set is written inside []; it may contain both characters and ranges. Unlike pyParsing srange() function, it basically expresses a *choice* with no further need of Or, FirstMatch, or oneOf. For instance, [A-Z] is more or less equivalent to FirstMatch(list(srange("[A-Z]"))) or Word(srange("[A-Z]"),exact=1). A set can be negated with a negation code: I would support the use of '!' for this role. See below quantifiers for expression of Word(set). Ranges are expressed as usual with a pair of characters separated with '-'. A range can also be named using a pyParsing term (like 'nums'(*)). Characters can be identified either literally, or using their ordinal (code number) preceded par the escape char. range : (char '-' char) | range_name set : '!'? '[' (range | char)+ ']' Ranges could be renamed! Eg nums --> digit or dec_digit. Also, no plural: set, which hold ranges, express choices. === token A token inside another pattern is identified by its name with no further character. token : identifier === group A group of tokens is built with (...). This allows use of quantities, role identifying, maybe more. group : '(' expression ')' === item row Items that must appear in a row are simply written one after another. A space is sufficient separator, no use of ',' or '+' is needed. I would support that a space separates items in row even if not necessary for parsing: this seems both simpler and clearer. row : item (' ' item)+ === quantifier todo ****************** === a c t i o n s ==================================== Actually, naming, packing, concatenation, as well as what is called parse actions, all can be seen as parse actions; in the sense that they are additional jobs done on, or with, a result -- once this result has been distinguished. Right? This applies both for actions on results of full patterns and ones executed on sub-expressions. About the latter case, I wonder if including this feature in a text grammar is really useful. Except for naming of *roles*. Reasons are: ~ It may rarely be used. (?) ~ It complicates the grammar. ~ The programmer can define a sub-pattern to apply the action on. ~ The action can be added once the text grammar is converted into code. If needed, the syntax should use the same codes as the ones for whole pattern actions, so that these codes should allow wrapping a sub-expression. I will let this aside for now, only keeping this requirement in mind, and concentrate on actions on whole patterns. As all these kinds of parsing actions apply on results, they all fit well in the right side of a line. This has the side advantage of avoiding expression overload. Examples: integer : [0-9]+ : int "int" --> convert integer to python int; set its 'nature' attribute to "int" decimal : integer '.' integer : <> float --> concatenate decimal, convert it to python float mult : num '+' num : {} --> pack mult into pyParsing Group (sequence). If find the organisation really convenient; and the choice of codes rather easy to use. An alternative would be to use () for packing=Group-ing, which would let {} free for Dict-ing. But then it would not be possible anymore to pack_Group() sub-expressions (which I personly do not find a serious drawback -- but others may). Naming of pattern use cases, i.e. result roles, happens rather naturally inside expressions: decimal : integer "l_int" '.' integer "r_int" : <> float Now, remains a possible issue: actually, there are 2 kinds of parse actions: * Tasks (think at Pascal procedures) execute an additional job when a matching string is found. Example: count the number of integers, add them, print them. We could call 'tasks' this kind of actions. Naming can be seen as a kind of task. * Mutations return a transformed result. Very different, I guess. Grouping and combining, even if they do not really change the result's content, should rather be seen as mutations as they change at least its type. The point is that both kinds of actions are implemented the same way. The reason why they are, logically enough, both called 'actions'. But I guess they obviously do not serve the same kind of purposes. They have a very different sense, meaning a different semantic from the point of view of the programmer. Tasks can be seen like parsing side-effects. Mutations also mean that the potential results of the pattern are relevant for the application, they are products. As a consequence, I would prefere expressing tasks and mutations differently. This would also allow distinction of relevant results. We may prefix mutations with e.g. '->', which is imo quite obvious: integer : [0-9]+ : ->int "int" decimal : integer '.' integer : <> ->float Now, I have no clue of which code would be sensible & meaningful for tasks, if any. === n a m i n g , t y p i n g ==================================== When a result is generated by a whole pattern, then the pattern's name, if any, defines the result's type -- or more precisely its *nature*. When results are relevant, they should be able to hold their nature, by default. When a (sub-)pattern is used inside one or more other (super-)patterns, then it can express that its results have several possible *roles*. Below, 'integer' and 'decimal' define product natures. I used setNames("id") for this. integer = Word(nums).setName.setName("integer") decimal = Combine(integer("int_part") + '.' + integer("dec_part")).setName("decimal") num = decimal | integer mult = num("left-num") + '*' + num("right-num") Depending on the situation, the integer pattern can be used as int_part or dec_part of a decimal. These terms thus define product roles instead. Moreover, both integers and decimals can be nums, and as such used as left- or right- operands of a multiplication, which are different roles again. I find all of this relevant. I would really like that both kinds of pattern naming / result typing are possible. The present implementation of pyParsing is so that achieving this goal is not straitforward -- but still possible. We could do it so: * The programmer selects which kinds of results are relevant: switches in text grammer can define which results should then know about their type (nature and/or role). * Patterns know 'who' they are: write code grammar in a scope (class or separate module), use the scope's dict to set an 'id' attribute on each pattern. No need to name whole patterns manually, as anyway their name is the variable name. Still, let it possible to use another name? Rather call this 'id' than 'name' which is used for debugging. * Patterns can get define a use case: implemented with SetParseResults("use") or simply ("use") pseudo-call. We can copy it to a 'use' attribute to avoid confusion. * Results, or only selected ones, hold a reference to their source pattern: either with a parse action that adds a dict item, or as an argument at result's initialisation. * Selected results read relevant info from pattern: either with an additional parse action, or at initialisation when they have the proper reference. Storage can be done as attribute or dict item. === f e a t u r e s =================================== List of not basic, yet unsupported, features to think at. Note that few additional codes or syntactic forms may well compromise the grammar's overall readibility. Furthermore, individual operations can easily be added by hand once the text grammar is translated to pyParsing code. Not to forget is the main goal of clarity that fits well with decomposition of complex expressions. As a consequence, here is a list of criteria to consider for supporting a feature: --> very common need (how to measure that?) --> difficult to express by combining existing ones --> obvious format that fits well in the grammar operations * '!' or '~' not (overall negation) * '&'=Each() (unordered And) * '^'=Or() (longest match) * FollowedBy() (look ahead - right side condition) * '~'=NotAny (!FollowedBy()) * SkipTo() (as name says) * Forward --> recursive paterns: useless in text grammar? tokens * Keyword() * White() * QuotedString() * CharsNotIn() * overall CaseLess() * Regexp helpers * nestedExpr() * delimitedList() * OnlyOnce() actions * Suppress() * Dict() * replaceWith() switches / parameters Switches all False by default (easy to remember). Params have same default value as by pyParsing (ditto). Switches/params can change (between lines). * auto naming of all patterns * auto naming of pattern class * define important (product) / incidental (token) results * respect/ignore whitespace * set of whitespace chars |
From: Paul M. <pt...@au...> - 2008-11-16 18:52:29
|
Denis - Thanks for your contributions on this list - please don't be discouraged if you don't get many replies to your messages. The readers here are mostly lurkers (not that there's anything wrong with that!), or folks who post with particular questions they need answered. I too look at the list as something of an archive of past pyparsing discussions. I've had a chance to look over your notes in brief, but I really want to give them more thought and consideration than I can spare just now. In general, I would say that the best way to start prototyping the integration of your ideas with pyparsing as it exists is through parse actions (for embellishing parse results) and helper methods (to simplify the construction of expressions, or linking them to parse actions). Whatever you do, I *strongly* suggest that you make these enhancements under the control of the developer, and not automatically applied across the board. Pyparsing creates many intermediate parse expressions when building the grammar, and parse results when parsing the source text, many of which get either discarded or absorbed into a larger expressions. Also, if your enhancements stay in the realm of extensions that users may add or not of their own choosing, then forward compatibility of existing code will be preserved, and it will be easier to add your ideas to the core pyparsing code. Here is one idea that might address your question about linking results to the original pyparsing expression (untested): from pyparsing import * def linkToResults(expr): def parseAction(tokens): tokens["expr"] = expr expr.addParseAction(parseAction) integer = Word(nums).setName("integer") decimal = Combine(integer("int_part") + '.' + integer("dec_part")) linkToResults(decimal) print decimal.parseString("3.14159").dump() Which prints: ['3.14159'] - dec_part: 14159 - expr: Combine:({integer "." integer}) - int_part: 3 Now the parse results that get created by expr will contain an added field named "expr" that points back to the original expression (as shown by the dump() output). If this works well as a prototype, then it may be just as easy to add it as a member function of ParserElement, so that any expression can get linked back to from its results. -- Paul |
From: spir <den...@fr...> - 2008-11-16 18:27:59
|
Hello, pyParsing world! [rough version -- I have not read over this message] -0- intro ======== Now is time for serious things! Below a kind of study on this subject I have brought up & approached several times already: result typing. Here is 'type' primarly used in its non-technical sense. Probably there are whole parsing use fields where result types are not such important. Each time, in fact, the types of results is predictable, they need not be explicitely defined. For instance, a file may contain, or one may extract, only data of a single type. Or a file format may define types of data in a constant repeated order, such as x y color x y color... Still the general situation, I guess, is so that we cannot predict in which order types of valuable data will happen in source texts, so that having them in the results would be highly helpful. This especially applies when parsing texts written in any king of /language/. The type of a result is similar to the one of any kind of data item: it carries the sense of the result. Without it, we are unable to do anything out of it; like without a type we, as well as the language "decoder" itself, do not even know what kind of operation may apply to a bit of data. If the results do not hold their type, we are obliged to re-parse them only to determine what kind of thing they are. pyParsing provides two functions to give patterns names -- I will talk about them later. -1- pattern idS, result typeS ======== What is the name of a pattern?; what is the type of a result? What is, actually, the link between patterns & results? Patterns define or generate results. They are /classes/ of results, in a similar manner as (programming) types are classes of instances. Actually, patterns could be (programming) types -- but this wouldn't fit in pyParsing. Results are like pattern samples, they share characteristics which are specified in/by patterns. A pattern identifier (name, id) thus defines its potential results' type. pattern object <--> result type pattern object id <--> result type id Now, there are actually several kinds of patterns IDs matching several kinds of result types: integer = Word(nums).setName("int") decimal = integer("int_part") + '.' + integer("dec_part") Basically, a pattern usually defines the /nature/ of results, like in the first line above. Now, a single pattern may have several use cases, like in the second, like, which define several results' /roles/. I intentionally used setName to define pattern names and setResultName (abbreviated in call) to define use cases -- but obviously nothing forces us to de that. The example can be extended to show the difference between result nature and role more accutely: integer = Word(nums).setName("integer") decimal = Combine(integer("int_part") + '.' + integer("dec_part")).setName("decimal") num = decimal | integer mult = num("left-num") + '*' + num("right-num") Both integers & decimals (nature) may be left-nums or right-nums (role). pattern id <--> result nature pattern use <--> result role Depending on the application, results nature, role or both may be relevant information. -2- pyParsing ======================= As I have used pyParsing for a few weeks only, I may say stuppid things. But, i have tried hard to find friendly ways to get such info from parse results -- I could not find any. Actually, I ended up with: * additional data to patterns * a custom result type * changes in pyParsing code First, patterns basicaly do not know anything about themselves. Especially, they do not know they are, not even their (variable) name. If patterns would be types, they would know it; but custom type do not have a __name__ attribute to receive their (variable) name. Pity. We nevertheless can give a pattern a name with setName, or setResultsName. The main problem anyway is that there no interconnection between patterns and results. A result have no access to the pattern that yielded it, not even a simple reference. A pattern only passes ResultsName at result init time. ResultsName only pattern --o--> results pattern <--x-- results nothing An additional obstacle comes from the protection of results access by __slots__, for performance reasons, which prevents setting/reading custom attributes. Fortunately, patterns are not protected. -3- letting patterns know =============== We can use a simple trick to let patterns know a bit about themselves. If they are put in a scope (e.g. separate module or class), we have access to a dict that holds together names and objects. With that information, we have all we need to tweak in patterns guts. Assuming the Grammar is in a class, we could even have a class method to do the job. [Note: the name can't be called '.name', as this name (!) is used by pyParsing to format pattern repr output, esp. for error display.] It may look like that: class Grammar(object): ''' pyParsing grammar ''' integer = Word(nums) decimal = Combine(integer("int_part") + '.' + integer("dec_part")) num = (decimal | integer).setName("num") mult = Group(num("left-num") + '*' + num("right-num")) calc = OneOrMore(mult) @classmethod def _setNames(Grammar): ''' give patterns their name ''' # exclude '-*' names attribs = Grammar.__dict__.items() namedPatterns = filter(lambda (name,pattern): name[0]!='_', attribs) # set .id attributes for (name,pattern) in namedPatterns: pattern.id=name Grammar.patterns = [pattern for (name,pattern) in namedPatterns] Grammar._setNames() for pattern in Grammar.patterns: print "%s: %s" %(pattern.id,pattern) ===>> num: num integer: W:(0123...) calc: {Group:({{num "*"} num})}... decimal: Combine:({{W:(0123...) "."} W:(0123...)}) mult: Group:({{num "*"} num}) Now, we have a proper tool to automatically name patterns. Manual setName is no more necessary, it can serve more specific needs such as delivering clearer info to users. We are ready to transmit results information about their use. ResultsName can set info about use cases. -4- results structure ==================== I have posted a message displaying a type called 'Data'. [Still not really allright, I have discovered a bug.] If ever results magically could receive the information that patterns now hold, about their nature and role, we could use Data objects to properly hold and display typed results. Output may then look like that: calc:[mult:[dec:1.1 <str>:* int:2] mult:[int:1 <str>:* dec:2.2]] The types shown like prefixes would be taken from the most accurate information available: * role = pattern use (e.g. left_num) * nature = pattern id (e.g. integer) * pattern format (as presently held in .name, e.g. W:(0123...)) * result type (e.g. <str> or <int>) Now, we have to find a way to let the results know about that. -5- passing info to results ===================== As results have no access to patterns, we are presently blocked. If we just gave them a reference to the patterns, we would be unblocked. I did some explorations & trials, and it seems allright. Things to do: * Add specific fields to patterns: id, nature * Add a reference to pattern at result's instanciation. This happens 3 times in the method _parseNobuffer of the class ParserElement. 'self' can be added there as new argument for result initialisation. For instance: retTokens = ParseResults(tokens, self.resultsName, asList=self.saveAsList ,modal=self.modalResults, pattern=self) this arg becomes a 'pattern' param in ParseResult's __new__ & __init__ methods. * Add private attribs to ParseResult. In __init__: self.__pattern = pattern self.__nature = pattern.id self.__role = pattern.use And matching accessors (because access is protected). E.g.: def pattern(): return self.__pattern denis |
From: spir <den...@fr...> - 2008-11-16 10:14:49
|
Hello, [As I don't if there anybody else on this list, well... I use it like a log for ideas and trials using pyParsing; and an oppportunity to express them clearly (?). denis] Here is an implementation of a custom type used to give parse results a alternative structure, and an illustration of what it is intended to. Data (sic!) is primarly used to natively give (nested) parse results a /type/. I will come back to this point of view that results should be typed in a further message. So, Data allow results to have a type -- like ordinary dat, hence the name -- and show in a type:content format. Actually, the implementation i uselessly complicated because presently it is able to receive content from several kinds of sources: final parse result, parse results created during the parsing process, ordinary data, objects that already are of type Data. This makes its typing and content reading overly complex. (ToDo: implement __new__ for the case when content of Data object.) Additionaly, it holds currently useless list-like operator overloading. For more specific use, it could be written in a dozen of lines, as it was before. I also added a Seq type to avoid a problem with built-in lists. Additionally, Data is able to receive content from any kind of simple or sequential object. The type property may be defined from several source, here listed from the most specific to the least one: * arg passed at init [ParseResults object only] * ResultsType retrieved from getName() [ditto] * pattern's .use or .ResultsType * pattern's .id or .name * pattern's type_name * result's own type_name Some sources of info listed above for typing an object belong in fact to further exploration about pattern naming that I will present in another post. Here is the Data thing: def typ(obj): return obj.__class__.__name__ class Seq(list): ''' specialized sequence type with improved str Override list's behaviour that str(list) calls repr instead of str on items. ''' def __str__(self): if len(self) == 0: return '[]' text = str(self[0]) for item in self[1:]: if isinstance(item, list): item = Seq(item) text += " ,%s" %item return "[%s]" %text class Data(object): ''' nestable type:content object with built_in toolset ''' def __init__(self, content, type=None, pattern=None): ''' store startup data ''' self.type = type # read info from pattern, if available self.read_pattern(pattern) # case content is ParseResults: extract proper info if isinstance(content,ParseResults): content, self.type = self.from_result(content, type) # case (new) content is Data object: copy if isinstance(content,Data): self.type, self.pattern = content.type, content.pattern self.content, self.isSimple = content.content, content.isSimple # case content is ordinary data: record it else: self.content = self.recursive_record(content) # define type if not given by user, nore read from pattern if not self.type: self.type = "<%s>" %typ(self.content) #print "* new Data - %s" %self def read_pattern(self,pattern): ''' if available, read info from pattern about source of result ''' self.pattern = pattern self.nature = self.role = self.pattern_type_name = None # get info about source of result if pattern: # pattern_type_name (e.g. Literal, MatchFirst, Group...) self.pattern_type_name = typ(pattern) # role <-- pattern.use: pattern use case try: self.role = pattern.use except AttributeError: self.role = pattern.ResultsName # nature <-- pattern name/id : pattern naming try: self.nature = pattern.id except AttributeError: try: self.nature = pattern.name except AttributeError: pass # if not yet set, try and define type from this info if not self.type: if self.role: self.type = self.role elif self.nature: self.type = self.nature elif self.pattern_type_name: self.type = "<%s>" %self.pattern_type_name def from_result(self,content,type): ''' define properties from result data ''' # try & set type from user-defined info if (not type) and content.getName(): type = content.getName() # jump inside Group if not sequence if len(content)==1: content = content[0] # take result as list if isinstance(content,ParseResults): content = content.asList() return content, type def recursive_record(self,content): ''' record content according to its structure ''' # return if isSimple if not isinstance(content,list): self.isSimple = True return content # === case complex / nested # mutate each nested item to Data object # may already be a Data object -- or not content = Seq(content) self.isSimple = False seq = Seq() for item in content: if isinstance(item,Data): seq.append(item) else: seq.append(Data(item)) return seq def treeView(self, noType=False, showGroup=False, level=0): ''' return full & legible tree view of object's data ''' tree = '' # this level's line tree += level * '\t' if not noType: tree+= "%s: " %self.type if self.isSimple or showGroup: tree+= "%s" %self.content_text() tree += "\n" # recursion for nested results if not self.isSimple: for item in self.content: tree += item.treeView(noType, showGroup, level+1) # final result return tree def leaves(self, noType=False): ''' return a flat list of 'terminal', low-level, object items -- actually called 'leaves' ''' seq = Seq() # case simple result : add content to seq if self.isSimple: if noType: seq.append(self.content) else: seq.append(self) # case compound result: recursively explore nested result else: for item in self.content: seq.extend(item.leaves(noType)) return seq def allFlat(self, noType=False): ''' return full flat list of object'items -- either compound or simple ''' seq = Seq() # in all cases : add content to seq if noType: seq.append(self.content) else: seq.append(self) # case compound result: recursively explore nested result if not self.isSimple: for item in self.content: seq.extend(item.allFlat(noType)) return seq def __len__(self): try: return len(self.content) except TypeError: return 0 def __getitem__(self,index): return self.content[index] def __getslice__(self,i1,i2): return self.content[i1:i2] def __repr__(self): ''' type:content format ''' return "%s:%s" %(self.type, self.content_text()) def content_text(self): ''' content expression for either simple or sequential content ''' # case simple content: just output as is if self.isSimple: return str(self.content) # case compound content: resursive text seq in [] else: text = str(self.content[0]) for item in self.content[1:]: text += " %s" %item return "[%s]" %text Below are illustrations for two use cases: -1- Parsing is done normally. The results feed a Data object. Both normal and Data results are printed, so that the difference is made clear. Additionally, tree view, 'leaves' & flat list of all-level nested results are also shown -- see my previous for more info about these latter things. Contained results are recursively converted into Data objects, so that all end up typed. -2- The parser is cheated to make it 'natively' return Data object instead of ParseResults one. Actually, for the sake of illustration, only important (named) result are converted. But it makes no difference to convert all, for anymay nested results will be recursively converted into Data objects. # === Data retrieved from final parse results ================== class Grammar(object): # tokens integer = Word(nums) integer.setParseAction(lambda i: int(i[0])) point = Literal('.') decimal = Combine(integer + point + integer) decimal.setParseAction(lambda x: float(x[0])) #decimal = Group(decimal)("dec") add = Literal('+') mult = Literal('*') # symbols num = decimal | integer mult_op = Group(num + mult + num)("mult_op") add_op = Group((mult_op|num) + add + (mult_op|num))("add_op") #group = Group(l_paren + in_op + r_paren)("group") operation = (add_op|mult_op) calcs = OneOrMore(operation)("calcs") calcs = Grammar.calcs # source text text = "1+2.2*3 4.4*5+6.6" print text # standard result results = calcs.parseString(text) print "=== standard results:", results # custom use & output data = Data(results) print "\n=== data:\n", data print "\n=== default treeview :\n", data.treeView() print "\n=== treeview with group w/o lead type:\n", data.treeView(showGroup=True, noType=True) print "\n=== show lowest-level flat sequence:\n", data.leaves() print "\n=== show lowest-level flat sequence w/o type:\n", data.leaves(noType=True) print "\n=== show flat sequence of items on all levels /lines :" for item in data.allFlat(): print item # === Data 'natively' returned by parsing process =============== class Grammar(object): # tokens integer = Word(nums) integer.setParseAction(lambda i: int(i[0])) point = Literal('.') decimal = Combine(integer + point + integer) decimal.setParseAction(lambda x: float(x[0])) #decimal = Group(decimal)("dec") add = Literal('+') mult = Literal('*') # symbols num = (decimal | integer)("num") mult_op = Group(num + mult + num)("mult_op") add_op = Group((mult_op|num) + add + (mult_op|num))("add_op") #group = Group(l_paren + in_op + r_paren)("group") operation = (add_op|mult_op) calcs = OneOrMore(operation)("calcs") #integer.addParseAction(toData) #decimal.addParseAction(toData) #mult_op.setParseAction(toData) #add_op.setParseAction(toData) #calcs.setParseAction(toData) @classmethod def _setToData(Grammar): patterns = filter(lambda(n,p): n[0]!='_', Grammar.__dict__.items()) print "patterns: %s" %([name for (name,pattern) in patterns]) named_patterns = filter(lambda(n,p): p.resultsName, patterns) print "named patterns: %s" %([name for (name,pattern) in named_patterns]) for name, pattern in named_patterns: pattern.setParseAction(lambda result: Data(result)) print print "\n========================================\n" Grammar._setToData() calcs = Grammar.calcs # standard result results = calcs.parseString(text) print "=== standard results holding data: %s:\n%s" %(results.__class__, results) data = Data(results) print "\n=== data: %s:\n%s" %(data.__class__,data) print "\n=== data treeview :\n", data.treeView() print "\n=== data leaves:\n", data.leaves() print "\n=== show flat sequence of items on all levels /lines :" for item in data.allFlat(): print item ====================================================== O U T P U T ====================================================== C:/prog/ACTIVE~1/pythonw.exe -u "D:/prog/parsing/Data.pyw" 1+2.2*3 4.4*5+6.6 === standard results: [[1, '+', [2.2000000000000002, '*', 3]], [[4.4000000000000004, '*', 5], '+', 6.5999999999999996]] === data: calcs:[<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] === default treeview : calcs: <Seq>: <int>: 1 <str>: + <Seq>: <float>: 2.2 <str>: * <int>: 3 <Seq>: <Seq>: <float>: 4.4 <str>: * <int>: 5 <str>: + <float>: 6.6 === treeview with group w/o lead type: [<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] [<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] 1 + [<float>:2.2 <str>:* <int>:3] 2.2 * 3 [<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6] [<float>:4.4 <str>:* <int>:5] 4.4 * 5 + 6.6 === show lowest-level flat sequence: [<int>:1 ,<str>:+ ,<float>:2.2 ,<str>:* ,<int>:3 ,<float>:4.4 ,<str>:* ,<int>:5 ,<str>:+ ,<float>:6.6] === show lowest-level flat sequence w/o type: [1 ,+ ,2.2 ,* ,3 ,4.4 ,* ,5 ,+ ,6.6] === show flat sequence of items on all levels /lines : calcs:[<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] <Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3] <float>:2.2 <str>:* <int>:3 <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6] <Seq>:[<float>:4.4 <str>:* <int>:5] <float>:4.4 <str>:* <int>:5 <str>:+ <float>:6.6 ======================================== patterns: ['mult_op', 'point', 'decimal', 'calcs', 'add', 'num', 'add_op', 'integer', 'operation', 'mult'] named patterns: ['mult_op', 'calcs', 'num', 'add_op'] === standard results holding data: <class 'pyparsing.ParseResults'>: [calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]]] === data: <class '__main__.Data'>: calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]] === data treeview : calcs: add_op: num: 1 <str>: + mult_op: num: 2.2 <str>: * num: 3 add_op: mult_op: num: 4.4 <str>: * num: 5 <str>: + num: 6.6 === data leaves: [num:1 ,<str>:+ ,num:2.2 ,<str>:* ,num:3 ,num:4.4 ,<str>:* ,num:5 ,<str>:+ ,num:6.6] === show flat sequence of items on all levels /lines : calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]] add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3] num:2.2 <str>:* num:3 add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6] mult_op:[num:4.4 <str>:* num:5] num:4.4 <str>:* num:5 <str>:+ num:6.6 |
From: spir <den...@fr...> - 2008-11-12 15:35:08
|
Hello, If ever there is someone alive on the list, these days... I had some trouble understanding the output structure of parsing results. Probably very different of what I was +/- unconsciously expecting. Below some tools used to get more usable results, according to my very personal taste. Possibly some of you may find them useful -- or know better ways to obtain similar results. Comments welcome. listAll() is used to get a flat list out of nested results; with compound and included results listed in sequence. Avoids the need to recursively walk through nested structure, when action has to be performed on each single result. By default, listAll actually returns (type,content) tuples, else simple results. The type is given by typ(): either the type set at pattern defintion (with ('type') or setResultName('type')) ; or the 'real' type of the result. I use listAll e.g. for instanciating objects which types are given by the result's type (mapping) and init data are taken from the content of the results. For instance: for item in listAll(calc.parseString(text)): Type = typ(item[0]) data = item[1] symbols.append(Type(data)) Well, in fact, I don't really use it like that anymore, because: * listAll (and other funcs below) as ParseResult methods. * I created a custom result type that natively hold type and content as fields. So that ParseStrings now returns objects of that kind for all named results (through a dedicated parse action). pickLeaves() is very similar to listAll, except it skips all compound results to jump inside instead -- at any level -- and retain only 'terminal' ones: ==> its name. Very nice to get a low-level overview of the results. If these build a complete representation of the source, then pickLeaves give it back at the lowest *relevant* level, as defined by the various grouping patterns in the grammar. treeView() builds a (python-like) hierarchical picture of the results. Compact and clear. Both are mainly intended for testing. Both also return types (as given by typ()) by default in addition to the content. showSeq() can be used to properly format pickLeaves screen output. This also applies to listAll. denis #============================= def typ(result): try: #print "type --- %s:%s" %(result.getName(),result) return result.getName() except AttributeError: return "<%s>" %result.__class__.__name__ def listAll(tree, noType=False): seq = [] for part in tree: isValue = not(isinstance(part,ParseResults)) isSimple = (not isValue) and (len(part) == 1) # case simple result if isValue: if noType: seq.append(part) else: seq.append((typ(part),part)) elif isSimple: if noType: seq.append(part[0]) else: seq.append((typ(part),part[0])) # recursively explore nested result else: if noType: seq.append(part) else: seq.append((typ(part),part)) seq.extend(listAll(part, noType)) return seq # o u t p u t f u n c s def showSeq(seq): if len(seq) == 0: return '' # define if seq holds types, or not noType = not isinstance(seq[0],tuple) # build return text text = seq[0] if noType else "%s:%s" %(seq[0][0],str(seq[0][1])) for item in seq[1:]: if noType: text += " , %s" %str(item) else: text += " , %s:%s" %(item[0],str(item[1])) # add [...] return "[%s]" %text def pickLeaves(tree, noType=False): seq = [] for part in tree: isValue = not(isinstance(part,ParseResults)) # str, int, float... isSimple = (not isValue) and (len(part) == 1) # unique item inside # case value result : add value to seq if isValue: if noType: seq.append(part) else: seq.append((typ(part),part)) # case simple result : add content to seq elif isSimple: if noType: seq.append(part[0]) else: seq.append((typ(part),part[0])) # case compound result: recursively explore nested result else: seq.extend(pickLeaves(part, noType)) return seq def treeView(results, level=0, skipAnonymous=False, defaultType=None, TAB='\t'): NL = '\n' texte = '' for result in results: # case named result try: texte += level*TAB + result.getName() + ': ' + str(result) + NL # case anonymous result except AttributeError: if not skipAnonymous: if defaultType: type = defaultType else: type = "<%s>" %(result.__class__.__name__) texte += level*TAB + type + ': ' + str(result) + NL # case compound result: walk through recursive nesting if result.__class__ == ParseResults and len(result) > 1: texte += treeView(result, level+1) return texte # ================================= # examples # ================================= # === g r a m m a r from pyparsing import * # !!! class Grammar(object): integer = Word(nums).setParseAction(lambda i: int(i[0])) point = Literal('.') decimal = Combine(integer + point + integer).setParseAction(lambda x: float(x[0])) num = Group(decimal | integer)("num") plus = Literal('+')("plus") op = Group(num + plus + num)("op") calc = OneOrMore(op) calc = Grammar.calc # === i n p u t t e x t text = "1+2 3.0+4 5.0+6.0" # === standard results results = calc.parseString(text) print "=== standard results :" print results # === show leaves print "=== lowest-level flat sequence :" leaves = pickLeaves(results) print showSeq(leaves) # === show treeView print "=== tree view :" print treeView(results) # ================================= # === g r a m m a r class Grammar(object): # tokens add = Literal('+') mult = Literal('*') l_paren = Literal('(') r_paren = Literal(')') num = Group(Word(nums).setParseAction(lambda i: int(i[0])))("num") # symbols mult_op = Group(num + mult + num)("mult_op") add_op = Group((mult_op|num) + add + (mult_op|num))("add_op") #group = Group(l_paren + in_op + r_paren)("group") operation = (add_op|mult_op) calc = OneOrMore(operation) calc = Grammar.calc # === i n p u t t e x t text = " 1+2*3 4*5+6" # === standard results results = calc.parseString(text) print;print "=== standard results :" print results # === show leaves print "=== lowest-level flat sequence :" leaves = pickLeaves(results) print showSeq(leaves) # === show treeView print "=== tree view :" print treeView(results) |
From: spir <den...@fr...> - 2008-11-09 19:25:54
|
Hello, New to the list. I'm trying to understand how things work with pattern types, result types, and naming through setResultsName() or (). Example: a = Literal('a') b = Literal('b') c1 = (a|b)('| ') c2 = MatchFirst(a,b)('first') c3 = (a^b)('^ ') c4 = Or(a,b)('or ') c5 = Combine(a|b)('combi') c0 = Group(a|b)('group') patterns = [c1,c2,c3,c4,c5,c0] t = "a" # === use for p in patterns: result = p.parseString(t) token = result[0] print p.resultsName, token.__class__.__name__, token, try: print token.getName() except AttributeError: print 'no_getName()' print 'full result: %s' %(result) # === output | str a no_getName() first str a no_getName() ^ str a no_getName() or str a no_getName() combi str a no_getName() group ParseResults ['a'] group full result: [['a']] This leads me to the following questions ;-) -1- Why aren't all results of type ParseResults? -2- Which patterns, actually, generate ParseResults results, and which don't? -3- Is there another way to access resultsName, from a result instance, than getName()? -4- Is there another way to guess a result's "kind" (i.e. which pattern has generated it)? -5- Do we need to systematically Group results to identify them? I wish to get rid of Group where useless, in order not to get results such as [['a']] (['a'] is enough overload). Denis |
From: Ujjaval S. <usm...@gm...> - 2008-11-03 01:05:23
|
Hi Eike, Thanks for that. Actually, the reason my grammer was not working is because I had to put \r inside CharsNotIn() where I only had '|'. The did the trick for me. Cheers, On Thu, Oct 30, 2008 at 3:17 AM, Eike Welk <eik...@gm...> wrote: > Hello Ujjaval! > > On Tuesday 28 October 2008, you wrote: > > Now to parse such sentence, I changed your parser code to the > > following: Here, I want to parse this string as a string that > > starts with 'ABC' followed by '|' and ends with '\r'. I need > > everything in between with '|' as delimiter in a list including > > 'XYZ' as last element in this case. > > Look at: > LineEnd() > > Your parsers normally don't see '\n' because the whitespace is removed > by the parsing machinery. If you want to use the end-of-line > frequently as an element in your grammar, you could tell Pyparsing > that '\n' should not be treated as whitespace: > ParserElement.setDefaultWhitespaceChars('\t ') > > But you have to care for all the newlines youself then, which might > become tedious. Look at > indentedBlock(...) > as an example how Paul (Pyparsing's author) does it. (I use > indentedBlock myself.) > > Kind regards, > Eike. > > |
From: Eike W. <eik...@gm...> - 2008-10-29 16:18:14
|
Hello Ujjaval! On Tuesday 28 October 2008, you wrote: > Now to parse such sentence, I changed your parser code to the > following: Here, I want to parse this string as a string that > starts with 'ABC' followed by '|' and ends with '\r'. I need > everything in between with '|' as delimiter in a list including > 'XYZ' as last element in this case. Look at: LineEnd() Your parsers normally don't see '\n' because the whitespace is removed by the parsing machinery. If you want to use the end-of-line frequently as an element in your grammar, you could tell Pyparsing that '\n' should not be treated as whitespace: ParserElement.setDefaultWhitespaceChars('\t ') But you have to care for all the newlines youself then, which might become tedious. Look at indentedBlock(...) as an example how Paul (Pyparsing's author) does it. (I use indentedBlock myself.) Kind regards, Eike. |
From: Paul M. <pt...@au...> - 2008-10-28 23:34:05
|
Sorry to not have been plugged in recently, and thanks Eike for taking up the slack! Please look into the latest pyparsing release, with the helper method "originalTextFor". This method is intended to displace the usage of keepOriginalText, and does so without using the inspect module. Instead of hacking _parseNoCache, you can do the same kind of parse profiling using debug actions to tally up the number of parse attempts, successes, and failures for one or more epxressions in a grammar. I added this code to a grammar that contained the expressions named in the varnames string: parseTally = {} def updTally(v,i): if v not in parseTally: parseTally[v] = [0,0,0] parseTally[v][i] += 1 def tallyTry(n): def tally(*args): updTally(n,0) return tally def tallyFail(n): def tally(*args): updTally(n,2) return tally def tallyMatch(n): def tally(*args): updTally(n,1) return tally varnames = "IDENTIFIER hexnumber decnumber realnumber boolvalue stringliteral label".split() for v in varnames: vars()[v].setName(v).setDebugActions(tallyTry(v),tallyMatch(v),tallyFail(v)) After the parser was run, I could enumerate the values in the parseTally dict, and see if there were any expressions that were tested *many* times, but only matched a few. It would be possible that such an expression was part of a MatchFirst, and, as long as the grammar remained unambiguous, I could move that expression further to the right in the list of alternatives. (Nowadays, parseTally would benefit from being a defaultdict - this code is a bit old.) Here is an optimization I added to the cStyleComment expression. Originally, this expression read: cStyleComment = Combine( Literal("/*") + ZeroOrMore( CharsNotIn("*") | ( "*" + ~Literal("/") ) ) + Literal("*/") ).streamline().setName("cStyleComment enclosed in /* ... */") restOfLine = Optional( CharsNotIn( "\n\r" ), default="" ).setName("rest of line up to \\n").leaveWhitespace() dblSlashComment = "//" + restOfLine cppStyleComment = ( dblSlashComment | cStyleComment ) Now if I was ignoring cppStyleComment, then this expression would get evaluated a *lot*. I realized that both alternatives start with a leading '/' character, so if I wasn't currently pointing at a '/', there was no point in checking either alternative. So I modified cppStyleComment to: cppStyleComment = FollowedBy("/") + ( dblSlashComment | cStyleComment ) (This is all really old code - I have since replaced most of these internal expressions with Regex's.) HTH, -- Paul -----Original Message----- From: dav...@l-... [mailto:dav...@l-...] Sent: Thursday, September 25, 2008 3:26 PM To: Eike Welk; pyp...@li... Subject: Re: [Pyparsing] Speeding up a parse Well, it's something that I was actually looking into doing. I profiled my parse actions by hand (using a stopwatch class), and when the code gets handed off to me, it's actually a really small amount of time that I use it (ms times usually). I did try psyco, and it worked great, but I had to disable the "keepOriginalText" chunk of my code, because it imports Inspect, which hoses psyco (known psyco issue). As a matter of fact, it gave me almost a 100% speedup, which is fantastic, and doesn't require much code. But I lose the ability to keep the source. I think some more investigation of this would be a bit more in my favor, as psyco is a "known" method to speed up, and frankly did an amazing job when I got it working. --dw > -----Original Message----- > From: Eike Welk [mailto:eik...@gm...] > Sent: Thursday, September 25, 2008 3:19 PM > To: pyp...@li... > Subject: Re: [Pyparsing] Speeding up a parse > > Hi David! > > I've read that guessing, which parts of a program need optimizing, is > usually impossible. You need to profile your program. There is a > profiler built into python: > http://docs.python.org/lib/profile.html > > I have no experience with the profiler, but the basic usage is fairly > simple. > > However, the profiler will probably show you, that most of the time is > spent in some function of the Pyparsing library. > You expected it anyway, and it doesn't help you very much. > You might catch a parse action that consumes much time this way; and > you might spot parts of Pyparsing that need optimization. > > So maybe you should start to write a profiling extension for > Pyparsing! I think it is feasible because the class ParserElement > contains some highlevel driver functions that are executed for each > parser object (_parseNoCache, _parseCache). I think it could be done > like this: > > You create a class variable: > ParserElement.profileStats = {} > It maps: > <parser's name> : n_enter, n_success, n_fail, t_cumulative > > Then at the start of _parseNoCache or _parseCache you locate the > matching entry, > ParserElement.profileStats[self.name] > increment the the enter counter and store the current time. > > At the exit points you increase either the success or the failure > counters; compute the time spent in the parser and add it to the > cumulative time value. > > At the end of the program you convert the dict to a list and store it > to a text file. You should also add a sorting function. > > I think _parseCache looks pretty simple; adding something like my > proposed profiling facility seems easy. I haven't looked carefully at > anything, nor did I write any code. (I should have maybe, instead of > writing this lengthy email.) As packrat parsing gives you no big > additional problems I think you should just use _parseCache because > its more easy. > > I hope this helps you at least somewhat. > Kind regards, > Eike. > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge Build the coolest Linux based applications with Moblin SDK & > win great prizes Grand prize is a trip for two to an Open Source event > anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Ujjaval S. <usm...@gm...> - 2008-10-28 02:44:36
|
Hi Eike, Thats exactly what I wanted. Thanks for that. It worked for me. One more question following what I've done which is really stupid.... I wanted to end each text with a new line character. For example: text6 = 'ABC | iöü | 应iöü | XYZ\r' Now to parse such sentence, I changed your parser code to the following: Here, I want to parse this string as a string that starts with 'ABC' followed by '|' and ends with '\r'. I need everything in between with '|' as delimiter in a list including 'XYZ' as last element in this case. start_kw = Keyword('ABC') fieldContents = Optional(CharsNotIn('|'),'') fields = delimitedList(fieldContents, '|', False) fieldSep = Literal('|').suppress() the_parser = (start_kw + fieldSep + fields + Literal('\r').suppress()) I can't get it to work. Could you tell what I am doing wrong? Thanks, On Tue, Oct 28, 2008 at 2:30 AM, Eike Welk <eik...@gm...> wrote: > On Monday 27 October 2008, Eike Welk wrote: > > > > Here is an example for CharsNotIn: > > http://pastebin.com/f7d6a3331 > > I just see that pastebin can't correctly work with Asian characters. > But I guess you understand how the example was meant anyways. Just > paste some Asian characters into the example strings and replace > these numbers (HTML entities?) with them. The original characters > were taken from Chinese and Japanese I-Pod ads. > > Kind regards, > Eike. > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Eike W. <eik...@gm...> - 2008-10-27 15:30:25
|
On Monday 27 October 2008, Eike Welk wrote: > > Here is an example for CharsNotIn: > http://pastebin.com/f7d6a3331 I just see that pastebin can't correctly work with Asian characters. But I guess you understand how the example was meant anyways. Just paste some Asian characters into the example strings and replace these numbers (HTML entities?) with them. The original characters were taken from Chinese and Japanese I-Pod ads. Kind regards, Eike. |
From: Eike W. <eik...@gm...> - 2008-10-27 15:02:06
|
On Monday 27 October 2008, Ujjaval Suthar wrote: > Hi everyone, > > I need to parse strings with mix of English and any other unicode > characters from any Asian or European languages. > > The format of strings is like following: > > ABC|[any unicode character]|[any unicode character]|XYZ Hello Ujjaval! If I understand your question right, CharsNotIn is the parser you are looking for. I don't see any general problem with Unicode. As you seem somewhat knowledgeable about the requirements of Asian languages, you could maybe propose a parser for words in Asian languages (or even post a patch). Here is an example for CharsNotIn: http://pastebin.com/f7d6a3331 I hope this helped you. Kind regards, Eike. |
From: Ujjaval S. <usm...@gm...> - 2008-10-27 04:39:56
|
Hi everyone, I need to parse strings with mix of English and any other unicode characters from any Asian or European languages. The format of strings is like following: ABC|[any unicode character]|[any unicode character]|XYZ In above string, I have ABC and XYZ as literals which are start and end of the string while '|' is the delimiter for the content in between start and end of the strings. How can I use pyparsing to parse this kind of string? Here in the outcome I should have a list of unicode character strings which are in between ABC and XYZ in a form of list. These strings are separated by '|' in between. Thanks, Ujjaval |
From: <dav...@l-...> - 2008-09-25 20:26:32
|
Well, it's something that I was actually looking into doing. I profiled my parse actions by hand (using a stopwatch class), and when the code gets handed off to me, it's actually a really small amount of time that I use it (ms times usually). I did try psyco, and it worked great, but I had to disable the "keepOriginalText" chunk of my code, because it imports Inspect, which hoses psyco (known psyco issue). As a matter of fact, it gave me almost a 100% speedup, which is fantastic, and doesn't require much code. But I lose the ability to keep the source. I think some more investigation of this would be a bit more in my favor, as psyco is a "known" method to speed up, and frankly did an amazing job when I got it working. --dw > -----Original Message----- > From: Eike Welk [mailto:eik...@gm...] > Sent: Thursday, September 25, 2008 3:19 PM > To: pyp...@li... > Subject: Re: [Pyparsing] Speeding up a parse > > Hi David! > > I've read that guessing, which parts of a program need > optimizing, is usually impossible. You need to profile your > program. There is a profiler built into python: > http://docs.python.org/lib/profile.html > > I have no experience with the profiler, but the basic usage > is fairly simple. > > However, the profiler will probably show you, that most of > the time is spent in some function of the Pyparsing library. > You expected it anyway, and it doesn't help you very much. > You might catch a parse action that consumes much time this > way; and you might spot parts of Pyparsing that need optimization. > > So maybe you should start to write a profiling extension for > Pyparsing! I think it is feasible because the class > ParserElement contains some highlevel driver functions that > are executed for each parser object (_parseNoCache, > _parseCache). I think it could be done like this: > > You create a class variable: > ParserElement.profileStats = {} > It maps: > <parser's name> : n_enter, n_success, n_fail, t_cumulative > > Then at the start of _parseNoCache or _parseCache you locate > the matching entry, > ParserElement.profileStats[self.name] > increment the the enter counter and store the current time. > > At the exit points you increase either the success or the > failure counters; compute the time spent in the parser and > add it to the cumulative time value. > > At the end of the program you convert the dict to a list and > store it to a text file. You should also add a sorting function. > > I think _parseCache looks pretty simple; adding something > like my proposed profiling facility seems easy. I haven't > looked carefully at anything, nor did I write any code. (I > should have maybe, instead of writing this lengthy email.) As > packrat parsing gives you no big additional problems I think > you should just use _parseCache because its more easy. > > I hope this helps you at least somewhat. > Kind regards, > Eike. > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge Build the coolest Linux based > applications with Moblin SDK & win great prizes Grand prize > is a trip for two to an Open Source event anywhere in the > world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Eike W. <eik...@gm...> - 2008-09-25 20:19:36
|
Hi David! I've read that guessing, which parts of a program need optimizing, is usually impossible. You need to profile your program. There is a profiler built into python: http://docs.python.org/lib/profile.html I have no experience with the profiler, but the basic usage is fairly simple. However, the profiler will probably show you, that most of the time is spent in some function of the Pyparsing library. You expected it anyway, and it doesn't help you very much. You might catch a parse action that consumes much time this way; and you might spot parts of Pyparsing that need optimization. So maybe you should start to write a profiling extension for Pyparsing! I think it is feasible because the class ParserElement contains some highlevel driver functions that are executed for each parser object (_parseNoCache, _parseCache). I think it could be done like this: You create a class variable: ParserElement.profileStats = {} It maps: <parser's name> : n_enter, n_success, n_fail, t_cumulative Then at the start of _parseNoCache or _parseCache you locate the matching entry, ParserElement.profileStats[self.name] increment the the enter counter and store the current time. At the exit points you increase either the success or the failure counters; compute the time spent in the parser and add it to the cumulative time value. At the end of the program you convert the dict to a list and store it to a text file. You should also add a sorting function. I think _parseCache looks pretty simple; adding something like my proposed profiling facility seems easy. I haven't looked carefully at anything, nor did I write any code. (I should have maybe, instead of writing this lengthy email.) As packrat parsing gives you no big additional problems I think you should just use _parseCache because its more easy. I hope this helps you at least somewhat. Kind regards, Eike. |
From: <dav...@l-...> - 2008-09-24 23:28:14
|
One more thing, I also tried enablePackrat as well, but it had no discernable affect on the parse speed (but it did suck up a huge chunk of memory :) ) > _____________________________________________ > From: Weber, David C @ Link > Sent: Wednesday, September 24, 2008 5:55 PM > To: pyp...@li... > Subject: Speeding up a parse > > All, > > We've got a data file that we use for parsing "stuff". Presently, > this file is 80K lines long. Presently, this file takes about 3.3 > minutes to parse, which is an awfully long time to wait for something > like this. There are 122 rules for parsing this file, and > unfortunately the syntax of the data within is not very strict. This > leads to constructs such as: > > Interaction = \ > Keyword("(Interaction") + \ > INT_ID + \ > INT_Name + \ > INT_ISRType + \ > OneOrMore( > INT_MOMInteraction | > INT_Description | > INT_DeliveryCategory | > INT_MessageOrdering | > INT_RoutingSpace > ) + \ > ZeroOrMore(InteractionComponent) + \ > ")"; > > Where the intent of the OneOrMore section, is: > 1.) All are optional > 2.) They may appear in any order > > > I've also tried Each([Optional(...), Optional(...)]) without much > speedup success. > > I'm pretty sure that these constructs are causing a significant amount > of backtracking, but I'm not sure the best way to go about cleaning up > the grammar. > > > Also, I tried using psyco to speed up the parse, but I'm making use of > "keepOriginalText" option within the setParseAction() call, so that I > can get a copy of the original text within my parse action. This > seems to break psyco, based on one of the imports that is done. > > So two things: > > 1.) Any grammar speed up rules for the above? > 2.) Any ideas to get the orignal text, as well as make use of psyco? > > > Thanks > > --dw |
From: <dav...@l-...> - 2008-09-24 23:23:26
|
All, We've got a data file that we use for parsing "stuff". Presently, this file is 80K lines long. Presently, this file takes about 3.3 minutes to parse, which is an awfully long time to wait for something like this. There are 122 rules for parsing this file, and unfortunately the syntax of the data within is not very strict. This leads to constructs such as: Interaction = \ Keyword("(Interaction") + \ INT_ID + \ INT_Name + \ INT_ISRType + \ OneOrMore( INT_MOMInteraction | INT_Description | INT_DeliveryCategory | INT_MessageOrdering | INT_RoutingSpace ) + \ ZeroOrMore(InteractionComponent) + \ ")"; Where the intent of the OneOrMore section, is: 1.) All are optional 2.) They may appear in any order I've also tried Each([Optional(...), Optional(...)]) without much speedup success. I'm pretty sure that these constructs are causing a significant amount of backtracking, but I'm not sure the best way to go about cleaning up the grammar. Also, I tried using psyco to speed up the parse, but I'm making use of "keepOriginalText" option within the setParseAction() call, so that I can get a copy of the original text within my parse action. This seems to break psyco, based on one of the imports that is done. So two things: 1.) Any grammar speed up rules for the above? 2.) Any ideas to get the orignal text, as well as make use of psyco? Thanks --dw |
From: Eike W. <eik...@gm...> - 2008-09-23 16:03:42
|
Packrat parsing seems to interfere with the indentedBlock parser. Some legitimate input is no longer recognized when packrat parsing is switched on. A script to demonstrate the problem is here: http://pastebin.com/f5b006f4d When line 7 is uncommented an exception is raised while parsing the 3rd example 'program'. Maybe I'm using indentedBlock somehow wrongly? Kind regards, Eike. |
From: Paul M. <pt...@au...> - 2008-09-07 15:15:47
|
Eike - Well, this came together a bit faster than I'd thought. The solution begins with an understanding of operatorPrecedence. Here is the expression that is published in the online example simpleArith.py: integer = Word(nums).setParseAction(lambda t:int(t[0])) variable = Word(alphas,exact=1) operand = integer | variable expop = Literal('**') signop = oneOf('+ -') multop = oneOf('* /') plusop = oneOf('+ -') factop = Literal('!') expr = operatorPrecedence( operand, [(factop, 1, opAssoc.LEFT), (expop, 2, opAssoc.RIGHT), (signop, 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (plusop, 2, opAssoc.LEFT),] ) As I mentioned earlier, this will only recognize "-a**-b" if given as "-a**(-b)". The reason for this is that operator precedence cannot look "down" the precedence chain unless the next term is enclosed in parentheses. My first attempt at fixing this was to change the definition of operand to include an optional leading sign, and this parsed successfully. But then it dawned on me that this same thing can be accomplished by inserting another signop operator *above* expop, as in: expr = operatorPrecedence( operand, [(factop, 1, opAssoc.LEFT), (signop, 1, opAssoc.RIGHT), (expop, 2, opAssoc.RIGHT), (signop, 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (plusop, 2, opAssoc.LEFT),] ) I think this is the correct solution, rather than mucking about with operatorPrecedence, or requiring strange embellishments to one's operand expression. I don't think this is a general issue with unary operators or right-associated operations, I think this is just a special case borne out of the definition of exponentiation with respect to leading sign operators. I'll fix simpleArith.py to correctly include the extra unary sign operator precedence level, and also include some examples. I'll also include simpleArith2.py, which actually evaluates the parsed expression. -- Paul |
From: Paul M. <pt...@au...> - 2008-09-07 14:32:02
|
I worked on this problem a bit last night. The opAssoc.RIGHT parameter is there to address this problem, and in fact, this expression: a**b**c Does get correctly evaluated as (a**(b**c)). But things get muddled when unary signs are added, and I think this points up a bug in operatorPrecedence. As your pastebin code comments state: #Power and unary operations are intertwined to get correct operator precedence: # -a**-b == -(a ** (-b)) Which should be supported using this form of operatorPrecedence: u_expr = Word(nums) | Word(alphas) expression = operatorPrecedence( u_expr, [('**', 2, opAssoc.RIGHT), (oneOf('+ -'), 1, opAssoc.RIGHT), (oneOf('* /'), 2, opAssoc.LEFT), (oneOf('+ -'), 2, opAssoc.LEFT), ]) For the moment, this expression correctly parses "a**b**c" as "(a**(b**c))", but can only parse "-a**-b" if it is given as "-a**(-b)". Ideally, your request should be handled by the operatorPrecedence API as it exists today - I don't think any more specialized interface should be needed. But I'll see if I make any progress with finding this problem in the next day or so. -- Paul |