Joseph Reagle

Show:

What's happening?

  • Followup: RE: Performance on parsing BibTex

    Thanks so much for your response Paul. 1. I see that you moved the nestedExpr from the outer structure, to just the values, makes sense. 2. I was using ingoreExpr because even though the open/close chars I was used was "{}" I believe the function still uses default quoteStrings, and hence causing me problems with a field like: title = {'The {Spinster} and the {Prophet'} by...

    2009-05-18 15:57:39 UTC in Python parsing module

  • Followup: RE: Performance on parsing BibTex

    For some reason, that version chokes on the first bibentry: test = '''@online{1722005u1, author = {172, User}, shorttitle = {{User:172} (Version 10702240)}, title = {{User:172}}, day = {2}, year = {2005}, urldate = {2005-03-24}, url = {http://en.wikipedia.org/w/index.php?title=User:172&oldid=10702240}, month = {3}, custom1 = {20050324}, organizatio.

    2009-05-18 12:31:03 UTC in Python parsing module

  • Performance on parsing BibTex

    I recently noted that a script I use was getting kind of laggy. It was taking almost 2s to parse my very large bibtex file using bibstuff (which uses SimpleParse). I thought, why not write something in pyparsing and see how that fares? It took about a 60s! (Though I'm sure I've done some clumsy stuff, including the need to flatten.) So I thought, what if I just wrote a simple regexp bibtex...

    2009-05-17 00:18:58 UTC in Python parsing module

  • unicode characters

    I want to define a word (a bibtex key) that might include u'ć'. The following doesn't seem to work: ident_chars = "-_'" + alphanums + alphas8bit + u'ć' I think the corresponding hex for that char is \xc4\x87 and pyparse matches only the first byte. In any case, I'm confused, so how to refer to accented/unicode characters beyond alphas8bit? Or that, less other characters?...

    2009-05-16 23:35:57 UTC in Python parsing module

  • Unicode characters

    I want to define a word (a bibtex key) that might include u'ć'. The following doesn't seem to work: <code> ident_chars = "-_'" + alphanums + alphas8bit + u'ć' </code> I think the corresponding hex for that char is \xc4\x87 and pyparse matches only the first byte. In any case, I'm confused, so how to refer to accented/unicode characters beyond alphas8bit? Or that...

    2009-05-16 23:34:53 UTC in Python parsing module

  • Followup: RE: Parsing biblatex commands

    So I still can't predict the structure of the parseResult based on the grammar, only through manual introspection. Also, in my example, it appears lower level structures with setResultsName also bubble up? (That is, multi_command also has the same dictionary entries as the lower level param.) Is there an example somewhere that shows the consequences of grammar construction on subsequent...

    2009-05-14 17:05:20 UTC in Python parsing module

  • Followup: RE: Parsing biblatex commands

    Perhaps my confusion is because results from following [1], where if I don't ask for asList(), I get each item with an empty dict: In [23]: formula.parseString( 'C6H5OH' ).asList() Out[23]: [['C', '6'], ['H', '5'], ['O'], ['H']] In [24]: formula.parseString( 'C6H5OH' ) Out[24]: ([(['C', '6'], {}), (['H', '5'], {}), (['O'], {}), (['H'], {})], {}) Is this the result in a change of...

    2009-05-13 17:06:45 UTC in Python parsing module

  • Followup: RE: Parsing biblatex commands

    OK! This bin actually works! http://pyparsing.pastebin.com/m19f5ddee Combine still confuses me, but I'm avoiding that. The one new question is why do I have to go so deep to access dictionary values? if 'citation' in tokens['param']: citation = param['citation'][0][0] I'm going to try to looking at "group" to see if that makes a diff.

    2009-05-13 13:58:37 UTC in Python parsing module

  • Followup: RE: Parsing biblatex commands

    On the list gunk front, I've simplied my grammar with nested expressions, but seem to have lost the ability to validate their content, even using the content attribute. That page number of [201-202h] should throw an exception I'd think. http://paste.pocoo.org/show/117050/ I can get rid of internal gunk with: prenote = nestedExpr("[", "]").setParseAction(lambda...

    2009-05-13 12:22:12 UTC in Python parsing module

  • Followup: RE: Parsing biblatex commands

    I need to get all my matches, do some tests, see if something is in my bibtex database, then construct the replacement with tokens reordered. (So I thought transform might not suffice.) I used scanString(text) and cut/splice my matches: http://pyparsing.pastebin.com/f689ec018 . Unfortunately, I can't iterate on matches over a piece of text because the matches don't accumulate. (Though...

    2009-05-12 20:53:22 UTC in Python parsing module

About Me

  • 2004-10-18 (5 years ago)
  • 1141624
  • jreagle (My Site)
  • Joseph Reagle

Send me a message