Thread: [Pyparsing] Parsing LaTeX and regression

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

I'm a beginner in Python and pyparsing and I'm trying to write a script to
parse LaTeX code to search&replace and suppress specific tags.

I've got this code :

# coding=latin1
from pyparsing import *

tag = Literal ("\\")
tagname= Word( alphas )
openingbracket = Literal("{")
text = Word( alphas + "éèà" + " ")
closingbracket = Literal("}")

paragraph = Forward()
paragraphitem = Optional(text) + Optional(paragraph) + Optional(text)
paragraph << tag +tagname+ openingbracket + Group(paragraphitem) +
closingbracket

test = " Starting text \\emph{This sentence is in \\textit{italics} in
Bembo} \\emph{This sentence is in \\textit{italics} in bembo and in
\\emph{Italian}} Middle filling \\emph{This second sentence is in Emphasis}
End"

for foundparagraph in paragraph.scanString(test) :
    print test, "-->", foundparagraph

I would like pyparsing to return :
1) \emph{This sentence is in \textit{italics} in Bembo}
2) \emph{This sentence is in \textit{italics} in bembo and in
\emph{Italian}}
3) \emph{This second sentence is in Emphasis}

My script does not parse correctly 2) and I'm puzzled.

Cheers,
Charles 

Thread: [Pyparsing] Parsing LaTeX and regression

pyparsing-users