Could anyone assist a newbie please -
I have created a program based on Paul Maguire's example in 'Building recursive descent parsers with python'
from pyparsing import *
# define basic text patterns for search
skipA = Literal('25px">')
searchString= Word(alphas ) + 'to:'
tdStart = Literal('"routeTitle">').suppress()
#tdEnd = Literal("</TBODY>").suppress()
tdEnd = Literal("<!--==").suppress()
aa = Literal('<TR>').suppress()
parama = tdStart + searchString.setResultsName("from")+ SkipTo(tdEnd).setResultsName("details") + tdEnd
# get list of Routes & prices (+ Loads of other stuff)
aUrl = "http://www.aerarann.ie/"
aPage = urllib.urlopen( aUrl )
aListHTML = aPage.read()
for srvrtokens,startloc,endloc in parama.scanString( aListHTML ):
print 'DETAILS EXTRACTED : ',"%(from)-15s : %(details)20s" % srvrtokens
The code works but brings back all of the html tags (except those I search on), when all I want to bring back is the embedded data.
Also the data occurs in several blocks, but if I put the 'tdEnd' parameter to a field which terminates each block, It only bring back the first block. So I had to set the tdEnd to '<T BODY>' which only occurrs at the end of the page
(which brings back all of the data).
Any assistance would be gratefully received.
The latest release contains the example htmlStripper.py, which might help you (although you have probably moved on with your life...)
Sorry to not be more responsive,
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.