for srvrtokens,startloc,endloc in parama.scanString( aListHTML ):
print 'DETAILS EXTRACTED : ',"%(from)-15s : %(details)20s" % srvrtokens
The code works but brings back all of the html tags (except those I search on), when all I want to bring back is the embedded data.
Also the data occurs in several blocks, but if I put the 'tdEnd' parameter to a field which terminates each block, It only bring back the first block. So I had to set the tdEnd to '<T BODY>' which only occurrs at the end of the page
(which brings back all of the data).
Any assistance would be gratefully received.
Thanks
Mick
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hallo
Could anyone assist a newbie please -
I have created a program based on Paul Maguire's example in 'Building recursive descent parsers with python'
-
from pyparsing import *
import urllib
# define basic text patterns for search
skipA = Literal('25px">')
searchString= Word(alphas ) + 'to:'
tdStart = Literal('"routeTitle">').suppress()
#tdEnd = Literal("</TBODY>").suppress()
tdEnd = Literal("<!--==").suppress()
aa = Literal('<TR>').suppress()
parama = tdStart + searchString.setResultsName("from")+ SkipTo(tdEnd).setResultsName("details") + tdEnd
# get list of Routes & prices (+ Loads of other stuff)
aUrl = "http://www.aerarann.ie/"
aPage = urllib.urlopen( aUrl )
aListHTML = aPage.read()
aPage.close()
for srvrtokens,startloc,endloc in parama.scanString( aListHTML ):
print 'DETAILS EXTRACTED : ',"%(from)-15s : %(details)20s" % srvrtokens
The code works but brings back all of the html tags (except those I search on), when all I want to bring back is the embedded data.
Also the data occurs in several blocks, but if I put the 'tdEnd' parameter to a field which terminates each block, It only bring back the first block. So I had to set the tdEnd to '<T BODY>' which only occurrs at the end of the page
(which brings back all of the data).
Any assistance would be gratefully received.
Thanks
Mick
Mick -
The latest release contains the example htmlStripper.py, which might help you (although you have probably moved on with your life...)
Sorry to not be more responsive,
-- Paul