Re: [Htmlparser-user] n lines before/after searchstring
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-02-05 13:09:28
|
The parser doesn't really deal in lines of text, since most HTML disregards linebreaks (the <pre> tag is the only exception I can think of). What you probably want is subsequent nodes. For this use the children of the parent of the node you have. Some methods were recently added on AbstractNode (which TextNode inherits from) to handle this... getPreviousSibling() and getNextSibling() These are only available in the latest Integration Build. If you really want lines of text, the Page object available from the parser, can be asked to fetch a line with GetLine(). This method has two overloads, one takes a cursor argument the other an integer position. The position is available from the node you have with getStartPosition() or getEndPosition(). That gets you the contents of the line in the HTML stream for the node you have. Subsequent lines are a little tougher to get a hold of. The line information is held in a PageIndex object which the Page doesn't expose. But it could if you added a method. If you had one of those you could step through the lines of the file. Derrick quanta veloce wrote: > Hi, > > Can HTMLParser allow one to extract into an array lines before or > after a search string? > > For instance: > > <CENTER> > <TABLE ALIGN="CENTER" BORDER=5> > <TR> > <TD width=150 align=center><B>Area</B></TD> > <TD width=120 align=center><B>Instantaneous Load</B></TD> > </TR> > <TR> > <TD>PJM MID ATLANTIC REGION</TD> > <TD align=right>33929</TD> > </TR> > <TR> > <TD>PJM WESTERN REGION</TD> > <TD align=right>39400</TD> > </TR> > <TR> > <TD>PJM SOUTHERN REGION</TD> > <TD align=right>9857</TD> > </TR> > <TR> > <TD>PJM RTO</TD> > <TD align=right>83186</TD> > </TR> > </TABLE> > </CENTER> > <P><CENTER>Loads are calculated from raw telemetry data and are > approximate.</CENTER> > <CENTER>The displayed values are NOT official PJM Loads.</CENTER> > <BR><BR><BR> > <P><CENTER><H2>Current PJM Transmission Limits</H2></CENTER> > <P align=center>None > > </BODY> > </HTML> > > In the following URL I matched the string "Current PJM Transmission > Limits" and I want to obtain any and all lines after this match...or > even the next 3 lines, etc., > > Any help would be appreciated! > Thanks, > > > ------------------------------------------------------------------------ > Relax. Yahoo! Mail virus scanning > <http://us.rd.yahoo.com/mail_us/taglines/viruscc/*http://communications.yahoo.com/features.php?page=221> > helps detect nasty viruses! |