Thread: [Htmlparser-user] Problem parsing a link

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I have a simple document which I am trying to parse a link out of:

Here is the code:

<html>
<body>
<DL>
<DT>YOUR QUERY WAS:
</DL>
Select one of the following documents to retrieve.
<P>
<HR>
<P><DL>
<DT><B>1:</B> <!-- hit  --><A
HREF="/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report
1.html">20020702 Report 1</A>
<DD><font size="-1">Score: 1000, Size: 7.4 kbytes, Type: URL file</font>

</DL>
</body>
</html>

The parser is getting confused by the '>' after the postdate.  Instead
of returning the whole link:

http://localhost/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report
1.html

only a portion of the link is returned:

http://localhost/cgi-bin/view_search?query_text

If the 'postdate>' is replaced by 'postdate=' then it functions
properly.  Seems like the parser is not looking at the double quotes.

I am using the latest integration build (1.2-2002_08_31)

Before digging into the source code and trying to fix the problem, I
thought maybe someone might have run into this problem before.

Thanks,

--stephen

Thread: [Htmlparser-user] Problem parsing a link

htmlparser-user