RE: [Htmlparser-developer] Writing OPTION tag
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-08-14 07:06:41
|
Hi guys, I am yet trying to solve my problem with the scanner of my OPTION tag. I would really appreciate any help from the developers of the parsing engine. I think a solution may lie in knowing certain internals of the parser. Let me explain my problem in detail. Assume the following 2 OPTION tags : <OPTION value="AltaVista Search">AltaVista <OPTION value="Lycos Search"></OPTION> The OPTION tag does not explicitly require an end tag. Hence the first line is valid. My parsing logic in scan is as follows : 1. Disable existing parsers 2. Read elements from the Reader. 3. Check whether it is an EndTag for OPTION or SELECT (since OPTION tags are always under SELECT). If so create an OptionTag object with necessary values 4. If it is not an EndTag, check whether it is a StringNode (this would be for the value between <OPTION> and </OPTION> tags). If so it is the text of the OPTION tag and store it temporarily. (This will be later used in the constructor). 5. If it is neither it could be an error or the beginning of another tag (possible another <OPTION> tag as above) and hence the current loop must be terminated and the option object must be constructed. The problem with my input is that <OPTION value="AltaVista Search"> would be read as an OptionTag, AltaVista would be read as the StringNode and then <OPTION value="Lycos Search"> would be read and since it is neither a StringNode nor an EndTag an OptionTag would be created for the above 2 values. However since this tag is already read it will not qualify as a new OptionTag and hence I am missing out this tag in my parsing. I hope I have been able to explain my problem clearly. If not, I would certainly like to clarify on any points which are not understood. A snippet of code from scan() of HTMLOptionTagScanner is given below Vector lScannerVector = HTMLParserUtils.adjustScanners(pReader); do { lNode = pReader.readElement(); System.out.println(lNode.toHTML()); if (lNode instanceof HTMLEndTag) { lEndTag = (HTMLEndTag)lNode; String lEndTagString = lEndTag.getText().toUpperCase(); if (lEndTagString.equals("OPTION") || lEndTagString.equals("SELECT")) { endTagFound = true; } } else if (lNode instanceof HTMLStringNode) { lText.append(lNode.toHTML()); } else if (lNode instanceof HTMLTag) { endTagFound = true; } } while (!endTagFound); HTMLOptionTag lOptionTag = new HTMLOptionTag(0, lNode.elementEnd(), pTag.getText(), lText.toString(), pCurrLine); HTMLParserUtils.restoreScanners(pReader, lScannerVector); Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 |