When I ask the parser to get the next body element it will return "<body>something</body>" instead of the correct one "<body><h1>Hello World!</h1></body>".
I tried to tell the parser to skip the head element and continue from there
int afterHEADPosition = source.findNextElement(0, HTMLElementName.HEAD).getEnd();
Element bodyElement = source.findNextElement(afterHEADPosition, HTMLElementName.BODY);
but the call to source.findNextElement(0, HTMLElementName.HEAD) will return "<head><script language="javascript">alert("" and a call to source.findNextElement(0, HTMLElementName.HEAD).getEndTag() will return null. The parsing of head is being cut when it finds the <body> token inside the javascript string.
Does anybody knows of a workaround for this? I'm guessing javascript is not being parsed as part of the html but in that case I'm wondering if the parser should ignore everything inside <script> tags.
Thanks in advance for your help.
- Diego
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm having trouble parsing the following html file:
<html>
<head><script language="javascript">alert("<body>something</body>");</script></head>
<body><h1>Hello World!</h1></body>
</html>
When I ask the parser to get the next body element it will return "<body>something</body>" instead of the correct one "<body><h1>Hello World!</h1></body>".
I tried to tell the parser to skip the head element and continue from there
int afterHEADPosition = source.findNextElement(0, HTMLElementName.HEAD).getEnd();
Element bodyElement = source.findNextElement(afterHEADPosition, HTMLElementName.BODY);
but the call to source.findNextElement(0, HTMLElementName.HEAD) will return "<head><script language="javascript">alert("" and a call to source.findNextElement(0, HTMLElementName.HEAD).getEndTag() will return null. The parsing of head is being cut when it finds the <body> token inside the javascript string.
Does anybody knows of a workaround for this? I'm guessing javascript is not being parsed as part of the html but in that case I'm wondering if the parser should ignore everything inside <script> tags.
Thanks in advance for your help.
- Diego