[Htmlparser-developer] minor code change to StringParser...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,
I found a bug with the current CVS release of the HTML Parse and the code 
below is the fix.

minor code change...

// line 86 of StringParser in method find(...
                         if (ignoreStateMode && (ch=='\'' || ch=='"')) {
                                 if (state==PARSE_IGNORE_STATE) 
state=PARSE_HAS_BEGUN_STATE;
                                 else {
//----->                                         make sure we're not 
testing outside the length of input.
                                         if (i+1 < input.length() && 
input.charAt(i+1)=='<')
                                                 state = PARSE_IGNORE_STATE;
                                 }

                         }

-------------------------------------------------------------------
When parsing the HTML below, an index out of bounds exception gets thrown 
when the parser hits the last single quote in the word 'hello' below.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<script language="JavaScript" type="text/JavaScript">
// if this fails, output a 'hello'
if (true)
{
         //something good...
}
</script>
</body>
</html>

James Moliere