[Htmlparser-developer] Re: Daily bugs ... and one little fix:)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

When I parse this url:
www.revues.org/calenda/articles/1083.html
Parsing this file last more than 40 second so I've searched which =
problem=20
may reduce performance.

First, I begin to fix this problem with prevent it to appear.

In HTMLReader.java:
------------------------------
protected boolean readNextLine()
{
   boolean skipLine =3D true;
   if (posInLine!=3D-1 && !(line !=3D null && =
node.elementEnd()+1>=3Dline.length()))
   {
     for (int i =3D 0; i < line.length(); i++)
     {
       if (line.charAt(i) !=3D ' ')
       {
         skipLine =3D false;
         break;
       }
     }
   }
   return skipLine;
}

Then I read sources around and I remark it will be a better idea to =
patch=20
HTMLStringNode.java
The solution is to go in state 1 when you are at the end of a space =
string.

if (state=3D=3D1)
{
   text+=3Dinput.charAt(i);
}
file://patch beginning here
if (state=3D=3D0 && i=3D=3Dinput.length()-1)
   state=3D1;
file://patch ending here
if (state=3D=3D1 && i=3D=3Dinput.length()-1)
{
   input =3D reader.getNextLine();
///.....

I think the second solution is better. I hope this fix will help you =
Somik,=20
to patch the code in the next integration release.

This fix is incorporated. Thanks. Ive written a test case to trap this =
bug.

Regards,
Somik