[Htmlparser-developer] Re: Daily bugs ... and one little fix:)
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-07-19 08:44:48
|
When I parse this url: www.revues.org/calenda/articles/1083.html Parsing this file last more than 40 second so I've searched which = problem=20 may reduce performance. First, I begin to fix this problem with prevent it to appear. In HTMLReader.java: ------------------------------ protected boolean readNextLine() { boolean skipLine =3D true; if (posInLine!=3D-1 && !(line !=3D null && = node.elementEnd()+1>=3Dline.length())) { for (int i =3D 0; i < line.length(); i++) { if (line.charAt(i) !=3D ' ') { skipLine =3D false; break; } } } return skipLine; } Then I read sources around and I remark it will be a better idea to = patch=20 HTMLStringNode.java The solution is to go in state 1 when you are at the end of a space = string. if (state=3D=3D1) { text+=3Dinput.charAt(i); } file://patch beginning here if (state=3D=3D0 && i=3D=3Dinput.length()-1) state=3D1; file://patch ending here if (state=3D=3D1 && i=3D=3Dinput.length()-1) { input =3D reader.getNextLine(); ///..... I think the second solution is better. I hope this fix will help you = Somik,=20 to patch the code in the next integration release. This fix is incorporated. Thanks. Ive written a test case to trap this = bug. Regards, Somik |