You could have a helper thread which pulls in files from the disk and
provides them to a second thread which does the HTML processing.
This doesn't really seem like an HTML parser issue, unless there is a
bug with HTML Parser that makes it slow pulling in files from the
disk. You could check this by instead reading the file into a String
first and then creating a parser with that String using
Parser.setInputHTML and then Parser.parse(null) - if there's a
noticeable difference in speed doing it this way, it might be worth
looking into the code of the HTML Parser constructor you are using to
see if there are any inefficiencies in it.
Ian
On 2/26/07, sajid khan <ass...@gm...> wrote:
> Hi,
> I am using HTMLParser for extracting the content of the Html page. I
> have noticed that bulk of the time is spent in extracting the information
> than processing the data.
> The code looks like this,
>
> // inputStream is of type InputStream. It carries the page Source of a
> Html page.
> Page page = new Page(inputStream, null);
> Lexer lexer = new Lexer(page);
> Parser parser = new Parser(lexer);
> StringBean sb=new StringBean();
> parser.visitAllNodesWith (sb);
> String text = sb.getStrings();
> //Doing something with text.
>
> Here I want to inform you that i have crawled few pages with the help of a
> crawler. So html pages are in my Hard Disk.
>
> Can anybody please help me to improve the speed of my program.
>
> regards
> Sajid Khan.
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|