Re: [Htmlparser-user] How can I improve the speed of extracting the information of a html page thro
Brought to you by:
derrickoswald
From: Ian M. <ian...@gm...> - 2007-02-28 16:41:29
|
You could have a helper thread which pulls in files from the disk and provides them to a second thread which does the HTML processing. This doesn't really seem like an HTML parser issue, unless there is a bug with HTML Parser that makes it slow pulling in files from the disk. You could check this by instead reading the file into a String first and then creating a parser with that String using Parser.setInputHTML and then Parser.parse(null) - if there's a noticeable difference in speed doing it this way, it might be worth looking into the code of the HTML Parser constructor you are using to see if there are any inefficiencies in it. Ian On 2/26/07, sajid khan <ass...@gm...> wrote: > Hi, > I am using HTMLParser for extracting the content of the Html page. I > have noticed that bulk of the time is spent in extracting the information > than processing the data. > The code looks like this, > > // inputStream is of type InputStream. It carries the page Source of a > Html page. > Page page = new Page(inputStream, null); > Lexer lexer = new Lexer(page); > Parser parser = new Parser(lexer); > StringBean sb=new StringBean(); > parser.visitAllNodesWith (sb); > String text = sb.getStrings(); > //Doing something with text. > > Here I want to inform you that i have crawled few pages with the help of a > crawler. So html pages are in my Hard Disk. > > Can anybody please help me to improve the speed of my program. > > regards > Sajid Khan. > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |