Re: [Htmlparser-developer] Re: HTMLReader design needs to be modified (dev opinion solicited)
Brought to you by:
derrickoswald
|
From: Leslie R. <le...@op...> - 2002-12-09 18:18:40
|
Stephen J. Harrington wrote: > I already created a work around, so it doesn't kill me. > > I just hated to have to spend the time to make a new connection to the > source I am scraping since the pipe I am using is small. > Don't make a new connection. Just do a mark(10000) right after the Reader is opened, call parse, and do a Reader.reset() before calling parse again. The connection will remain, and the BufferedReader will hold onto the html string between calls to parse. The # 10000 is an example only -- you'll have to provide a value large enough to accommodate whatever stream length you expect or the subsequent reset will fail. > I would be fine with it the way it is, provided the docs are updated. > > Thanks for looking into this. > > --stephen > > Somik Raha wrote: > >> Hi Folks, We've come up with an interesting problem - there was a >> request by Steve Harrington recently that we support >> multiple-sequential parsing, i.e. use the same parser object multiple >> times to parse instead of creating a new one each time. >> Unfortunately this has caused us to play around with the reader and >> try to mark and reset streams. This is not such a good idea as for >> large streams there is no guarantee that a reset will work. Leslie >> suggests that we note this in the javadoc, and roll back this >> feature. Our complete bug report and discussion is at >> https://sourceforge.net/tracker/index.php?func=detail&aid=649133&group_id=24399&atid=381399 >> <https://sourceforge.net/tracker/index.php?func=detail&aid=649133&group_id=24399&atid=381399> >> The bug id is #649133. A discussion of this bug is in order, and it >> would be good if developers can participate with their views. >> Steve --> It will be good to hear your views on this. Regards,Somik > -- Leslie Rohde mailto:le...@op... http://www.optitext.com |