Re: [Htmlparser-user] decouple parser from URLConnection
Brought to you by:
derrickoswald
From: Antony S. <ant...@gm...> - 2006-03-07 04:08:14
|
Thank you. I will use your suggested approach if my current approach does not work out= . Currently I have come up with a means of providing a URLConnection backed by a byte array (instead of a TCP connection) and using that connection to construct the parser object. I have attached the code file. It is ugly and very specific to my current experimentation. I use it like URL urlob =3D ByteBufferURL.fromByteArray(new URL("http://original url string so relative links get resolved right"),byetarray,bytecontentlenght); Parser parser =3D new Parser(urlob.openConnection()); This does not result in any network activity of resolving/connecting etc (at least in my limited testing) as desired. The advantage IMO is it keeps the rest of the code simple (hopefully). Responding since this may be useful to Lu=EDs Gomes. I have other unrelated questions that I'll ask in a separate thread Thanks for the pointers. -Antony On 3/4/06, Derrick Oswald <Der...@ro...> wrote: > Lu=EDs, > > I believe what you want to do is possible with the current API. > > Page page =3D new Page (new InputStreamSource (input, charset)); > page.setUrl (url); > Parser parser =3D new Parser (new Lexer (page)); > > You would use the HTTP headers to figure out if it's gzipped (and use a > GZIPInputStream) and determine the charset yourself. > > Derrick |