From: Ahmed A. <asa...@ya...> - 2014-03-14 22:01:15
|
Hi David, The only known issue is 1577 where HtmlUnit makes new socket connection per call to getPage(), which is not happening in 2.13 I suggest two things: revert to 2.13, and directly use HttpClient or Java net with very minimal logic, so that we know if it's HtmlUnit or Java. Ahmed > On Mar 14, 2014, at 11:05 PM, David Noel <dav...@gm...> wrote: > > I've encountered an issue while scaling a Java project that I'm not > sure how to resolve. Any thoughts would be appreciated. > > The code is a crawler that uses HTMLUnit's getPage method. I'm running > 100 threads per instance. When I have 1 instance up and running > everything is fine. When I scale it to a second machine though I start > having trouble. Calls to getPage keep throwing UnknownHostException's. > Roughly 1 out of every 20 calls throw this exception. For some reason > it's unable to resolve domain names.. and it's not just the crawlers, > my entire network starts to bug on DNS queries. On different systems > on the same network I get 'unable to resolve host' errors in my web > browser periodically when loading URL's. Usually when I retry it goes > through, but it keeps happening sporadically as long as the crawlers > are running. > > So many things could be going wrong here. Thinking maybe it was my > provider throttling DNS queries I've tried changing DNS servers, but > that's done nothing. Thinking it might be a bandwidth issue I checked > systat, but the cumulative load is well under what my line can handle. > What else could be causing this? My network is pretty simple: Provider > <--> modem <--> 2 routers running pfSense <--> Servers and > workstations. The servers are running FreeBSD, and the workstations > run FreeBSD, Windows, and OSX. > > Has anyone encountered this before? Does anyone have any thoughts on > what might be causing it? > > My only other thought is that maybe pfSense is doing something > strange, so if I can't come up with any better ideas I'll try plugging > the servers directly into the modem. I'd rather have them behind the > routers though, so this would be a less-than-ideal solution. > > -David > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user |