From: David N. <dav...@gm...> - 2014-03-18 13:32:01
|
It seems that pfSense is the culprit. I loaded the crawlers on a few servers with 100 threads per instance, waited for UnknownHostException's to be thrown, then plugged a laptop directly in to my modem, bypassing my 2 pfSense routers. All DNS queries have gone through, no problem. I'm contacting the pfSense lists to see if anyone there knows what might be going on. It's probably a misconfiguration on my part... or maybe it just needs to be tuned differently. I'd be surprised if something were wrong with pfSense itself. Thanks for your input, Ahmed. -David On 3/14/14, Ahmed Ashour <asa...@ya...> wrote: > Hi David, > > The only known issue is 1577 where HtmlUnit makes new socket connection per > call to getPage(), which is not happening in 2.13 > > I suggest two things: revert to 2.13, and directly use HttpClient or Java > net with very minimal logic, so that we know if it's HtmlUnit or Java. > > Ahmed > >> On Mar 14, 2014, at 11:05 PM, David Noel <dav...@gm...> wrote: >> >> I've encountered an issue while scaling a Java project that I'm not >> sure how to resolve. Any thoughts would be appreciated. >> >> The code is a crawler that uses HTMLUnit's getPage method. I'm running >> 100 threads per instance. When I have 1 instance up and running >> everything is fine. When I scale it to a second machine though I start >> having trouble. Calls to getPage keep throwing UnknownHostException's. >> Roughly 1 out of every 20 calls throw this exception. For some reason >> it's unable to resolve domain names.. and it's not just the crawlers, >> my entire network starts to bug on DNS queries. On different systems >> on the same network I get 'unable to resolve host' errors in my web >> browser periodically when loading URL's. Usually when I retry it goes >> through, but it keeps happening sporadically as long as the crawlers >> are running. >> >> So many things could be going wrong here. Thinking maybe it was my >> provider throttling DNS queries I've tried changing DNS servers, but >> that's done nothing. Thinking it might be a bandwidth issue I checked >> systat, but the cumulative load is well under what my line can handle. >> What else could be causing this? My network is pretty simple: Provider >> <--> modem <--> 2 routers running pfSense <--> Servers and >> workstations. The servers are running FreeBSD, and the workstations >> run FreeBSD, Windows, and OSX. >> >> Has anyone encountered this before? Does anyone have any thoughts on >> what might be causing it? >> >> My only other thought is that maybe pfSense is doing something >> strange, so if I can't come up with any better ideas I'll try plugging >> the servers directly into the modem. I'd rather have them behind the >> routers though, so this would be a less-than-ideal solution. >> >> -David |