From: David N. <dav...@gm...> - 2014-03-19 14:48:10
|
Well, it may not be pfSense after all. I connected the servers directly to the modem, ran the crawlers, and I'm still getting UnknownHostException's. It must be either my modem, rate limiting on my providers DNS servers, or the line I'm on itself. On 3/18/14, David Noel <dav...@gm...> wrote: > It seems that pfSense is the culprit. I loaded the crawlers on a few > servers with 100 threads per instance, waited for > UnknownHostException's to be thrown, then plugged a laptop directly in > to my modem, bypassing my 2 pfSense routers. All DNS queries have gone > through, no problem. I'm contacting the pfSense lists to see if anyone > there knows what might be going on. It's probably a misconfiguration > on my part... or maybe it just needs to be tuned differently. I'd be > surprised if something were wrong with pfSense itself. > > Thanks for your input, Ahmed. > > -David > > On 3/14/14, Ahmed Ashour <asa...@ya...> wrote: >> Hi David, >> >> The only known issue is 1577 where HtmlUnit makes new socket connection >> per >> call to getPage(), which is not happening in 2.13 >> >> I suggest two things: revert to 2.13, and directly use HttpClient or Java >> net with very minimal logic, so that we know if it's HtmlUnit or Java. >> >> Ahmed >> >>> On Mar 14, 2014, at 11:05 PM, David Noel <dav...@gm...> wrote: >>> >>> I've encountered an issue while scaling a Java project that I'm not >>> sure how to resolve. Any thoughts would be appreciated. >>> >>> The code is a crawler that uses HTMLUnit's getPage method. I'm running >>> 100 threads per instance. When I have 1 instance up and running >>> everything is fine. When I scale it to a second machine though I start >>> having trouble. Calls to getPage keep throwing UnknownHostException's. >>> Roughly 1 out of every 20 calls throw this exception. For some reason >>> it's unable to resolve domain names.. and it's not just the crawlers, >>> my entire network starts to bug on DNS queries. On different systems >>> on the same network I get 'unable to resolve host' errors in my web >>> browser periodically when loading URL's. Usually when I retry it goes >>> through, but it keeps happening sporadically as long as the crawlers >>> are running. >>> >>> So many things could be going wrong here. Thinking maybe it was my >>> provider throttling DNS queries I've tried changing DNS servers, but >>> that's done nothing. Thinking it might be a bandwidth issue I checked >>> systat, but the cumulative load is well under what my line can handle. >>> What else could be causing this? My network is pretty simple: Provider >>> <--> modem <--> 2 routers running pfSense <--> Servers and >>> workstations. The servers are running FreeBSD, and the workstations >>> run FreeBSD, Windows, and OSX. >>> >>> Has anyone encountered this before? Does anyone have any thoughts on >>> what might be causing it? >>> >>> My only other thought is that maybe pfSense is doing something >>> strange, so if I can't come up with any better ideas I'll try plugging >>> the servers directly into the modem. I'd rather have them behind the >>> routers though, so this would be a less-than-ideal solution. >>> >>> -David > |