Originally created by: Mathias....@googlemail.com
Originally owned by: Mathias....@googlemail.com
My testcase:
Start from LOWI and fly west, Then I get frequent hangs every one or two minute for about one to two seconds.
The commit
commit [r3c9f4ad73efa801cea29f3c6ab60e189a143ee48]
Author: James Turner <zakalawe@mac.com>
Date: Mon Aug 8 18:16:49 2011 +0100
Switch RealWx to use simgear::HTTP class. Also tweak metar-proxy to send a HTTP status response.
introduces this hangs.
Backing out this change or putting --disable-real-weather-fetch on the command line works around this problem.
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bre... (code.google.com)@gmail.com
Seems the switch to the new HTTP class also dropped the separate METAR loader thread, so the new code runs in the main loop(?). Not sure if that's a good idea... :).
Summary: METAR real-weather loading results in frequent lags (FG2.5.0).
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
Dropping the threading is intentional, since the new HTTP code is all (supposedly) non-blocking - the threading existed because the original HTTP support in SimGear was synchronous and blocking. I've added some timing code to measure how long we're spending in the RealWx code with my changes, and not seeing any pauses here, but I may be measuring the wrong thing.
Could you adjust the RealWx check interval (_positionTimeToLive = 60.0) and confirm this changes the interval between the hangs? I'm pretty sure it will, but would like to be 100% sure.
Also, is the metar successfully retrieved? The check interval is 60 seconds, but assuming your metar station doesn't change, the lifetime is 15 minutes.
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
Bit more info - starting the METAR request is taking longer that I'd like - eg, 50 or 100msec in a debug build (of SG & FG) - my guess is this is the hostname lookup happening synchronously, but I need to investigate further.
HOWEVER - this is only when an actual metar fetch happens - every 15 minutes (900 seconds) *and* it's still much smaller that the hangs you reported - enough to cause a visually noticeable glitch, but shouldn't be anywhere near a second.
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
Further testing; gethostbyname is blocking for 'some time' - sometimes 40 msec, sometimes 300msec, for me. (Still a bit surprised it's so long for Mathias) Adding caching of resolved host-names should fix the issue for the moment, though ideally we'd change the socket code to do asynchronous resolves.
Status: Accepted
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: Mathias....@googlemail.com
Thanks for investigating!
Yes, gethostbyname is syncronous. May be my dsl connect has too much latency and my router is too slow, which adds then some latency? Also I have the multiplayer packets also plugging my slow dsl connect.
But, in any case you cannot assume that gethostbyname/getaddrinfo/getservicebyname et al are sufficiently fast. They all do may be blocking and even lengthy stuff. There can be an ldap/AD/yellowpage/dns/... query sequence in any order configured there. All of them could do network queries to may be busy servers. So, I think we should just do this once for a fgfs run and then just reuse the result.
Also one hint in this area: The getaddrinfo call is way easier to use then all the gethostby* with hton[sl] and getserviceby* stuff. Also you will get ipv6 compatibility without additional code for free.
Greetings
Mathias
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
Agreed on all the points about gethostbyname - but there's still one major thing - you reported the issue every *minute*, but the metar fetch should only be happening every 15 minutes - so I wonder if your problem is something else.
I'll do the resolve-name caching anyway, of course, but it would be good if you can confirm that you see the delay every minute, and if changing _positionTimeToLive affects it.
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: Mathias....@googlemail.com
And one more:
The connect call is blocking and may take significant time if you switch the socket to non blocking past the connect. But you need to be careful with sending if you do an asyncronous connect call. You really need to wait until select/poll tells you that you can write. At this time the connections is established and you can send something.
Greetings
Mathias
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bre... (code.google.com)@gmail.com
gethostbyname is known to suck :). I've once seen it blocking for a whole minute in our corporate network - happened whenever the network connection to the ISP went down - while the systems still thought the DNS should be available. Also, when METAR loading fails (i.e. system is offline), I think METAR loading is retried every 10 or 20 seconds. I see frequent METAR error messages in my log window (metar fetching is enabled by default). What happens when a user starts fgfs but stays offline? Retry gethostbyname every now and then only?
Well, even if calls aren't officially blocking, they still may not have a bounded execution time. Sometimes it's good to keep such stuff out of the "real-time" context where execution time really matters. But I'm a real-time guy, so probably too picky with these things :). I'm sure you'll find a way to fix it...
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
Regarding non-blocking connect, fortunately the socket code already correctly supports a non-blocking connect, and using select to wait until tells us we can write - so no problems there. The remaining piece, calling getaddrinfo on a helper thread, I'm working on at the moment.
(To avoid breaking existing code, I'm going to make 'non blocking' SimGear sockets look up the host name asynchronously, but blocking ones can continue using the current code - otherwise I risk breaking lots of existing, socket based code in FG, I think, by changing the connect() semantics)
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: Mathias....@googlemail.com
Ok, I have tried to reproduce the hangs today evening.
But currently it works without such frequent and long hickups.
So I expect that my upstream dns server had load problem when I experienced these problems.
I instrumented the real wether fetch's update routine with a timer. The longest time it took today evening was 0.2 sec. Which is too long but way less than I felt at tuesday evening. This matches the felt experience with 'no such hickups today' from above.
So, if I still experience again such hickups when your next change went in, I will open a new ticket :)
Thanks for looking into this
Mathias
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: Mathias....@googlemail.com
Ok, again weekend, again long dns lookup times.
I could verify that the effect I observed is the one we think it is:
Today I had times up to 3.5 seconds spent in NoaaMetarRealWxController::update.
The lasst loop in this method is the offending one.
I assume that the name lookup happens on every 'new MetarGetRequest'?
So, the easiest fix would be to do that dns lookup only once when we setup this class, then later use the the already known struct sockaddr*.
Greetings
Mathias
View and moderate all "codetickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: zakalawe@mac.com
SimGear changes to use a host cache and lookup thread are committed now, and hopefully working everywhere, including Windows.
Status: Fixed