Menu

#401 METAR real-weather loading results in frequent lags (FG2.5.0).

Fixed
nobody
None
2011-09-15
2011-08-09
Anonymous
No

Originally created by: Mathias....@googlemail.com
Originally owned by: Mathias....@googlemail.com

My testcase:

Start from LOWI and fly west, Then I get frequent hangs every one or two minute for about one to two seconds.

The commit

commit [r3c9f4ad73efa801cea29f3c6ab60e189a143ee48]
Author: James Turner <zakalawe@mac.com>
Date:   Mon Aug 8 18:16:49 2011 +0100

    Switch RealWx to use simgear::HTTP class. Also tweak metar-proxy to send a HTTP status response.

introduces this hangs.

Backing out this change or putting --disable-real-weather-fetch on the command line works around this problem.

Related

Tickets: #397

Discussion

  • Anonymous

    Anonymous - 2011-08-09

    Originally posted by: bre... (code.google.com)@gmail.com

    Seems the switch to the new HTTP class also dropped the separate METAR loader thread, so the new code runs in the main loop(?). Not sure if that's a good idea... :).

    Summary: METAR real-weather loading results in frequent lags (FG2.5.0).

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: zakalawe@mac.com

    Dropping the threading is intentional, since the new HTTP code is all (supposedly) non-blocking - the threading existed because the original HTTP support in SimGear was synchronous and blocking. I've added some timing code to measure how long we're spending in the RealWx code with my changes, and not seeing any pauses here, but I may be measuring the wrong thing.

    Could you adjust the RealWx check interval (_positionTimeToLive = 60.0) and confirm this changes the interval between the hangs? I'm pretty sure it will, but would like to be 100% sure.

    Also, is the metar successfully retrieved? The check interval is 60 seconds, but assuming your metar station doesn't change, the lifetime is 15 minutes.

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: zakalawe@mac.com

    Bit more info - starting the METAR request is taking longer that I'd like - eg, 50 or 100msec in a debug build (of SG & FG) - my guess is this is the hostname lookup happening synchronously, but I need to investigate further.

    HOWEVER - this is only when an actual metar fetch happens - every 15 minutes (900 seconds) *and* it's still much smaller that the hangs you reported - enough to cause a visually noticeable glitch, but shouldn't be anywhere near a second.

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: zakalawe@mac.com

    Further testing; gethostbyname is blocking for 'some time' - sometimes 40 msec, sometimes 300msec, for me. (Still a bit surprised it's so long for Mathias) Adding caching of resolved host-names should fix the issue for the moment, though ideally we'd change the socket code to do asynchronous resolves.

    Status: Accepted

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: Mathias....@googlemail.com

    Thanks for investigating!

    Yes, gethostbyname is syncronous. May be my dsl connect has too much latency and my router is too slow, which adds then some latency? Also I have the multiplayer packets also plugging my slow dsl connect.

    But, in any case you cannot assume that gethostbyname/getaddrinfo/getservicebyname et al are sufficiently fast. They all do may be blocking and even lengthy stuff. There can be an ldap/AD/yellowpage/dns/... query sequence in any order configured there. All of them could do network queries to may be busy servers. So, I think we should just do this once for a fgfs run and then just reuse the result.

    Also one hint in this area: The getaddrinfo call is way easier to use then all the gethostby* with hton[sl] and getserviceby* stuff. Also you will get ipv6 compatibility without additional code for free.

    Greetings

    Mathias

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: zakalawe@mac.com

    Agreed on all the points about gethostbyname - but there's still one major thing - you reported the issue every *minute*, but the metar fetch should only be happening every 15 minutes - so I wonder if your problem is something else.

    I'll do the resolve-name caching anyway, of course, but it would be good if you can confirm that you see the delay every minute, and if changing _positionTimeToLive affects it.

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: Mathias....@googlemail.com

    And one more:

    The connect call is blocking and may take significant time if you switch the socket to non blocking past the connect. But you need to be careful with sending if you do an asyncronous connect call. You really need to wait until select/poll tells you that you can write. At this time the connections is established and you can send something.

    Greetings

    Mathias

     
  • Anonymous

    Anonymous - 2011-08-10

    Originally posted by: bre... (code.google.com)@gmail.com

    gethostbyname is known to suck :). I've once seen it blocking for a whole minute in our corporate network - happened whenever the network connection to the ISP went down - while the systems still thought the DNS should be available. Also, when METAR loading fails (i.e. system is offline), I think METAR loading is retried every 10 or 20 seconds. I see frequent METAR error messages in my log window (metar fetching is enabled by default). What happens when a user starts fgfs but stays offline? Retry gethostbyname every now and then only?

    Well, even if calls aren't officially blocking, they still may not have a bounded execution time. Sometimes it's good to keep such stuff out of the "real-time" context where execution time really matters. But I'm a real-time guy, so probably too picky with these things :). I'm sure you'll find a way to fix it...

     
  • Anonymous

    Anonymous - 2011-08-11

    Originally posted by: zakalawe@mac.com

    Regarding non-blocking connect, fortunately the socket code already correctly supports a non-blocking connect, and using select to wait until tells us we can write - so no problems there. The remaining piece, calling getaddrinfo on a helper thread, I'm working on at the moment.

    (To avoid breaking existing code, I'm going to make 'non blocking' SimGear sockets look up the host name asynchronously, but blocking ones can continue using the current code - otherwise I risk breaking lots of existing, socket based code in FG, I think, by changing the connect() semantics)

     
  • Anonymous

    Anonymous - 2011-08-11

    Originally posted by: Mathias....@googlemail.com

    Ok, I have tried to reproduce the hangs today evening.
    But currently it works without such frequent and long hickups.
    So I expect that my upstream dns server had load problem when I experienced these problems.

    I instrumented the real wether fetch's update routine with a timer. The longest time it took today evening was 0.2 sec. Which is too long but way less than I felt at tuesday evening. This matches the felt experience with 'no such hickups today' from above.

    So, if I still experience again such hickups when your next change went in, I will open a new ticket :)

    Thanks for looking into this

    Mathias

     
  • Anonymous

    Anonymous - 2011-08-14

    Originally posted by: Mathias....@googlemail.com

    Ok, again weekend, again long dns lookup times.

    I could verify that the effect I observed is the one we think it is:
    Today I had times up to 3.5 seconds spent in NoaaMetarRealWxController::update.

    The lasst loop in this method is the offending one.
    I assume that the name lookup happens on every 'new MetarGetRequest'?

    So, the easiest fix would be to do that dns lookup only once when we setup this class, then later use the the already known struct sockaddr*.

    Greetings

    Mathias

     
  • Anonymous

    Anonymous - 2011-09-15

    Originally posted by: zakalawe@mac.com

    SimGear changes to use a host cache and lookup thread are committed now, and hopefully working everywhere, including Windows.

    Status: Fixed

     

Log in to post a comment.

MongoDB Logo MongoDB