After a few hours of flight, I am getting a regular segfault there (AI deactivated):
Thread 1 "fgfs" received signal SIGSEGV, Segmentation fault.
0x00007efd09196ce2 in SGTime::updateLocal(SGGeod const&, SGPath const&) ()
from /lib64/libSimGearCore.so.2020.4.0
(gdb) bt
#0 0x00007efd09196ce2 in SGTime::updateLocal(SGGeod const&, SGPath const&) ()
from /lib64/libSimGearCore.so.2020.4.0
#1 0x0000000000b02122 in TimeManager::updateLocalTime() ()
#2 0x0000000000b041c8 in TimeManager::update(double) ()
#3 0x00007efd09182e46 in SGSubsystemGroup::Member::update(double) ()
from /lib64/libSimGearCore.so.2020.4.0
#4 0x00007efd09182ef9 in SGSubsystemGroup::updateMembers(int, double) ()
from /lib64/libSimGearCore.so.2020.4.0
#5 0x00007efd0917ecaa in SGSubsystemMgr::update(double) ()
from /lib64/libSimGearCore.so.2020.4.0
#6 0x0000000000ce45fd in fgMainLoop() ()
#7 0x0000000000c4863c in fgOSMainLoop() ()
#8 0x0000000000ce999f in fgMainInit(int, char**) ()
#9 0x0000000000545d6e in main ()
Another run (2020.4.0, forgot to say), the same segfault with slightly more info:
Mmmm, if SGGeod is returning junk, that could trigger the pb.
Might be related to? https://sourceforge.net/p/flightgear/codetickets/2764/
Last edit: eatdirt 2022-11-20
In math/SGGeod.hxx, a comment says that a new SGGeod() object is by default created to be invalid for historical reasons. So these lines suggest that when aLocation is invalid there is a pb, but if the problem is to be invalid, having a new object "location" is not going to help!
Edit: That seems to be fine, the constructor set invalid to true but reset lon/lat/evel to 0. However, isValid is not checking NaN on elevation, that might be the culprit.
I am having this bug in Space, so may be elevation is the trigger for having aLocation valid while it is not. But, I've never had that on 2018.x.y though, why now?
Last edit: eatdirt 2022-11-20
Line 241 is:
description = nearestTz->getDescription();
I'm 99% sure th eproblem is 'nearestTz' being null or invalid. Can you add a check and log message before line 241, somethiing liek if (nearestTz == nullptr) { SG_LOG(SG_GENERAL, SG_ALERT, "No timezone"); return; }
... and see if this fixes the crash, and if the log message is printed?
You're right, as usual :)
I am having it under gdb, and location and alocation are perfectly fine indeed.
And indeed, nearestTz is null as equalized to that guy:
It's interesting, there is a conditional in the SGTime::init() before trying to getDescription(), but not in updateLocal().
Last edit: eatdirt 2022-11-21
If we bail out of updateLocal, the sim local time is going to get weird: should we maybe default to UTC instead? Do you see any problems if you just early-return from updateLocale, in your testing?
I've checked a hard bailout:
and that is very rarely happening. So, if you want to skip the whole updatelocal above a certain altitude, UTC would be the choice (notice however that all the other functions like sidereal time etc.. are required for space flight). But if the idea is to bailout only when that pointer is null, I would let the local time, it recovers the next iteration and the time zone remains fine.
Now, why that pointer is becoming null seems to be a bug elsewhere?
PS: This bug seems to be triggered by a very peculiar situation for the Space Shuttle, it happens soon after we simulate a second orbiting target (here the Hubble Space Telescope), I haven't figured out why, but we stress things by doing this :)
NB: Unrelated to this bug, space flights are also stressing the tile manager, I have megabytes of this in the logs (the terrain is not displayed as clipped and replaced by earthview). Maybe there could be a way to stop this above a certain altitude as well.
There's definitely an underlying bug that we ever get a null tz value, but fixing that may be trickier so let's just bail out for now, if the situation is so obscure.
About the other message, maybe start a seperate discussion on disabling tile loading when earthview is active? Since I agree that probably makes sense.
Pushed the work around to next now, it logs the problem SGGeod so curious to see what values occur. Leaving this as NeedInfo for now, to fix the real issue of getting the nearest TZ failing.
Here we go, maybe the Shuttle being fast it just has a highest probability to explore a region where the timezone has a bug? It is really only this, no error after passing this zone, no error before!
Just checking, this is location in the Carribean, off the north west tip of Curaco: I'm wondering if our timezone data has an edge case there?
I added a unit-test for this: (-69.5, 12.0) works (gets TZ of Americas/Caracas, (-69.0, 12.0) does not find a matching time zone and trips the error. I suspect this is a bug in the zone-detect input data we use, but I'm not sure how we get a fix there.
Importantly; this is nothing to do with the Shuttle: if you fly with UFO or anything in this area, you would have got the same crash.
Well found! I remember some loading of a binary file in the code, maybe it misses data (timezone16.bin), but that would require some work to check all this (FGDATA/Timezone has some info though).
I've pushed an updated timezone16.bit to FGData, which fixes the problem for me (in this location), I will backport that to 2020.3