Running server, maps and archetypes 1.71.dev-r21037
In the server logs, a message about "skipping time" frequently appears.
I am not sure how long this message has been showing up in the server logs or since what revision it started showing up.
The issue does not appear to impact game play on the server, server stability, etc.
Example:
19/11/28 23:29:03 [II] skipping time
19/11/28 23:39:29 [II] skipping time
19/11/29 14:36:17 [II] skipping time
19/11/29 14:43:46 [II] skipping time
19/11/29 14:44:25 [II] skipping time
19/11/29 14:44:41 [II] skipping time
19/11/29 14:45:00 [II] skipping time
19/11/29 14:45:01 [II] skipping time
19/11/29 14:48:07 [II] skipping time
19/11/29 14:48:19 [II] skipping time
19/11/29 14:49:48 [II] skipping time
19/11/29 14:50:00 [II] skipping time
This message was added when I rewrote the main loop for lower latency. This message means that the server took more real time to handle a unit of server time ("tick") than the game time elapsed. A tick is by default about 1/8 of a second; this means that the server took more than 1/8 of a second to handle 1/8 of time in-game, so the loop skipped that time step.
Do you run on a heavily loaded server, or are playing very big maps?
This occasionally happens on Invidious, but only once or twice every few days or so.
Single character in a virtual box VM running Fedora 31. I did see the log file alerts in a Debian 10 VM again with a single user logged in. Both times the player was just wandering around Scorn and Beginners House.
Allocated resources for the VM is 2 cores of Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz and 4GB of RAM. Six cores and ~14GB RAM remain for the host machine.
I'll continue to test and see if I can replicate the issue or find a possible cause.
That should be more than plenty -- Invidious is relatively under-powered. Crossfire can only utilize one core, anyway.
Been fiddling around with a private server on a Raspberry Pi 2B, and I get the message pretty frequently. Observed that it tends to stop occurring at the autosave point (every 10 minutes) if no players are logged in.
Using valgrind to add overhead to the server (and track where execution time is occurring) also seems to make it easy to recreate on reasonably powerful machines.
Presumably something gets cleaned up at that autosave point and then it stops causing delays.
Also, I added detail to the log message to indicate how much over the server tick was. I'm usually seeing < 300ms overages when they occur. Every once in a while I get really bad ones > 1 second.
Last edit: SilverNexus 2019-12-02
I wonder if this is due to I/O? On the systems you tested, is /tmp mounted on a RAM disk or a physically-backed filesystem?
I am fairly confident at least part of mine is due to I/O.
/tmp is on the SD Card-driven filesystem. So all I/O is painfully slow. The initial load tick complains about being like 4 seconds over due to all the I/O from loading the game data.
Though since dropping Valgrind on top of my laptop's version of the server (which otherwise does not trigger the overage often) also makes it go over the server tick time regularly, I assume there could be non-I/O sources as well.
Would you be able to test this theory by remounting tmp as a tmpfs?
I think I managed it more simply with just some extra attention to when the overages occur -- under regular use, the vast majority of skipping time messages occurred when one or more maps were saved or loaded in the same second as (and prior to) the message. When not saving player-specific maps, it usually took at least two maps saving at the same time to trigger the message.
In particular, running around the overworld triggers a lot of saves and loads, and seems to give it grief.
It would still be helpful it you could test with tmpfs.
This is somewhat of an issue; whenever the server reports "skipping time," all the clients see what amounts to a "lag," i.e. game time pausing.
Running around the overworld causes a lot of saves and loads because of a bug in how tiled maps are swapped (socket/request.c, setting "m->timeout" overrides the previous timeout). This should reduce some unnecessary swapping.
Tried it out with /tmp mounted on tmpfs. It performed marginally better, but still had overages corresponding to when two or more maps saved on the same tick.
It seemed to be more limited to the world maps than before, so it did create some improvement in performance.
Last edit: SilverNexus 2019-12-04
I think it's safe to say that most of the "skipping time" we see is due to I/O. Since this creates user-visible lag, I think it's appropriate that this warning stays.
In the long term we should think about how to best reduce these pauses, and a way to effectively test our changes. I use tmpfs/ZFS, so I don't see many of these pauses due to I/O.
At the moment, all evidence points to this being a local hardware issue.
I can no longer duplicate or encounter the issue with the Fedora 31 VM. But, it does come up and is worse with a brand new VM and then disappears after a day or two.
Working theory with the informatin provided earlier in this ticket:
The computer has a hybrid SSD. Every new VM resides on the SSD partition of the HD, which is why the problems comes up on the new VMs and not the old ones.
Added a note about "skipping time" on http://wiki.cross-fire.org/dokuwiki/doku.php/server.