Hi,
I'm expereincing a recurrence of an issue with the system monitor, where it "gets stuck" and every 5 mins I get an email as below from CRON. I have narrowed it down to the FTP module, but the FTP server being tracked is always up. It happens sporadically every few hours and gets stuck till I kill monitor.pl. The monitor is configured to just login and quit. I say recuurence - historically it has happened from time to time, then goes away. But this has now been consistent for a couple of weeks.
The strange thing is that the FTP monitor status history stops updating when the problem occurs, yet no error or timeout is logged. I have a large number of monitors (50+), but if run monitor.pl "manually", the process runs in about 3-4s, so it's not "overrunning". Please let me know what more debug you need.
Error: Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes. Last error was :
Error
Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes. Last error was :
Do you a custom timeout set for your FTP monitor? By default Webmin should wait only 10 seconds for each FTP check to complete or fail, rather than being blocked forever.
No. I have had a custom timeoit in the past, I removed it and when to default in case it was the cause of the issue, but made no difference. Just occasionally when I click on the monitor (so it runs immediatley from the web interface) it freezes (forever). It's like the timeout isn't being applied to some phase of the transaction.
OK I see the cause of this - it will be fixed in the next Webmin release.
Great - appreciate your work as always
Were the changes just commits
95022b57d813318d64887903f86eff3f357f31e5 and a2e984175e9e8839174a171a2376b5eaa306decb ?
If so, I replaced web-lib-funcs.pl and made the other change but I'm sorry to say the issue persists.
You can see the missing events in the log in the early hours, but no webmin event
03/Sep/2018 04:10
03/Sep/2018 04:05
03/Sep/2018 04:00
03/Sep/2018 02:55
03/Sep/2018 02:50
03/Sep/2018 02:45
email every 5 mins:
Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes.
Perhaps I missed something?
If you applied those patches, you may need to restart Webmin and delete that /var/webmin/modules/status/oldstatus.lock file one last time.
Yes, I did all of that. It hasn't made a difference to the particular scenario I'm expereincing, it seems about the same. I tried different/default timeouts too. What other debug can I get you?
When this happens, can you try running :
and post the output here?
output of strace -p
cat /var/webmin/modules/status/oldstatus.lockIt then gets stuck here. and doesn't exit without CTRL-C
With -d debug:
Let me know what more you need
Thx
What output do you get if you run
lsof -p 25962