Menu

#5185 Status monitor gets stuck "Failed to lock file..." FTP

1.890
closed-fixed
nobody
FTP monitor (1)
5
2018-09-12
2018-08-28
No

Hi,
I'm expereincing a recurrence of an issue with the system monitor, where it "gets stuck" and every 5 mins I get an email as below from CRON. I have narrowed it down to the FTP module, but the FTP server being tracked is always up. It happens sporadically every few hours and gets stuck till I kill monitor.pl. The monitor is configured to just login and quit. I say recuurence - historically it has happened from time to time, then goes away. But this has now been consistent for a couple of weeks.

The strange thing is that the FTP monitor status history stops updating when the problem occurs, yet no error or timeout is logged. I have a large number of monitors (50+), but if run monitor.pl "manually", the process runs in about 3-4s, so it's not "overrunning". Please let me know what more debug you need.

Error: Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes. Last error was :
Error


Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes. Last error was :

Discussion

  • Jamie Cameron

    Jamie Cameron - 2018-08-29

    Do you a custom timeout set for your FTP monitor? By default Webmin should wait only 10 seconds for each FTP check to complete or fail, rather than being blocked forever.

     
  • Richard Farthing

    No. I have had a custom timeoit in the past, I removed it and when to default in case it was the cause of the issue, but made no difference. Just occasionally when I click on the monitor (so it runs immediatley from the web interface) it freezes (forever). It's like the timeout isn't being applied to some phase of the transaction.

     
  • Jamie Cameron

    Jamie Cameron - 2018-08-29
    • status: open --> closed-fixed
     
  • Jamie Cameron

    Jamie Cameron - 2018-08-29

    OK I see the cause of this - it will be fixed in the next Webmin release.

     
  • Richard Farthing

    Great - appreciate your work as always

     
  • Richard Farthing

    Were the changes just commits
    95022b57d813318d64887903f86eff3f357f31e5 and a2e984175e9e8839174a171a2376b5eaa306decb ?

    If so, I replaced web-lib-funcs.pl and made the other change but I'm sorry to say the issue persists.
    You can see the missing events in the log in the early hours, but no webmin event

    03/Sep/2018 04:10
    03/Sep/2018 04:05
    03/Sep/2018 04:00
    03/Sep/2018 02:55
    03/Sep/2018 02:50
    03/Sep/2018 02:45

    email every 5 mins:
    Failed to lock file /var/webmin/modules/status/oldstatus after 5 minutes.

    Perhaps I missed something?

     
  • Jamie Cameron

    Jamie Cameron - 2018-09-05

    If you applied those patches, you may need to restart Webmin and delete that /var/webmin/modules/status/oldstatus.lock file one last time.

     
  • Richard Farthing

    Yes, I did all of that. It hasn't made a difference to the particular scenario I'm expereincing, it seems about the same. I tried different/default timeouts too. What other debug can I get you?

     
  • Jamie Cameron

    Jamie Cameron - 2018-09-05

    When this happens, can you try running :

    strace -p `cat /var/webmin/modules/status/oldstatus.lock`
    

    and post the output here?

     
  • Richard Farthing

    output of strace -p cat /var/webmin/modules/status/oldstatus.lock

    Process 25962 attached
    read(3, 
    

    It then gets stuck here. and doesn't exit without CTRL-C

    With -d debug:

    new tcb for pid 25962, active tcbs:1                                                                                                                    
    ptrace_setoptions = 0x11                                                                                                                                
    attach to pid 25962 (main) succeeded                                                                                                                    
    Process 25962 attached                                                                                                                                  
     [wait(0x80057f) = 25962] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)                                                                                       
    pid 25962 has TCB_STARTUP, initializing it                                                                                                              
     [wait(0x00857f) = 25962] WIFSTOPPED,sig=133                                                                                                            
    read(3, 
    

    Let me know what more you need
    Thx

     
  • Jamie Cameron

    Jamie Cameron - 2018-09-12

    What output do you get if you run lsof -p 25962

     

Log in to post a comment.