Menu

#3222 base: write() from socketpair in exec module causes amfnd hanging

5.20.11
fixed
None
defect
base
lib
major
False
2020-10-28
2020-09-22
No

Reproduction:
- Load amf_demo app with a modification that adds 800 components per SU
- Unlock-in/Unlock both SUs
- pkill amf_demo. Note, this pkill will terminate 800 components at once

Observation:
- amfnd's stuck, and node reboot eventually due to amfwd
- strace shows that all amfnd's threads are stuck at write()

root@SC-1:~# ps -ef | grep osafamfnd
root 329 1 0 22:59 ? 00:00:08 /usr/local/lib/opensaf/osafamfnd --tracemask=0xffffffff
root 18743 11375 0 23:00 pts/4 00:00:00 grep --color=auto osafamfnd
root@SC-1:~# strace -ffp 329
strace: Process 329 attached with 5 threads
[pid 334] write(32, "\0\0\0\0\0\0\0\0\0\0\0\0", 12 <unfinished ...="">
[pid 333] write(32, "\0\0\0\0\0\0\0\0\0\0\0\0", 12 <unfinished ...="">
[pid 332] write(32, "\0\0\0\0\0\0\0\0\0\0\0\0", 12 <unfinished ...="">
[pid 331] write(32, "\0\0\0\0\0\0\0\0\0\0\0\0", 12 <unfinished ...="">
[pid 329] write(32, "\0\0\0\0\0\0\0\0\0\0\0\0", 12</unfinished></unfinished></unfinished></unfinished>

(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f78af005780 (LWP 322) 0x00007f78adb364bd in write ()
at ../sysdeps/unix/syscall-template.S:84
2 Thread 0x7f78abfb3700 (LWP 326) 0x00007f78adb364bd in write ()
at ../sysdeps/unix/syscall-template.S:84
3 Thread 0x7f78aefc2b00 (LWP 327) 0x00007f78adb364bd in write ()
at ../sysdeps/unix/syscall-template.S:84
4 Thread 0x7f78af002b00 (LWP 324) 0x00007f78adb364bd in write ()
at ../sysdeps/unix/syscall-template.S:84
5 Thread 0x7f78aefe2b00 (LWP 325) 0x00007f78adb364bd in write ()
at ../sysdeps/unix/syscall-template.S:84
(gdb) bt
0 0x00007f78adb364bd in write () at ../sysdeps/unix/syscall-template.S:84
1 0x00007f78ae31c742 in ncs_exec_module_signal_hdlr (signal=<optimised out="">)
at src/base/sysf_exc_scr.c:111
2 <signal handler="" called=""></signal></optimised>

Related

Wiki: ChangeLog-5.20.11

Discussion

  • Minh Hon Chau

    Minh Hon Chau - 2020-09-22
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -31,7 +31,7 @@
       5    Thread 0x7f78aefe2b00 (LWP 325) 0x00007f78adb364bd in write ()
         at ../sysdeps/unix/syscall-template.S:84
     (gdb) bt 
    -#0  0x00007f78adb364bd in write () at ../sysdeps/unix/syscall-template.S:84
    -#1  0x00007f78ae31c742 in ncs_exec_module_signal_hdlr (signal=&lt;optimised out&gt;)
    +0  0x00007f78adb364bd in write () at ../sysdeps/unix/syscall-template.S:84
    +1  0x00007f78ae31c742 in ncs_exec_module_signal_hdlr (signal=&lt;optimised out&gt;)
         at src/base/sysf_exc_scr.c:111
    -#2  &lt;signal handler called&gt;
    +2  &lt;signal handler called&gt;
    
     
  • Minh Hon Chau

    Minh Hon Chau - 2020-09-22

    One possible fix is by adding SOCK_NONBLOCK at socketpair() creation could solve the issue.

     
  • Minh Hon Chau

    Minh Hon Chau - 2020-09-22
    • status: unassigned --> assigned
    • assigned_to: Minh Hon Chau
     
  • Minh Hon Chau

    Minh Hon Chau - 2020-10-28
    • status: assigned --> fixed
     
  • Minh Hon Chau

    Minh Hon Chau - 2020-10-28

    commit 8758c96eaf3d62ec99b99a7ae8d3ebf6884793c1
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon Oct 26 13:12:07 2020 +1100

    base: Use non-blocking socketpair in sysf_exc module V3 [#3222]
    
     

Log in to post a comment.