#76 100% CPU usage when using TLS

1.4.x
closed-fixed
core (110)
2
2009-03-21
2009-01-13
Anonymous
No

Suddenly I got some opensips process using 100% of CPU power when using TLS.
"ltrace -p pid" shows that opensips is calling indefinidely the functions: SSL_connect(), SSL_get_error() and pool().

BT using opensips-1.4.3:
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7d547db in poll () from /lib/libc.so.6
#2 0x0810a0df in tls_blocking_write (c=0xb35f40f0, fd=31,
buf=0x81c8a80 "NOTIFY sip:554133413090@172.16.10.102:5060;transport=tls SIP/2.0\r\nVia: SIP/2.0/TLS 172.20.25.1:5061;branch=z9hG4bK2564375960aa47a90;i=b\r\nVia: SIP/2.0/TLS 172.20.3.90:5061;branch=z9hG4bK2564375960aa47a"..., len=1141) at tls/tls_server.c:746
#3 0x080adfe2 in tcp_send (send_sock=0x8182a20, type=3,
buf=0x81c8a80 "NOTIFY sip:554133413090@172.16.10.102:5060;transport=tls SIP/2.0\r\nVia: SIP/2.0/TLS 172.20.25.1:5061;branch=z9hG4bK2564375960aa47a90;i=b\r\nVia: SIP/2.0/TLS 172.20.3.90:5061;branch=z9hG4bK2564375960aa47a"..., len=1141, to=0xbfeca438, id=0) at tcp_main.c:765
#4 0x0806313d in msg_send (send_sock=0x1, proto=<value optimized out>, to=0xbfeca438, id=0,
buf=0x81c8a80 "NOTIFY sip:554133413090@172.16.10.102:5060;transport=tls SIP/2.0\r\nVia: SIP/2.0/TLS 172.20.25.1:5061;branch=z9hG4bK2564375960aa47a90;i=b\r\nVia: SIP/2.0/TLS 172.20.3.90:5061;branch=z9hG4bK2564375960aa47a"..., len=1141) at forward.h:127
#5 0x08064284 in forward_request (msg=0x81c77e8, p=0x81c80f8) at forward.c:382
#6 0x08054c44 in do_action (a=0x818ba38, msg=0x81c77e8) at action.c:336
#7 0x080585df in run_action_list (a=0x818ba38, msg=0x81c77e8) at action.c:138
#8 0x08057944 in do_action (a=0x818bb08, msg=0x81c77e8) at action.c:717
#9 0x080585df in run_action_list (a=0x818bb08, msg=0x81c77e8) at action.c:138
#10 0x08058232 in do_action (a=0x818bb70, msg=0x81c77e8) at action.c:723
#11 0x080585df in run_action_list (a=0x818bb70, msg=0x81c77e8) at action.c:138
#12 0x08057944 in do_action (a=0x818bbd8, msg=0x81c77e8) at action.c:717
#13 0x080585df in run_action_list (a=0x8187a10, msg=0x81c77e8) at action.c:138
#14 0x080589ca in run_top_route (a=0x8187a10, msg=0x81c77e8) at action.c:118
#15 0x0808aadc in receive_msg (
buf=0xb359095c "NOTIFY sip:554133413090@172.16.10.102:5060;transport=tls SIP/2.0\r\nVia: SIP/2.0/TLS 172.20.3.90:5061;branch=z9hG4bK2564375960aa47a90\r\nRoute: <sip:172.20.25.1:5061;transport=tls;r2=on;lr=on;ftag=snl_t2a"..., len=1070, rcv_info=0xb35908f8) at receive.c:165
#16 0x080b1cdc in tcp_read_req (con=0xb35908e8, bytes_read=0xbfecb038) at tcp_read.c:544
#17 0x080b3f8b in handle_io (fm=0x81bb514, idx=-1) at tcp_read.c:812
#18 0x080b57b7 in tcp_receive_loop (unix_sock=25) at io_wait.h:727
#19 0x080a6fdf in tcp_init_children (chd_rank=0x81490a0) at tcp_main.c:1706
#20 0x0806b49c in main (argc=7, argv=0xbfecb374) at main.c:832
(gdb)

Usign GDB I see that tls_blocking_write() function never returns. The poll() system call returns instantly. SSL_connect() returns error, SSL_geterror shows SSL_ERROR_WANT_READ...

Seems that is missing some work in the tls_blocking_write() function in order to prevent such behavior - some kind of timeout or max retries...

The same problem occurs with openser-1.3.0.

Also, I saw some strange comments in the source code:
/*
* fixme: probably does not work correctly
*/
size_t
tls_blocking_write(struct tcp_connection *c, int fd, const char *buf, size_t len)
fixme?

Discussion

    • assigned_to: nobody --> bogdan_iancu
     
  • Investigating....

     
    • priority: 5 --> 2
    • status: open --> open-fixed
     
  • Ok, a fix is available on trunk - if you could test it and report back, it will be great. I will wait a bit for feedback before backporting to 1.4.

    Thanks and regards,
    Bogdan

     
  • backport done

     
    • status: open-fixed --> closed-fixed