Menu

#607 Mds : tcp assert in MDS on cumulated unsent messages

4.5.0
fixed
None
enhancement
mds
-
4.5
minor
2015-07-14
2013-10-30
No

Using the OpenSAF test program "logtest" and the latest opensaf configured with MDS/TCP crashes the log server in the assert in mds_mdtm_queue_add_unsent_msg():

======================================================================================
++tcp_cb->mdtm_tcp_unsent_counter; / Increment the counter to keep a tab on number of messages /
if (tcp_cb->mdtm_tcp_unsent_counter <= DTM_INTRANODE_UNSENT_MSG) {
if (NULL == hdr && NULL == tail) {
tcp_cb->mds_mdtm_msg_unsent_hdr = tmp;
tcp_cb->mds_mdtm_msg_unsent_tail = tmp;
} else {
tail->next = tmp;
tcp_cb->mds_mdtm_msg_unsent_tail = tmp;

         /* Change the poll from POLLIN to POLLOUT */
         pfd[0].events = pfd[0].events | POLLOUT;
     }
 } else {
     syslog(LOG_ERR, " MDTM unsent message is more!=%d", DTM_INTRANODE_UNSENT_MSG);
     assert(0);
     return NCSCC_RC_FAILURE;
 }

======================================================================================

$ grep DTM_INTRANODE_UNSENT_MSG include/*
include/mds_dt_tcp_disc.h:#define DTM_INTRANODE_UNSENT_MSG 200

mds_mdtm_unsent_queue_add_send() is the only place mds_mdtm_queue_add_unsent_msg() is called.

mds_mdtm_unsent_queue_add_send() can return an error code, none of its callers check the return code! I guess it should return void then and abort internally.

Related

Tickets: #607

Discussion

  • A V Mahesh (AVM)

    • summary: Mds : tcp assert in MDS on cumulated unset messages --> Mds : tcp assert in MDS on cumulated unsent messages
     
  • Hans Feldt

    Hans Feldt - 2013-10-30

    I don't understand the purpose of the send queue. TCP socket have a send buffer. It's default size on my system is:
    $ cat /proc/sys/net/core/wmem_default
    229376

    So if a user wants more buffering it should be able to increase this value. The internal queue in MDS does not make sense. It just extens the already existing socket send buffer. I think we should get rid of the queue.

    The user should have a possibility to tune the socket send buffer size. Right now it is hardcoded to:

    define MDS_SND_RCV_SIZE 64000

    which is lower than the default on modern systems! MDS should not set the socket buffer length unless asked by the client. Just use the default. The client could specify an environment variable that the MDS library could pick up and use.

    Keep it simple otherwise the system characteristics will be very hard to understand and predict...

     
  • A V Mahesh (AVM)

    • Type: defect --> enhancement
     
  • Hans Feldt

    Hans Feldt - 2014-10-07

    I don't understand why this was changed to an enhancement. The following test case shows the problem:
    $ saflogtest -k -c 1000 -y hello

    and by simply not setting the socket as non blocking it works! patch:
    diff --git a/osaf/libs/core/mds/mds_dt_tcp.c b/osaf/libs/core/mds/mds_dt_tcp.c
    --- a/osaf/libs/core/mds/mds_dt_tcp.c
    +++ b/osaf/libs/core/mds/mds_dt_tcp.c
    @@ -255,7 +255,7 @@ uint32_t mds_mdtm_init_tcp(NODE_ID nodei
    close(tcp_cb->DBSRsock);
    return NCSCC_RC_FAILURE;
    }
    -
    +#if 0
    / Convert the socket to a Non-blocking socket /
    if ((flags = fcntl(tcp_cb->DBSRsock, F_GETFL, NULL)) < 0) {
    syslog(LOG_ERR, "MDS:MDTM:TCP Unable to set the F_GETFL O_NONBLOCK Flag on DBSRsock err :%s", strerror(errno));
    @@ -268,6 +268,7 @@ uint32_t mds_mdtm_init_tcp(NODE_ID nodei
    close(tcp_cb->DBSRsock);
    return NCSCC_RC_FAILURE;
    }
    +#endif

        /* Code for Tmr Mailbox Creation used for Tmr Msg Retrival */
    
     
    • A V Mahesh (AVM)

      If we are blocking mds_mdtm_queue_add_unsent_msg() functionality is may not be required we have to clean up the code.

      I am also thinking of giving Configurable option , and erro handling also need to be changed, that is why planing to do it is enhancement

       
      • Hans Feldt

        Hans Feldt - 2014-10-08

        If you block the caller in mds_mdtm_queue_add_unsent_msg() you have basically implemented a blocking send which is pointless. Just don't set the socket non-blocking will have the same effect.

        MDS send can fail, it is OK to return FAILURE but it is also OK to block in the send call. We do it in the TIPC case. In my opinion the send queue should be removed. No configurability is needed.

         
        • A V Mahesh (AVM)

          If you block the caller in mds_mdtm_queue_add_unsent_msg() you have basically
          implemented a blocking send which is pointless.

          No, after non-blocking send() fail , the message is added in unsent queue
          . Before sending a new message, the unsent queue is checked and if present, the new message will be added to the queue. When the pollout comes on the fd(write fd), the messages in the queue are processed and sent.

          so we are postponing the send().

           
          • Hans Feldt

            Hans Feldt - 2014-10-08

            Gaining what?
            If we agree non-blocking is bad then the queue should not be used.

             
  • Adrian Szwej

    Adrian Szwej - 2014-10-07

    With the patch above I can pass the 45 containers up to 67 containers before next problem appear.

     
  • A V Mahesh (AVM)

    • status: assigned --> review
     
  • A V Mahesh (AVM)

    • status: review --> fixed
    • Version: 4.4 --> 4.5
     
  • A V Mahesh (AVM)

    Following are done to make MDS TCP transport sockets as BLOCKIN,which is similar to
    TIPC transport :

    -Removed Non-blocking socket option on sockets (TCP transport) in MDS library
    -Removed UNSENT_MSGS & it queue related code & logic

    changeset: 6048:eb4147a21b4f
    parent: 6045:fa8f3226802f
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 21 11:40:15 2014 +0530
    summary: mds: remove O_NONBLOCK option on MDS TCP transport sockets [#607]

    changeset: 6046:c312cb12ee04
    branch: opensaf-4.5.x
    parent: 6042:a7766e9cb310
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 21 11:38:27 2014 +0530
    summary: mds: remove O_NONBLOCK option on MDS TCP transport sockets [#607]

    Following are done to make DTM sockets as BLOCKIN :

    -Removed Non-blocking socket option on sockets (TCP transport) in DTM

    changeset: 6049:c89dd077038f
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 21 11:40:26 2014 +0530
    summary: dtm: remove O_NONBLOCK option on DTM TCP transport sockets [#607]

    changeset: 6047:5bf70b96105f
    branch: opensaf-4.5.x
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 21 11:38:56 2014 +0530
    summary: dtm: remove O_NONBLOCK option on DTM TCP transport sockets [#607]

     

    Related

    Tickets: #607

  • Nagendra Kumar

    Nagendra Kumar - 2015-07-14
    • Milestone: future --> 4.5.0
     

Log in to post a comment.

MongoDB Logo MongoDB