Menu

#1 Deadlock when context is destroyed while sending a Pdu

open-accepted
None
5
2018-12-04
2009-03-17
No

Calling AbstractSnmpContext.destroy() while a Transmitter is sending a Pdu can cause a deadlock:

Thread 1:

Name: 192.168.70.164_161_null_v0_Trans0
State: BLOCKED on uk.co.westhawk.snmp.stack.SnmpContext@259478 owned by: SnmpCollector
Total blocked: 1 Total waited: 0

Stack trace:
uk.co.westhawk.snmp.stack.AbstractSnmpContext.sendPacket(AbstractSnmpContext.java:612)
uk.co.westhawk.snmp.stack.Pdu.sendme(Pdu.java:256)
uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:681)
uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:669)
uk.co.westhawk.snmp.stack.Transmitter.run(Transmitter.java:80)
- locked uk.co.westhawk.snmp.stack.Transmitter@ff9a49
java.lang.Thread.run(Unknown Source)

Thread 2:

Name: SnmpCollector
State: BLOCKED on uk.co.westhawk.snmp.stack.Transmitter@ff9a49 owned by: 192.168.70.164_161_null_v0_Trans0
Total blocked: 25.307 Total waited: 24.732

Stack trace:
uk.co.westhawk.snmp.stack.Transmitter.stand(Transmitter.java:144)
uk.co.westhawk.snmp.stack.Transmitter.destroy(Transmitter.java:164)
uk.co.westhawk.snmp.stack.AbstractSnmpContext.freeTransmitters(AbstractSnmpContext.java:922)
uk.co.westhawk.snmp.stack.AbstractSnmpContext.destroy(AbstractSnmpContext.java:457)
- locked uk.co.westhawk.snmp.stack.SnmpContext@259478

Related

Bugs: #1

Discussion

  • Birgit Arkesteijn

    • assigned_to: nobody --> birgita
     
  • Birgit Arkesteijn

    • status: open --> open-accepted
     
  • Zur13

    Zur13 - 2018-12-04

    If someone interested in the fix I have made one for myself (sorry I can't make the proper patch) and I'm still testing it.

    There are changes in 2 methods in class:

    uk.co.westhawk.snmp.stack.Transmitter

     public void run()
        {
            while (me != null)
            {
                Pdu cachedPdu = sit();
    
                if (cachedPdu != null)
                {
                    cachedPdu.transmit();
                    synchronized (this) 
                    {
                        if( pdu == cachedPdu ) 
                        {
                            // I will say this only once....
                            pdu = null; 
                        }
                    }
                }
            }
        }
    
        synchronized Pdu sit()
        {
            while ((me != null) && (pdu == null))
            {
                try
                {
                    wait();
                }
                catch (InterruptedException iw)
                {
                    ;
                }
    
            }
            return pdu;
        }
    
     

    Last edit: Zur13 2018-12-04
    • Tim Panton

      Tim Panton - 2018-12-04

      Thanks for this, if you could supply a diff, that would be great.

      Tim.

      On 4 Dec 2018, at 17:10, Zur13 zur13@users.sourceforge.net wrote:

      If someone interested in the fix I have made one for myself (sorry I can't make the proper patch) and I'm still testing it.

      There are changes in 2 methods in class:

      uk.co.westhawk.snmp.stack.Transmitter

      public void run()
      {
      while (me != null)
      {
      Pdu cachedPdu = sit();

              if (cachedPdu != null)
              {
                  cachedPdu.transmit();
                  synchronized (this) 
                  {
                      if( pdu == cachedPdu ) 
                      {
                          // I will say this only once....
                          pdu = null; 
                      }
                  }
              }
          }
      
      synchronized Pdu sit()
      {
          while ((me != null) && (pdu == null))
          {
              try
              {
                  wait();
              }
              catch (InterruptedException iw)
              {
                  ;
              }
      
          }
          return pdu;
      }
      

      [bugs:#1] https://sourceforge.net/p/westhawksnmp/bugs/1/ Deadlock when context is destroyed while sending a Pdu

      Status: open-accepted
      Group:
      Created: Tue Mar 17, 2009 09:02 AM UTC by Christoph Hohmann
      Last Updated: Mon Oct 19, 2009 03:15 PM UTC
      Owner: Birgit Arkesteijn

      Calling AbstractSnmpContext.destroy() while a Transmitter is sending a Pdu can cause a deadlock:

      Thread 1:

      Name: 192.168.70.164_161_null_v0_Trans0
      State: BLOCKED on uk.co.westhawk.snmp.stack.SnmpContext@259478 owned by: SnmpCollector
      Total blocked: 1 Total waited: 0

      Stack trace:
      uk.co.westhawk.snmp.stack.AbstractSnmpContext.sendPacket(AbstractSnmpContext.java:612)
      uk.co.westhawk.snmp.stack.Pdu.sendme(Pdu.java:256)
      uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:681)
      uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:669)
      uk.co.westhawk.snmp.stack.Transmitter.run(Transmitter.java:80)
      - locked uk.co.westhawk.snmp.stack.Transmitter@ff9a49
      java.lang.Thread.run(Unknown Source)

      Thread 2:

      Name: SnmpCollector
      State: BLOCKED on uk.co.westhawk.snmp.stack.Transmitter@ff9a49 owned by: 192.168.70.164_161_null_v0_Trans0
      Total blocked: 25.307 Total waited: 24.732

      Stack trace:
      uk.co.westhawk.snmp.stack.Transmitter.stand(Transmitter.java:144)
      uk.co.westhawk.snmp.stack.Transmitter.destroy(Transmitter.java:164)
      uk.co.westhawk.snmp.stack.AbstractSnmpContext.freeTransmitters(AbstractSnmpContext.java:922)
      uk.co.westhawk.snmp.stack.AbstractSnmpContext.destroy(AbstractSnmpContext.java:457)
      - locked uk.co.westhawk.snmp.stack.SnmpContext@259478

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/westhawksnmp/bugs/1/ https://sourceforge.net/p/westhawksnmp/bugs/1/
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #1

      • Zur13

        Zur13 - 2018-12-04

        I'm still running the tests but while this fixes the deadlock there is the hidden issue with the lib architecture in there related to this deadlock. This deadlock happens in the case when the context was closed before the PDU was sent so if the deadlock will be fixed there will still be the issue: in some cases some PDUs will not be sent to devices due to race condition because the required context is destroyed before the actual data send happens.

         

        Last edit: Zur13 2018-12-04
        • Tim Panton

          Tim Panton - 2018-12-04

          I suppose you could delay the destruction of the context until all pending PDUs are sent.
          That however might leave you with a problem on messages that expect an ack - you could be
          waiting a while for destroy to happen.

          You could have a destroyWhenIdle() type method - which fires a callback when complete

          • or set a ’closing’ flag and accept no new requests...

          T.

          On 4 Dec 2018, at 18:25, Zur13 zur13@users.sourceforge.net wrote:

          I'm still running the tests but while this fixes the deadlock there is the hidden issue with the lib architecture in there related to this deadlock. This deadlock happens in the case when the context was closed before the PDU was sent so if the deadlock will be fixed there will still be the issue: in some cases some PDUs will not be sent to devices due to race condition where the required context is destroyed before the actual data send happens.

          [bugs:#1] https://sourceforge.net/p/westhawksnmp/bugs/1/ Deadlock when context is destroyed while sending a Pdu

          Status: open-accepted
          Group:
          Created: Tue Mar 17, 2009 09:02 AM UTC by Christoph Hohmann
          Last Updated: Tue Dec 04, 2018 04:10 PM UTC
          Owner: Birgit Arkesteijn

          Calling AbstractSnmpContext.destroy() while a Transmitter is sending a Pdu can cause a deadlock:

          Thread 1:

          Name: 192.168.70.164_161_null_v0_Trans0
          State: BLOCKED on uk.co.westhawk.snmp.stack.SnmpContext@259478 owned by: SnmpCollector
          Total blocked: 1 Total waited: 0

          Stack trace:
          uk.co.westhawk.snmp.stack.AbstractSnmpContext.sendPacket(AbstractSnmpContext.java:612)
          uk.co.westhawk.snmp.stack.Pdu.sendme(Pdu.java:256)
          uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:681)
          uk.co.westhawk.snmp.stack.Pdu.transmit(Pdu.java:669)
          uk.co.westhawk.snmp.stack.Transmitter.run(Transmitter.java:80)
          - locked uk.co.westhawk.snmp.stack.Transmitter@ff9a49
          java.lang.Thread.run(Unknown Source)

          Thread 2:

          Name: SnmpCollector
          State: BLOCKED on uk.co.westhawk.snmp.stack.Transmitter@ff9a49 owned by: 192.168.70.164_161_null_v0_Trans0
          Total blocked: 25.307 Total waited: 24.732

          Stack trace:
          uk.co.westhawk.snmp.stack.Transmitter.stand(Transmitter.java:144)
          uk.co.westhawk.snmp.stack.Transmitter.destroy(Transmitter.java:164)
          uk.co.westhawk.snmp.stack.AbstractSnmpContext.freeTransmitters(AbstractSnmpContext.java:922)
          uk.co.westhawk.snmp.stack.AbstractSnmpContext.destroy(AbstractSnmpContext.java:457)
          - locked uk.co.westhawk.snmp.stack.SnmpContext@259478

          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/westhawksnmp/bugs/1/ https://sourceforge.net/p/westhawksnmp/bugs/1/
          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions/

           

          Related

          Bugs: #1

  • Zur13

    Zur13 - 2018-12-04

    Ok, here is possible patch but I did't test it at all. Transmitter is changed so it calls pdu.transmit() outside of synchronize block so the transmit sequence don't capture the Transmitter instance monitor. The AbstractSnmpContext.destroy() is changed so it uses new flag which prevents new PDUs to be added tothe context after the destroy is called and awaits for known PDUs to get answers.

     

Log in to post a comment.

MongoDB Logo MongoDB