Menu

how to recover from intermittent TCP timeout exceptions

2017-02-27
2022-09-20
  • Tyson Colby

    Tyson Colby - 2017-02-27

    I am using the python wrapper. Please see a descrption of my challenge below. I did not want to post this in the bug section in case I was missing something on the implementation. Any feedback would be appreciated. Thanks, Tyson

    https://github.com/gijzelaerr/python-snap7/issues/70
    What is the proper way to recover from a snap7 exception ISO : An error occurred during recv TCP : Connection timed out?

    If I place a db_read() into a try/except clause to keep the application from aborting on TCP timeout, subsequent db_read() become out of sync. It seems somewhere the calls are queued. Probably in the native snap7 thread.

    For example: If I have three locations I am reading from a PLC
    3,260,4 contains value 100
    3,272,4 contains value 200
    1,104,4 contains value 300

    -the first db_read(3,260,4) call successfully returns a value 100
    -second db_read(3,272,4) call times out with TCP timeout exception
    -third db_read(1,104,4) call will return value 200 (Not 300 like you would expect)
    ..
    -a fourth call of db_read(3,260,4) will return the value from the third call above (300)

    It seems the method calls get spooled into a queue and for every call that times out, the data returned will be delayed that many calls in the future.

    Is there a way to throw away the buffer on a tcp timeout? Or is there some other philosophical approach I should be taking? It seems very dangerous to call a function with specific arguments and get the wrong data back. Thanks for any suggestions.

     
  • Davide Nardella

    Davide Nardella - 2017-02-28

    Hi Tyson,
    Unluckily I don't know Python at all, but but I can make some considerations.

    1. There are no exceptions raised by Snap7. Snap7 is a binary library whose functions always return an error code, 0 : OK, !=0 : something went wrong.

    2) You are using synchronous functions : i.e. a function exits only when it finishes its task, there are no queues and there is no memory across the functions. Here you can find a better explanation of this : http://snap7.sourceforge.net/snap7_client.html#asyncdatatransfer

    That said, I don't understand the need of checking an error code, then raising an exception and finally do trap it into the user code. Your try/except serves to shield check_error() function and not to shield the db_read() function.

    I don't know what happens in Python when an exception is raised, the stack status and so on..

    As said, there are no queues, are you sharing a client across different threads ? This is not allowed, a transaction must be atomic.
    And, don't insert delays between the instructions, they are not needed.

    Let me know.

    Davide

     
  • Davide Nardella

    Davide Nardella - 2017-02-28

    Did you check the execution time of data transfer ? How many ms are needed for a transaction ? In a clean closed lan they should not exceed 5 (internal CP) or 12 ms (External CP).

    Are you sure you are not exceeding the number of S7 connection resources allowed ?

     
  • Tyson Colby

    Tyson Colby - 2017-02-28

    Hi, Davide,
    Thank you for your reply. You have given me some suggestions that I will dig into.
    I originally was sharing the client across different threads, however, I was using a lock to restrict access to the client. The threads were more of a convenience for polling certain tags at different rates. I did test in a single thread and the behavior is the same, so I know my locking was effective.
    I am not deffinatly not exceeding the number of S7 connection resources.
    I will report back on what I find out. Thank you for the direction.
    Regards,
    Tyson

     
  • Tyson Colby

    Tyson Colby - 2017-02-28

    Hi, Davide,
    I dug around in the python wrapper and could not find the root of the problem. I quickly hacked (really, really dirty) your working client example to see if I could reporduce the problem and eliminate the python code.

    I picked 3 locations in the PLC (all three are real's). I sequentially read them in a loop. While the loop is running, I interrupt the network connection and then restore shortly after. You can see where the network timeouts are when the status returned is 655470. You can also see in the output that the returned values are shifted as I described in the original post after the network connection is restored.

    Here is main:

    int main(int argc, char* argv[])
    {
    // // Get Progran args (we need the client address and optionally Rack and Slot)  
    //     if (argc!=2 && argc!=4)
    //     {
    //         Usage();
    //         return 1;
    //     }
    //     Address=argv[1];
    //     if (argc==4)
    //     {
    //         Rack=atoi(argv[2]);
    //         Slot=atoi(argv[3]);
    //     }
    
    // Client Creation
        Client= new TS7Client();
        int size;
        int status;
        // Client->SetAsCallback(CliCompletion,NULL);
        union floatchar
        {
            unsigned char c[4];
            float f;
        };
        union floatchar asFloat;
    // Connection
        if (CliConnect())
        {
            while(1)
            {
                printf("DBRead(3,272,4)\n");
                status = Client->DBRead(3,272,4,&Buffer);
                printf("status: %d\n", status);
                if(!status)
                {
                    printf("buffer: ");
                    for(int i=0; i<4; i++)
                    {
                        printf("%x ", (unsigned char)Buffer[i]);
                        asFloat.c[3-i] = Buffer[i];
                    }
                    printf("\nfloat: %f\n\n", asFloat.f);
                }
    
                printf("DBRead(1,104,4)\n");
                status = Client->DBRead(1,104,4,&Buffer);
                printf("status: %d\n", status);
                if(!status)
                {
                    printf("buffer: ");
                    for(int i=0; i<4; i++)
                    {
                        printf("%x ", (unsigned char)Buffer[i]);
                        asFloat.c[3-i] = Buffer[i];
                    }
                    printf("\nfloat: %f\n\n", asFloat.f);
                }
    
                printf("DBRead(3,260,4)\n");
                status = Client->DBRead(3,260,4,&Buffer);
                printf("status: %d\n", status);
                if(!status)
                {
                    printf("buffer: ");
                    for(int i=0; i<4; i++)
                    {
                        printf("%x ", (unsigned char)Buffer[i]);
                        asFloat.c[3-i] = Buffer[i];
                    }
                    printf("\nfloat: %f\n\n", asFloat.f);
                }
    
            printf("---------------------------------\n");
            SysSleep(1000);
            }
            Client->Disconnect();
            //PerformTests();
            //CliDisconnect();
        };
    
    // // Deletion
    //     delete Client;
    //     Summary();
        // Cli_Destroy(&client);
    
        return 0;
    }
    

    Here is the output:

    +-----------------------------------------------------
    | UNIT Connection
    +-----------------------------------------------------
    | Result         : OK
    | Execution time : 144 ms
    +-----------------------------------------------------
      Connected to   : 10.2.60.10 (Rack=0, Slot=1)
      PDU Requested  : 480 bytes
      PDU Negotiated : 480 bytes
    DBRead(3,272,4)
    status: 0
    buffer: 42 6d 83 12 
    float: 59.377998
    
    DBRead(1,104,4)
    status: 0
    buffer: 45 ac cc cd 
    float: 5529.600098
    
    DBRead(3,260,4)
    status: 0
    buffer: 3e 4c cc cd 
    float: 0.200000
    
    ---------------------------------
    DBRead(3,272,4)
    status: 0
    buffer: 42 6d 83 12 
    float: 59.377998
    
    DBRead(1,104,4)
    status: 0
    buffer: 45 ac cc cd 
    float: 5529.600098
    
    DBRead(3,260,4)
    status: 0
    buffer: 3e 4c cc cd 
    float: 0.200000
    
    ---------------------------------
    DBRead(3,272,4)
    status: 655470
    DBRead(1,104,4)
    status: 655470
    DBRead(3,260,4)
    status: 0
    buffer: 42 6d 83 12 
    float: 59.377998
    
    ---------------------------------
    DBRead(3,272,4)
    status: 0
    buffer: 45 ac cc cd 
    float: 5529.600098
    
    DBRead(1,104,4)
    status: 0
    buffer: 3e 4c cc cd 
    float: 0.200000
    
    DBRead(3,260,4)
    status: 0
    buffer: 42 6d 83 12 
    float: 59.377998
    
    ---------------------------------
    ^C
    tyson@salmonslayer:~/dev/snap7/mysnap7app/c$
    
     
  • Davide Nardella

    Davide Nardella - 2017-03-01

    Hi Tyson,
    I see only one connection at the beginning of the main.
    Every time a TCP (severe) error occurs you must re-establish the connection because it's no more valid.
    You cannot recover such errors.
    I tested your software but I receive "connection reset by peer" as error, and from this point there are no more data transfers.

    I often use two (similar) usefull functions in my threads:

    void CheckConnection()
    {
        while(!ChannelStatus)
        {
            ChannelStatus=CliConnect();
            if (!ChannelStatus)
                SysSleep(1000);         
        }
    }
    
    bool TcpError(int Error)
    {
        return (Error & 0x0000FFFF)!=0;
    }
    

    WinCC and some OPC servers have the same approach.

    I adapted your code, check it and let me know.
    I changed IP address and some offsets in DB1 and DB3 to meet my PLC conf.

    Regards
    Davide

     
  • Tyson Colby

    Tyson Colby - 2017-03-01

    That works.
    I will adapt this philosphy to my program.

    I really appreciate the support, Davide. Take care.

    Best,
    Tyson

     
  • Ben

    Ben - 2022-09-20

    Hi Gents,
    it would be appreciated to guide me into the solution in python to overcome the issue with 'tcp timeout' while reading DB,therefore stopping the code and you have to relaunch again. I know you answered this question in C but if you guys have a sample of recovery from this exception in python . PLease share

     

    Last edit: Ben 2022-09-20

Log in to post a comment.