When S7ReadArea returns with a timeout error on waiting for the response from the PLC, it returns WSAETIMEDOUT in the lower word of the return code. In the meanwhile, the PLC sends its answer (or it arrives retarded).
The next call to S7ReadArea will most likely return with no error. BUT it will return the data from the previous request. If the second call reads from another location, this will most probably create desaster.
You can easily reproduce this behavior by modifying the server side and putting a Sleep(3050) before returning the answer. Make this happen after some random time and the problem should show up.
1) This inserts the sequence number check in the iso/rfc1006 part of the stack, but the sequence number is on the S7 layer. This may work for this lib as it only exchanges S7 packets (as far as I found), but conceptionally the fix is misplaced, IMHO.
2) Should by some reason, the first call to S7ReadArea read only part of the reponse and then for example time out, the code proposed will probably not be able to recover, as it tries to read complete packets and throw them away if the sequence number doesn't match, but here it will find no header or a header by hazzard, which would be even worse.
3) After a quick read of rfc1006 I came to the conclusion, that they expect the protocol to run on a reliable network layer - (class 0 in terms of ISO 8073), and that no procedures are included to reliably recover from an error on the underlying transport layer (TCP). There is even no means on the rfc1006 level to check if a response is to corresponding to the request just send, as they assume, I think, that packet reordering etc. is handled by the lower level.
4) This lead me to the conclusion, that the only right way to recover from TCP errors is to close the connection and reopen a new one. So I check the return code of the S7* functions and look for any bit set in the lower word (as this contains an TCP error). For me I decided to even treat errors on the iso level alike and reopen the connection. So my check is effectivly (RETURNCODE & 0x000FFFFF) != 0 then reopen.
I propose the following changes to the lib:
a) On any TCP error (or all but those a can't do any harm), put the whole connection in an unrecoverable error state to effectively prevent calling any data exchange and probably mixing up responses. Maybe you should think about the possible iso Errors, if they should be added, too.
b) To be a correct handler of S7 packets, the sequence number on S7 side should probably be checked, even if they most probably won't be wrong any more. But being a good citizen isn't evil. This would require an s7ExchangeBuffers on the micro client level, IMHO, to handle this on the right level.
David, how do you think about this?
-- Joja
PS: I am wondering that there are not far more complaints about mixing up response data.... seems that everybody has very reliable network connections.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
first of all, big thanks for creating this easy to use library!
I stumpled over a behavior likely already reported in https://sourceforge.net/p/snap7/discussion/bugfix/thread/de352e4cc2/ or https://sourceforge.net/p/snap7/discussion/bugfix/thread/927e97c11c/
When S7ReadArea returns with a timeout error on waiting for the response from the PLC, it returns WSAETIMEDOUT in the lower word of the return code. In the meanwhile, the PLC sends its answer (or it arrives retarded).
The next call to S7ReadArea will most likely return with no error. BUT it will return the data from the previous request. If the second call reads from another location, this will most probably create desaster.
You can easily reproduce this behavior by modifying the server side and putting a Sleep(3050) before returning the answer. Make this happen after some random time and the problem should show up.
I initially thought to fix this with the proposal in https://sourceforge.net/p/snap7/discussion/bugfix/thread/de352e4cc2/ . But I have some concerns with this:
1) This inserts the sequence number check in the iso/rfc1006 part of the stack, but the sequence number is on the S7 layer. This may work for this lib as it only exchanges S7 packets (as far as I found), but conceptionally the fix is misplaced, IMHO.
2) Should by some reason, the first call to S7ReadArea read only part of the reponse and then for example time out, the code proposed will probably not be able to recover, as it tries to read complete packets and throw them away if the sequence number doesn't match, but here it will find no header or a header by hazzard, which would be even worse.
3) After a quick read of rfc1006 I came to the conclusion, that they expect the protocol to run on a reliable network layer - (class 0 in terms of ISO 8073), and that no procedures are included to reliably recover from an error on the underlying transport layer (TCP). There is even no means on the rfc1006 level to check if a response is to corresponding to the request just send, as they assume, I think, that packet reordering etc. is handled by the lower level.
4) This lead me to the conclusion, that the only right way to recover from TCP errors is to close the connection and reopen a new one. So I check the return code of the S7* functions and look for any bit set in the lower word (as this contains an TCP error). For me I decided to even treat errors on the iso level alike and reopen the connection. So my check is effectivly (RETURNCODE & 0x000FFFFF) != 0 then reopen.
I propose the following changes to the lib:
a) On any TCP error (or all but those a can't do any harm), put the whole connection in an unrecoverable error state to effectively prevent calling any data exchange and probably mixing up responses. Maybe you should think about the possible iso Errors, if they should be added, too.
b) To be a correct handler of S7 packets, the sequence number on S7 side should probably be checked, even if they most probably won't be wrong any more. But being a good citizen isn't evil. This would require an s7ExchangeBuffers on the micro client level, IMHO, to handle this on the right level.
David, how do you think about this?
-- Joja
PS: I am wondering that there are not far more complaints about mixing up response data.... seems that everybody has very reliable network connections.