I am considering changing the network communication protocol slightly in the Gadgetron. I would like to hear if anybody has comments/concerns/suggestions.
First a bit of background. At the moment each message (which may consist of multiple components such as an AcquisitionHeader and some arrays with data) is preceeded by a message id (an unsigned integeter). This message ID is used to select an appropriate reader from a list of registered readers and this reader takes control of the network socket for a while and reads the message (all components) from the socket stream. After the reader has processed the message, the next message ID is read and so on.
While this protocol is simple, it is vulnerable to messages that are not properly formed. As an example, if the AcquisitionHeader indicates that there should be 256 samples of data, but there (due to some error/bug) is only 200, then the reader would still read 256 samples worth of data, which would shift the read frame on the the socket. While these errors are rare, they do cause some rather spectacular crashes that are impossibly to predict and also typically impossible to recover from.
There are other nasty side effects of this protocol. As an example, the base class for a reader looks like this:
Basically the underlying library choices for socket communication are exposed and if we were to switch to something else (e.g. boost::asio, zeromq, etc.), all this reader code needs to change.
I would like to propose the following change. All messages (which may have multiple parts) should be preceeded by an unsigned integer (uint64_t) which indicates the total length of the message including the message id, but not including the length integer itself. More specifically the message would look like:
And length would be equal to the number of bytes + the 16 bit message id.
The GadgetStreamController should then be modified to read the entire message into a buffer and hand it off to a Reader for deserialization. The GadgetMessagereader class would change interface to:
That still leaves some nasty bits of the ACE interface exposed in form of the ACE_Message_Block return type, but that could eventually be changed to GadgetContainerMessageBase* to allow flexible change of that underlying interface at some point, but baby steps.
The main advantage of this change would be the robustness that it would allow us to build into the GadgetMessageReaders. For example, if you get a message with an id corresponding to say an ISMRMRD Acquisition, you would be able to check, is the message at least as long as an AcquisitionHeader and if there are samples and/or trajectory arrays, does the total length of the message make sense or is this a malformed message that should either cause a graceful shutdown with appropriate messages in the log or maybe it should simply be ignored.
One possible variation is the length only correspond to the number of bytes excluding both the length indicator itself and the message id. Another potential variation is the option to include a bitfield with some basic flags although I see no real use for this at the moment.
I would like to hear thoughts/comments and we can discuss this on Friday. The change is rather small, but all clients would have to change, so the impact is huge.
Michael
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for this clear notice and explanation.
Very good to improve robustness, I certainly support it - for what that's worth.
I apologise if any of my comments expose how little I understand the system you guys have created.
Once we've installed the new Gadgetron software, how big a change is this for people who are just programming a few Gadgets? Is it simply a recompile, so we will hardly notice - no, I think I am missing something: it will rely on each invividual Gadget to fill in the "length" field correctly on each message it sends? and as you wrote, we do the checks on the received message length in gadgets - so, not a big change in each gadget. I think it might even help more disorganised programmers like me, as we can use these checks to make crystal clear to ourselves just what each message is intended to contain.
Does the ACE interface include any kind of CRC or redundancy error-check against a limited number of bad bits? Perhaps I am going too far as that kind of failure shouldn't occur internal to a PC without much more drastic consequences. clearly we don't want to do that in gadgets
Re: other header info - whenever Siemens has had spare bit-fields or FreeParams etc, they've eventually found very good uses for them! I would support quite a lot of spare space there. Is there any use for a high-precision timestamp also?
On an associated topic, can ACE support "high-priority" messages which might allow Gadgetron to get more involved with real-time feedback "navigator" work. Can this change incorporate anything like that? I guess I mean, that such a message is allowed to jump the queue or is directed to its Gadgets for high-priority processing ( typically, a separate chain of Gadgets) immediately while other chains of Gadgets are paused.
Re: the length to exclude or include length indicator and message ID (and any other header info) - would be neater/simpler programming in gadgets if the length is exclusive of those.
If it's going to happen, it has to be just one change. or chaos!
Would it be possible that the new length-and-extra-field messages have a different category or some high-value ID bit set...i.e. so the system could support the old message format too - or is that simply too likely to cause trouble.
Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Let me see if I can answer some of your questions.
First of all, this does not affect the code in the Gadgets. The proposed changes only affect the "on the wire" communication protocol between client and server and it would only be the Reader and Writer code (+ some low level stream controller code) that would change. Since people don't normally change these, most people would not notice. In fact, if you have a Gadget library, it would probably work without any changes, i.e without recompilation.
Also, you would never actually have to set this length field or deal with it in any way. It would be handled by the readers and writers.
On checksums, this would only be relevant to detect data tampering during the on the wire transfer. This could be implemented and it would not be dependent of ACE, but would take CPU time on both client and server side and all other things equal, it would reduce performance. We have discussed it a few times and decided that at the moment, it would have limited or no value and we will not pursue it.
We already have a not of user space in the header. Adding any more would be saying that we really don't know what we are doing, so we will just leave it up to the users to put their data in random fields. I think we should add fields when they are needed and we will address adding more user space in the headers if and when somebody actually fills up the space that is there.
In high priority processing, this could be achieved right now by a) creating a Gadget that would have enough capacity on its queue that it would be able to accommodate all the data in the sequence, b) create a Gadget upstream of this that would be able to pick out the navigator data, c) have the navigation data selection gadget send the navigation data directly to a navigation data processing gadget (or it could process it itself). Any Gadget can already send data to any other gadget using the controller_->find_gadget("GadgetName")->putq(mb) syntax (or something like that). So nothing new needs to be created, it is already there.
And don't worry, we will try to avoid chaos, I think we have it under control;)
Thanks for your comments,
Michael
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I am considering changing the network communication protocol slightly in the Gadgetron. I would like to hear if anybody has comments/concerns/suggestions.
First a bit of background. At the moment each message (which may consist of multiple components such as an AcquisitionHeader and some arrays with data) is preceeded by a message id (an unsigned integeter). This message ID is used to select an appropriate reader from a list of registered readers and this reader takes control of the network socket for a while and reads the message (all components) from the socket stream. After the reader has processed the message, the next message ID is read and so on.
While this protocol is simple, it is vulnerable to messages that are not properly formed. As an example, if the AcquisitionHeader indicates that there should be 256 samples of data, but there (due to some error/bug) is only 200, then the reader would still read 256 samples worth of data, which would shift the read frame on the the socket. While these errors are rare, they do cause some rather spectacular crashes that are impossibly to predict and also typically impossible to recover from.
There are other nasty side effects of this protocol. As an example, the base class for a reader looks like this:
Basically the underlying library choices for socket communication are exposed and if we were to switch to something else (e.g. boost::asio, zeromq, etc.), all this reader code needs to change.
I would like to propose the following change. All messages (which may have multiple parts) should be preceeded by an unsigned integer (uint64_t) which indicates the total length of the message including the message id, but not including the length integer itself. More specifically the message would look like:
This correspond to:
And length would be equal to the number of bytes + the 16 bit message id.
The GadgetStreamController should then be modified to read the entire message into a buffer and hand it off to a Reader for deserialization. The GadgetMessagereader class would change interface to:
That still leaves some nasty bits of the ACE interface exposed in form of the ACE_Message_Block return type, but that could eventually be changed to GadgetContainerMessageBase* to allow flexible change of that underlying interface at some point, but baby steps.
The main advantage of this change would be the robustness that it would allow us to build into the GadgetMessageReaders. For example, if you get a message with an id corresponding to say an ISMRMRD Acquisition, you would be able to check, is the message at least as long as an AcquisitionHeader and if there are samples and/or trajectory arrays, does the total length of the message make sense or is this a malformed message that should either cause a graceful shutdown with appropriate messages in the log or maybe it should simply be ignored.
One possible variation is the length only correspond to the number of bytes excluding both the length indicator itself and the message id. Another potential variation is the option to include a bitfield with some basic flags although I see no real use for this at the moment.
I would like to hear thoughts/comments and we can discuss this on Friday. The change is rather small, but all clients would have to change, so the impact is huge.
Michael
Thanks for this clear notice and explanation.
Very good to improve robustness, I certainly support it - for what that's worth.
I apologise if any of my comments expose how little I understand the system you guys have created.
Once we've installed the new Gadgetron software, how big a change is this for people who are just programming a few Gadgets? Is it simply a recompile, so we will hardly notice - no, I think I am missing something: it will rely on each invividual Gadget to fill in the "length" field correctly on each message it sends? and as you wrote, we do the checks on the received message length in gadgets - so, not a big change in each gadget. I think it might even help more disorganised programmers like me, as we can use these checks to make crystal clear to ourselves just what each message is intended to contain.
Does the ACE interface include any kind of CRC or redundancy error-check against a limited number of bad bits? Perhaps I am going too far as that kind of failure shouldn't occur internal to a PC without much more drastic consequences. clearly we don't want to do that in gadgets
Re: other header info - whenever Siemens has had spare bit-fields or FreeParams etc, they've eventually found very good uses for them! I would support quite a lot of spare space there. Is there any use for a high-precision timestamp also?
On an associated topic, can ACE support "high-priority" messages which might allow Gadgetron to get more involved with real-time feedback "navigator" work. Can this change incorporate anything like that? I guess I mean, that such a message is allowed to jump the queue or is directed to its Gadgets for high-priority processing ( typically, a separate chain of Gadgets) immediately while other chains of Gadgets are paused.
Re: the length to exclude or include length indicator and message ID (and any other header info) - would be neater/simpler programming in gadgets if the length is exclusive of those.
If it's going to happen, it has to be just one change. or chaos!
Would it be possible that the new length-and-extra-field messages have a different category or some high-value ID bit set...i.e. so the system could support the old message format too - or is that simply too likely to cause trouble.
Peter
Hi Peter,
Let me see if I can answer some of your questions.
First of all, this does not affect the code in the Gadgets. The proposed changes only affect the "on the wire" communication protocol between client and server and it would only be the Reader and Writer code (+ some low level stream controller code) that would change. Since people don't normally change these, most people would not notice. In fact, if you have a Gadget library, it would probably work without any changes, i.e without recompilation.
Also, you would never actually have to set this length field or deal with it in any way. It would be handled by the readers and writers.
On checksums, this would only be relevant to detect data tampering during the on the wire transfer. This could be implemented and it would not be dependent of ACE, but would take CPU time on both client and server side and all other things equal, it would reduce performance. We have discussed it a few times and decided that at the moment, it would have limited or no value and we will not pursue it.
We already have a not of user space in the header. Adding any more would be saying that we really don't know what we are doing, so we will just leave it up to the users to put their data in random fields. I think we should add fields when they are needed and we will address adding more user space in the headers if and when somebody actually fills up the space that is there.
In high priority processing, this could be achieved right now by a) creating a Gadget that would have enough capacity on its queue that it would be able to accommodate all the data in the sequence, b) create a Gadget upstream of this that would be able to pick out the navigator data, c) have the navigation data selection gadget send the navigation data directly to a navigation data processing gadget (or it could process it itself). Any Gadget can already send data to any other gadget using the controller_->find_gadget("GadgetName")->putq(mb) syntax (or something like that). So nothing new needs to be created, it is already there.
And don't worry, we will try to avoid chaos, I think we have it under control;)
Thanks for your comments,
Michael