IMHO, the use of ExtendedMinLowerLayerProtocol should be used as default in all contexts of HAPI besides in the context of 'HL7 over HTTP' where the encoding is read from the Http fields.
The ExtendedMinLLPReader probably needs to reuse more code from MinLLPReader; there's too much code duplication...
Is all this BOM byte / Little-Endian magic in ExtendedMinLLPReader necessary?
AFAIK Java streams cannot handle little-endian data anyway without further effort (there are implementations like http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/LittleEndianDataInputStream.html); and there is no way to dynamically detect if an incoming stream is little-endian or big-endian except when coming with a BOM header. And I haven't seen BOM-prefixed HL7 MLLP streams so far, and if something like this occurs I would consider it a MLLP protocol error (wrong start byte).
Any opinions?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BTW I refactored the MLLP readers and writers on my machine some time ago, so that they only differ in the way how they obtain their encoding (dynamically vs statically), but this excludes the BOM/endian issue mentioned above.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll take credit for tacking the endian checking into the new LLP. I'm not sure what led me to believe that was needed, other than perhaps my tendency to over-engineer things for no good reason. :)
I'm fine with that being refactored out. Christian, if you have a simpler refactor of that code I'd say roll with it! I recently committed a fix for ExtendedMinLlpWriter to prevent it from flushing the buffer between the start block and the message payload, so hopefully that doesn't conflict with your work.
In terms of the actual request in this ticket, I don't know for sure that I agree with it. The problem is that MSH-18 is very often misused and understood. Possibly the situation is better in Europe where people are used to different character encodings but at least here in NA people get it wrong all the time. The problem then is that this could potentially be a breaking change in lots of cases when systems suddenly start respecting MSH-18 where they previously did not.
Thoughts?
An alternative I might propose could be to beef up the documentation and examples to make it more obvious how to "turn on respect for MSH-18"? I suppose another alternative might be to merge the two implementations and make "MSH-18 respect" be configurable?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yeah, I haven't seen MSH-18 used a lot, even in Europe...
In fact, the only difference between the two implementations is that, once the message bytes are read, ExtendedMinLLPReader tries to override the statically configured charset with the content from MSH-18, but only if it's not empty and matches one of the charsets defined by HL7.
In this respect, ExtendedMinLLPReader is already backwards compatible to MinLLPReader, but there's some overhead to extract data from MSH-18 (PreParser stuff), which again could be skipped by disabling a "respectMSH18" flag (correspondingly for ExtendedMinLLPWriter).
I guess I will actually merge the two into one and deprecate the other, respectively.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Rest assured - MSH-18 is used a lot - maybe not in NA but in Denmark, it is. If I had a penny for each time I've integrated to a system where the only thing that did not work was the ÆØÅ (danish letters), well - lets just say I would be rich. I agree on the approach that deprecating the MinLLPReader as it is today would be the right thing to do and by default use the MinLLPReader that does respect the MSH-18.
And just for the record - people that do not use the MSH-18 correctly do not write good enough unit tests.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
deprecated ExtendedLowerLayerProtocol by using a respectMSH18 flag to the MinLowerLayerProtocol. The contained MLLP readers/writes/encoders/decoders have been refactored to share common code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The ExtendedMinLLPReader probably needs to reuse more code from MinLLPReader; there's too much code duplication...
Is all this BOM byte / Little-Endian magic in ExtendedMinLLPReader necessary?
AFAIK Java streams cannot handle little-endian data anyway without further effort (there are implementations like http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/LittleEndianDataInputStream.html); and there is no way to dynamically detect if an incoming stream is little-endian or big-endian except when coming with a BOM header. And I haven't seen BOM-prefixed HL7 MLLP streams so far, and if something like this occurs I would consider it a MLLP protocol error (wrong start byte).
Any opinions?
BTW I refactored the MLLP readers and writers on my machine some time ago, so that they only differ in the way how they obtain their encoding (dynamically vs statically), but this excludes the BOM/endian issue mentioned above.
I'll take credit for tacking the endian checking into the new LLP. I'm not sure what led me to believe that was needed, other than perhaps my tendency to over-engineer things for no good reason. :)
I'm fine with that being refactored out. Christian, if you have a simpler refactor of that code I'd say roll with it! I recently committed a fix for ExtendedMinLlpWriter to prevent it from flushing the buffer between the start block and the message payload, so hopefully that doesn't conflict with your work.
In terms of the actual request in this ticket, I don't know for sure that I agree with it. The problem is that MSH-18 is very often misused and understood. Possibly the situation is better in Europe where people are used to different character encodings but at least here in NA people get it wrong all the time. The problem then is that this could potentially be a breaking change in lots of cases when systems suddenly start respecting MSH-18 where they previously did not.
Thoughts?
An alternative I might propose could be to beef up the documentation and examples to make it more obvious how to "turn on respect for MSH-18"? I suppose another alternative might be to merge the two implementations and make "MSH-18 respect" be configurable?
Yeah, I haven't seen MSH-18 used a lot, even in Europe...
In fact, the only difference between the two implementations is that, once the message bytes are read, ExtendedMinLLPReader tries to override the statically configured charset with the content from MSH-18, but only if it's not empty and matches one of the charsets defined by HL7.
In this respect, ExtendedMinLLPReader is already backwards compatible to MinLLPReader, but there's some overhead to extract data from MSH-18 (PreParser stuff), which again could be skipped by disabling a "respectMSH18" flag (correspondingly for ExtendedMinLLPWriter).
I guess I will actually merge the two into one and deprecate the other, respectively.
Rest assured - MSH-18 is used a lot - maybe not in NA but in Denmark, it is. If I had a penny for each time I've integrated to a system where the only thing that did not work was the ÆØÅ (danish letters), well - lets just say I would be rich. I agree on the approach that deprecating the MinLLPReader as it is today would be the right thing to do and by default use the MinLLPReader that does respect the MSH-18.
And just for the record - people that do not use the MSH-18 correctly do not write good enough unit tests.
deprecated ExtendedLowerLayerProtocol by using a respectMSH18 flag to the MinLowerLayerProtocol. The contained MLLP readers/writes/encoders/decoders have been refactored to share common code.
Great news ;)