From: Ian V. <Ian...@he...> - 2013-09-04 00:14:00
|
I have sent mails to the general list about this issue before, and the advice has helped me progress. Then along comes another system that has slightly different behaviour. In this particular case a system correctly escapes the HL7 delimiters EXCEPT the escape delimiter. This allows it to send field content like this (from an address): 1 \ 24 Smith \T\ Wesson Road I was hopeful that since the single escape on it's own didn't form part of an escape sequence, that it might be preserved through the parse. This is not the case. The lone backslash is consumed in the process and disappears. I don't know how valid an argument it is to say it should be preserved, but if it isn't, I can't subsequently properly escape it to send to a downstream system. Given that I had been dealing with HL7 for some time before I found HAPI, I had done some work previously on an encode / unencode routine. My own code couldn't cope with this one either. I decided it was time to be brave, and dive into the HAPI code. Somewhere there had to be encode/unecode low level routines. Up until I looked in the source, I had been creating a new ST object, and using it's parse and encode methods. Once I looked into the source I found the Escape class. This updated version of Escape does the following: Preserves escape characters that do not form part of an escape sequence Permits the exceptional escape sequence case of \X000d\ to work when the escape character has been changed to something other than \ Adds extra HEX escaped code \X0D\ and \X0A\ because we see them here occasionally. Test case code is also included at the bottom, including my now infamous "HATER" example :-). Test cases with lots of > < are there because we often do transforms between HL7 and XML, so we often look at these in additional test cases of the XML output produced. What are my chances of this being adopted? If not, how can I get my version to override the existing one? Thanks Ian ---------- /** The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is "Escape.java". Description: "Handles "escaping" and "unescaping" of text according to the HL7 escape sequence rules defined in section 2.10 of the standard (version 2.4)" The Initial Developer of the Original Code is University Health Network. Copyright (C) 2001. All Rights Reserved. Contributor(s): Mark Lee (Skeva Technologies); Elmar Hinz Alternatively, the contents of this file may be used under the terms of the GNU General Public License (the ?GPL?), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the GPL and not to allow others to use your version of this file under the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL License. If you do not delete the provisions above, a recipient may use your version of this file under either the MPL or the GPL. */ package ca.uhn.hl7v2.parser; import java.util.Collections; import java.util.LinkedHashMap; import java.util.Map; /** * Handles "escaping" and "unescaping" of text according to the HL7 escape * sequence rules defined in section 2.10 of the standard (version 2.4). * Currently, escape sequences for multiple character sets are unsupported. The * highlighting and locally defined escape sequences are also * unsupported. * The only hexademical escapes supported are X000d, X0D, X0A * * @author Bryan Tripp * @author Mark Lee (Skeva Technologies) * @author Elmar Hinz * @author Christian Ohr */ public class HL7Escape { /** Creates a new instance of Escape */ public Hl7Escape() { } /** * @param text string to be escaped * @return the escaped string * <p>Defaults the escape characters to the conventional values |^~\& */ public static String escape(String text) { return escape(text,"|^~\\&"); } /** * @param text string to be escaped * @param encChars encoding characters to be used in the order * <br>Field, Component, Repetition, Escape, Sub-component * @return the escaped string */ public static String escape(String text, String encChars) { EncLookup esc = getEscapeSequences(encChars); int textLength = text.length(); StringBuilder result = new StringBuilder(textLength); for (int i = 0; i < textLength; i++) { boolean charReplaced = false; char c = text.charAt(i); FORENCCHARS: for (int j = 0; j < 6; j++) { if (text.charAt(i) == esc.characters[j]) { // Formatting escape sequences such as \.br\ should be left alone if (j == 4) { if (i+1 < textLength) { // Check for \.br\ char nextChar = text.charAt(i + 1); switch (nextChar) { case '.': case 'C': case 'M': case 'X': case 'Z': { int nextEscapeIndex = text.indexOf(esc.characters[j], i + 1); if (nextEscapeIndex > 0) { result.append(text.substring(i, nextEscapeIndex + 1)); charReplaced = true; i = nextEscapeIndex; break FORENCCHARS; } break; } case 'H': case 'N': { if (i+2 < textLength && text.charAt(i+2) == '\\') { int nextEscapeIndex = i + 2; if (nextEscapeIndex > 0) { result.append(text.substring(i, nextEscapeIndex + 1)); charReplaced = true; i = nextEscapeIndex; break FORENCCHARS; } } break; } } } } result.append(esc.encodings[j]); charReplaced = true; break; } } if (!charReplaced) { result.append(c); } } return result.toString(); } /** * @param text string to be unescaped * @return the unescaped string * <p>Defaults the escape characters to the conventional values |^~\& */ public static String unescape(String text) { return unescape(text,"|^~\\&"); } /** * @param text string to be unescaped * @param encChars encoding characters to be used in the order * <br>Field, Component, Repetition, Escape, Sub-component * @return the unescaped string */ public static String unescape(String text, String encChars) { // If the escape char isn't found, we don't need to look for escape sequences char escapeChar = encChars.charAt(3); boolean foundEscapeChar = false; for (int i = 0; i < text.length(); i++) { if (text.charAt(i) == escapeChar) { foundEscapeChar = true; break; } } if (!foundEscapeChar) { return text; } int textLength = text.length(); StringBuilder result = new StringBuilder(textLength + 20); EncLookup esc = getEscapeSequences(encChars); char escape = esc.characters[3]; int encodingsCount = esc.characters.length; int i = 0; while (i < textLength) { char c = text.charAt(i); if (c != escape) { result.append(c); i++; } else { boolean foundEncoding = false; // Test against the standard encodings for (int j = 0; j < encodingsCount; j++) { String encoding = esc.encodings[j]; int encodingLength = encoding.length(); if ((i + encodingLength <= textLength) && text.substring(i, i + encodingLength) .equals(encoding)) { result.append(esc.characters[j]); i += encodingLength; foundEncoding = true; break; } } if (!foundEncoding) { // If we haven't found this, there is one more option. Escape sequences of /.XXXXX/ are // formatting codes. They should be left intact if (i + 1 < textLength) { char nextChar = text.charAt(i + 1); switch (nextChar) { case '.': case 'C': case 'M': case 'X': case 'Z': { int closingEscape = text.indexOf(escape, i + 1); if (closingEscape > 0) { String substring = text.substring(i, closingEscape + 1); result.append(substring); i += substring.length(); } else { i++; } break; } case 'H': case 'N': { int closingEscape = text.indexOf(escape, i + 1); if (closingEscape == i + 2) { String substring = text.substring(i, closingEscape + 1); result.append(substring); i += substring.length(); } else { i++; } break; } default: { // Preserve unescaped escape delimiter result.append(c); i++; } } } else { // Preserve unescaped escape delimiter result.append(c); i++; } } } } return result.toString(); } /** * Returns a HashTable with escape sequences as keys, and corresponding * Strings as values. * @param encChars * @return */ private static EncLookup getEscapeSequences(String encChars) { EncLookup escapeSequences = new EncLookup(encChars); return escapeSequences; } /** * A performance-optimized replacement for using when * mapping from HL7 special characters to their respective * encodings * * @author Christian Ohr */ private static class EncLookup { char[] characters = new char[8]; String[] encodings = new String[8]; EncLookup(String ec) { characters[0] = ec.charAt(0); characters[1] = ec.charAt(1); characters[2] = ec.charAt(2); characters[3] = ec.charAt(3); characters[4] = ec.charAt(4); characters[5] = '\r'; characters[6] = '\r'; characters[7] = '\n'; char escapeChar = ec.charAt(3); char[] codes = {'F', 'S', 'R', 'E', 'T'}; for (int i = 0; i < codes.length; i++) { StringBuilder seq = new StringBuilder(); seq.append(escapeChar); seq.append(codes[i]); seq.append(escapeChar); encodings[i] = seq.toString(); } // encodings[5] = "\\X000d\\ ( file://\X000d\ )"; encodings[5] = escapeChar + "X000d" + escapeChar; encodings[6] = escapeChar + "X0D" + escapeChar; encodings[7] = escapeChar + "X0A" + escapeChar; } } } ----- Test case: /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package ca.uhn.hl7v2.parser; import org.junit.After; import org.junit.AfterClass; import org.junit.Before; import org.junit.BeforeClass; import org.junit.Test; import static org.junit.Assert.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * * @author vowlesi */ public class SingleBackslashV3Test { private static final Logger log = LoggerFactory.getLogger(EscapeV2Test.class); private String encChars = "|^~\\&"; public SingleBackslashV3Test() { } @BeforeClass public static void setUpClass() { } @AfterClass public static void tearDownClass() { } @Before public void setUp() { } @After public void tearDown() { } /** * Test of unescape method, of class Escape. */ @Test public void testUnescapeSingleBackslash() { log.debug("unescape with single backslash"); String text = "1 \\ 24 Smith \\T\\ ( file://\T\ ) Wesson Road"; String expResult = "1 \\ 24 Smith & Wesson Road"; String result = Hl7Escape.unescape(text); log.debug("Input : " + text); log.debug("Result : " + result); log.debug("Expected : " + expResult); assertEquals(expResult, result); text = "\"\\E\\''\\F\\\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test '\\XFFFFFFFFFFFFFFFFFFFF\\'"; expResult = "\"\\\\H\\A&E~\\N\\<<^>>\"\\''|Special test '\\XFFFFFFFFFFFFFFFFFFFF\\'"; result = Hl7Escape.unescape(text); log.debug("Input : " + text); log.debug("Result : " + result); log.debug("Expected : " + expResult); assertEquals(expResult, result); text = "\"\\E\\''\\F\\\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test '\\X000d\\'"; expResult = "\"\\\\H\\A&E~\\N\\<<^>>\"\\''|Special test '\r\'"; result = Hl7Escape.unescape(text); log.debug("Input : " + text); log.debug("Result : " + result); log.debug("Expected : " + expResult); assertEquals(expResult, result); text = "\\\\\\\\\\\\\\\\\\\\"; expResult = "\\\\\\\\\\\\\\\\\\\\"; result = Hl7Escape.unescape(text); log.debug("Input : " + text); log.debug("Result : " + result); log.debug("Expected : " + expResult); assertEquals(expResult, result); text = "Ken\\n\\F\\edy"; expResult = "Ken\\E\\n\\F\\edy"; result = Hl7Escape.unescape(text); result = Hl7Escape.escape(result); log.debug("Input : " + text); log.debug("Result : " + result); log.debug("Expected : " + expResult); assertEquals(expResult, result); } } ******************************************************************************** This email, including any attachments sent with it, is confidential and for the sole use of the intended recipient(s). This confidentiality is not waived or lost, if you receive it and you are not the intended recipient(s), or if it is transmitted/received in error. Any unauthorised use, alteration, disclosure, distribution or review of this email is strictly prohibited. The information contained in this email, including any attachment sent with it, may be subject to a statutory duty of confidentiality if it relates to health service matters. If you are not the intended recipient(s), or if you have received this email in error, you are asked to immediately notify the sender by telephone collect on Australia +61 1800 198 175 or by return email. You should also delete this email, and any copies, from your computer system network and destroy any hard copies produced. If not an intended recipient of this email, you must not copy, distribute or take any action(s) that relies on it; any form of disclosure, modification, distribution and/or publication of this email is also prohibited. Although Queensland Health takes all reasonable steps to ensure this email does not contain malicious software, Queensland Health does not accept responsibility for the consequences if any person's computer inadvertently suffers any disruption to services, loss of information, harm or is infected with a virus, other malicious computer programme or code that may occur as a consequence of receiving this email. Unless stated otherwise, this email represents only the views of the sender and not the views of the Queensland Government. ********************************************************************************** |