From: Ian V. <Ian...@he...> - 2014-01-16 22:00:51
|
Some time ago I noticed a problem with some messages we were dealing with which sent fields with single escape characters in them. HAPI consumes them during parsing, and I found the code that did it and suggested a change to allow them to be left alone. I believe the standard is unclear in it's definition of the action to perform in the case of a single escape character being present, where it doesn't form part of a valid escape sequence. A copy of the suggested change to Escape object made at the time (code may have changed slightly with the updated versions of HAPI, suggestion was made some time ago). In this object there is also reference to the hexadecimal escape not being supported, but the code does include handing of \X000d\, and could easily be extended to cover the most common hexadecimal escapes we have seen being \X0D\ and \X0A\. Hope this helps Code Follows Ian /** The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is "Escape.java". Description: "Handles "escaping" and "unescaping" of text according to the HL7 escape sequence rules defined in section 2.10 of the standard (version 2.4)" The Initial Developer of the Original Code is University Health Network. Copyright (C) 2001. All Rights Reserved. Contributor(s): Mark Lee (Skeva Technologies); Elmar Hinz Alternatively, the contents of this file may be used under the terms of the GNU General Public License (the ?GPL?), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the GPL and not to allow others to use your version of this file under the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL License. If you do not delete the provisions above, a recipient may use your version of this file under either the MPL or the GPL. */ package ca.uhn.hl7v2.parser; import java.util.Collections; import java.util.LinkedHashMap; import java.util.Map; /** * Handles "escaping" and "unescaping" of text according to the HL7 escape * sequence rules defined in section 2.10 of the standard (version 2.4). * Currently, escape sequences for multiple character sets are unsupported. The * highlighting, hexademical, and locally defined escape sequences are also * unsupported. * * @author Bryan Tripp * @author Mark Lee (Skeva Technologies) * @author Elmar Hinz * @author Christian Ohr */ public class EscapeV2 { /** * limits the size of variousEncChars to 1000, can be overridden by system property. */ private static Map<EncodingCharacters, EncLookup> variousEncChars = Collections.synchronizedMap(new LinkedHashMap<EncodingCharacters, EncLookup>(5, 0.75f, true) { private static final long serialVersionUID = 1L; final int maxSize = new Integer(System.getProperty(Escape.class.getName() + ".maxSize", "1000")); @Override protected boolean removeEldestEntry(Map.Entry<EncodingCharacters, EncLookup> eldest) { return this.size() > maxSize; } }); /** Creates a new instance of Escape */ public EscapeV2() { } /** * @param text string to be escaped * @param encChars encoding characters to be used * @return the escaped string */ public static String escape(String text, EncodingCharacters encChars) { EncLookup esc = getEscapeSequences(encChars); int textLength = text.length(); StringBuilder result = new StringBuilder(textLength); for (int i = 0; i < textLength; i++) { boolean charReplaced = false; char c = text.charAt(i); FORENCCHARS: for (int j = 0; j < 6; j++) { if (text.charAt(i) == esc.characters[j]) { // Formatting escape sequences such as \.br\ should be left alone if (j == 4) { if (i+1 < textLength) { // Check for \.br\ char nextChar = text.charAt(i + 1); switch (nextChar) { case '.': case 'C': case 'M': case 'X': case 'Z': { int nextEscapeIndex = text.indexOf(esc.characters[j], i + 1); if (nextEscapeIndex > 0) { result.append(text.substring(i, nextEscapeIndex + 1)); charReplaced = true; i = nextEscapeIndex; break FORENCCHARS; } break; } case 'H': case 'N': { if (i+2 < textLength && text.charAt(i+2) == '\\') { int nextEscapeIndex = i + 2; if (nextEscapeIndex > 0) { result.append(text.substring(i, nextEscapeIndex + 1)); charReplaced = true; i = nextEscapeIndex; break FORENCCHARS; } } break; } } } } result.append(esc.encodings[j]); charReplaced = true; break; } } if (!charReplaced) { result.append(c); } } return result.toString(); } /** * @param text string to be unescaped * @param encChars encoding characters to be used * @return the unescaped string */ public static String unescape(String text, EncodingCharacters encChars) { // If the escape char isn't found, we don't need to look for escape sequences char escapeChar = encChars.getEscapeCharacter(); boolean foundEscapeChar = false; for (int i = 0; i < text.length(); i++) { if (text.charAt(i) == escapeChar) { foundEscapeChar = true; break; } } if (!foundEscapeChar) { return text; } int textLength = text.length(); StringBuilder result = new StringBuilder(textLength + 20); EncLookup esc = getEscapeSequences(encChars); char escape = esc.characters[4]; int encodingsCount = esc.characters.length; int i = 0; while (i < textLength) { char c = text.charAt(i); if (c != escape) { result.append(c); i++; } else { boolean foundEncoding = false; // Test against the standard encodings for (int j = 0; j < encodingsCount; j++) { String encoding = esc.encodings[j]; int encodingLength = encoding.length(); if ((i + encodingLength <= textLength) && text.substring(i, i + encodingLength) .equals(encoding)) { result.append(esc.characters[j]); i += encodingLength; foundEncoding = true; break; } } if (!foundEncoding) { // If we haven't found this, there is one more option. Escape sequences of /.XXXXX/ are // formatting codes. They should be left intact if (i + 1 < textLength) { char nextChar = text.charAt(i + 1); switch (nextChar) { case '.': case 'C': case 'M': case 'X': case 'Z': { int closingEscape = text.indexOf(escape, i + 1); if (closingEscape > 0) { String substring = text.substring(i, closingEscape + 1); result.append(substring); i += substring.length(); } else { i++; } break; } case 'H': case 'N': { int closingEscape = text.indexOf(escape, i + 1); if (closingEscape == i + 2) { String substring = text.substring(i, closingEscape + 1); result.append(substring); i += substring.length(); } else { i++; } break; } default: { // Preserve unescaped escape delimiter result.append(c); i++; } } } else { // Preserve unescaped escape delimiter result.append(c); i++; } } } } return result.toString(); } /** * Returns a HashTable with escape sequences as keys, and corresponding * Strings as values. */ private static EncLookup getEscapeSequences(EncodingCharacters encChars) { EncLookup escapeSequences = variousEncChars.get(encChars); if (escapeSequences == null) { // this means we haven't got the sequences for these encoding // characters yet - let's make them escapeSequences = new EncLookup(encChars); variousEncChars.put(encChars, escapeSequences); } return escapeSequences; } /** * A performance-optimized replacement for using when * mapping from HL7 special characters to their respective * encodings * * @author Christian Ohr */ private static class EncLookup { char[] characters = new char[6]; String[] encodings = new String[6]; EncLookup(EncodingCharacters ec) { characters[0] = ec.getFieldSeparator(); characters[1] = ec.getComponentSeparator(); characters[2] = ec.getSubcomponentSeparator(); characters[3] = ec.getRepetitionSeparator(); characters[4] = ec.getEscapeCharacter(); characters[5] = '\r'; char[] codes = {'F', 'S', 'T', 'R', 'E'}; for (int i = 0; i < codes.length; i++) { StringBuilder seq = new StringBuilder(); seq.append(ec.getEscapeCharacter()); seq.append(codes[i]); seq.append(ec.getEscapeCharacter()); encodings[i] = seq.toString(); } // encodings[5] = "\\X000d\\ ( file://\X000d\ )"; encodings[5] = ec.getEscapeCharacter() + "X000d" + ec.getEscapeCharacter(); } } } Test case code /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package ca.uhn.hl7v2.parser; import org.junit.After; import org.junit.AfterClass; import org.junit.Before; import org.junit.BeforeClass; import org.junit.Test; import static org.junit.Assert.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * * @author vowlesi */ public class SingleBackslashV2Test { private static final Logger log = LoggerFactory.getLogger(EscapeV2Test.class); private EncodingCharacters encChars = EncodingCharacters.defaultInstance(); public SingleBackslashV2Test() { } @BeforeClass public static void setUpClass() { } @AfterClass public static void tearDownClass() { } @Before public void setUp() { } @After public void tearDown() { } /** * Test of unescape method, of class Escape. */ @Test public void testUnescapeSingleBackslash() { log.debug("unescape with single backslash"); String text = "1 \\ 24 Smith \\T\\ ( file://\T\ ) Wesson Road"; String expResult = "1 \\ 24 Smith & Wesson Road"; String result = EscapeV2.unescape(text, encChars); log.debug(result); log.debug(expResult); assertEquals(expResult, result); text = "\"\\E\\''\\F\\\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test '\\XFFFFFFFFFFFFFFFFFFFF\\'"; expResult = "\"\\\\H\\A&E~\\N\\<<^>>\"\\''|Special test '\\XFFFFFFFFFFFFFFFFFFFF\\'"; result = EscapeV2.unescape(text, encChars); log.debug(result); log.debug(expResult); assertEquals(expResult, result); text = "\"\\E\\''\\F\\\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test '\\X000d\\'"; expResult = "\"\\\\H\\A&E~\\N\\<<^>>\"\\''|Special test '\r\'"; result = EscapeV2.unescape(text, encChars); log.debug(result); log.debug(expResult); assertEquals(expResult, result); text = "\\\\\\\\\\\\\\\\\\\\"; expResult = "\\\\\\\\\\\\\\\\\\\\"; result = EscapeV2.unescape(text, encChars); log.debug(result); log.debug(expResult); assertEquals(expResult, result); text = "Ken\\n\\F\\edy"; expResult = "Ken\\E\\n\\F\\edy"; result = EscapeV2.unescape(text, encChars); result = EscapeV2.escape(result, encChars); log.debug(result); log.debug(expResult); assertEquals(expResult, result); } } >>> g3949 <g3...@ya...> 16/01/14 20:03 >>> sorry....I#m wrong... The Problem seems to be in the parser. Parsing the HL7 Textfile getting the "\" lost... Coee: Parser p = context.getPipeParser(); Message msg = iter.next(); try { log.info(p.encode(msg)); } catch (HL7Exception e2) { // TODO Auto-generated catch block e2.printStackTrace(); } ZPD|1|PDF|14627^20675^begin 644 pdf1.pdfx0Dx0A\M)5!$1BTQ+C,-"B7BX\E_3#0H-"C$ Still needing help... FP On Thursday, January 16, 2014 10:02 AM, g3949 <g3...@ya...> wrote: Hi, in fact...thats the situation. In my system, I get UUEncoded and plain.PDF documents within the zpd-3.3 segment. Now I have still the problem replacing the \x0D\\x0A\ because while catching the zpd-3.3 segment useing the terser, the "\" getting lost... Reading direct from file results ->> String WITH "\" 2014-01-16 09:51:17 [INFO ] (Hl7FromFile):39 - begin 644 pdf1.pdf\x0D\\x0A\M)5!$1BTQ+C,-"B7BX\E\_ getting zpd-3.3 from terser and save the segment in String variable called zpd: --> String WITHOUT "\" 2014-01-16 09:56:46 [INFO ] (Hl7FromFile):84 - useing terser: begin 644 pdf1.pdfx0Dx0A\M)5!$1BTQ+C,-"B7BX Code: String zpd = null; Terser t = new Terser(msg); try { zpd = t.get("/.ZPD-3-3"); log.info("useing terser: "+zpd); } catch (HL7Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } Why does the terser cuts out the backslahes and which workarounds are possible. Thanks a lot! FP On Monday, January 13, 2014 4:33 PM, James Agnew <ja...@ja...> wrote: Hi GGK, I've never seen anyone use UUEncoding inside an HL7 message (Base64 is the way I've generally seen people solve this problem) but it should be possible. Your problem is definitely that the first line of a UUEncoded string needs to be in the form begin <mode> <filename><newline> You have all of that in your string except the newline. That may be what the string "x0Dx" is representing.. You would need to convert that to a newline, but also be careful since that string could also appear in the UUEncoded text. James On Mon, Jan 13, 2014 at 3:14 AM, g3949 <g3...@ya...> wrote: Now I have the Problem, do decode a ZPD-3.3 Segement which is UUDecoded. Example: ZPD|1|PDF|14627^20675^begin 644 pdf1.pdfx0Dx0A\M)5!$1BTQ+C,-"B7BX\E_3#0H-"C.... end Decode the segement, I alway get the Errof: sun.misc.CEFormatException: UUDecoder: No begin line. Does anybody hab som Ideas? GGK ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Hl7api-devel mailing list Hl7...@li... https://lists.sourceforge.net/lists/listinfo/hl7api-devel ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Hl7api-devel mailing list Hl7...@li... https://lists.sourceforge.net/lists/listinfo/hl7api-devel ******************************************************************************** This email, including any attachments sent with it, is confidential and for the sole use of the intended recipient(s). This confidentiality is not waived or lost, if you receive it and you are not the intended recipient(s), or if it is transmitted/received in error. Any unauthorised use, alteration, disclosure, distribution or review of this email is strictly prohibited. The information contained in this email, including any attachment sent with it, may be subject to a statutory duty of confidentiality if it relates to health service matters. If you are not the intended recipient(s), or if you have received this email in error, you are asked to immediately notify the sender by telephone collect on Australia +61 1800 198 175 or by return email. You should also delete this email, and any copies, from your computer system network and destroy any hard copies produced. If not an intended recipient of this email, you must not copy, distribute or take any action(s) that relies on it; any form of disclosure, modification, distribution and/or publication of this email is also prohibited. Although Queensland Health takes all reasonable steps to ensure this email does not contain malicious software, Queensland Health does not accept responsibility for the consequences if any person's computer inadvertently suffers any disruption to services, loss of information, harm or is infected with a virus, other malicious computer programme or code that may occur as a consequence of receiving this email. Unless stated otherwise, this email represents only the views of the sender and not the views of the Queensland Government. ********************************************************************************** |