Menu

continuation symbols and delimiters

Help
Gabriele
2009-02-24
2013-05-30
  • Gabriele

    Gabriele - 2009-02-24

    I have edifact flat files to map to xml and validate.
    Records terminate with ' character, fields with + , and subfields with :
    The character ? is used as an escape character. For example:

    NAD+SU+++COMPANY?: XXXX?+YYYY+VIA ROMA, 1+CITTA?' DI GENOVA+GE+16163+IT'

    will be interpreted as:
    ...
    COMPANY: XXXX+YYYY
    ...
    CITTA' DI GENOVA
    ...

    I can use
    <sx:recordDelimiter value="'" continuation="?"/>
    for the record delimiter. The ' character after the ? will be missing from the xml output, but tha's not a very big deal (it would be better if both ?' remained).
    The problem is with
    <sx:fieldDelimiter value="+"/>
    <sx:fieldDelimiter value=":"/>
    because I cannot find anything similar to tell servingXML to ignore the ?+ and ?: character sequences.

    I found in the on-line help:
    "If you have a field value that includes a character that is the same as the delimiter, you must define the field to have quoted field values and enclose the value of the field in quotes."
    But this solution is not an option for our workflow.
    Any help or suggestion is appreciated.
    Thanks.

    Gabriele

     
    • Gabriele

      Gabriele - 2009-02-24

      Actually the best solution would be to have the ? symbol removed and the following character maintained in the resulting xml text, like this:

      flat file:
      COMPANY?: XXXX?+YYYY+CITTA?' DI GENOVA

      resulting xml field content:
      COMPANY: XXXX+YYYY
      CITTA' DI GENOVA

      Gabriele

       
    • Gabriele

      Gabriele - 2009-02-24

      Another similar question is about newline characters (\n \r).
      Can I tell servingXML to ignore these characters while reading the flat file and building the xml?
      For example to replace "\n" with "" for each occurrence?
      Because I could have an edifact file with records delimited with the ' character, but with lines limited to 80 columns, so the same record could continue on a new line.
      Newline characters, if interpreted as delimiters for records or fields, would generate mapping errors. Or, if maintained inside fields, would generate validation errors.

      In the following text the mapped value IN\nVOIC would be invalid:

      UNB+UNOA:3+0000008033253:14+01534730278:ZZ+080130:0000+04000020++++++1'UNH+1+IN
      VOIC:D:96A:UN+refcode+'BGM+381+0000322+9'DTM+137:20080130:102'PAI+35::30'NAD+SU
      +++COMPANY XXXX+VIA ROMA, 1+CITTA?'+GE+16163+IT'

      I see I could use something like:
      <sx:findAndReplace searchFor ="\n" replaceWith="">
                <sx:toString value="*"/>   
      </sx:findAndReplace>
      but I can't understand exactly how it works. That is, where it can be used inside the workflow.

       
    • Gabriele

      Gabriele - 2009-02-24

      It would be interesting if something like this were available:

      1)
      <sx:recordDelimiter value="'" continuation="?" preserveDelimiter="true" preserveContinuation="false"/>

      that is ?' => '

      2)
      <sx:recordDelimiter value="\n" continuation="*" preserveDelimiter="true" preserveContinuation="true"/>

      that is {anything}\n{anything_else} => {anything}{anything_else}
      and every newline \n would be trimmed. And my IN\nVOIC would become INVOIC. Wouldn't it?

      I don't know if this could be a solution for a future release, or if a workaround is already available.
      Anyway I still think that something similar would be needed for
      <sx:fieldDelimiter value="+"/>
      <sx:fieldDelimiter value=":"/>
      Please let me know your opinion.
      Thanks.

      Gabriele

       
    • Daniel Parker

      Daniel Parker - 2009-02-24

      Gabriele,

      I'll have a look at

      <sx:recordDelimiter value="'" continuation="?" preserveDelimiter="true" preserveContinuation="false"/>

      to see if it's possible to add that as a minor enhancement.  Another option might be

      <sx:recordDelimiter value="'" escapeCharacter="?"/>

      which would be consistent with the use of escapeCharacter in other elements (so continuation discards the delimiter, but escapeCharacter preserves it.)

      I'll also look at field delimiter.

      I'm not so sure about

      <sx:recordDelimiter value="\n" continuation="*" preserveDelimiter="true" preserveContinuation="true"/>

      (but shouldn't preserveDelimiter and preserveContinuation both be false?)

      For this I think it would be simpler to add another XML filter sx:removeCharacters, similiar to sx:removeEmptyElements, which would allow you to remove characters satisfying regular expression patterns for specified elements.

      -- Daniel

       
    • Gabriele

      Gabriele - 2009-03-13

      Hi Daniel,

      please let me know if you can introduce in a new release the enhancements regarding the previous issue, like:
      <sx:recordDelimiter value="'" escapeCharacter="?"/>
      and (if needed)
      <sx:fieldDelimiter value=":" escapeCharacter="?"/> 
      <sx:fieldDelimiter value="+" escapeCharacter="?"/>
      so I can introduce the management for the escape characters in my edifact validators.

      Thanks,

      Gabriele

       
    • Daniel Parker

      Daniel Parker - 2009-03-13

      Yes, but it won't be released until around the end of May, sorry.

      -- Daniel

       
    • Daniel Parker

      Daniel Parker - 2009-03-14

      If in the next couple of days you send me  a complete example with input file as above, resources script, and expected output, I may be able to get the delimiter "escaped by" feature uploaded before I leave on vacation beginning of April. 

      -- Daniel

       
    • Daniel Parker

      Daniel Parker - 2009-03-19

      I've uploaded a version 1.0.2 which is able to handle your escaped delimiters, see the Invoic96A example in the Examples.  This versions allows an escapeCharacter attribute on sx:recordDelimiter, sx:fieldDelimiter, sx:subfieldDelimiter, sx:repeatDelimiter, and sx:segmentDelimiter.  A new element sx:nonrepeatingGroup simplifies the definition of a non repeating group.

      -- Daniel

       
    • Daniel Parker

      Daniel Parker - 2009-03-21

      I've uploaded a new version 1.0.3 which handles your requirement to "tell servingXML to ignore these characters [\r \n] while reading the flat file".  In particular, it produces the expected output for your "80columns" example, the files for this example are now included with the edi samples.  The resources script uses the instructions

          <sx:recordDelimiter value="\r\n" continuationSequence="\r\n"/>
          <sx:recordDelimiter value="\n" continuationSequence="\n"/>

      which have the effect of removing the CRLF and LF occurances.

      -- Daniel

       
    • Gabriele

      Gabriele - 2009-03-23

      Thanks a lot for the update Daniel ... and good vacation!

      Gabriele

       
  • Gabriele

    Gabriele - 2009-10-27

    Hello Daniel,

    recently I have encountered a peculiar case in which the escape-delimiter thing seems not to work. The case is with &quot;80columns&quot; edi files in which the escape character &quot;?&quot; is the last character of the line and the record terminator character &quot;'&quot; is the first character of the following line (so, it should be escaped and not interpreted as record delimiter). The problem is that in this particular case, the instructions

    sx:recordDelimiter value=&quot;\n&quot; continuationSequence=&quot;\n&quot;
    sx:recordDelimiter value=&quot;\r&quot; continuationSequence=&quot;\r&quot;
    sx:recordDelimiter value=&quot;'&quot; escapeCharacter=&quot;?&quot;

    do not work correctly, that is, they correctly strip our the \n and \r characters, but the escape sequence is not recognized, and the record is truncated.

    Example lines:

    0'QTY+59:1:UN'NAD+DP+0110:92++INTNL. GROUP S.R.L. - Campoverde+Localita?
    ' Camposanto, 7+SANNICO+TR+33012+IT'RFF+AAM+8870/1999+090618'UNS+S'TMA+2

    I should obtain a record containing &quot;Localita' Camposanto, 7&quot; instead the record ends with &quot;Localita?&quot;. I tried different combinations of record delimiters, continuation sequence and escape character, but I can't understand if I'm missing something or if it is just a very particular issue with these functions.
    When you have time, can you have a look?
    Thanks.

    Gabriele

     
  • Daniel Parker

    Daniel Parker - 2009-11-02

    Apologies for the delay, I'll find some time this week to look at this.

    • Daniel
     
  • Gabriele

    Gabriele - 2012-12-19

    Hello Daniel,

    I recently encoutered a special case with an EDI file, in which a repetition of two or more escape sequences is present and it looks likte it's not correctly managed.

    For example if I have the following escape sequence ?' repeated twice one after another:

    IMD+B++:::MBC DOP CAMPANA 50g.?'?'VASETTO?'?' 500'

    The text should read MBC DOP CAMPANA 50g.''VASETTO'' 500, instead the second escape sequence of each pair is not correctly recognized, it looks like the second escape sequence is ignored and copied into the text as is. The problem is that the ' character is the record termination character, so the record is split incorrectly, like this:

    IMD+B++:::MBC DOP CAMPANA 50g.'?'
    VASETTO?'?' 500'

    So I obtain a partial record followed by an invalid record:

    <IMD>
                <DE7077>B</DE7077>
                <DE7081/>
                <C273>
                   <DE7009/>
                   <DE1131/>
                   <DE3055/>
                   <DE7008-1>MBC DOP CAMPANA 50g.'?</DE7008-1>
                   <DE7008-2/>
                   <DE3453/>
                </C273>
                <DE7383/>
             </IMD>

    If a single (e.g.: space) character is inserted between the escape sequences, everything seems to work fine if the escape sequences are inside the record. If the escape sequence is just before the record terminator, another space is needed for them to work correctly:

    OK: (spaces even before record terminator character)

    IMD+B++:::MBC DOP CAMPANA 50g.?' ?'VASETTO 500 ?' ?' '

    <IMD>
                <DE7077>B</DE7077>
                <DE7081/>
                <C273>
                   <DE7009/>
                   <DE1131/>
                   <DE3055/>
                   <DE7008-1>MBC DOP CAMPANA 50g.' 'VASETTO 500' '</DE7008-1>
                   <DE7008-2/>
                   <DE3453/>
                </C273>
                <DE7383/>
             </IMD>

    KO (no space before final record terminator character, the record terminator character is not recognized as a terminator character, is escaped and the subsequent record is "fused" together):

    IMD+B++:::MBC DOP CAMPANA 50g.?' ?'VASETTO 500?' ?''
    QTY+47:18:50'

    <IMD>
                <DE7077>B</DE7077>
                <DE7081/>
                <C273>
                   <DE7009/>
                   <DE1131/>
                   <DE3055/>
                   <DE7008-1>MBC DOP CAMPANA 50g.' 'VASETTO 500' ''QTY</DE7008-1>
                   <DE7008-2/>
                   <DE3453/>
                </C273>
                <DE7383>47:18:50</DE7383>
             </IMD>

    If you have some time, please have a look.
    Thanks.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.