ServingXML / Discussion / Help: continuation symbols and delimiters

Gabriele - 2009-02-24

I have edifact flat files to map to xml and validate.
Records terminate with ' character, fields with + , and subfields with :
The character ? is used as an escape character. For example:

NAD+SU+++COMPANY?: XXXX?+YYYY+VIA ROMA, 1+CITTA?' DI GENOVA+GE+16163+IT'

will be interpreted as:
...
COMPANY: XXXX+YYYY
...
CITTA' DI GENOVA
...

I can use
<sx:recordDelimiter value="'" continuation="?"/>
for the record delimiter. The ' character after the ? will be missing from the xml output, but tha's not a very big deal (it would be better if both ?' remained).
The problem is with
<sx:fieldDelimiter value="+"/>
<sx:fieldDelimiter value=":"/>
because I cannot find anything similar to tell servingXML to ignore the ?+ and ?: character sequences.

I found in the on-line help:
"If you have a field value that includes a character that is the same as the delimiter, you must define the field to have quoted field values and enclose the value of the field in quotes."
But this solution is not an option for our workflow.
Any help or suggestion is appreciated.
Thanks.

Gabriele

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriele - 2009-02-24
  
  Actually the best solution would be to have the ? symbol removed and the following character maintained in the resulting xml text, like this:
  
  flat file:
  COMPANY?: XXXX?+YYYY+CITTA?' DI GENOVA
  
  resulting xml field content:
  COMPANY: XXXX+YYYY
  CITTA' DI GENOVA
  
  Gabriele
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriele - 2009-02-24
  
  Another similar question is about newline characters (\n \r).
  Can I tell servingXML to ignore these characters while reading the flat file and building the xml?
  For example to replace "\n" with "" for each occurrence?
  Because I could have an edifact file with records delimited with the ' character, but with lines limited to 80 columns, so the same record could continue on a new line.
  Newline characters, if interpreted as delimiters for records or fields, would generate mapping errors. Or, if maintained inside fields, would generate validation errors.
  
  In the following text the mapped value IN\nVOIC would be invalid:
  
  UNB+UNOA:3+0000008033253:14+01534730278:ZZ+080130:0000+04000020++++++1'UNH+1+IN
  VOIC:D:96A:UN+refcode+'BGM+381+0000322+9'DTM+137:20080130:102'PAI+35::30'NAD+SU
  +++COMPANY XXXX+VIA ROMA, 1+CITTA?'+GE+16163+IT'
  
  I see I could use something like:
  <sx:findAndReplace searchFor ="\n" replaceWith="">
  <sx:toString value="*"/>
  </sx:findAndReplace>
  but I can't understand exactly how it works. That is, where it can be used inside the workflow.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriele - 2009-02-24
  
  It would be interesting if something like this were available:
  
  1)
  <sx:recordDelimiter value="'" continuation="?" preserveDelimiter="true" preserveContinuation="false"/>
  
  that is ?' => '
  
  2)
  <sx:recordDelimiter value="\n" continuation="*" preserveDelimiter="true" preserveContinuation="true"/>
  
  that is {anything}\n{anything_else} => {anything}{anything_else}
  and every newline \n would be trimmed. And my IN\nVOIC would become INVOIC. Wouldn't it?
  
  I don't know if this could be a solution for a future release, or if a workaround is already available.
  Anyway I still think that something similar would be needed for
  <sx:fieldDelimiter value="+"/>
  <sx:fieldDelimiter value=":"/>
  Please let me know your opinion.
  Thanks.
  
  Gabriele
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Parker - 2009-02-24
  
  Gabriele,
  
  I'll have a look at
  
  <sx:recordDelimiter value="'" continuation="?" preserveDelimiter="true" preserveContinuation="false"/>
  
  to see if it's possible to add that as a minor enhancement. Another option might be
  
  <sx:recordDelimiter value="'" escapeCharacter="?"/>
  
  which would be consistent with the use of escapeCharacter in other elements (so continuation discards the delimiter, but escapeCharacter preserves it.)
  
  I'll also look at field delimiter.
  
  I'm not so sure about
  
  <sx:recordDelimiter value="\n" continuation="*" preserveDelimiter="true" preserveContinuation="true"/>
  
  (but shouldn't preserveDelimiter and preserveContinuation both be false?)
  
  For this I think it would be simpler to add another XML filter sx:removeCharacters, similiar to sx:removeEmptyElements, which would allow you to remove characters satisfying regular expression patterns for specified elements.
  
  -- Daniel
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriele - 2009-03-13
  
  Hi Daniel,
  
  please let me know if you can introduce in a new release the enhancements regarding the previous issue, like:
  <sx:recordDelimiter value="'" escapeCharacter="?"/>
  and (if needed)
  <sx:fieldDelimiter value=":" escapeCharacter="?"/>
  <sx:fieldDelimiter value="+" escapeCharacter="?"/>
  so I can introduce the management for the escape characters in my edifact validators.
  
  Thanks,
  
  Gabriele
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Parker - 2009-03-13
  
  Yes, but it won't be released until around the end of May, sorry.
  
  -- Daniel
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Parker - 2009-03-14
  
  If in the next couple of days you send me a complete example with input file as above, resources script, and expected output, I may be able to get the delimiter "escaped by" feature uploaded before I leave on vacation beginning of April.
  
  -- Daniel
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Parker - 2009-03-19
  
  I've uploaded a version 1.0.2 which is able to handle your escaped delimiters, see the Invoic96A example in the Examples. This versions allows an escapeCharacter attribute on sx:recordDelimiter, sx:fieldDelimiter, sx:subfieldDelimiter, sx:repeatDelimiter, and sx:segmentDelimiter. A new element sx:nonrepeatingGroup simplifies the definition of a non repeating group.
  
  -- Daniel
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Parker - 2009-03-21
  
  I've uploaded a new version 1.0.3 which handles your requirement to "tell servingXML to ignore these characters [\r \n] while reading the flat file". In particular, it produces the expected output for your "80columns" example, the files for this example are now included with the edi samples. The resources script uses the instructions
  
  <sx:recordDelimiter value="\r\n" continuationSequence="\r\n"/>
  <sx:recordDelimiter value="\n" continuationSequence="\n"/>
  
  which have the effect of removing the CRLF and LF occurances.
  
  -- Daniel
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriele - 2009-03-23
  
  Thanks a lot for the update Daniel ... and good vacation!
  
  Gabriele
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriele - 2009-10-27

Hello Daniel,

recently I have encountered a peculiar case in which the escape-delimiter thing seems not to work. The case is with "80columns" edi files in which the escape character "?" is the last character of the line and the record terminator character "'" is the first character of the following line (so, it should be escaped and not interpreted as record delimiter). The problem is that in this particular case, the instructions

sx:recordDelimiter value="\n" continuationSequence="\n"
sx:recordDelimiter value="\r" continuationSequence="\r"
sx:recordDelimiter value="'" escapeCharacter="?"

do not work correctly, that is, they correctly strip our the \n and \r characters, but the escape sequence is not recognized, and the record is truncated.

Example lines:

0'QTY+59:1:UN'NAD+DP+0110:92++INTNL. GROUP S.R.L. - Campoverde+Localita?
' Camposanto, 7+SANNICO+TR+33012+IT'RFF+AAM+8870/1999+090618'UNS+S'TMA+2

I should obtain a record containing "Localita' Camposanto, 7" instead the record ends with "Localita?". I tried different combinations of record delimiters, continuation sequence and escape character, but I can't understand if I'm missing something or if it is just a very particular issue with these functions.
When you have time, can you have a look?
Thanks.

Gabriele

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Parker - 2009-11-02

Apologies for the delay, I'll find some time this week to look at this.

Daniel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriele - 2012-12-19

Hello Daniel,

I recently encoutered a special case with an EDI file, in which a repetition of two or more escape sequences is present and it looks likte it's not correctly managed.

For example if I have the following escape sequence ?' repeated twice one after another:

IMD+B++:::MBC DOP CAMPANA 50g.?'?'VASETTO?'?' 500'

The text should read MBC DOP CAMPANA 50g.''VASETTO'' 500, instead the second escape sequence of each pair is not correctly recognized, it looks like the second escape sequence is ignored and copied into the text as is. The problem is that the ' character is the record termination character, so the record is split incorrectly, like this:

IMD+B++:::MBC DOP CAMPANA 50g.'?'
VASETTO?'?' 500'

So I obtain a partial record followed by an invalid record:

<IMD>
            <DE7077>B</DE7077>
            <DE7081/>
            <C273>
               <DE7009/>
               <DE1131/>
               <DE3055/>
               <DE7008-1>MBC DOP CAMPANA 50g.'?</DE7008-1>
               <DE7008-2/>
               <DE3453/>
            </C273>
            <DE7383/>
         </IMD>

If a single (e.g.: space) character is inserted between the escape sequences, everything seems to work fine if the escape sequences are inside the record. If the escape sequence is just before the record terminator, another space is needed for them to work correctly:

OK: (spaces even before record terminator character)

IMD+B++:::MBC DOP CAMPANA 50g.?' ?'VASETTO 500 ?' ?' '

<IMD>
            <DE7077>B</DE7077>
            <DE7081/>
            <C273>
               <DE7009/>
               <DE1131/>
               <DE3055/>
               <DE7008-1>MBC DOP CAMPANA 50g.' 'VASETTO 500' '</DE7008-1>
               <DE7008-2/>
               <DE3453/>
            </C273>
            <DE7383/>
         </IMD>

KO (no space before final record terminator character, the record terminator character is not recognized as a terminator character, is escaped and the subsequent record is "fused" together):

IMD+B++:::MBC DOP CAMPANA 50g.?' ?'VASETTO 500?' ?''
QTY+47:18:50'

<IMD>
            <DE7077>B</DE7077>
            <DE7081/>
            <C273>
               <DE7009/>
               <DE1131/>
               <DE3055/>
               <DE7008-1>MBC DOP CAMPANA 50g.' 'VASETTO 500' ''QTY</DE7008-1>
               <DE7008-2/>
               <DE3453/>
            </C273>
            <DE7383>47:18:50</DE7383>
         </IMD>

If you have some time, please have a look.
Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

continuation symbols and delimiters

Forums

Help

continuation symbols and delimiters document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

continuation symbols and delimiters