Menu

#102 unesc doesn't always work correctly

all
closed-fixed
nobody
main code (54)
5
2013-09-03
2013-03-08
No

The attached file contains a number of xml encoded single and double quote characters. I have been trying to use the unesc feature of xml starlet to unescape them in order to facilitate a data comparison. Sometimes instead of replacing a full """ or "'" with " or ' respectively, instead this tool simply strips the ampersand. When running the attached file through "xml unesc", the last sentence starts with "Kerryapos;s confirmation" instead of "Kerry's confirmation"

Discussion

  • Dale Newfield

    Dale Newfield - 2013-03-08

    POL-Kerry-Secretary-Of-State-Confirmation-5

     
  • Dale Newfield

    Dale Newfield - 2013-03-08

    I recognize the output of sending this document through unesc may well be invalid xml. Don't worry--I'm not assuming it will be valid xml. I'm just trying to remove a class of differences (xml encodings) between a set of files I'm comparing in order to unmask other differences.

     
  • Noam Postavsky

    Noam Postavsky - 2013-03-09

    unesc reads lines into a 4096 byte buffer and it wasn't handling the case when an entity started at the 4095th byte of the line.

    Fixed in commit a9f8ec60a3510082bb8807d928805d17ce89222a.

     
  • Noam Postavsky

    Noam Postavsky - 2013-03-09
    • status: open --> open-fixed
     
  • Noam Postavsky

    Noam Postavsky - 2013-07-07

    Fixed in 1.5.0

     
  • Noam Postavsky

    Noam Postavsky - 2013-09-03
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

Auth0 Logo