XMLStarlet command line XML toolkit / Bugs / #102 unesc doesn't always work correctly

unesc doesn't always work correctly

#102 unesc doesn't always work correctly

Milestone: all

Status: closed-fixed

Owner: nobody

Labels: main code (54)

Priority: 5

Updated: 2013-09-03

Created: 2013-03-08

Creator: Dale Newfield

Private: No

The attached file contains a number of xml encoded single and double quote characters. I have been trying to use the unesc feature of xml starlet to unescape them in order to facilitate a data comparison. Sometimes instead of replacing a full """ or "'" with " or ' respectively, instead this tool simply strips the ampersand. When running the attached file through "xml unesc", the last sentence starts with "Kerryapos;s confirmation" instead of "Kerry's confirmation"

Discussion

Dale Newfield - 2013-03-08

POL-Kerry-Secretary-Of-State-Confirmation-5

badinput

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dale Newfield - 2013-03-08

I recognize the output of sending this document through unesc may well be invalid xml. Don't worry--I'm not assuming it will be valid xml. I'm just trying to remove a class of differences (xml encodings) between a set of files I'm comparing in order to unmask other differences.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Noam Postavsky - 2013-03-09

unesc reads lines into a 4096 byte buffer and it wasn't handling the case when an entity started at the 4095th byte of the line.

Fixed in commit a9f8ec60a3510082bb8807d928805d17ce89222a.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Noam Postavsky - 2013-03-09

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Noam Postavsky - 2013-07-07

Fixed in 1.5.0

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Noam Postavsky - 2013-09-03

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

unesc doesn't always work correctly

Group

Searches

Help

#102 unesc doesn't always work correctly

Discussion