From: Ming C. <cim...@ya...> - 2011-09-26 10:22:15
|
Hi, Recently I run the XmlStarlet tool on my Win7 OS with different Unicode XML files, and found some strange things: 1. XmlStarlet supports UTF-16BE (no BOM, encoding is UTF-16BE), with command line like "xml ed -d //d UTF-16BE.xml > d-UTF-16BE.xml", the output file has UTF-16BE format, the line endings becomes 00 0D 0A whatever the original line endings are (00 0D 00 0A, 00 0A or 00 0D). 2. XmlStarlet supports UTF-16LE (no BOM, encoding is UTF-16LE), with command line like "xml ed -d //d UTF-16LE.xml > d-UTF-16LE.xml", the output file has UTF-16LE format, the line endings becomes 0D 0A 00 whatever the original line endings are (0D 00 0A 00, 0A 00 or 0D 00). 3. XmlStarlet supports UTF-16LE-BOM (BOM FF FE, encoding is UTF-16), with command line like "xml ed -d //d UTF-16LE-BOM.xml > d-UTF-16LE-BOM.xml", the output file has UTF-16LE-BOM format, the line endings becomes 0D 0A 00 whatever the original line endings are (0D 00 0A 00, 0A 00 or 0D 00). 4. XmlStarlet supports UTF-16BE-BOM (BOM FE FF, encoding is UTF-16), with command line like "xml ed -d //d UTF-16BE-BOM.xml > d-UTF-16BE-BOM.xml", the output file has UTF-16LE-BOM format, the line endings becomes 0D 0A 00 whatever the original line endings are (00 0D 00 0A, 00 0A or 00 0D). Please note that the case #4, in which the output file has an reversed byte order. In all output files, the line endings have strange format. I think they should be 00 0D 00 0A or 0D 00 0A 00. I am using the latest windows version (1.2.1). My OS is Win7 64bit. Could someone help to check this? Thanks, Ming |