Hello. I am forwarding a feature request from a Debian user of dos2unix. The request and a bit of discussion can be found in Debian bug #1053983. It concerns the manpage (version 7.5.1) description of ascii
mode.
In the section OPTIONS:
-ascii
Convert only line breaks. This is the default conversion mode.
And then in the section CONVERSION MODES:
ascii
In mode "ascii" only line breaks are converted. This is the
default conversion mode.
Although the name of this mode is ASCII, which is a 7 bit standard,
the actual mode is 8 bit. Use always this mode when converting
Unicode UTF-8 files.
The requester notes that when in ascii mode, UTF-16 input will be converted to UTF-8 and the BOM is removed. So technically, more than line breaks may be affected.
As noted in the bug report, the behavior is correctly described in other options in the manpage, so the suggestion is to make sure users find those descriptions and are aware of the behavior.
Perhaps something like the following?
-ascii
Convert only line breaks unless the input is UTF-16, in which case
the input is transformed to UTF-8 and no BOM is written. This is the
default conversion mode.
Also see options -u, -r, and -b for more about UTF-16 and the BOM.
Thank you for considering the request.
Anonymous
Hi Tony,
Thanks for the request. This text needs indeed an update. I will make the change.
best regards,
Erwin
The text stems from the time when dos2unix had no Unicode conversion. It was the default mode, not doing any ISO character set conversion (option -iso).
I have updated the text to the following (commit [24279d] ):
Related
Commit: [24279d]
Hello Erwin,
The updated wording looks good. This should make it clearer to users that they they need to review the various modes of operation and other sections of the documentation if those apply to their use case.
Thank you for considering the suggestion!