Menu

#8 manpage clarification request for ascii mode regarding UTF-16 input

closed
None
5
2024-01-16
2023-10-17
No

Hello. I am forwarding a feature request from a Debian user of dos2unix. The request and a bit of discussion can be found in Debian bug #1053983. It concerns the manpage (version 7.5.1) description of ascii mode.

In the section OPTIONS:

 -ascii
   Convert only line breaks. This is the default conversion mode.

And then in the section CONVERSION MODES:

ascii
   In mode "ascii" only line breaks are converted. This is the
   default conversion mode.

   Although the name of this mode is ASCII, which is a 7 bit standard,
   the actual mode is 8  bit.  Use always this mode when converting
   Unicode UTF-8 files.

The requester notes that when in ascii mode, UTF-16 input will be converted to UTF-8 and the BOM is removed. So technically, more than line breaks may be affected.

As noted in the bug report, the behavior is correctly described in other options in the manpage, so the suggestion is to make sure users find those descriptions and are aware of the behavior.

Perhaps something like the following?

-ascii
   Convert only line breaks unless the input is UTF-16, in which case
   the input is transformed to UTF-8 and no BOM is written. This is the
   default conversion mode.

   Also see options -u, -r, and -b for more about UTF-16 and the BOM.

Thank you for considering the request.

Discussion

  • Erwin Waterlander

    • status: open --> accepted
    • assigned_to: Erwin Waterlander
     
  • Erwin Waterlander

    Hi Tony,
    Thanks for the request. This text needs indeed an update. I will make the change.
    best regards,
    Erwin

     
  • Erwin Waterlander

    The text stems from the time when dos2unix had no Unicode conversion. It was the default mode, not doing any ISO character set conversion (option -iso).

     
  • Erwin Waterlander

    I have updated the text to the following (commit [24279d] ):

    CONVERSION MODES
        ascii
            This is the default conversion mode. This mode is for converting
            ASCII and ASCII-compatible encoded files, like UTF-8. Enabling ascii
            mode disables 7bit and iso mode.
    
            If dos2unix has UTF-16 support, UTF-16 encoded files are converted
            to the current locale character encoding on POSIX systems and to
            UTF-8 on Windows. Enabling ascii mode disables the option to keep
            UTF-16 encoding ("-u") and the options to assume UTF-16 input ("-ul"
            and "-ub"). To see if dos2unix has UTF-16 support type "dos2unix
            -V". See also section UNICODE.
    
     

    Related

    Commit: [24279d]

  • tony mancill

    tony mancill - 2023-11-05

    Hello Erwin,

    The updated wording looks good. This should make it clearer to users that they they need to review the various modes of operation and other sections of the documentation if those apply to their use case.

    Thank you for considering the suggestion!

     
  • Erwin Waterlander

    • status: accepted --> closed
     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.