Uncrustify Code Beautifier / Bugs / #387 special chars in utf-8 are replaced by a space

special chars in utf-8 are replaced by a space

#387 special chars in utf-8 are replaced by a space

Status: closed

Owner: Ben Gardner

Labels: Build (23)

Language:

Priority: 5

Updated: 2012-11-10

Created: 2011-04-13

Creator: Theo Thustrup

Private: No

Special characters in utf-8 consists of 2 bytes. A leading control char followed by the char. Some danish characters are like that.
The leading char is #C3. Some, but not all danish characters, are changed to a space #20.
The behavior is different if the utf-8 file has the BOM (=#EFBBBF) or not. The utf-8 standard, as I know, allows an utf-8 document both to have a BOM and not have a BOM. It is good practise to have a BOM, but some editors remove the BOM, leaving us with both situations.
Example 1: if the file has NO BOM, the 2-byte chars #C3A5, #C3A6, #C3B8 are changed to #C320, #C320, #C3B8.
Example 2: if the file has a BOM, the 2-byte chars #C3A5, #C3A6, #C3B8 are changed to #20, #20, #C3B8
As I see, uncrustify may handle all chars after #C3 as non-space, without regard if the file has a BOM.
An example file using the special chars is attached.

Discussion

Theo Thustrup - 2011-04-13

SkrCensurAllokDBRead C77465.cs

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Gardner - 2011-05-05

Fixed in commit 6ed81d3.
However, there is currently no support for UTF-8 multi-byte characters, so using them may cause alignment problems.
This is because Uncrustify assumes that the byte-length of a chunk is the same as the display-length.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

special chars in utf-8 are replaced by a space

Group

Searches

Help

#387 special chars in utf-8 are replaced by a space

Discussion