Add Encoding of python source code files to comment lines.

Brought to you by: ahasuerus_isfdb, alvonruff, mkupper

#205 Add Encoding of python source code files to comment lines.

Milestone: Approved

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2022-04-27

Created: 2022-04-18

Creator: Klaus Elsbernd

Private: No

Well, not a real issue, but may be needed in the future.
I (not accidentally) checked subversion files out via svn update using the non US locale de_DE.UTF-8.
No problem with that, but files, changed in March get a revision time stamp in the comment lines of the python code with subversion keyword $Date. The date added to the code contains "März" (German for March).
This confuses python, which will fail to execute the file, complaining illegal characters ('ä') in the comment lines.

Python ... SyntaxError: Non-ASCII character '\xc3' in file ...

subversion internally stores all files in UTF-8 (to my knowledge).
PEP 263 (https://peps.python.org/pep-0263/) suggest to add a comment line to the header comments of the file, containing the coding of the file:

# coding: UTF-8

or alike.
This works with the python 2.7.18 and resolves the problem.

Discussion

Ahasuerus - 2022-04-18

Ticket moved from /p/isfdb/feature-requests/1498/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ahasuerus - 2022-04-27

At one point a couple of Python files contained encoding information, which allowed developers to use non-ASCII characters in comments. However, it also allowed the addition of non-ASCII characters to Python code, which could cause complications when invisible characters were copy-pasted into the body of the module. Conversely, as long as there is no encoding statement at the top of a source code file, Python won't let you file non-ASCII characters and will tell you where the problem is.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Klaus Elsbernd - 2022-04-27

I would easily agree with these points.
Nevertheless, as far, as I know, it's svn update command, which enters the non-ASCII characters as the $Date data into the header of file. I'll change locale for the check out to en_US, if I use the command in the future.
Rem.: But why should it not be allowed to use UTF-8 code into text-strings of the python code? SVN-Repository is based on UTF-8

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ahasuerus - 2022-04-27

Many Unicode characters look the same but have different numeric values. For example, there are a lot of different "space" characters in Unicode, multiple characters for different currency signs and even digits. When copy-pasting code from another file or online, it's easy to miss the fact that you are getting the wrong character. Disallowing Unicode characters in the source code means that the developer will need to provide their numeric equivalents, which are less likely to be in error. (And if a numeric value is mistyped, a different, unexpected, character will be displayed, which will make it easier to debug.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous