Well, not a real issue, but may be needed in the future.
I (not accidentally) checked subversion files out via svn update using the non US locale de_DE.UTF-8.
No problem with that, but files, changed in March get a revision time stamp in the comment lines of the python code with subversion keyword $Date. The date added to the code contains "März" (German for March).
This confuses python, which will fail to execute the file, complaining illegal characters ('ä') in the comment lines.
Python ... SyntaxError: Non-ASCII character '\xc3' in file ...
subversion internally stores all files in UTF-8 (to my knowledge).
PEP 263 (https://peps.python.org/pep-0263/) suggest to add a comment line to the header comments of the file, containing the coding of the file:
# coding: UTF-8
or alike.
This works with the python 2.7.18 and resolves the problem.
Anonymous
Ticket moved from /p/isfdb/feature-requests/1498/
At one point a couple of Python files contained encoding information, which allowed developers to use non-ASCII characters in comments. However, it also allowed the addition of non-ASCII characters to Python code, which could cause complications when invisible characters were copy-pasted into the body of the module. Conversely, as long as there is no encoding statement at the top of a source code file, Python won't let you file non-ASCII characters and will tell you where the problem is.
I would easily agree with these points.
Nevertheless, as far, as I know, it's
svn update
command, which enters the non-ASCII characters as the $Date data into the header of file. I'll change locale for the check out to en_US, if I use the command in the future.Rem.: But why should it not be allowed to use UTF-8 code into text-strings of the python code? SVN-Repository is based on UTF-8
Many Unicode characters look the same but have different numeric values. For example, there are a lot of different "space" characters in Unicode, multiple characters for different currency signs and even digits. When copy-pasting code from another file or online, it's easy to miss the fact that you are getting the wrong character. Disallowing Unicode characters in the source code means that the developer will need to provide their numeric equivalents, which are less likely to be in error. (And if a numeric value is mistyped, a different, unexpected, character will be displayed, which will make it easier to debug.)