PythonQt / Discussion / Open Discussion: evalScript and Encoding

Joerg Kreuzberger - 2017-10-16

if i use evalScript with QString and the string contains special characters (e.g. GERMAN UMLAUTE), i always get errors about unicode. I am using pythonqt on linux with default Locale de and encoding utf8.

This dissapears if i use toUtf8 instead of toLatin1().
Would it be ok to add an overload of evalScript with QByteArray to let the caller handle encoding?
Or do i something wrong in general?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joerg Kreuzberger - 2017-10-18

Using QByteArray could not be an overload of QString cause both have constructors from const char. This is could a problem for already used code around.
So it would work to have just a const char as overload or to make a hard cut (QString replaced with QByteArray).

Im am wondering why i have this problem alone? In the other threads around i found no clear solution, so i do not know why this problem affects me alone?

The scenario: We use a QTextEdit to store the strings. We take the strings out of the TextEdit and if they contains special chars (like German Umlaute), the conversion to latin1() causes the issue.

Another consequence could be to use toUtf8() for all Code conversions (cause Python assumes utf8 for all operations as far as i know) and toLocal8Bit() for alle file access conversions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Florian Link - 2017-10-18

According to https://www.python.org/dev/peps/pep-3120/ this has changed in Python 3, where the default encoding for source is now UTF8, while it used to be Latin-1 and then ASCII.

So I think we should change the toLatin1() to toUtf8() for Python 3.x. For Python 2.x this would indeed require an extra method that takes QByteArray. This is only a problem if the source is passed via the API, not when it is read from the filesystem, I think, because we use QByteArray there.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joerg Kreuzberger - 2017-10-18

to avoid ifdefs between python2.x and python3.x we can use QByteArray in the evalScript for both. Or should we make different interfaces for python2 and python3?

The major problem in the file access is not the CONTENT, the major problem is here the filesystem name.
The filenames opened are also with toLatin1(). In our experience it would better to use toLocal8Bit() cause it considers the encoding in the environment

As i mentioned above you cannot use overloads for QByteArray and QString without compilation Errors if you use
evalSript("print(3)") due to const char* ambigious overloads.
Or we can acccept such errors for the users to be more clearly( using explizit QString and QByteArray).

Last edit: Joerg Kreuzberger 2017-10-18

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Florian Link - 2017-10-18

I don't want to change the PythonQt interface to use QByteArray, this would break everybody's code.
I suggest that we do toUtf8() on Py3k inside of PythonQt and this will fix the problem for Python 3.x.
I don't see a good solution for Python 2.
Regarding files, probably this has changed with Python 3 as well, I never worked with unicode paths so it might be wrong to use toLatin() at several places in PythonQt importer. I will have to read if that has changed in Python 3 as well.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joerg Kreuzberger - 2017-10-18

PyObject *
PythonQtImport::compileSource(const QString& path, const QByteArray& data)
{
PyObject *code;
QByteArray data1 = data;
// in qt4, data is null terminated
// data1.resize(data.size()+1);
// data1.data()[data.size()-1] = 0;
code = Py_CompileString(data.data(), path.toLatin1().constData(),
Py_file_input);
return code;
}

Question what to use here. Normaly i would use intentionaly path.toLocal8Bit(), but not sure how Py_CompileString would handle this.

Regarding the the Interface for Python2:

The only solution i see to handle it would be to have a const char interface as for the python code required instead of QString.
This is compatible for python2 and 3 and due to QString CastToAscii enabled also compatible for the rest of the exisiting code :-)
Then the unicode handling is not your problem an more.
Handling this is also wrong for ANY Case in python3 cause the DEFAULT Encoding is assuemed to be UTF8, but could be changed by the user during input with the # -- coding: latin-1 --
syntax (other codings possible), or PYTHONIOENCODING and other ugly thinks
So if someone sends this script to pythonQt, and pythonQt converts to utf8() the encoding is also broken :-). So if you want to get rid of the problem forever, use const char in alle QString interfaces :-))
if we do it with utf8() ist must be mentioned with BIG CAPITAL LETTERS in the doc that we only support utf8 encoding.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joerg Kreuzberger - 2018-02-21

I have integrated r465 with your utf8 changes. Worked out of the box for me. Thank you for implementing, this reduced workload and patching for me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Florian Link - 2018-02-21
  
  We finally switched MeVisLab to Python 3, so more effort has gone/will go
  into polishing/fixing stuff for Python 3.
  
  On Wed, Feb 21, 2018 at 8:17 AM, Joerg Kreuzberger kreuzberger@users.sourceforge.net wrote:
  
  I have integrated r465 with your utf8 changes. Worked out of the box for
  me. Thank you for implementing, this reduced workload and patching for me.
  
  evalScript and Encoding
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/pythonqt/discussion/631392/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

evalScript and Encoding

Dynamic Python binding for Qt Applications

Forums

Help

evalScript and Encoding

evalScript and Encoding

Dynamic Python binding for Qt Applications

Forums

Help

evalScript and Encoding document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

evalScript and Encoding