Menu

evalScript and Encoding

2017-10-16
2018-02-21
  • Joerg Kreuzberger

    if i use evalScript with QString and the string contains special characters (e.g. GERMAN UMLAUTE), i always get errors about unicode. I am using pythonqt on linux with default Locale de and encoding utf8.

    This dissapears if i use toUtf8 instead of toLatin1().
    Would it be ok to add an overload of evalScript with QByteArray to let the caller handle encoding?
    Or do i something wrong in general?

     
  • Joerg Kreuzberger

    Using QByteArray could not be an overload of QString cause both have constructors from const char. This is could a problem for already used code around.
    So it would work to have just a const char
    as overload or to make a hard cut (QString replaced with QByteArray).

    Im am wondering why i have this problem alone? In the other threads around i found no clear solution, so i do not know why this problem affects me alone?

    The scenario: We use a QTextEdit to store the strings. We take the strings out of the TextEdit and if they contains special chars (like German Umlaute), the conversion to latin1() causes the issue.

    Another consequence could be to use toUtf8() for all Code conversions (cause Python assumes utf8 for all operations as far as i know) and toLocal8Bit() for alle file access conversions.

     
  • Florian Link

    Florian Link - 2017-10-18

    According to https://www.python.org/dev/peps/pep-3120/ this has changed in Python 3, where the default encoding for source is now UTF8, while it used to be Latin-1 and then ASCII.

    So I think we should change the toLatin1() to toUtf8() for Python 3.x. For Python 2.x this would indeed require an extra method that takes QByteArray. This is only a problem if the source is passed via the API, not when it is read from the filesystem, I think, because we use QByteArray there.

     
  • Joerg Kreuzberger

    to avoid ifdefs between python2.x and python3.x we can use QByteArray in the evalScript for both. Or should we make different interfaces for python2 and python3?

    The major problem in the file access is not the CONTENT, the major problem is here the filesystem name.
    The filenames opened are also with toLatin1(). In our experience it would better to use toLocal8Bit() cause it considers the encoding in the environment

    As i mentioned above you cannot use overloads for QByteArray and QString without compilation Errors if you use
    evalSript("print(3)") due to const char* ambigious overloads.
    Or we can acccept such errors for the users to be more clearly( using explizit QString and QByteArray).

     

    Last edit: Joerg Kreuzberger 2017-10-18
  • Florian Link

    Florian Link - 2017-10-18

    I don't want to change the PythonQt interface to use QByteArray, this would break everybody's code.
    I suggest that we do toUtf8() on Py3k inside of PythonQt and this will fix the problem for Python 3.x.
    I don't see a good solution for Python 2.
    Regarding files, probably this has changed with Python 3 as well, I never worked with unicode paths so it might be wrong to use toLatin() at several places in PythonQt importer. I will have to read if that has changed in Python 3 as well.

     
  • Joerg Kreuzberger

    PyObject
    PythonQtImport::compileSource(const QString& path, const QByteArray& data)
    {
    PyObject
    code;
    QByteArray data1 = data;
    // in qt4, data is null terminated
    // data1.resize(data.size()+1);
    // data1.data()[data.size()-1] = 0;
    code = Py_CompileString(data.data(), path.toLatin1().constData(),
    Py_file_input);
    return code;
    }

    Question what to use here. Normaly i would use intentionaly path.toLocal8Bit(), but not sure how Py_CompileString would handle this.

    Regarding the the Interface for Python2:

    The only solution i see to handle it would be to have a const char interface as for the python code required instead of QString.
    This is compatible for python2 and 3 and due to QString CastToAscii enabled also compatible for the rest of the exisiting code :-)
    Then the unicode handling is not your problem an more.
    Handling this is also wrong for ANY Case in python3 cause the DEFAULT Encoding is assuemed to be UTF8, but could be changed by the user during input with the # -
    - coding: latin-1 --
    syntax (other codings possible), or PYTHONIOENCODING and other ugly thinks
    So if someone sends this script to pythonQt, and pythonQt converts to utf8() the encoding is also broken :-). So if you want to get rid of the problem forever, use const char
    in alle QString interfaces :-))
    if we do it with utf8() ist must be mentioned with BIG CAPITAL LETTERS in the doc that we only support utf8 encoding.

     
  • Joerg Kreuzberger

    I have integrated r465 with your utf8 changes. Worked out of the box for me. Thank you for implementing, this reduced workload and patching for me.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.