Menu

Xnedit without UTF8

L.K.P.
2019-03-27
2019-04-03
  • L.K.P.

    L.K.P. - 2019-03-27

    Hi,

    great, eventually an nedit with smooth fonts and UTF8 support. But I'm
    still using a locale with a simple ISO character encoding. Here's a
    small suggestion to be able to use xnedit in a non-UTF8 locale on files
    without extended attributes:

    --- file.c.org 2019-02-24 14:24:48.000000000 +0100
    +++ file.c 2019-03-27 23:15:22.053279131 +0100
    @@ -57,6 +57,7 @@
    #include <unistd.h>
    #include <ctype.h>
    #include <iconv.h>
    +#include <langinfo.h></langinfo.h></iconv.h></ctype.h></unistd.h>

    #ifdef VMS
    #include "../util/VMSparam.h"
    @@ -567,6 +568,9 @@
    free(enc_attr);
    enc_attr = NULL;
    }
    + } else {
    + / file has no extended attributes, use locale charset /
    + encoding = nl_langinfo(CODESET);
    }
    }

    Best regards
    Lothar
    --
    Lothar Paltins lptmp10@arcor.de

     
  • Pyrphoros

    Pyrphoros - 2019-03-28

    Hello and thanks for the patch. This looks like a good idea and I have applied the changes with some modifications.

    I think there are still problems when not using an UTF8 locale. If you create a new file and save it, the default encoding is still UTF8. The next time you start XNEdit and want to open that file, you have to manually select the UTF8 encoding, because "detect" is not that powerful.

    Should I just implement that the default charset on Save is the locale charset?

     

    Last edit: Pyrphoros 2019-03-28
    • L.K.P.

      L.K.P. - 2019-03-28

      Hi,

      I think there are still problems when not using an UTF8 locale. If you
      create a new file and save it, the default encoding is still UTF8. The
      next time you start XNEdit and want to open that file, you have to
      manually select the UTF8 encoding, because "detect" is not that powerful.

      what I have noticed is that xnedit sets the extended file attribute
      user.charset even if this wasn't specified in the "Save as" dialog.

      Should I just implement that the default charset on Save is the locale
      charset?

      That's already the case with my patch and I think, that's the correct
      behavior. This is also the case if a file is opened using the open
      dialog. The question is, what to do with new windows or files specified
      on the command line? I think, here the default should also be the locale
      charset but until now there's no way to specify a different encoding here.

      But I've found another issue in the implementation of doSave in file.c.
      A return value of -1 of iconv was always considered a conversion error.
      But iconv also returns -1 if only the output buffer is full, therefore
      every 2048th character was skipped. I've stumbled on it while I edited
      an Xdefaults file and xnedit changed "shiftLeft" to "shitLeft" in the
      saved file. :-)

      I've attached another patch for file.c. Please take a look at it, maybe
      I took into account too many error conditions. But I think, it's better
      to consider errors that could never occur than to forget a possible
      error. I think, it's also better to give the user the chance to abort
      the saving of a file in case of conversion errors than to show only a
      warning message.

      Best regards
      Lothar
      --
      Lothar Paltins lptmp10@arcor.de

       
      • Pyrphoros

        Pyrphoros - 2019-03-29

        Thanks a lot. Not only for finding this bug, but also for the patch. The error handling is really nice. Based on that I also changed the encoding conversion in doOpen.

        what I have noticed is that xnedit sets the extended file attribute user.charset even if this wasn't specified in the "Save as" dialog.

        I can't reproduce this, but maybe the GUI just tricked you. If you select a non-unicode encoding, the checkbox is automatically enabled. The intention was that a user can select any encoding he wants and the next time he opens the file, the right encoding is choosen. Maybe I should make this behavior configurable for people who don't like extended attributes as much as I do.

        Should I just implement that the default charset on Save is the locale charset?

        That's already the case with my patch

        Not completely. When you open a file and than do Save As, the correct encoding is pre-selected. But when you just open xnedit and save the file, UTF-8 is selected. In this regard, your patch only changes doOpen.

        Best regards
        Olaf

         
        • L.K.P.

          L.K.P. - 2019-03-30

          Hi Olaf,

          I did sent a reply per mail to the list, but it didn't appear here, so I'm entering it again througt the web interface. Please ignore it, if this same message appears again later.

          Thanks a lot. Not only for finding this bug, but also for the patch. The
          error handling is really nice. Based on that I also changed the encoding
          conversion in doOpen.

          what you also should check is the behavior with UTF-8 encoding, when
          strconv = copyBytes. The last call to iconv() should be one with inbuf
          or *inbuf equal to NULL, in order to flush out any partially converted
          input (original comment from the man page). Therefore I'm calling
          strconv again after inleft is 0 with in set to NULL. I didn't test it,
          but if strconv == copyBytes, then memcpy is called with *inbuf == NULL.
          It may work correctly, but the man page doesn't specify what happens in
          this case and it would be more safe if copyBytes would not call memcpy
          in this case.

          what I have noticed is that xnedit sets the extended file attribute
          user.charset even if this wasn't specified in the "Save as" dialog.
          

          I can't reproduce this, but maybe the GUI just tricked you. If you
          select a non-unicode encoding, the checkbox is automatically enabled.

          Yes, I didn't notice that "Store encoding in extended attribute" was
          checked automatically after I selected ISO8859-15.

          The intention was that a user can select any encoding he wants and the
          next time he opens the file, the right encoding is choosen. Maybe I
          should make this behavior configurable for people who don't like
          extended attributes as much as I do.

          Yes, it's a nice feature, but only if user_xattr is enabled.

              Should I just implement that the default charset on Save is the
              locale charset?
          
          That's already the case with my patch
          

          Not completely. When you open a file and than do Save As, the correct
          encoding is pre-selected. But when you just open xnedit and save the
          file, UTF-8 is selected. In this regard, your patch only changes doOpen.

          Yes, of course, I patched only doOpen. But I think, the encoding of the
          locale should be the default for all operations. This is by far the most
          important use case. Users will most likely have text files only in the
          encoding of the locale. Converting to another encoding should be optional.

          But yet another problem. All dialogs seem to work correctly only with
          UTF-8 coded text. All text after a non-UTF-8 character is swallowed. For
          example, I'm using the ISO8859-15 encoding and there's a directory named
          "Geräte". But in the Open File dialog it's shown as "Ger". It's also not
          possible to enter non-ASCII ISO8859 characters in the Replace dialog
          window. I didn't look deeper into it, but this could be a Motif issue.

          Lothar

           
          • L.K.P.

            L.K.P. - 2019-03-31

            The problem with the dialog boxes is really a Motif issue. It occurs with Motif-2.3.4, but not with the older Motif-2.1.32 that doesn't have scalable font support. But dialog boxes with bitmapped X11 fonts are better than boxes with nice looking but broken text .

            Two ideas relating to the character encoding:

            You are storing the encoding internally as a property of the window. Why not handle it in this way also externally for the user as an option of a window resp. tab without the selection boxes in the open and save dialogs? You could set the "external" encoding of a window and this would determine the codeset conversion for an open into and a save out of this window.

            Or even simpler, you could follow the classic Unix philosophy "one task, one tool". The job of an editor is to edit text and not to convert encodings between text files. That's the job of the iconv program. I could live with an editor that asumes that all text files are using the encoding of the locale.

            Lothar

             
            • Pyrphoros

              Pyrphoros - 2019-03-31

              I wanted to handle encodings like other editors do it, therefore selecting encodings on open/save is really necessary.

              However, I have changed the default encoding for windows. It is now initialized with the locale encoding. Therefore everything should be fine with non-UTF8 locales.

               
  • Pyrphoros

    Pyrphoros - 2019-03-31

    But yet another problem. All dialogs seem to work correctly only with UTF-8 coded text. All text after a non-UTF-8 character is swallowed. For example, I'm using the ISO8859-15 encoding and there's a directory named "Geräte". But in the Open File dialog it's shown as "Ger". It's also not possible to enter non-ASCII ISO8859 characters in the Replace dialog window. I didn't look deeper into it, but this could be a Motif issue.

    Same on my system. The problem also exists in nedit and other motif applications. Turns out XSupportsLocale returns false. After that the locale is set to C and only ASCII input works.

    I have not figured out yet why the locale is unsupported.

     
    • L.K.P.

      L.K.P. - 2019-03-31

      That's not the case here. XSupportsLocale returns true also with Motif-2.3.4 but nevertheless it doesn't work. I think it's because the scalable font handling in newer Motif versions supports only UTF-8 coded text.

       
  • Dusan Peterc

    Dusan Peterc - 2019-03-31

    I can confirm your observation that XFT fonts in Motif only work with UTF-8 endoding.
    In my opinon the program should only use UTF-8 internally, and do the conversion on loading and saving to/from other encodings, if necessary. This is what I did in my other Motif programs and it significanlty reduced code complexity. But it is probably more difficult for xnedit, if it wishes to keep compatibility with older platforms, which do not support XFT.

     
  • Pyrphoros

    Pyrphoros - 2019-04-03

    I'm using UTF-8 internally in the text widget, because everything else is practically impossible. However that lead to another problem when using non-utf8 locales. Input in the search dialog is not UTF-8 encoded, therefore it was impossible to search for non-ASCII characters. I have fixed that now, so all search related text input will be converted to UTF-8.

    Forcing UTF8 for everything by switching the locale for XNEdit would be the easiest option, however then you would have problems with differently encoded file names.

    I think with disabled XFT for Motif widgets everything should work now. I'll do some more testing and than release an update.

     

Log in to post a comment.