From: Paul K <pau...@ya...> - 2012-06-11 19:53:22
|
Hi All, I've been working on a wxLua-based IDE (https://github.com/pkulchenko/ZeroBraneStudio/) and have a bug opened about dealing with utf-8 encoding (https://github.com/pkulchenko/ZeroBraneStudio/issues/7). I've captured relevant information in the ticket; to summarize, I seem to be setting correctly both the the codepage (SetCodePage(wxstc.wxSTC_CP_UTF8)) and the encoding (wxFONTENCODING_UNICODE), but the text is still shown as single-byte garbled content (there are some examples in the ticket), even though the editor seems to recognize that these are two-byte characters (as it doesn't allow to position the cursor in the middle of any unicode character). Also, when I set wxFONTENCODING_UTF8, I get "No font for displaying text in encoding 'Unicode 8 bit (UTF-8)' found." and even after I select "Lucida Sans Unicode", which is reported to be a unicode font, I still have the issue above. I also found this message from John "They probably compiled it in ANSI mode and not Unicode. Note however that Lua is strictly ANSI only." in this thread (http://comments.gmane.org/gmane.comp.lib.wxwidgets.wxlua.user/2636). Does this mean that the binaries I'm using need to be compiled with some other configuration (Unicode mode)? Note that in my case I'm not manipulating UTF strings, I just want them to be displayed correctly in the editor. Selecting File | Encoding | UTF-8 in Scite from Lua4Windows displays the same text correctly, which confirms that this is not a font issue. What am I doing wrong here? Is there *any* WxLua based application working on windows with UTF8 encoded text (with or without BOM)? I've tested with both 2.8.7 and 2.8.10 (on windows Vista) with the same result. WxLua and wxLuaEdit also show the same behavior. Paul. |
From: John L. <jla...@gm...> - 2012-06-12 04:22:46
|
On Mon, Jun 11, 2012 at 3:26 PM, Paul K <pau...@ya...> wrote: > Hi All, > > I've been working on a wxLua-based IDE > (https://github.com/pkulchenko/ZeroBraneStudio/) and have a bug opened > about dealing with utf-8 encoding > (https://github.com/pkulchenko/ZeroBraneStudio/issues/7). I've > captured relevant information in the ticket; to summarize, I seem to > be setting correctly both the the codepage > (SetCodePage(wxstc.wxSTC_CP_UTF8)) and the encoding > (wxFONTENCODING_UNICODE), but the text is still shown as single-byte > garbled content (there are some examples in the ticket), even though > the editor seems to recognize that these are two-byte characters (as > it doesn't allow to position the cursor in the middle of any unicode > character). Can you paste this into the editor from Firefox? It works fine in Linux with the Unicode build. Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥ > Also, when I set wxFONTENCODING_UTF8, I get "No font for displaying > text in encoding 'Unicode 8 bit (UTF-8)' found." and even after I > select "Lucida Sans Unicode", which is reported to be a unicode font, > I still have the issue above. > > I also found this message from John "They probably compiled it in ANSI > mode and not Unicode. Note however > that Lua is strictly ANSI only." in this thread > (http://comments.gmane.org/gmane.comp.lib.wxwidgets.wxlua.user/2636). > Does this mean that the binaries I'm using need to be compiled with > some other configuration (Unicode mode)? Note that in my case I'm not > manipulating UTF strings, I just want them to be displayed correctly > in the editor. Where did you get your binaries? The Windows 2.8.10 ones at wxlua.sf.net are compiled in ANSI and not Unicode. The idea was that since Lua is ANSI wxLua might as well be too since multibyte chars will not be handled properly in the Lua string.XXX functions. However, if you are careful it can be made to work. > What am I doing wrong here? Is there *any* WxLua based application > working on windows with UTF8 encoded text (with or without BOM)? I've > tested with both 2.8.7 and 2.8.10 (on windows Vista) with the same > result. WxLua and wxLuaEdit also show the same behavior. You have to recompile wxLua for Unicode, linking against a Unicode wxWidgets build. When compiled in ANSI mode, strings are considered one char per byte and that's it. I plan on providing binaries for Windows compiled in Unicode for the next release. Regards, John |
From: Paul K <pau...@ya...> - 2012-06-12 05:23:51
|
Hi John, > Can you paste this into the editor from Firefox? It works fine in > Linux with the Unicode build. > Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥ I don't have access to a Linux machine, but tried on MacOS (with the build I built from sources, so can't vouch for its quality) and I only got question marks and some special characters. On Windows I just get question marks. > Where did you get your binaries? The Windows 2.8.10 ones at > wxlua.sf.net are compiled in ANSI and not Unicode. The idea was that > since Lua is ANSI wxLua might as well be too since multibyte chars > will not be handled properly in the Lua string.XXX functions. However, > if you are careful it can be made to work. I think mine are from the official wxlua site, so are probably compiled in ANSI mode. > You have to recompile wxLua for Unicode, linking against a Unicode > wxWidgets build. When compiled in ANSI mode, strings are considered > one char per byte and that's it. > > I plan on providing binaries for Windows compiled in Unicode for the > next release. Ok; I'll wait for it then. Thank you. Paul. On Mon, Jun 11, 2012 at 9:22 PM, John Labenski <jla...@gm...> wrote: > On Mon, Jun 11, 2012 at 3:26 PM, Paul K <pau...@ya...> wrote: >> Hi All, >> >> I've been working on a wxLua-based IDE >> (https://github.com/pkulchenko/ZeroBraneStudio/) and have a bug opened >> about dealing with utf-8 encoding >> (https://github.com/pkulchenko/ZeroBraneStudio/issues/7). I've >> captured relevant information in the ticket; to summarize, I seem to >> be setting correctly both the the codepage >> (SetCodePage(wxstc.wxSTC_CP_UTF8)) and the encoding >> (wxFONTENCODING_UNICODE), but the text is still shown as single-byte >> garbled content (there are some examples in the ticket), even though >> the editor seems to recognize that these are two-byte characters (as >> it doesn't allow to position the cursor in the middle of any unicode >> character). > > Can you paste this into the editor from Firefox? It works fine in > Linux with the Unicode build. > Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥ > >> Also, when I set wxFONTENCODING_UTF8, I get "No font for displaying >> text in encoding 'Unicode 8 bit (UTF-8)' found." and even after I >> select "Lucida Sans Unicode", which is reported to be a unicode font, >> I still have the issue above. >> >> I also found this message from John "They probably compiled it in ANSI >> mode and not Unicode. Note however >> that Lua is strictly ANSI only." in this thread >> (http://comments.gmane.org/gmane.comp.lib.wxwidgets.wxlua.user/2636). >> Does this mean that the binaries I'm using need to be compiled with >> some other configuration (Unicode mode)? Note that in my case I'm not >> manipulating UTF strings, I just want them to be displayed correctly >> in the editor. > > Where did you get your binaries? The Windows 2.8.10 ones at > wxlua.sf.net are compiled in ANSI and not Unicode. The idea was that > since Lua is ANSI wxLua might as well be too since multibyte chars > will not be handled properly in the Lua string.XXX functions. However, > if you are careful it can be made to work. > >> What am I doing wrong here? Is there *any* WxLua based application >> working on windows with UTF8 encoded text (with or without BOM)? I've >> tested with both 2.8.7 and 2.8.10 (on windows Vista) with the same >> result. WxLua and wxLuaEdit also show the same behavior. > > You have to recompile wxLua for Unicode, linking against a Unicode > wxWidgets build. When compiled in ANSI mode, strings are considered > one char per byte and that's it. > > I plan on providing binaries for Windows compiled in Unicode for the > next release. > > Regards, > John > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wxlua-users mailing list > wxl...@li... > https://lists.sourceforge.net/lists/listinfo/wxlua-users |
From: John L. <jla...@gm...> - 2012-06-12 16:11:19
|
On Tue, Jun 12, 2012 at 12:58 AM, Paul K <pau...@ya...> wrote: > Hi John, > >> Can you paste this into the editor from Firefox? It works fine in >> Linux with the Unicode build. >> Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥ > > I don't have access to a Linux machine, but tried on MacOS (with the > build I built from sources, so can't vouch for its quality) and I only > got question marks and some special characters. On Windows I just get > question marks. I just tried it in MSW with Unicode and ANSI builds, running the wxLua editor the Sanskrit characters show up fine in the Unicode build and as ??? in the ANSI build. So I will have to provide Unicode binaries then. Regards, John |