I have been giving thought to how best to handle text in Hermes given the desire to produce versions for Linux and Mac. Broadly there are three options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or UTF-16 depending on a compilation option. For Windows we would use UTF-16 and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes for Windows and I think we should give priority to getting a Windows version completed with as few changes as practical. I dislike (2) because, while there can be no problem storing text as UTF-16 on all platforms, facilities for manipulating text as UTF-16 may be limited on Linux (and possibly Mac). The bottom line is that I find myself favouring (3). It makes for great consistency. Everywhere text is stored in char's it is assumed to be UTF-8 (except of course in contexts where it is being converted from other single- and multi-byte character sets to UTF-8) and everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it would be useful to know what WxWidgets works with -- I hope either. Soren, could you answer this for us, please.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
WxWidgets works internally with wxString, but we can use std::string and
std::wstring, only having to convert when calling WxWidgets functions.
Also, wxstring was designed to work almost exactly as the std:: versions. I
think std::wstring is UTF-8, but let me get home to be sure, I'm a little
pressed here...
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it would
be useful to know what WxWidgets works with -- I hope either. Soren, could
you answer this for us, please.
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it would
be useful to know what WxWidgets works with -- I hope either. Soren, could
you answer this for us, please.
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either. Soren,
could you answer this for us, please.
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either. Soren,
could you answer this for us, please.
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.
sourceforge.net petemaclean@users.sourceforge.net wrote:
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either. Soren,
could you answer this for us, please.
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.
sourceforge.net petemaclean@users.sourceforge.net wrote:
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2) because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either. Soren,
could you answer this for us, please.
Again, with the reservation that MAC is the joker here. I have zero
experience with that. There are however examples of MAC builds in the
samples. Which BTW doesn't compile "out of the box". The configure and
makefile are regular nightmares.
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.
sourceforge.net petemaclean@users.sourceforge.net wrote:
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2)
because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either.
Soren,
could you answer this for us, please.
I'm also a little stressed out by the fact that I installed and uninstalled
WxWidgets so many times now on Debian that I'm considering reinstalling
Linux. That wouldn't be a big deal, if it weren't for all the stuff I'll
have to back up.....
Again, with the reservation that MAC is the joker here. I have zero
experience with that. There are however examples of MAC builds in their
samples. Which BTW doesn't compile "out of the box". The configure and
makefile are regular nightmares.
I may just start with codelite after all....
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But that doesn't change the fact that I agree with your suggestion no 3.
That'll do fine.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
No wait. I'm on 64-bit Linux. I'm not thinking straight. Let me get home
and I'll check to be absolutely sure....
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.
sourceforge.net petemaclean@users.sourceforge.net wrote:
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2)
because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either.
Soren,
could you answer this for us, please.
I'm also a little stressed out by the fact that I installed and uninstalled
WxWidgets so many times now on Debian that I'm considering reinstalling
Linux. That wouldn't be a big deal, if it weren't for all the stuff I'll
have to back up.....
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
Again, with the reservation that MAC is the joker here. I have zero
experience with that. There are however examples of MAC builds in their
samples. Which BTW doesn't compile "out of the box". The configure and
makefile are regular nightmares.
I may just start with codelite after all....
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But that doesn't change the fact that I agree with your suggestion no 3.
That'll do fine.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
No wait. I'm on 64-bit Linux. I'm not thinking straight. Let me get home
and I'll check to be absolutely sure....
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.
sourceforge.net petemaclean@users.sourceforge.net wrote:
I have been giving thought to how best to handle text in Hermes given the
desire to produce versions for Linux and Mac. Broadly there are three
options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low
level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low
level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or
UTF-16 depending on a compilation option. For Windows we would use UTF-16
and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes
for Windows and I think we should give priority to getting a Windows
version completed with as few changes as practical. I dislike (2)
because,
while there can be no problem storing text as UTF-16 on all platforms,
facilities for manipulating text as UTF-16 may be limited on Linux (and
possibly Mac). The bottom line is that I find myself favouring (3). It
makes for great consistency. Everywhere text is stored in char's it is
assumed to be UTF-8 (except of course in contexts where it is being
converted from other single- and multi-byte character sets to UTF-8) and
everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it
would be useful to know what WxWidgets works with -- I hope either.
Soren,
could you answer this for us, please.
I have been giving thought to how best to handle text in Hermes given the desire to produce versions for Linux and Mac. Broadly there are three options, as follows:
(1) Handle all text as UTF-8 and, in Windows, convert to UTF-16 at a low level when dealing with the Windows API.
(2) Handle all text as UTF-16 and, in Linux, convert to UTF-8 at a low level when calling Linux APIs.
(3) Use MSVC's TCHAR mechanism which allows text to be either UTF-8 or UTF-16 depending on a compilation option. For Windows we would use UTF-16 and for Linux UTF-8. I am unclear which would be better for Mac.
I dislike (1) mostly because it would require the most extensive changes for Windows and I think we should give priority to getting a Windows version completed with as few changes as practical. I dislike (2) because, while there can be no problem storing text as UTF-16 on all platforms, facilities for manipulating text as UTF-16 may be limited on Linux (and possibly Mac). The bottom line is that I find myself favouring (3). It makes for great consistency. Everywhere text is stored in char's it is assumed to be UTF-8 (except of course in contexts where it is being converted from other single- and multi-byte character sets to UTF-8) and everywhere text is stored in wchar_t's it is assumed to be UTF-16.
Do we have any consensus on this? Before making a final decision, it would be useful to know what WxWidgets works with -- I hope either. Soren, could you answer this for us, please.
(AFK)
WxWidgets works internally with wxString, but we can use std::string and
std::wstring, only having to convert when calling WxWidgets functions.
Also, wxstring was designed to work almost exactly as the std:: versions. I
think std::wstring is UTF-8, but let me get home to be sure, I'm a little
pressed here...
In the meantime:
https://docs.wxwidgets.org/3.1/classwx_string.html
https://en.cppreference.com/w/cpp/string
https://stackoverflow.com/questions/4588302/why-isnt-wchar-t-widely-used-in-code-for-linux-related-platforms
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.sourceforge.net wrote:
--
Søren Bro Thygesen
(AFK)
But if you ask me if I agree with your deductions I do.
Regards
On Tuesday, September 18, 2018, Pete Maclean petemaclean@users.sourceforge.net wrote:
--
Søren Bro Thygesen
(AFK)
I'm talking Linux only here right now. I assumed, perhaps wrongfully,
that's what you asked.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
--
Søren Bro Thygesen
(AFK)
...and I meant about the size/type of std::wstring (wchar_t)
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
--
Søren Bro Thygesen
(AFK)
No wait. I'm on 64-bit Linux. I'm not thinking straight. Let me get home
and I'll check to be absolutely sure....
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
--
Søren Bro Thygesen
(AFK)
But that doesn't change the fact that I agree with your suggestion no 3.
That'll do fine.
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
--
Søren Bro Thygesen
(AFK)
Again, with the reservation that MAC is the joker here. I have zero
experience with that. There are however examples of MAC builds in the
samples. Which BTW doesn't compile "out of the box". The configure and
makefile are regular nightmares.
I may just start with codelite after all....
Regards
On Tuesday, September 18, 2018, sbrothy@gmail.com wrote:
--
Søren Bro Thygesen
(destination home)
I'm also a little stressed out by the fact that I installed and uninstalled
WxWidgets so many times now on Debian that I'm considering reinstalling
Linux. That wouldn't be a big deal, if it weren't for all the stuff I'll
have to back up.....
Regards
On Tuesday, September 18, 2018, Soren Bro sbrothy@users.sourceforge.net
wrote:
--
Søren Bro Thygesen
sizeof(wchar_t) on Debian Linux: 4
Regards.
On Tue, Sep 18, 2018 at 6:05 PM Soren Bro sbrothy@users.sourceforge.net
wrote: