As I said, you seem to know what you're talking about and, for that reason
alone I'd like to keep you around. :)
Remember though, that we're not experts in HUNSPELL. We (I?) just started
experimenting with it very recently. I also have a lot in my plate, as you
may have deducted.
i have searched for lists and IRC channels to ask questions but came up
short. I'm guessing that's how you found your way here, to the Hermes mail
list.
I'm of the opinion that all sourceforge projects should have an IRC channel
on FREENODE. No such luck with HUNSPELL so far, but we have one. It's on
the FREENODE IRC server and it's called #hermesmail. It's not in any
danger of going malthusian anytime soon, but at least I'm usually there.
If not AFK. I use my phone as hotspot, so when I'm not home I'm not online.
Historical reasons.
I'm perfectly comfortable with our exchanges being public on the list. It's
just gmail that messes with my recipients from time to time.
Regards.
This question is really independent of UTF-8. You can represent both NFC
and NFD in UTF-8 (just like you can represent both in UTF-16). >>
+
For instance, according to my experiments, if I define my .dic file to
include the word "blasé" where the é is represented by one character
(U+00E9 = >> C3 A9 in UTF-8 - this is NFC), and then try to use it to
check the data "blasé" with the é represented by two characters (U+0065 &
U+0301 = 65 CC >> 81 in UTF-8 - this is NFD), the word will be marked as
misspelled.
At least this is what my tests show. Is there something I could put in
my Hunspell files to handle this? If not, then a spell-checker will only
handle >> data in the normalization form that it is specifically defined
for. This is unfortunate, since applications are ideally supposed to handle
NFC and NFD >> as if they are equivalent. Defining a spell-checker to
handle both could be a big pain and make the string list huge.
Maybe the assumption has been that a Hunspell spell-checker is only
written to handle NFC, but there's nothing to enforce that, and it would
surprise me if all applications that want to use Hunspell work that way.
In fact I'm sure they don't.
I see that the MAP mechanism can be used to define closely related
sequences. But NFC and NFD are supposed to be equivalent, not just similar.
This question is really independent of UTF-8. You can represent both NFC
and NFD in UTF-8 (just like you can represent both in UTF-16).
For instance, according to my experiments, if I define my .dic file to
include the word "blasé" where the é is represented by one character
(U+00E9 = C3 A9 in UTF-8 - this is NFC), and then try to use it to check
the data "blasé" with the é represented by two characters (U+0065 & U+0301
= 65 CC 81 in UTF-8 - this is NFD), the word will be marked as misspelled.
At least this is what my tests show. Is there something I could put in my
Hunspell files to handle this? If not, then a spell-checker will only
handle data in the normalization form that it is specifically defined for.
This is unfortunate, since applications are ideally supposed to handle NFC
and NFD as if they are equivalent. Defining a spell-checker to handle both
could be a big pain and make the string list huge.
Maybe the assumption has been that a Hunspell spell-checker is only
written to handle NFC, but there's nothing to enforce that, and it would
surprise me if all applications that want to use Hunspell work that way. In
fact I'm sure they don't.
I see that the MAP mechanism can be used to define closely related
sequences. But NFC and NFD are supposed to be equivalent, not just similar.
On 10/26/2018 1:48 PM, sbrothy@gmail.com wrote:
I'm quite sure diacritics and such are covered by UTF-8. After all, Arabic
and Hebrew are. You're concerned about the transition though, if I read you
correctly?
We're sorta committed to UTF-8 as it is. Unless someone shows me a
language more obscure than "Modern Greek (Polytonic) or "Friulian",or the
RTL-ones like Arabic or Hebrew I am not concerned. I'm quite sure they'll
survive any potential shift. In fact I think it can only become better.
Regardless of method.
But that's just my completely unbacked optimism shining through. :)
As always, you're welcome to a second opinion. Anyone?
To be brutally honest, I considered replacing the GUI with WxWidgets
to be portable and all, but only MFC has the "sexiness" expexted by it's
users. Dockable toolbars and tabbed dockable windows etc. You can tell me
all you want that as long as the functionality is the same it won't matter,
but I'm not convinced.
Which makes me kinda curious and worried about the MAC users.
I guess this ones's for me. If you mean whether HUNSPELL supports
Korean or similar, the answer is yes. It even supports Hebrew. An RTL
language. I don't know whether Korean is RTL, but replacing SPELL32.DLL
isn't easy. To say the least....
I apologize if I have overlooked something, but...is there any kind
of
NFC/NFD support in Hunspell currently? If not, it appears that a
spell-checker designed for NFC data will not work if the client app
sends it NFD, and vice versa.
If there is no such support, it might be something that I would
consider adding. It surprises me to think that this is not a
significant need.
As I said, you seem to know what you're talking about and, for that reason
alone I'd like to keep you around. :)
Remember though, that we're not experts in HUNSPELL. We (I?) just started
experimenting with it very recently. I also have a lot in my plate, as you
may have deducted.
i have searched for lists and IRC channels to ask questions but came up
short. I'm guessing that's how you found your way here, to the Hermes mail
list.
I'm of the opinion that all sourceforge projects should have an IRC channel
on FREENODE. No such luck with HUNSPELL so far, but we have one. It's on
the FREENODE IRC server and it's called #hermesmail. It's not in any
danger of going malthusian anytime soon, but at least I'm usually there.
If not AFK. I use my phone as hotspot, so when I'm not home I'm not online.
Historical reasons.
I'm perfectly comfortable with our exchanges being public on the list. It's
just gmail that messes with my recipients from time to time.
Regards.
This question is really independent of UTF-8. You can represent both NFC
and NFD in UTF-8 (just like you can represent both in UTF-16). >>
+
For instance, according to my experiments, if I define my .dic file to
include the word "blasé" where the é is represented by one character
(U+00E9 = >> C3 A9 in UTF-8 - this is NFC), and then try to use it to
check the data "blasé" with the é represented by two characters (U+0065 &
U+0301 = 65 CC >> 81 in UTF-8 - this is NFD), the word will be marked as
misspelled.
At least this is what my tests show. Is there something I could put in
my Hunspell files to handle this? If not, then a spell-checker will only
handle >> data in the normalization form that it is specifically defined
for. This is unfortunate, since applications are ideally supposed to handle
NFC and NFD >> as if they are equivalent. Defining a spell-checker to
handle both could be a big pain and make the string list huge.
Maybe the assumption has been that a Hunspell spell-checker is only
written to handle NFC, but there's nothing to enforce that, and it would
surprise me if all applications that want to use Hunspell work that way.
In fact I'm sure they don't.
I see that the MAP mechanism can be used to define closely related
sequences. But NFC and NFD are supposed to be equivalent, not just similar.
On Fri, Oct 26, 2018 at 9:35 PM Sharon Correll sharon_correll@sil.org
wrote:
This question is really independent of UTF-8. You can represent both NFC
and NFD in UTF-8 (just like you can represent both in UTF-16).
For instance, according to my experiments, if I define my .dic file to
include the word "blasé" where the é is represented by one character
(U+00E9 = C3 A9 in UTF-8 - this is NFC), and then try to use it to check
the data "blasé" with the é represented by two characters (U+0065 & U+0301
= 65 CC 81 in UTF-8 - this is NFD), the word will be marked as misspelled.
At least this is what my tests show. Is there something I could put in my
Hunspell files to handle this? If not, then a spell-checker will only
handle data in the normalization form that it is specifically defined for.
This is unfortunate, since applications are ideally supposed to handle NFC
and NFD as if they are equivalent. Defining a spell-checker to handle both
could be a big pain and make the string list huge.
Maybe the assumption has been that a Hunspell spell-checker is only
written to handle NFC, but there's nothing to enforce that, and it would
surprise me if all applications that want to use Hunspell work that way. In
fact I'm sure they don't.
I see that the MAP mechanism can be used to define closely related
sequences. But NFC and NFD are supposed to be equivalent, not just similar.
On 10/26/2018 1:48 PM, sbrothy@gmail.com wrote:
I'm quite sure diacritics and such are covered by UTF-8. After all, Arabic
and Hebrew are. You're concerned about the transition though, if I read you
correctly?
We're sorta committed to UTF-8 as it is. Unless someone shows me a
language more obscure than "Modern Greek (Polytonic) or "Friulian",or the
RTL-ones like Arabic or Hebrew I am not concerned. I'm quite sure they'll
survive any potential shift. In fact I think it can only become better.
Regardless of method.
But that's just my completely unbacked optimism shining through. :)
As always, you're welcome to a second opinion. Anyone?
Regards,
Soren
On Friday, October 26, 2018, sbrothy@gmail.com wrote:
Oh You're talking about normalisation. I got confused by all the German
links I ran into. Eggs on my face. Let me get back to you on this one.
Regards
On Thursday, October 25, 2018, sbrothy@gmail.com wrote:
What I think I forgot to mention, is that I'm trying to replace the
spellchecking and it's a spaghetti-code nightmare.
Regards.
On Thu, Oct 25, 2018 at 11:00 PM sbrothy@gmail.com wrote:
On Thu, Oct 25, 2018 at 10:51 PM sbrothy@gmail.com wrote:
To be brutally honest, I considered replacing the GUI with WxWidgets
to be portable and all, but only MFC has the "sexiness" expexted by it's
users. Dockable toolbars and tabbed dockable windows etc. You can tell me
all you want that as long as the functionality is the same it won't matter,
but I'm not convinced.
Which makes me kinda curious and worried about the MAC users.
Regards.
On Thu, Oct 25, 2018 at 10:31 PM sbrothy@gmail.com wrote:
I guess this ones's for me. If you mean whether HUNSPELL supports
Korean or similar, the answer is yes. It even supports Hebrew. An RTL
language. I don't know whether Korean is RTL, but replacing SPELL32.DLL
isn't easy. To say the least....
Regards,
Soren
On Thu, Oct 25, 2018 at 9:33 PM Sharon Correll sharon_correll@sil.org
wrote:
I apologize if I have overlooked something, but...is there any kind
of
NFC/NFD support in Hunspell currently? If not, it appears that a
spell-checker designed for NFC data will not work if the client app
sends it NFD, and vice versa.
If there is no such support, it might be something that I would
consider adding. It surprises me to think that this is not a
significant need.
As I said, you seem to know what you're talking about and, for that reason
alone I'd like to keep you around. :)
Remember though, that we're not experts in HUNSPELL. We (I?) just started
experimenting with it very recently. I also have a lot in my plate, as you
may have deducted.
i have searched for lists and IRC channels to ask questions but came up
short. I'm guessing that's how you found your way here, to the Hermes mail
list.
I'm of the opinion that all sourceforge projects should have an IRC channel
on FREENODE. No such luck with HUNSPELL so far, but we have one. It's on
the FREENODE IRC server and it's called #hermesmail. It's not in any
danger of going malthusian anytime soon, but at least I'm usually there.
If not AFK. I use my phone as hotspot, so when I'm not home I'm not online.
Historical reasons.
I'm perfectly comfortable with our exchanges being public on the list. It's
just gmail that messes with my recipients from time to time.
Regards.
On Fri, Oct 26, 2018 at 9:35 PM Sharon Correll sharon_correll@sil.org
wrote:
"on my plate", "deduced"?! And I'm the spellchecker?! Heh.
Regards
On Sat, Oct 27, 2018 at 6:18 PM Soren Bro sbrothy@users.sourceforge.net
wrote: