On 8/23/07, Reini Urban <rurban@x-ray.at> wrote:
Sabri LABBENE schrieb:
> Hi Reini and all,
> I have some issues with not supported characters by wysiwyg charset.
> What happens is that some users sometimes copy some text from other text
> editors and paste it into wysiwyg edition area. The copied text can
> contain some characters that are not part of ISO 8859-1 charset. When
> opening a page that contains such characters with wysiwyg the text after
> the unknown character is not shown.
> I choosed ISO 8859-1 as a charset for wikiwyg conversion because it is
> understandable by all web browsers and is a standard in web applications.
> - Should we support additional charsets?

We should detect/choose it (optionally) and convert it.
Or tell the user to use our native charset and how to convert it.
(iconv or use some sort of save as...)
 
You think about popups?
I like it. We can ask the user whether he wants to do a conversion to our native charset.
This have to happen only if we detect that the user content is in a different charset. I Don't know how to do that yet.
I also didn't understoot the 'save as' solution. However, I think it is not a good idea to force user to save his wiki page at any time durin the edition. He should keep the freedom of doing it whenever he wants.   

> - May be skip the unknown characters and continue the display? Not
> really good because there is some data loosing !
>
> what do you think?
> Cheers,

There are mostly two types of pasted formats: rtf and ansi (text/plain)
in the clipboard. From notepad you get ansi (the windows version of
latin1. almost the same, but different).
From word, wordpad, outlook at el. you'll get rtf, with formatting.
This is what the user wants usually.

I have to investigate if it's possible to get the source charset from
rtf to convert it automatically. If possible. I doubt it.
We'd need some dropdown to choose the source charset within the editor,
which converts it into the wiki charset, which is usually latin1 or utf-8.
 
The problem I described in my previous message is that 
- Some characters in wiki pages are part from wiki charset (utf-8).
- Those characters are not part of ISO 8859-1 charset which is used in wysiwyg edition mode.
- When you open the wiki page in wysiwyg edition mode, the display stops at the unknown character (unknown for ISO 8859-1)
 
These characters are usually introduced into the wiki page by copy from the clipboard. The classical edition textarea accept those chars since they are in utf-8. If you save the page, they will be part of it and they will be displayed without problem if you view the wiki page.
The problem only appears when you edit the page with wysiwyg editor.
 
I just want to remind that ISO 8859-1 was used to solve the accented characters issue some time ago and I believe it is suitable for wysiwyg editor html content.

There's a libtextcat library which can detect the language and/or
charset of the source automatically.
I maintain it for cygwin.
 
Good.
Let's assume that we guessed in which charset user data is. This is what I propose:
- Tell user that his data is not compatible with our native charset.
- We will then propose to him to convert his content to our native wysiwyg charset (ISO 8859-1). Otherwise, his page will not be viewed correctly.
 
What do you think.?
 
-- Sabri.