This bug ported from fwcontrib bug
http://sourceforge.net/tracker/index.php?func=detail&aid=1027056&group_id=104906&atid=639667
When wiki topic files contain umlaut characters like äöüő
and the file is generated (I use xsl, output Text utf-8)
1. The generated file displays correctly in flexwiki
2. After fwsync in the web interface all umlaut
characters are replaced by ?
3. If I edit the file in flexwikipad (insert space, delete
space) and commit the edited file, the umlaut characters
appear correctly.
Date: 2004-09-30 10:36
Sender: candera
Logged In: YES
user_id=879696
D'oh! Just remembered you're talking about fwsync, not
flexwikipad. So ignore the editor comment. I'll have to
figure out where this problem is coming from.
Date: 2004-09-30 10:36
Sender: candera
Logged In: YES
user_id=879696
FYI: I'm in the middle of trying to change to a new editor
component that deals with non-ASCII data *much* better,
so I
hope support for stuff like this will get a ton better. But
I'll be sure to test for it when I make the switch.
Date: 2004-09-21 10:32
Sender: indoraq
Logged In: YES
user_id=682657
Well, it is quite important as text like
p??aike p??letab p????sukesepesi
is not very understandable and the number of generated
WikiTopics is increasing and it is not so pleasant to
manually
save them from flexwikipad as my idea is to streamline the
Wiki in my documentation process. So I raised the priority
level a bit...
Date: 2004-09-13 10:59
Sender: candera
Logged In: YES
user_id=879696
OK, thanks, that helps.
Can you give me an idea of how big a problem this is
for you
so I can prioritize a fix? If it's really important, I can
try to get to it this weekend. Otherwise, it's going to
be a
while.
Date: 2004-09-13 08:26
Sender: indoraq
Logged In: YES
user_id=682657
Hi
No --- copy-paste to flexwikipad does not cause problems.
The funny thing is that just by overwriting the file by
flexwikipad seems to cure the issue.
However *sometimes* flexwikipad also transfers the umlaut
characters also into ?. When I reopen the wiki site in
flexwikipad, then I have been able to restore the correct
picture.
Fragment from the generated file
START---
Summary: Mőistete seletused
Terminid
*Töötaja* isik, kellel on kehtiv tööleping.
Töötajad jagunevad aruandluse seisukohalt
*Turvatöötaja _security guard_* tehakse vahet
juhtimiskeskuse, avalike teenuste ja mehitatud valve
turvatöötajatel.
---END
Date: 2004-09-13 08:00
Sender: candera
Logged In: YES
user_id=879696
Interesting! Since fwsync doesn't really process the
contents of the file (just streams it to disk), I'm
wondering what the issue is. I'll try to repro when I get a
chance.
If you open up FlexWikiPad and just type or paste the
characters in, do you have the same problem?
Logged In: YES
user_id=682657
Did a text comparison of two local files
A. wiki file generated and uploaded, umlaut characters being
replaced by '?'
B. the same file opened and saved by flexwikipad and
behaving normally (umalut characters display normally)
Comparison says that files are identical.
Logged In: YES
user_id=879696
OK. A few more questions, then.
1) How are you comparing the files? Have you done a
byte-by-byte comparison? It's possible that the absence of a
byte order mark (BOM) at the beginning of one of the files
could cause various programs to display the characters
incorrectly, even though the binary data is the same.
2) Is it actually running fwsync that seems to cause the
problem? In other words, have you verified that the files
are okay, and then aren't okay after running fwsync?
3) Does fwsync tell you that it updated the files that
you're dealing with? Or just that it committed them? It
should only write to a file when it updates.
I'm thinking this will be a pretty easy fix once we track it
down.
Logged In: YES
user_id=682657
Hi.
Well, firsthand compared the files in text mode with an text
editor and assuming utf-8 encoding from both.
Now, binary comparision gave quite different results. There
seem to be a lot differencies, starting from the very first
bytes. I attached the test files
1. AndmeKontrollA --- the generated file, output from xsl
processor, have used msxsl and saxon
2. AndmeKontroll.wiki (the same file as above, opened and
saved by flexwikipad
fwsync updates both cases OK with a message
Committed FalckPTA.AndmeKontroll: was LocallyModified, is
now UpToDate
but when I do not open and save the topic file with
flexwikipad I get umlaut characters replaced by ? in flexwiki.
When I open the generated topic wih flexwikipad and
overwrite it, flexwiki displays the contents correctly.
Logged In: YES
user_id=682657
See the attached files at
http://sourceforge.net/tracker/index.php?
func=detail&aid=1027056&group_id=104906&atid=639667
indrek
Logged In: YES
user_id=682657
Hi,
Yes, BOM is the source of evil. XSL processors do not mark
UTF-8 encoded files with the BOM. FlexWiki expects the BOM
to be there. Binary comparison also shows that the correctly
displaying file has BOM, the not-displaying one has none.
Logged In: YES
user_id=879696
OK, that's good to know. Now I just need to figure out what
to do about it. I think I might have a fix - there's a way
to tell .NET to assume UTF-8 if no BOM is found. Since UTF-8
is identical to ASCII for ASCII characters, this is probably
pretty safe. I need to think about it and/or test it,
though. At least it seems like FWP is doing the "right" thing.
One other thing you could try would be change your XSLT to
emit UTF-16 or some other encoding. That might force a BOM
to be emitted. There might be other ways to get a BOM in the
document as well, like some sort of preprocessing step. How
much control do you have over the transform?
BTW, sorry I haven't been moving faster on this one. Things
have been pretty crazy since my daughter was born. I do
intend to get this fixed, though!
Logged In: YES
user_id=682657
Hi,
Well I have full control over xsl. But can't figure out how to
add the BOM. Seems that the xslt processors (saxon and
MS,s own msxsl) both try to follow the utf-8 standard, which
accepts no BOM in the case of utf-8 encoding.
utf-16 seems to work OK. Bad luck, that did not try this
firsthand.... So from my viewpoint, the case can be closed.
Logged In: YES
user_id=879696
OK. Glad you got it working.
I'm going to leave the bug open for now to remind me to
think about adding UTF-8 as the default.
If you really want to use UTF-8, you could perhaps try
postprocessing the file with a separate step to add the BOM
after the XSLT is complete. Not sure if that would work for
you.
Thanks for your feedback!