Menu

#61 umlaut characters replaced by ?

FlexWiki
open
5
2004-10-13
2004-10-13
No

This bug ported from fwcontrib bug
http://sourceforge.net/tracker/index.php?func=detail&aid=1027056&group_id=104906&atid=639667

When wiki topic files contain umlaut characters like äöüő
and the file is generated (I use xsl, output Text utf-8)

1. The generated file displays correctly in flexwiki

2. After fwsync in the web interface all umlaut
characters are replaced by ?

3. If I edit the file in flexwikipad (insert space, delete
space) and commit the edited file, the umlaut characters
appear correctly.

Comments:

Date: 2004-09-30 10:36
Sender: candera
Logged In: YES
user_id=879696

D'oh! Just remembered you're talking about fwsync, not
flexwikipad. So ignore the editor comment. I'll have to
figure out where this problem is coming from.

Date: 2004-09-30 10:36
Sender: candera
Logged In: YES
user_id=879696

FYI: I'm in the middle of trying to change to a new editor
component that deals with non-ASCII data *much* better,
so I
hope support for stuff like this will get a ton better. But
I'll be sure to test for it when I make the switch.

Date: 2004-09-21 10:32
Sender: indoraq
Logged In: YES
user_id=682657

Well, it is quite important as text like
p??aike p??letab p????sukesepesi

is not very understandable and the number of generated
WikiTopics is increasing and it is not so pleasant to
manually
save them from flexwikipad as my idea is to streamline the
Wiki in my documentation process. So I raised the priority
level a bit...

Date: 2004-09-13 10:59
Sender: candera
Logged In: YES
user_id=879696

OK, thanks, that helps.

Can you give me an idea of how big a problem this is
for you
so I can prioritize a fix? If it's really important, I can
try to get to it this weekend. Otherwise, it's going to
be a
while.

Date: 2004-09-13 08:26
Sender: indoraq
Logged In: YES
user_id=682657

Hi

No --- copy-paste to flexwikipad does not cause problems.
The funny thing is that just by overwriting the file by
flexwikipad seems to cure the issue.

However *sometimes* flexwikipad also transfers the umlaut
characters also into ?. When I reopen the wiki site in
flexwikipad, then I have been able to restore the correct
picture.

Fragment from the generated file
START---

Summary: Mőistete seletused
Terminid

*Töötaja* isik, kellel on kehtiv tööleping.
Töötajad jagunevad aruandluse seisukohalt

*Turvatöötaja _security guard_* tehakse vahet
juhtimiskeskuse, avalike teenuste ja mehitatud valve
turvatöötajatel.

---END

Date: 2004-09-13 08:00
Sender: candera
Logged In: YES
user_id=879696

Interesting! Since fwsync doesn't really process the
contents of the file (just streams it to disk), I'm
wondering what the issue is. I'll try to repro when I get a
chance.

If you open up FlexWikiPad and just type or paste the
characters in, do you have the same problem?

Discussion

  • Indrek Pehk

    Indrek Pehk - 2004-11-02

    Logged In: YES
    user_id=682657

    Did a text comparison of two local files

    A. wiki file generated and uploaded, umlaut characters being
    replaced by '?'

    B. the same file opened and saved by flexwikipad and
    behaving normally (umalut characters display normally)

    Comparison says that files are identical.

     
  • Craig Andera

    Craig Andera - 2004-11-02

    Logged In: YES
    user_id=879696

    OK. A few more questions, then.

    1) How are you comparing the files? Have you done a
    byte-by-byte comparison? It's possible that the absence of a
    byte order mark (BOM) at the beginning of one of the files
    could cause various programs to display the characters
    incorrectly, even though the binary data is the same.
    2) Is it actually running fwsync that seems to cause the
    problem? In other words, have you verified that the files
    are okay, and then aren't okay after running fwsync?
    3) Does fwsync tell you that it updated the files that
    you're dealing with? Or just that it committed them? It
    should only write to a file when it updates.

    I'm thinking this will be a pretty easy fix once we track it
    down.

     
  • Indrek Pehk

    Indrek Pehk - 2004-11-04

    Logged In: YES
    user_id=682657

    Hi.

    Well, firsthand compared the files in text mode with an text
    editor and assuming utf-8 encoding from both.

    Now, binary comparision gave quite different results. There
    seem to be a lot differencies, starting from the very first
    bytes. I attached the test files
    1. AndmeKontrollA --- the generated file, output from xsl
    processor, have used msxsl and saxon

    2. AndmeKontroll.wiki (the same file as above, opened and
    saved by flexwikipad

    fwsync updates both cases OK with a message

    Committed FalckPTA.AndmeKontroll: was LocallyModified, is
    now UpToDate
    but when I do not open and save the topic file with
    flexwikipad I get umlaut characters replaced by ? in flexwiki.
    When I open the generated topic wih flexwikipad and
    overwrite it, flexwiki displays the contents correctly.

     
  • Indrek Pehk

    Indrek Pehk - 2004-11-04

    Logged In: YES
    user_id=682657

    See the attached files at
    http://sourceforge.net/tracker/index.php?
    func=detail&aid=1027056&group_id=104906&atid=639667

    indrek

     
  • Indrek Pehk

    Indrek Pehk - 2004-11-18

    Logged In: YES
    user_id=682657

    Hi,

    Yes, BOM is the source of evil. XSL processors do not mark
    UTF-8 encoded files with the BOM. FlexWiki expects the BOM
    to be there. Binary comparison also shows that the correctly
    displaying file has BOM, the not-displaying one has none.

     
  • Craig Andera

    Craig Andera - 2004-11-18

    Logged In: YES
    user_id=879696

    OK, that's good to know. Now I just need to figure out what
    to do about it. I think I might have a fix - there's a way
    to tell .NET to assume UTF-8 if no BOM is found. Since UTF-8
    is identical to ASCII for ASCII characters, this is probably
    pretty safe. I need to think about it and/or test it,
    though. At least it seems like FWP is doing the "right" thing.

    One other thing you could try would be change your XSLT to
    emit UTF-16 or some other encoding. That might force a BOM
    to be emitted. There might be other ways to get a BOM in the
    document as well, like some sort of preprocessing step. How
    much control do you have over the transform?

    BTW, sorry I haven't been moving faster on this one. Things
    have been pretty crazy since my daughter was born. I do
    intend to get this fixed, though!

     
  • Indrek Pehk

    Indrek Pehk - 2004-11-18

    Logged In: YES
    user_id=682657

    Hi,

    Well I have full control over xsl. But can't figure out how to
    add the BOM. Seems that the xslt processors (saxon and
    MS,s own msxsl) both try to follow the utf-8 standard, which
    accepts no BOM in the case of utf-8 encoding.

    utf-16 seems to work OK. Bad luck, that did not try this
    firsthand.... So from my viewpoint, the case can be closed.

     
  • Craig Andera

    Craig Andera - 2004-11-18

    Logged In: YES
    user_id=879696

    OK. Glad you got it working.

    I'm going to leave the bug open for now to remind me to
    think about adding UTF-8 as the default.

    If you really want to use UTF-8, you could perhaps try
    postprocessing the file with a separate step to add the BOM
    after the XSLT is complete. Not sure if that would work for
    you.

    Thanks for your feedback!

     

Log in to post a comment.