Re: [Kdiff3-user] Decoding a cp1251 file saved as utf8?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Angel,

No good news: When looking at the output in hex 
(e.g. via  od -t x cp1251.saved.as.utf8.txt )
you see that there is no useful information left anymore.

0000000 efbdbfef bfefbdbf bdbfefbd efbdbfef
0000020 bfefbdbf bdbfefbd efbdbfef bfefbdbf
0000040 bdbfefbd efbdbfef bfefbdbf bdbfefbd
0000060 efbdbfef bfefbdbf bdbfefbd efbdbfef
0000100 bfefbdbf bdbfefbd efbdbfef bfefbdbf
0000120 bdbfefbd efbdbfef 000abdbf

So in your concrete situation I can't do much for you.

But I must admit, that I was not aware of that problem. As I mentioned before 
I expected no irreversible conversion loss, but now I think that Qt internally 
converts to 16 bit  although UTF8 allows 32 bit characters. So most random 
combinations will result in a "invalid" character.

I will try to detect this and display a warning in KDiff3 for such situations.

Thanks for telling! I really do hope you find some backup.
Joachim

> On 02/15/11 22:41, Joachim Eibl wrote:
> > Hi Angel,
> > 
> > I'm not quite sure that I understand the problem correctly.
> > 
> > When reading as UTF8 and writing the same data as UTF8 then I would not
> > expect many changes, because except for a few places everything should
> > stay the same, regardless of what codec is really used as input.
> 
> I forgot to mention that the original file contains cyrillic characters.
> 
> [...]
> 
> > Could you repeat this with a test file and send the original and modified
> > versions?
> 
> See attached archive.  It contains 2 files: the first one lists the
> lowercase letters (30 in total) of the Bulgarian alphabet plus a LF
> character and the second one was generated by merging the first one with
> a copy of itself (both opened as UTF8 files) and saving the output as
> UTF8.  A binary editor shows that the output file contains the same
> character duplicated 30 times followed by a LF character.
> 
> 
> Regards,
> Angel Tsankov

Re: [Kdiff3-user] Decoding a cp1251 file saved as utf8?

A graphical text difference analyzer

Re: [Kdiff3-user] Decoding a cp1251 file saved as utf8?