From: Joachim E. <joa...@gm...> - 2011-02-16 19:53:54
|
Hi Angel, No good news: When looking at the output in hex (e.g. via od -t x cp1251.saved.as.utf8.txt ) you see that there is no useful information left anymore. 0000000 efbdbfef bfefbdbf bdbfefbd efbdbfef 0000020 bfefbdbf bdbfefbd efbdbfef bfefbdbf 0000040 bdbfefbd efbdbfef bfefbdbf bdbfefbd 0000060 efbdbfef bfefbdbf bdbfefbd efbdbfef 0000100 bfefbdbf bdbfefbd efbdbfef bfefbdbf 0000120 bdbfefbd efbdbfef 000abdbf So in your concrete situation I can't do much for you. But I must admit, that I was not aware of that problem. As I mentioned before I expected no irreversible conversion loss, but now I think that Qt internally converts to 16 bit although UTF8 allows 32 bit characters. So most random combinations will result in a "invalid" character. I will try to detect this and display a warning in KDiff3 for such situations. Thanks for telling! I really do hope you find some backup. Joachim > On 02/15/11 22:41, Joachim Eibl wrote: > > Hi Angel, > > > > I'm not quite sure that I understand the problem correctly. > > > > When reading as UTF8 and writing the same data as UTF8 then I would not > > expect many changes, because except for a few places everything should > > stay the same, regardless of what codec is really used as input. > > I forgot to mention that the original file contains cyrillic characters. > > [...] > > > Could you repeat this with a test file and send the original and modified > > versions? > > See attached archive. It contains 2 files: the first one lists the > lowercase letters (30 in total) of the Bulgarian alphabet plus a LF > character and the second one was generated by merging the first one with > a copy of itself (both opened as UTF8 files) and saving the output as > UTF8. A binary editor shows that the output file contains the same > character duplicated 30 times followed by a LF character. > > > Regards, > Angel Tsankov |