From: Angel T. <fn...@fm...> - 2011-02-15 21:19:01
|
On 02/15/11 22:41, Joachim Eibl wrote: > Hi Angel, > > I'm not quite sure that I understand the problem correctly. > > When reading as UTF8 and writing the same data as UTF8 then I would not expect > many changes, because except for a few places everything should stay the same, > regardless of what codec is really used as input. I forgot to mention that the original file contains cyrillic characters. [...] > Could you repeat this with a test file and send the original and modified > versions? See attached archive. It contains 2 files: the first one lists the lowercase letters (30 in total) of the Bulgarian alphabet plus a LF character and the second one was generated by merging the first one with a copy of itself (both opened as UTF8 files) and saving the output as UTF8. A binary editor shows that the output file contains the same character duplicated 30 times followed by a LF character. Regards, Angel Tsankov |