Re: [Kdiff3-user] Symbols Are Stipped Out During Merge

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Joachim Eibl wrote:
> Am Montag, 17. Oktober 2005 19:54 schrieb Matus Lipka:
> 
>>Joachim,
>>
>>Are you using MBCS? From my humble knowledge of multibyte character
>>encodings, I was under the impression that UNICODE is always 2 bytes per
>>character. So once a file is detected as being in UNICODE format, *all*
>>characters are interpreted as 2 bytes. If not, then all characters are 1
>>byte, and the ° and likewise characters are never interpreted as mutibyte.
>>
>>These kind of encodings shouldn't be mixed together in a single file,
>>unless something weird like MBCS is used (which could be a non-default
>>option in KDiff).
>>
>>Does this make sense?
>>
>>Cheers,
>>
>>Matus
> 
> 
> Hi Matus,
> 
> The term "Unicode" covers both. You might want to read 
> http://en.wikipedia.org/wiki/Unicode
> 
> Since the name "Unicode" doesn't stand for any specific encoding the names 
> UTF-8 or UCS-2 are used to be more precise.
> 
> In any case UTF-8 (which is an 8-bit, variable-width encoding) is becoming 
> very popular and is often the default (especially on Linux-machines).
> 
> But KDiff3 should try to honor the default setting for every individual 
> machine.
> 
> Cheers,
> Joachim
> 

FYI,

Microsoft somewhat mixed up people with their nomenclature. Unicode is 
NOT an encoding, as Joachim previously said. I wouldn't blame Microsoft 
has the encoding standards have been relatively slow to be standardized.

Windows uses UTF-16 internally. It used to use UCS-2 in NT 3.x but it is 
now deprecated.
http://www.faqs.org/rfcs/rfc2279.html
http://www.faqs.org/rfcs/rfc2781.html

Windows is the only OS that use UTF-16 (known to me), every other uses 
UTF-8 because it is simpler for backward compatibility.

MBCS is a generic term because in the past, there has been other 
encoding than UTF-X that shouldn't be used anymore.

UTF-8/UTF-16 can be detected by looking for the BOM (BYTE ORDER MARK). 
But UTF-8 can be the default encoding nevertheless.

By the way, if a invalid character is read, this is probably normal that 
is it removed by the UTF-8 decoder. But it would be better than kdiff3 
detect this and warns the user. Just a thought.

M-A

Re: [Kdiff3-user] Symbols Are Stipped Out During Merge

A graphical text difference analyzer

Re: [Kdiff3-user] Symbols Are Stipped Out During Merge