I'm trying to get a kdiff3 preprocessor working on Windows. The preprocessor works fine when it filters an ASCII file, but something goes awry with the input stream when processing a UTF-16 file.
I am working in plain C, and I'm able to step from my preprocessor into the Microsoft C library code, up to the point that it calls ReadFile, which is an operating system routine. What I'm seeing is that after multiple calls to fgets (actually its wide character counterpart, _fgetws) a library internal 4096 byte buffer has been filled four times, and inexplicably a byte is missing at the very end of the fourth bufferful of data.
That is, byte 16383 of the file is missing (the first byte of the file being byte 0) and byte 16384 (which should be the first byte in the next bufferful) appears in its place. The last wide character of that buffer displays as a Chinese glyph. Further bufferfuls of data are shifted by one byte, and also display for the most part as Chinese glyphs. It is as if a single byte has been discarded from the input stream.
I am assuming that kdiff3 reads in the file and pipes it to the preprocessor. The problem manifests itself in the preprocessor immediately after a call to ReadFile (which, being part of the OS, is pretty much fail-proof) so it would appear that the problem is somewhere in kdiff3's input handling.
I downloaded the kdiff3 source code, and I see that kdiff3 calls "convertFileEncoding" to filter the input before passing it on to the preprocessor. I'm not sure why kdiff3 would need to change the encoding of the input for a preprocessor, or how it would know what encoding the preprocessor is expecting. In any case, "convertFileEncoding" only calls what appear to be QT4 routines, so QT4 would appear to be the culprit, unless the problem actually occurs outside of "convertFileEncoding".
To verify my assertions, I have run the preprocessor on the identical input file, piping the input to the program via the DOS shell. I have again broken into the debugger immediately after each call to ReadFile, and no corruption of the input stream occurs.
Can anyone suggest how to proceed?
Elle