binarys compare in quick content is faster.
We have also to discuss if it make sence to have all compare options desabled for binarys, just in this one run.
diff_2_files() in Src\diffutils\src\analyze.c already switches to byte per byte compare for binary files. And that is faster than we can do since it uses single buffer.
We could refactor this but it is hard to do without loosing the speed and as such it is hard to get any speed advantage from it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No normally it's slower.
a)read_files (IO.c) reads files to buffer and than to an array.
b)detection is done second time
after read_files yes, WM switches also to byte level.
two advance are there checking the lenght and timestamp. That can make it faster.
But we can add also to bytecompare without a problem.
m_pByteCompare->CompareFiles()
just read the buffers and compare that, no more detection etc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Where is the second time if we don't do your check? Reading from memory is lots faster than reading from disk. Especially on folder compare which has another thread accessing disk too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
where unicode is not working! Seams the leading BOM are missing. Can be the file is allready open, and we are not at readingpos 0, so its detecting bytes after bom.
I have to check where it's done.
Speed is not a problem updo about 100 MB there original is faster, graeter bytecompare, as windows musn't swap to much memory, espacially you run more instances at the time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now its clear why isunicode() is not working correctly.
Transform2FilesToUTF8()... creats UTF8 without BOM
result:
CheckForInvalidUtf8() is missing
as with and without my patch UTF8 without BOM and extented char are detect as binary files.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't understand
> Transform2FilesToUTF8()... creats UTF8 without BOM
That is fine, as the resulting data must be sent to diffutils which doesn't know about BOM bytes.
> CheckForInvalidUtf8() is missing
How adding it would help? It would only tell the file is probably 8-bit ASCII file.
What would help with binary file compare is adding new binary file compare engine. It would just do byte-per-byte compare of memory blocks in tight loop. Basically what quick compare engine does but without EOL and whitespace ignore options handling.
Doing that would be a lot better solution than trying to tweak current compare engines. Then we could tweak that engine to compare binary files as fast it can and no need to care about text files at all.
Of course we must then have routine to detect binary files first. It can be simple check for zero bytes in first/last 4KB of the file data. But we do this checking only for files larger than 100 KB (or some proper size) so we don't lose the speed advantage for the additional binary file check.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
switch to quick content for binarys
diff_2_files() in Src\diffutils\src\analyze.c already switches to byte per byte compare for binary files. And that is faster than we can do since it uses single buffer.
We could refactor this but it is hard to do without loosing the speed and as such it is hard to get any speed advantage from it.
No normally it's slower.
a)read_files (IO.c) reads files to buffer and than to an array.
b)detection is done second time
after read_files yes, WM switches also to byte level.
two advance are there checking the lenght and timestamp. That can make it faster.
But we can add also to bytecompare without a problem.
m_pByteCompare->CompareFiles()
just read the buffers and compare that, no more detection etc.
Where is the second time if we don't do your check? Reading from memory is lots faster than reading from disk. Especially on folder compare which has another thread accessing disk too.
in IO.c
isbinary = binary_file_p (current->buffer, current->buffered_chars);
isbinary &= !isunicode(current->buffer, current->buffered_chars);
where unicode is not working! Seams the leading BOM are missing. Can be the file is allready open, and we are not at readingpos 0, so its detecting bytes after bom.
I have to check where it's done.
Speed is not a problem updo about 100 MB there original is faster, graeter bytecompare, as windows musn't swap to much memory, espacially you run more instances at the time.
Now its clear why isunicode() is not working correctly.
Transform2FilesToUTF8()... creats UTF8 without BOM
result:
CheckForInvalidUtf8() is missing
as with and without my patch UTF8 without BOM and extented char are detect as binary files.
I don't understand
> Transform2FilesToUTF8()... creats UTF8 without BOM
That is fine, as the resulting data must be sent to diffutils which doesn't know about BOM bytes.
> CheckForInvalidUtf8() is missing
How adding it would help? It would only tell the file is probably 8-bit ASCII file.
What would help with binary file compare is adding new binary file compare engine. It would just do byte-per-byte compare of memory blocks in tight loop. Basically what quick compare engine does but without EOL and whitespace ignore options handling.
Doing that would be a lot better solution than trying to tweak current compare engines. Then we could tweak that engine to compare binary files as fast it can and no need to care about text files at all.
Of course we must then have routine to detect binary files first. It can be simple check for zero bytes in first/last 4KB of the file data. But we do this checking only for files larger than 100 KB (or some proper size) so we don't lose the speed advantage for the additional binary file check.
I started new thread in developers forum about this:
Binary file compare engine?
https://apps.sourceforge.net/phpbb/winmerge/viewtopic.php?f=6&t=115
>How adding it would help? It would only tell the file is probably 8-bit
>ASCII file.
UTF8 with exteted char will be dectec as binary!