#2864 switch to quick content for binarys

Not Scheduled
open
nobody
None
5
2013-03-02
2009-06-07
Matthias
No

binarys compare in quick content is faster.
We have also to discuss if it make sence to have all compare options desabled for binarys, just in this one run.

Discussion

  • Matthias

    Matthias - 2009-06-07

    switch to quick content for binarys

     
  • Kimmo Varis

    Kimmo Varis - 2009-06-07
    • milestone: 438015 --> Trunk
     
  • Kimmo Varis

    Kimmo Varis - 2009-06-07

    diff_2_files() in Src\diffutils\src\analyze.c already switches to byte per byte compare for binary files. And that is faster than we can do since it uses single buffer.

    We could refactor this but it is hard to do without loosing the speed and as such it is hard to get any speed advantage from it.

     
  • Matthias

    Matthias - 2009-06-07

    No normally it's slower.
    a)read_files (IO.c) reads files to buffer and than to an array.
    b)detection is done second time
    after read_files yes, WM switches also to byte level.
    two advance are there checking the lenght and timestamp. That can make it faster.
    But we can add also to bytecompare without a problem.

    m_pByteCompare->CompareFiles()
    just read the buffers and compare that, no more detection etc.

     
  • Kimmo Varis

    Kimmo Varis - 2009-06-07

    Where is the second time if we don't do your check? Reading from memory is lots faster than reading from disk. Especially on folder compare which has another thread accessing disk too.

     
  • Matthias

    Matthias - 2009-06-08

    in IO.c
    isbinary = binary_file_p (current->buffer, current->buffered_chars);
    isbinary &= !isunicode(current->buffer, current->buffered_chars);

    where unicode is not working! Seams the leading BOM are missing. Can be the file is allready open, and we are not at readingpos 0, so its detecting bytes after bom.
    I have to check where it's done.

    Speed is not a problem updo about 100 MB there original is faster, graeter bytecompare, as windows musn't swap to much memory, espacially you run more instances at the time.

     
  • Matthias

    Matthias - 2009-06-08

    Now its clear why isunicode() is not working correctly.
    Transform2FilesToUTF8()... creats UTF8 without BOM
    result:
    CheckForInvalidUtf8() is missing
    as with and without my patch UTF8 without BOM and extented char are detect as binary files.

     
  • Kimmo Varis

    Kimmo Varis - 2009-06-09

    I don't understand
    > Transform2FilesToUTF8()... creats UTF8 without BOM
    That is fine, as the resulting data must be sent to diffutils which doesn't know about BOM bytes.
    > CheckForInvalidUtf8() is missing
    How adding it would help? It would only tell the file is probably 8-bit ASCII file.

    What would help with binary file compare is adding new binary file compare engine. It would just do byte-per-byte compare of memory blocks in tight loop. Basically what quick compare engine does but without EOL and whitespace ignore options handling.

    Doing that would be a lot better solution than trying to tweak current compare engines. Then we could tweak that engine to compare binary files as fast it can and no need to care about text files at all.

    Of course we must then have routine to detect binary files first. It can be simple check for zero bytes in first/last 4KB of the file data. But we do this checking only for files larger than 100 KB (or some proper size) so we don't lose the speed advantage for the additional binary file check.

     
  • Matthias

    Matthias - 2009-06-09

    >How adding it would help? It would only tell the file is probably 8-bit
    >ASCII file.
    UTF8 with exteted char will be dectec as binary!

     
  • Christian List

    Christian List - 2013-03-02
    • milestone: Trunk --> Not Scheduled
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks