Menu

#1266 Pyro is unnecessary slow

v3.0
closed
pmarcu
pyro (39)
2012-09-15
2008-07-08
zett42
No

Pyro is unnecessary slow when comparing files for patch creation.
We have an installer with ~9000 files (~600 MiB), where comparing the old and the updated version takes roughly an hour.

This can be speed up considerably by optimizing the BinderFileManager::CompareFiles() method.
I tried this by writing a binder extension by which I could reduce the patch creation time to just a few minutes!

This is the relevant part of the code. It should be as accurate as the original WiX code, but way faster:

--- SNIP ---

public override bool CompareFiles( string targetFile, string updatedFile )
{
// Speed-up by checking file size before actually opening files.

FileInfo fiTarget = new FileInfo( targetFile );
FileInfo fiUpdated = new FileInfo( updatedFile );

if( fiTarget.Length != fiUpdated.Length )
    return false;

/
// Trade comparison speed over accuracy: assume the files are identical, if size and last write time are same.
if( fiTarget.LastWriteTime == fiUpdated.LastWriteTime)
return true;
/
// Check if the files are binary different.
using( FileStream targetStream = File.OpenRead( targetFile ) )
{
using( FileStream updatedStream = File.OpenRead( updatedFile ) )
{
// Speed-up by using buffered read instead of Stream.ReadByte()
byte[] targetBuf = new byte[ 16 * 1024 ];
byte[] updatedBuf = new byte[ 16 * 1024 ];
int targetReadLen;
do
{
targetReadLen = targetStream.Read( targetBuf, 0, targetBuf.Length );
int updatedReadLen = updatedStream.Read( updatedBuf, 0, updatedBuf.Length );
if( targetReadLen != updatedReadLen )
return false;
for( int i = 0; i < targetReadLen; ++i )
if( targetBuf[ i ] != updatedBuf[ i ] )
return false;
}
while( targetReadLen > 0 );
}
}

return true;

}

--- SNAP ---

I even suggest that pyro gets a command line parameter to optionally speed-up the comparison process further by using the out-commented code to trade comparison speed over accuracy.
This trade-off has already been discussed on the WiX mailing list some time ago, but apparently it hasn't found its way into the WiX code.

Discussion

  • zett42

    zett42 - 2008-07-08

    BinderFileManager::CompareFiles optimization

     
  • zett42

    zett42 - 2008-07-08

    Logged In: YES
    user_id=1779395
    Originator: YES

    Argh, SourceForge killed the tabs.
    I'm attaching the source file for your convenience :)
    File Added: MxWixExtension.cs

     
  • pmarcu

    pmarcu - 2008-07-18

    Logged In: YES
    user_id=1612676
    Originator: NO

    In the current code, we already do the filesize check and FileStream does a buffered read behind the scenes for you already using a buffer of 4096 by default. I'm not comfortable with sacraficing accuracy for speed and with the ability to provide your own extension, you are able to implement the code that way. I think this type of thing is better handled by creating an extension than to have native support through a command line switch.

     
  • zett42

    zett42 - 2008-07-21

    Logged In: YES
    user_id=1779395
    Originator: YES

    Have you tried out the code on a reasonable large set of files?
    Even without sacrificing accuracy for speed the code actually runs significantly faster and takes less CPU time than the current code.
    The main performance boost seems to come from NOT using Stream.ReadByte() but instead doing a buffered read explicitly by using Stream.Read().
    I don't know excactly what's wrong with Stream.ReadByte() but if you enable "Show kernel times" in Task Manager you will see that during patch building, pyro spends much of the time in the kernel which may hint at some overhead involved with Stream.ReadByte().
    This doesn't happen with Stream.Read().

    Regarding file size comparison - I know the current code already does this, but only after opening the files. You can reduce some overhead by just reading the file size with FileInfo, which doesn't need to open the file. Though it should not be of that significiance compared to the Stream.ReadByte() optimization.

     
  • pmarcu

    pmarcu - 2008-08-22

    Logged In: YES
    user_id=1612676
    Originator: NO

    How did you arrive at "16 * 1024"? The CLR uses "4 * 1024"

    I submitted a change similar to what you suggested but dont have a large dataset to test this on. Please let me know.

     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 15 days (the time period specified by
    the administrator of this Tracker).

     
MongoDB Logo MongoDB