Menu

Data Performance Tester (for data.zip)

Andrei
2019-03-23
2019-03-24
  • Andrei

    Andrei - 2019-03-23

    Ever since TR 0.6.4, the game data has been distributed in a zero-compression .ZIP archive because I "felt" it improved loading performance. This was not an original idea of mine: I have seen other games using this technique before: concatenating all files into a big file for a performance benefit.

    That said, I never liked that this idea wasn't properly verified. So today I wrote a utility to actually do that, DataPerfTest.

    Attached to this post is the source code (C++) and its build scripts. You need the PhysFS v.3+ dev library installed. (If you can already build TR 0.6.6.1 then all should be OK.)

    Build and usage example for Linux:

    $ ./build-on-linux.sh
    $ ./dataperftest-debug data.zip
    $ time ./dataperftest data.zip
    $ time ./dataperftest data-extracted/
    

    After a successful build, two executables will be available: the regular version whose execution must be timed and the debug version which prints a lot of data for diagnostics.

    What DataPerfTest does is simple:

    1. Open the directory or the archive provided by the user.
    2. Recursively read all files into memory, one-by-one.
    3. For each file read, take its first and last byte and XOR them into a total sum.
    4. Print the total sum (and optionally the debug info).

    No files are modified in step 3, or in any other step. The XOR operation is meaningless and is only there to prove that work was done. Originally I considered doing CRC32 but decided against it, thinking it would drown out the time differences.

    Having run this test on Windows 7 64-bit (NTFS), I can confirm that indeed putting the game files into a zero-compression (store) .ZIP improves file access performance over bare disk... on Windows at least. Unsurprisingly, putting the same game files into a maximum-compression (9) .ZIP degrades performance.

    Now I'd like to see what is the situation on Linux (or other systems).

    For test fairness, take the data.zip file from a TR release and extract it. Both the dir and the archive must contain the exact same files for the test to be fair.

    Finally, I'm interested if anyone has better ideas about what the test should do instead of the steps 1-4 above, to better test read performance. Or if there are objections to this method, etc.

    $ ls
    build-on-linux.sh  data.zip          dataperftest.exe
    build-on-msys2.sh  data-0661         dataperftest-debug.exe
    data.md5           dataperftest.cpp  legalcode.txt
    
    $ time ./dataperftest.exe data.zip
    Digest: 4A
    
    real    0m0.351s
    user    0m0.015s
    sys     0m0.000s
    
    $ time ./dataperftest.exe data-0661/
    Digest: 4A
    
    real    0m0.624s
    user    0m0.000s
    sys     0m0.015s
    
     
  • Onsemeliot

    Onsemeliot - 2019-03-24

    I don't know if the feedback of the first command is a seroius problem but I get these results. So on my system reading the zipped file seems to be faster too:

    $ ./build-on-linux.sh
    dataperftest.cpp: In function ‘unsigned char {anonymous}::digest_all()’:
    dataperftest.cpp:113:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                     if (PHYSFS_readBytes(file, buffer.data(), buffer.size())
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                         == buffer.size())
                         ^~~~~~~~~~~~~~~~
    dataperftest.cpp: In function ‘unsigned char {anonymous}::digest_all()’:
    dataperftest.cpp:113:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                     if (PHYSFS_readBytes(file, buffer.data(), buffer.size())
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                         == buffer.size())
                         ^~~~~~~~~~~~~~~~
    
    $ ./dataperftest-debug data.zip
    /events
    ...
    /textures/CodriverSigns/white/SquareRight.png
    Digest: 4A
    Number of files: 1891
    Number of dirs: 204
    Total space: 122838084 bytes (117 MiB)
    
    $ time ./dataperftest data.zip
    Digest: 4A
    
    real    0m0.146s
    user    0m0.048s
    sys 0m0.092s
    
    $ time ./dataperftest data-extracted/
    Digest: 4A
    
    real    0m0.855s
    user    0m0.056s
    sys 0m0.136s
    
     
  • Andrei

    Andrei - 2019-03-24

    I don't know if the feedback of the first command is a seroius problem

    No, it's not a serious problem. I'm a bit surprised g++ 7.4.0 didn't report it to me.

    So on my system reading the zipped file seems to be faster too:

    Looking good. Could you run the tests three times each please (3 data.zip then 3 uncompressed)? The results shouldn't change a lot but I'd like to see.

    Also, out of curiosity, what filesystem are you using? Ext4?

     
  • Onsemeliot

    Onsemeliot - 2019-03-24
    $ time ./dataperftest data.zip
    Digest: 4A
    
    real    0m0.143s
    user    0m0.044s
    sys 0m0.096s
    
    real    0m0.144s
    user    0m0.036s
    sys 0m0.108s
    
    $ time ./dataperftest data-extracted/
    Digest: 4A
    
    real    0m0.157s
    user    0m0.032s
    sys 0m0.116s
    
    real    0m0.153s
    user    0m0.052s
    sys 0m0.092s
    

    The new results seem to be less different.

    And yes, I do use ext4.

     

    Last edit: Onsemeliot 2019-03-24

Log in to post a comment.