Menu

#23 Mat_VarReadNext() is slow

master
closed-invalid
tbeu
None
5
2017-12-19
2014-01-09
No

mode(2)

In Scilab, the included test case is much slower in V5.4.1 than in V5.5-beta.

Digging into the functions shows that 99% of the time is spent in Mat_VarReadNext().

The Scilab gateway has not changed much between the two versions, so I think Mat_VarReadNext has been modified and slowed between matio V1.3.4 and V1.5.1 (at least for this test).

// Test case:
// Generate string data
Date=[];
Time=[];
for x=1:1000
Date=[Date;"123"];
Time=[Time;"456"];
end
// save as matlab file
savematfile("test.mat","Date","Time","-v6")
// load the matlab file and measure execution time
tic()
loadmatfile("test.mat")
toc()

Siclab related bug: http://bugzilla.scilab.org/show_bug.cgi?id=13143

Thank you for your time,
Paul

1 Attachments

Discussion

  • cch

    cch - 2014-01-09

    I can't recall off hand what the old code did and can't look at it right now. The issue here is that character data is stored as uint16. When reading back it converts to char. The current implementation is reading a single character at a time which is probably where the performance issue is. A better implementation could read many uint16 at once into a buffer and then loop the conversion.

    Actually, this looks like an issue that would occur with many data types and not just char. Double is the only one that does it well (if not converting types).

    I will see if I can look at this soon, but can't promise anything.

     
  • cch

    cch - 2014-01-11
     
  • cch

    cch - 2014-01-11

    Can you test the attached patch to see how that improves your performance? If the performance is still not good, please post some profile information that I can look at.

     
  • Paul Bignier

    Paul Bignier - 2014-01-13

    Could you please provide me with a libmatio.so / libmatio.so.2 / libmatio.so.0.1 file? I use a compiled version of matio for Scilab

     
  • cch

    cch - 2014-01-13

    Unfortunately not. I do not have access to a Linux system. You should easily be able to clone the repository apply the patch and compile.

     
  • tbeu

    tbeu - 2016-01-10
    • Group: 1.5.0 --> master
     
  • tbeu

    tbeu - 2016-01-11

    I created the MAT files in MATLAB and attached them for your convenience.

    Now, when I run

    test_mat readvar test_bug23_uncompressed.mat Time

    or

    test_mat readvar test_bug23_compressed.mat Time

    on current master (and w/o the read_char.patch applied) it is as quick as it should be. Therefore I am afraid I cannot reproduce this issue.

     

    Last edit: tbeu 2016-01-11
  • tbeu

    tbeu - 2016-01-11
    • status: open --> pending
     
  • Chad Smith

    Chad Smith - 2016-12-21

    I am encountering this issue too with the current head of master, 935e4b67df, when writing Matlab file v5 with zlib compression turned on.

    I ran gprof and valgrind to determine where things got slower between the two, and it's in the call to inflate. It takes the majority of the program's execution time (see attachments), when it used to take none.

    I do not see this issue in matio v1.5.2, and in fact v1.5.2 never calls inflate. I put a breakpoint on zlib's inflate function in gdb and it only gets hit on the latest version of matio, not v1.5.2.

    This is the stack when inflate is called:

    addr    file    fullname    func    level   line
    zlib_compression/inflate.c  zlib_compression/inflate.c  inflate 0   608
    matio/src/inflate.c matio/src/inflate.c InflateVarTag   1   258
    matio/src/mat5.c    matio/src/mat5.c    Mat_VarReadNextInfo5    2   6771
    matio/src/mat.c matio/src/mat.c Mat_VarReadNextInfo 3   1937
    matio/src/mat.c matio/src/mat.c Mat_VarReadInfo 4   1993
    matio/src/mat.c matio/src/mat.c Mat_VarWrite    5   2205
    
     

    Last edit: Chad Smith 2016-12-21
  • tbeu

    tbeu - 2016-12-21

    Thank you for your analyses. Can you please provide the MAT-file and the call (like test_mat readvar test_bug23_uncompressed.mat Time) to read the file and do the profiling.

     
  • Chad Smith

    Chad Smith - 2016-12-21

    I am trying to create a mat file, not read one.

    I found the source of the slowness. It was introduced in issue #35 in github, commit 1e00807968cc. I included a (very simple) patch that made it run at the same speed as before.

    For each write, the file is read to check if the variable name already exists in the file. I noticed my application gets slower each time it writes a new struct, which makes sense because it must load the entire file as it grows.

    Perhaps we could add a new option to the Mat_VarWrite API, or create a new function that does not check for a duplicate name?

     

    Last edit: Chad Smith 2017-01-03
  • tbeu

    tbeu - 2017-01-02

    Thanks for figuring it out. Since, it seems, you call Mat_VarWrite more than once, I could internally (in private mat_t structure) hold and sync a list of all variable names of the MAT file. This would make the call to Mat_VarReadInfo just to check if the variable already exists superfluous. What do you think?

     
  • Chad Smith

    Chad Smith - 2017-01-03

    I guess the assumption is that when matio reads the list of variable names for the first time, it will be the only process adding/deleting variables to/from that file, so its cached list of variable names is always accurate. It seems like a valid assumption, so I think that's a great idea.

     
  • tbeu

    tbeu - 2017-01-24

    The poor performance for repeated calls to Mat_VarWrite is fixed by 7f55a3d.

     
  • tbeu

    tbeu - 2017-12-19
    • status: pending --> closed-invalid
    • assigned_to: tbeu
     
  • tbeu

    tbeu - 2017-12-19

    As reported by http://bugzilla.scilab.org/show_bug.cgi?id=13143#c5 the bug can no longer be reproduced in Scilab >= 5.5.1.

     

Log in to post a comment.

MongoDB Logo MongoDB