Menu

#325 Data is copied when using pass by reference

SVN version
open
Tom
5
2009-08-26
2007-10-06
Tom
No

Data seems to be copied when using pass by reference with imported functions. I will attach a sample import function and m-file that demonstrate the behavior (at least in terms of runtime). The copying seems to take place because the input[].getReadWriteDataPointer() calls in ImportedFunctionDef::evaluateFunction() cause a detach() call in the QSharedDataPointer class containing the Array data. The attached patch uses const void ** instead of void ** for the values passed to avcall and uses input[].getDataPointer() to get a const void * when passing by reference. I tested it a bit, but it could break things horribly.

Discussion

  • Tom

    Tom - 2007-10-06

    Logged In: YES
    user_id=1619785
    Originator: YES

    File Added: fmtest.m

     
  • Tom

    Tom - 2007-10-06

    Logged In: YES
    user_id=1619785
    Originator: YES

    The reftest.c and fmtest.m contain code to call a function using pass-by-value and pass-by-reference. In each case, the call is timed and the effect on the local variable is noted. The data passed to the import function is large (~256MB) so that the copying times are substantial. The following is the output before and after the patch:

    Before patch

    Passing by value.
    Before call, data(12345) = 1.000000
    Call took 0.852000 seconds.
    After call, data(12345) = 1.000000

    Passing by reference.
    Before call, data(12345) = 1.000000
    Call took 0.858000 seconds.
    After call, data(12345) = 100.000000

    After patch

    Passing by value.
    Before call, data(12345) = 1.000000
    Call took 0.837000 seconds.
    After call, data(12345) = 1.000000

    Passing by reference.
    Before call, data(12345) = 1.000000
    Call took 0.000000 seconds.
    After call, data(12345) = 100.000000

    Finally, I think that it would be good to add Robot Chicken references and quotes to the source code. It's good for karma.
    File Added: reftest.c

     
  • Tom

    Tom - 2007-10-21

    Logged In: YES
    user_id=1619785
    Originator: YES

    The initial patch has a problem: when two variables have the same data pointer, but are unique variables in FreeMat (no copy-on-write has yet occurred), then there will be no copy-on-write done if they are both passed by reference. For example, the following

    a = zeros(100);
    b = a;
    importedfunc(a,b);

    would be the same data pointer for both a and b even though the user clearly expected b to be a copy of a. Thus, any changes to either a or b will change the values for both a and b. I am attaching a patch that corrects this problem by forcing a copy-on-write when more than one parameter shares a data pointer address. The patch fixes the above case, but is still problematic for the following case:

    a = zeros(100);
    importedfunc(a,a);

    Here, assuming that both the first and second parameters are passed by reference, a copy will be made for the second a. When the assignment is made back to the a in the FreeMat scope, only the changes made to the second a (the copy) will be retained. I'm not sure how to tell if the same variable is passed multiple times (it looks look I would need access to the expression tree, but evaluateFunction does not have that currently). This would be a strange use of imported functions, but it seems that it would not behave as expected. Note that the current version of FunctionDef.cpp (without this patch) exhibits the same behavior (it makes copies for both the first and second a, but ultimately only the changes made to the second a are visible).

    I will attach new fmtest.m and reftest.c files that show the behaviors before and after the patch (reftest sets data1(12345) to 100 and data2(12346) to 200). Here is the output:

    Before:

    Passing by value.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 1.748000 seconds.
    After call, data1(12345) = 1.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 1.000000

    Passing two unique variables by reference.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 1.369000 seconds.
    After call, data1(12345) = 100.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

    Passing two variables with shared data by reference.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 1.321000 seconds.
    After call, data1(12345) = 100.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

    Passing a single variable by reference as two parameters.
    Before call, data1(12345) = 1.000000, data1(12345) = 1.000000
    Call took 1.373000 seconds.
    After call, data1(12345) = 1.000000, data1(12346) = 200.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

    After:

    Passing by value.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 1.519000 seconds.
    After call, data1(12345) = 1.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 1.000000

    Passing two unique variables by reference.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 0.000000 seconds.
    After call, data1(12345) = 100.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

    Passing two variables with shared data by reference.
    Before call, data1(12345) = 1.000000, data2(12346) = 1.000000
    Call took 0.736000 seconds.
    After call, data1(12345) = 100.000000, data1(12346) = 1.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

    Passing a single variable by reference as two parameters.
    Before call, data1(12345) = 1.000000, data1(12345) = 1.000000
    Call took 0.760000 seconds.
    After call, data1(12345) = 1.000000, data1(12346) = 200.000000
    After call, data2(12345) = 1.000000, data2(12346) = 200.000000

     
  • Tom

    Tom - 2007-10-21

    Logged In: YES
    user_id=1619785
    Originator: YES

    File Added: passbyref.patch

     
  • Tom

    Tom - 2007-10-21

    FunctionDef.cpp patch

     
  • Tom

    Tom - 2007-10-21
     
  • Tom

    Tom - 2007-10-21

    Logged In: YES
    user_id=1619785
    Originator: YES

    File Added: fmtest.m

     
  • Tom

    Tom - 2007-10-21
     
  • Tom

    Tom - 2007-10-21

    Logged In: YES
    user_id=1619785
    Originator: YES

    File Added: reftest.c

     
  • Samit Basu

    Samit Basu - 2008-02-22

    Logged In: YES
    user_id=878533
    Originator: NO

    See? Now you can fix it directly in SVN... =)

    Samit

     
  • Samit Basu

    Samit Basu - 2008-02-22
    • assigned_to: nobody --> thomasbenson
     
  • Samit Basu

    Samit Basu - 2009-08-25

    Hi Thomas,

    I'd like to do something with this patch/bug... Suggestions?

    Samit

     
  • Samit Basu

    Samit Basu - 2009-08-25
    • status: open --> pending
     
  • Tom

    Tom - 2009-08-26

    I never came up with a robust enough solution for this. The final version that I had explored was to check the ref count on the QSharedDataPointer for the workspace variable and only pass-by-reference if the count was one. That would avoid the aliasing problems and still give good performance as long as there weren't several workspace variables backed by the same data. However, as it turned out, the reference count was almost always at least two or three in the imported function call, even if the data only had one workspace variable referencing it. The ref count was getting bumped in several places internally during method calls, etc. That was with the 3.x code base though and may not be the case with the 4.x code base, so I would have to check it again.

     
  • Tom

    Tom - 2009-08-26
    • status: pending --> open
     

Log in to post a comment.