Menu

#2 performance slightly low

need-info
performance (1)
1.0.0
2014-09-24
2014-09-22
John Hunt
No

As reported by "user1129812" in the first comment to
http://stackoverflow.com/a/25959122/3565696,
Using "ndpar_arrayfun" reduces the running time to about 65% compared with "cellfun". It seems slightly slower than "parcellfun".

Discussion

  • John Hunt

    John Hunt - 2014-09-22

    This is strange as the speedups are usually higher than 3 on a quadcore machine (under linux). What was the operating system ?

    As compared to parcellfun (reduction to 60%, see 4th comment to http://stackoverflow.com/a/25945953/3565696)
    Strange, since it was intended to be slightly faster actually. Did the 60% reduction include everything including the mat2cell and cell2mat lines ?

     
  • John Hunt

    John Hunt - 2014-09-22
    • status: open --> need-info
     
  • John Hunt

    John Hunt - 2014-09-22

    Reproduced here, even worse for my sample function. With such a fast F, the overhead for parallelization is too high. From another comment of "user1129812" it seems that fast functions were used here also. To be confirmed.

     
  • Lawrence Tsang

    Lawrence Tsang - 2014-09-23

    I am "user1129812".

    My operating system is "Ubuntu 12.04 amd64". I am using Octave 3.8.1 on an "intel i5-2500" quadcore desktop.

    The running time reduction to 60%/65% of that of "cellfun" does include everything including the mat2cell and cell2mat lines (for "cellfun" and "parcellfun" only), and several lines of matrix multiplication/addition (I think not time critical).

    In fact, F = @(a,B) sum(bsxfun(@times, a, B),2)'; and B has several hundreds rows and several ten thousands columns.

    But I am not sure whether "F" is too fast for effective parallelization.

     

    Last edit: Lawrence Tsang 2014-09-23
  • John Hunt

    John Hunt - 2014-09-23

    With
    k = 300; #rows A
    m = 20000; # columns A and B
    n = 300; # rows B

    F(A(1,:), B); takes about 40 ms, and the speedup is about 3.3 (parcellfun calculation) or 2.0 (ndpar), compared to a serial (for) version. There is to much bookkeeping in ndpar for such fast functions, I'll investigate that.

    But your results are different. Increasing m and n to much increase memory usage (as it runs more F executions at once) and can ruin the parallelization advantage. Could it be the case ?

     
  • Lawrence Tsang

    Lawrence Tsang - 2014-09-24

    I don't know how do you define "speedup". Here I write down my cases for your comparison.

    k = 10;
    m = 60000;
    n = 785;
    F = @(a,B) sum(bsxfun(@times, a, B),2)';
    f = @(a) F(a,B);
    %----------
    switch (method_choice)
    case 1
    pkg load ndpar;
    nproc = 3;
    result = ndpar_arrayfun(nproc, F, A, B, "IdxDimensions", [1, 0], "CatDimensions", [1], "VerboseLevel", 0);
    case 2
    A_cell = mat2cell(A, ones(1,size(A,1)));
    pkg load parallel;
    nproc = 3;
    result_cell = parcellfun(nproc, f, A, "UniformOutput", false, "VerboseLevel", 0);
    result = cell2mat(result_cell);
    case 3
    A_cell = mat2cell(A, ones(1,size(A,1)));
    result_cell = cellfun(f, A, "UniformOutput", false);
    result = cell2mat(result_cell);
    endswitch
    %----------

    The "switch" statement is executed for 100 times. The running times are :

    1. When method_choice = 1, running time = about 130 seconds,
    2. when method_choice = 2, running time = about 120 seconds,
    3. when method_choice = 3, running time = about 200 seconds.

    Hope it helps.

     
  • John Hunt

    John Hunt - 2014-09-24

    So your speedups (with respect to the serial version) are 200/130~1.5 and 200/120~1.7 for ndpar and parcellfun. These speedups are too low. Memory usage might be the culprit. To ascertain that, you can use "free -h" on the command line. Comparing the results when parcellfun or cellfun is executing would be interesting.

     

Log in to post a comment.

MongoDB Logo MongoDB