Hey,

The nasty bug on strided GEMV got solved.

I'm available on wednesday for the code uniformization session. We should be on IRC at the same time, though, in case we face a situation we had not discussed. I have a couple of questions regarding a standardized way of naming the numeric type of a matrix/vector. Sometimes it's NumericT, sometimes it's T, sometimes it's TYPE... What about NumericType everywhere? Anyway, some similar questions could arise so it's probably better to be able to chat in real time while making the code style uniform.

We must also remember to sort out
https://github.com/viennacl/viennacl-dev/issues/71
https://github.com/viennacl/viennacl-dev/issues/77
https://github.com/viennacl/viennacl-dev/issues/66
https://github.com/viennacl/viennacl-dev/issues/2

Philippe


2014-08-17 19:36 GMT+02:00 Philippe Tillet <phil.tillet@gmail.com>:
The same template using macros can be used for any benchmark. It's pretty concise and maintainable!

Philippe


2014-08-17 13:50 GMT+02:00 Karl Rupp <rupp@iue.tuwien.ac.at>:

Hi,


>     * nmf only implements matrix<T>, but in principle matrix_base<T> should
    work (since no custom kernel is called, I believe)


    NMF uses a custom kernel and thus only works with OpenCL. A
    generalization to matrix_base should be straight-forward, yes. I
    should be able to do it for the release.


The kernel it uses is:

         template <typename StringType>
         void generate_nmf_el_wise_mul_div(StringType & source,
std::string const & numeric_string)
         {
           source.append("__kernel void el_wise_mul_div( \n");
           source.append("          __global ");
source.append(numeric_string); source.append(" * matrix1, \n");
           source.append("          __global const ");
source.append(numeric_string); source.append(" * matrix2, \n");
           source.append("          __global const ");
source.append(numeric_string); source.append(" * matrix3, \n");
           source.append("          unsigned int size) \n");
           source.append("{ \n");
           source.append("  for (unsigned int i = get_global_id(0); i <
size; i += get_global_size(0)) \n");
           source.append("  { \n");
           source.append("    "); source.append(numeric_string);
source.append(" val = matrix1[i] * matrix2[i]; \n");
           source.append("    "); source.append(numeric_string);
source.append(" divisor = matrix3[i]; \n");
           source.append("    matrix1[i] = (divisor > (");
source.append(numeric_string); source.append(")0.00001) ? (val /
divisor) : ("); source.append(numeric_string); source.append(")0; \n");
           source.append("  } \n");
           source.append("} \n");
         }

So, the layout of the matrix shouldn't matter, indeed. It would be
pretty easy to have this kernel generated by the generator, too, as this
can be represented by the expression tree :
matrix1 = select(matrix3 > 0.00001, element_div(element_prod(matrix1,
matrix2), matrix3), cast<T>(0)).
However, we're running out of time so I wouldn't port it. But we have to
keep in mind that this would be a trivial thing to do.

The same student who ported the FFT-code to multiple backends will take care of porting NMF to multiple backends. He's pretty quick already, so it should be done by the release.

However, I'd refrain from integrating this into the generator for now because it is totally non-critical in terms of overall performance. We can port that under perfect control within the OpenCL backend later when we have more confidence in the stability of the generator (no pun' intended).



        - We should definitely have a discussion on matrix padding,
        which is no
        longer required anywhere in ViennaCL, as far as I know. I am in
        favor of
        making size()==internal_size() by default. That's not the point
        of the
        e-mail, but we should have a discussion on what we should do
        with it!


    Getting rid of the padding would certainly remove the traps of using
    fast_copy() on a matrix. Other than that, I don't think it has a
    substantial influence on the code because internal_size() is still
    needed for dealing with ranges.

    There may be an influence on certain bandwidth-limited operations,
    though, as for example a matrix addition may lead to bank conflicts
    (or channel conflicts, whatever...) when accessing GPU RAM for
    certain matrix sizes. Before making a decision on the padding issue,
    we should run some benchmarks to see whether there is an impact.


Well, one thing I'm sure of is that we should give the possibility to
use no padding if needed (for memory constraints), or (probably even
better) to choose the padding size.

Apparently it is not an easy choice for us to pick the default because of the many things to consider. Thus, making this user-customizable is most likely the way to go, so that we only have to worry about choosing the 'best' default layout :-)


I completely agree that removing
padding will have a harmful influence for ldsize=some_weird_number.

which makes things so complicated... ;-)



However, we certainly don't need to pad both size1 and size2. Padding
size2() for row-major matrices, and size1() for column major matrices,
will not cause any performance regression.

Indeed.



    There are a couple of more things for the release to be completed.
    They are essentially all listed in the issue tracker and have the
    1.6.0 milestone assigned to it, except for the unification of coding
    style. When are you available for tackling that together? I'm
    available after Monday.


I'm available from today to Friday. I'll be unavailable for quite some
time afterwards for any significant work. I will still be available for
critical work such as fixing correctness issues in the generated code,
but overall I'll be busy designing my PhD course/research plans. What I
plan to do before leaving:
- Fix the GEMM performance regression of the fallback kernel
- Refurbish the benchmark code for dense operations
- Rewrite the matrix-vector tests

Not much more. This is my last week in France, so I want to spend some
time with my family. I've also been having a really hard time lately
when adding support for vector types, ranges and strides inside the
generated kernels, so I feel like taking a short break before my PhD
begins...

Sure, make sure you get to the US sufficiently relaxed, who knows when you'll have the next opportunity to relax again ;-) Let's schedule the coding style unification for Wednesday? We should be done within a few hours I guess.

Best regards,
Karli