Hi Karl Rupp,
I am using ViennaCL as part of an CFD research code. I am hoping to keep eveything on the GPU....
3d vector cross product(V1 X V2) and hybrid product(V3.(V1 X V2)) are processed by CPU now, I expect such things to be processed on the GPU.
Could you please add these methods to the vector class?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi ngagewj,
the problem with such tiny vectors is that in almost all cases you do not want to process them on the GPU as-is (i.e. one after another). The issue is the time it takes to launch a compute kernel, which is in the order of a few microseconds for GPUs. To really get good performance, you need to collect as many vectors as possible and compute these products concurrently within the same compute kernel. As this is typically a very problem-specific operation, it is virtually impossible for us to provide all possible operations.
If you have suggestions on how to provide this and related functionality through library routines, please let us know.
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
okay, so the operation is an element-wise cross product of an array of 3d-vectors. Currently we are limited to have vector<T> and matrix<T> with T being a primitive scalar type (float, double, int, etc.), so all I can advise for the time being is to use a custom kernel for achieving this. Examples how to do that can be found in examples/tutorial/custom-kernels.cpp
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Karl Rupp,
I am using ViennaCL as part of an CFD research code. I am hoping to keep eveything on the GPU....
3d vector cross product(V1 X V2) and hybrid product(V3.(V1 X V2)) are processed by CPU now, I expect such things to be processed on the GPU.
Could you please add these methods to the vector class?
Hi ngagewj,
the problem with such tiny vectors is that in almost all cases you do not want to process them on the GPU as-is (i.e. one after another). The issue is the time it takes to launch a compute kernel, which is in the order of a few microseconds for GPUs. To really get good performance, you need to collect as many vectors as possible and compute these products concurrently within the same compute kernel. As this is typically a very problem-specific operation, it is virtually impossible for us to provide all possible operations.
If you have suggestions on how to provide this and related functionality through library routines, please let us know.
Best regards,
Karli
thanks.
class Vec3D {
public:
double comp[3];
...
Vec3D cross(const Vec3D &right);
}
Vec3D Vec3D::cross(const Vec3D &right) {
Vec3D temp;
temp.comp[0]=comp[1]right.comp[2]-comp[2]right.comp[1];
temp.comp[1]=-comp[0]right.comp[2]+comp[2]right.comp[0];
temp.comp[2]=comp[0]right.comp[1]-comp[1]right.comp[0];
return temp;
}
boost::numeric::ublas::matrix<Vec3D> A(1000,1000);
Vec3D x;
for (int i=0;i<1000;i++)
for (int j=0;j<1000;j++)A(i,j)=A(i,j).cross(x);
There are lots of these operations in my CFD codes.
Last edit: ngagewj 2014-07-27
Hey,
okay, so the operation is an element-wise cross product of an array of 3d-vectors. Currently we are limited to have vector<T> and matrix<T> with T being a primitive scalar type (float, double, int, etc.), so all I can advise for the time being is to use a custom kernel for achieving this. Examples how to do that can be found in examples/tutorial/custom-kernels.cpp
Best regards,
Karli
thanks,
Hoping ViennaCL can have vector<T> and matrix<T> with T being a tiny vector or matrix .