I'm just learning to work with ViennaCL. The first tries on the CPU worked fine, now I am trying to use OpenCL. However, I can't manage to get data onto the CPU - while the matrices seem to be created, they don't get any contents:
hmm, the only known issue with OS X to date is that on older versions the OpenCL SDK has some problems when handles are stored in static variables, but we have a workaround for this. Another user reported issues in some examples when using a Retina display, the reason being too little GPU RAM left. However, this should not affect your simple snippet.
Do the examples run correctly? Particularly those ending in "-opencl"?
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
noname:benchmarks Markus$ ./openclbench-opencl
----------------------------------------------
Device Info
----------------------------------------------
CL Device Vendor ID: 16918016
CL Device Name: GeForce GT 650M
CL Driver Version: CLH 1.0
--------------------------------
CL Device Max Compute Units: 2
CL Device Max Work Group Size: 1024
CL Device Global Mem Size: 1073741824
CL Device Local Mem Size: 49152
----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
Time for building scalar kernels: 0
Time for building vector kernels: 0.000444
Time for building matrix kernels: 0.003909
Time for building compressed_matrix kernels: 1.6e-05
Time for 100000 entry accesses on host: 0.000466
Time per entry: 4.66e-09
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 2.50028
Time per entry: 2.50028e-05
Result of operation via OpenCL: 4.59163e-36
-------------------------------
# benchmarking double-precision
-------------------------------
Time for building scalar kernels: 1e-06
Time for building vector kernels: 0.000706
Time for building matrix kernels: 0.005705
Time for building compressed_matrix kernels: 4.3e-05
Time for 100000 entry accesses on host: 0.000467
Time per entry: 4.67e-09
Result of operation on host: 105171
Time for 100000 entry accesses via OpenCL: 5.92087
Time per entry: 5.92087e-05
Result of operation via OpenCL: 6.95322e-305
I assume this means they didn't:
Result of operation on host: 105171
Result of operation via OpenCL: 6.95322e-305
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmm, this looks really broken. Did you run any sample OpenCL examples outside ViennaCL? It seems to me that your OpenCL installation is somehow broken, as no data gets written to the device.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nope. Looks like my OpenCL is broken. I didn't think about that possibility since I didn't to anything to Apple's OpenCL installation. Thanks for your help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm just learning to work with ViennaCL. The first tries on the CPU worked fine, now I am trying to use OpenCL. However, I can't manage to get data onto the CPU - while the matrices seem to be created, they don't get any contents:
After this,
is 0 but I believe it should be 1. Also, if I copy
back to
,
is incorrect as well.
I'm compiling this with
. I am using OS X with the provided OpenCL driver.
What am I doing wrong here?
Edit: Removing
fixes the problem, but is not what I want.
Whoops - sorry for messing up the formatting. I thought that inline codes would work.
Hi,
I reran your example on my machine, everything runs correctly.
Maybe the values aren't yet copied to the GPU? Could you please add
after the call to copy() and let me know whether the error remains?
Thanks and best regards,
Karli
Hi Karli,
Thank you for your reply. Unfortunately, this didn't change anything.
Are there any known pitfalls when using OS X? I will try this on another machine later today and see if that makes any difference.
Thanks,
Markus
Hi Markus,
hmm, the only known issue with OS X to date is that on older versions the OpenCL SDK has some problems when handles are stored in static variables, but we have a workaround for this. Another user reported issues in some examples when using a Retina display, the reason being too little GPU RAM left. However, this should not affect your simple snippet.
Do the examples run correctly? Particularly those ending in "-opencl"?
Best regards,
Karli
noname:benchmarks Markus$ ./openclbench-opencl
----------------------------------------------
Device Info
----------------------------------------------
CL Device Vendor ID: 16918016
CL Device Name: GeForce GT 650M
CL Driver Version: CLH 1.0
--------------------------------
CL Device Max Compute Units: 2
CL Device Max Work Group Size: 1024
CL Device Global Mem Size: 1073741824
CL Device Local Mem Size: 49152
----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
Time for building scalar kernels: 0
Time for building vector kernels: 0.000444
Time for building matrix kernels: 0.003909
Time for building compressed_matrix kernels: 1.6e-05
Time for 100000 entry accesses on host: 0.000466
Time per entry: 4.66e-09
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 2.50028
Time per entry: 2.50028e-05
Result of operation via OpenCL: 4.59163e-36
-------------------------------
# benchmarking double-precision
-------------------------------
Time for building scalar kernels: 1e-06
Time for building vector kernels: 0.000706
Time for building matrix kernels: 0.005705
Time for building compressed_matrix kernels: 4.3e-05
Time for 100000 entry accesses on host: 0.000467
Time per entry: 4.67e-09
Result of operation on host: 105171
Time for 100000 entry accesses via OpenCL: 5.92087
Time per entry: 5.92087e-05
Result of operation via OpenCL: 6.95322e-305
I assume this means they didn't:
Hmm, this looks really broken. Did you run any sample OpenCL examples outside ViennaCL? It seems to me that your OpenCL installation is somehow broken, as no data gets written to the device.
Nope. Looks like my OpenCL is broken. I didn't think about that possibility since I didn't to anything to Apple's OpenCL installation. Thanks for your help.
Thanks, Markus, for letting us know. This will certainly help us if similar troubles show up again for other users.
Best regards,
Karli