You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
(11) |
Dec
(36) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(30) |
Feb
(4) |
Mar
(4) |
Apr
(7) |
May
(5) |
Jun
(31) |
Jul
(6) |
Aug
(19) |
Sep
(38) |
Oct
(30) |
Nov
(22) |
Dec
(19) |
2013 |
Jan
(55) |
Feb
(39) |
Mar
(77) |
Apr
(10) |
May
(83) |
Jun
(52) |
Jul
(86) |
Aug
(61) |
Sep
(29) |
Oct
(9) |
Nov
(38) |
Dec
(22) |
2014 |
Jan
(14) |
Feb
(29) |
Mar
(4) |
Apr
(19) |
May
(3) |
Jun
(27) |
Jul
(6) |
Aug
(5) |
Sep
(3) |
Oct
(48) |
Nov
|
Dec
(5) |
2015 |
Jan
(8) |
Feb
(2) |
Mar
(8) |
Apr
(16) |
May
|
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
(13) |
Nov
(5) |
Dec
(2) |
2016 |
Jan
(26) |
Feb
(6) |
Mar
(8) |
Apr
(8) |
May
(2) |
Jun
|
Jul
|
Aug
(11) |
Sep
(3) |
Oct
(5) |
Nov
(14) |
Dec
(2) |
2017 |
Jan
(16) |
Feb
(4) |
Mar
(11) |
Apr
(4) |
May
(5) |
Jun
(5) |
Jul
(3) |
Aug
|
Sep
(6) |
Oct
|
Nov
(10) |
Dec
(6) |
2018 |
Jan
|
Feb
(21) |
Mar
(11) |
Apr
(3) |
May
(2) |
Jun
(8) |
Jul
|
Aug
(13) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
(11) |
2019 |
Jan
|
Feb
(5) |
Mar
(10) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(10) |
Oct
(4) |
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(4) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(11) |
Jun
(1) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
(2) |
Dec
(1) |
2023 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: mashilamani s. <kan...@gm...> - 2020-07-06 18:16:44
|
Hi, I am new to opencl. I am trying to port huge amount of C code to run on GPU with minimum modification to original C code. In that direction, I am trying to call ordinary C functions from my kernel functions. I want to know if that is possible with opencl 1.2 which you have implemented. Assume that I am taking care that each kernel has its own address space, its own stack etc which dont clash with other kernels. I have NVIDIA GTX1060 6GB. Thanks much, mani PS> I have tried in openmp and failed. In openmp, it is impossible to call ordinary functions from a function which is executing on "target" device. |
From: Pekka J. <pek...@tu...> - 2020-04-03 10:04:06
|
PoCL is a portable open source (MIT-licensed) implementation of the OpenCL standard (1.2 with some 2.0 features supported). In addition to being an easily portable multi-device open-source OpenCL implementation, a major goal of this project is improving interoperability of diversity of OpenCL-capable devices by integrating them to a single centrally orchestrated platform. Upstream PoCL currently supports various CPUs, NVIDIA GPUs via libcuda, HSA-supported GPUs and TCE ASIPs (experimental, see: http://openasip.org) It also is known to have multiple (private) ports. Release Highlights ------------------ * Support for Clang/LLVM 10.0 * An easy to use kernel profiling feature * Plenty of fixes and performance improvements Please note that there's an official PoCL "maintenance policy" in place. This text describes the policy and how you can get your favourite project that uses OpenCL to remain regression free in the future PoCL releases: http://portablecl.org/docs/html/maintainer-policy.html Acknowledgements ---------------- Most of the code that landed to the PoCL code base during this release cycle was produced for the needs of research projects funded by various sources. Customized Parallel Computing research group of Tampere University, Finland likes to thank the ECSEL JU project FitOptiVis (project number 783162) and HSA Foundation for funding most of the development work in this release. Much appreciated! Links ----- Home page: http://portablecl.org/ This announcement: http://portablecl.org/downloads/ANNOUNCEMENT Change log: http://portablecl.org/downloads/CHANGES Download: http://portablecl.org/download.html -- Pekka |
From: Michal B. (TAU) <mic...@tu...> - 2020-04-01 18:36:26
|
Hi, Pocl's CPU driver does not touch the affinity, by default. IOW it leaves it to the kernel. By default it creates as much threads as there are logical CPU cores; this can be limited by POCL_MAX_PTHREAD_COUNT env variable. If you want to force Pocl to set the affinity of threads, set POCL_AFFINITY env var to 1. As for: "When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilised cores seems to be one less than actually specified" This has been discussed already in https://github.com/pocl/pocl/issues/716. The short version is that while kernel commands are split and processed on all driver threads, the other commands (clReadBuffer) are never split, and are processed in a single thread. Regards, -- mb ________________________________ From: Srijeeta Maity <sri...@gm...> Sent: Wednesday, April 1, 2020 5:49 PM To: poc...@li... <poc...@li...> Subject: [pocl-devel] Subdevices andCPU affinity mismatch Hello, I found some observation while checking the CPU core affinity after creating OpenCL subdevices using PoCL. 1. I created 8 subdevices in a platform having 8CPU cores (Odroid XU4) and assigned some tasks on different subdevices . I checked with htop that the device affinity of those subdevices is not constant for the whole runtime (i.e. it is reset from time to time). 2. Some of those subdevices seem to be set to the same affinity. This can also be a side effect of the refresh rate in htop, so that I am not able to see when the affinity has changed. 3. When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilised cores seems to be one less than actually specified (i.e. partition equally to subdevices each having 4 compute units, will actually only use 3 cpu cores). While checking for solution I found there was a similar bug reported for OpenCL in 2013. https://software.intel.com/en-us/forums/opencl/topic/377784 Can you please share some pointer to solve this issue? Thanks and regards, Srijeeta |
From: Srijeeta M. <sri...@gm...> - 2020-04-01 14:49:48
|
Hello, I found some observation while checking the CPU core affinity after creating OpenCL subdevices using PoCL. 1. I created 8 subdevices in a platform having 8CPU cores (Odroid XU4) and assigned some tasks on different subdevices . I checked with htop that the device affinity of those subdevices is not constant for the whole runtime (i.e. it is reset from time to time). 2. Some of those subdevices seem to be set to the same affinity. This can also be a side effect of the refresh rate in htop, so that I am not able to see when the affinity has changed. 3. When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilised cores seems to be one less than actually specified (i.e. partition equally to subdevices each having 4 compute units, will actually only use 3 cpu cores). While checking for solution I found there was a similar bug reported for OpenCL in 2013. https://software.intel.com/en-us/forums/opencl/topic/377784 Can you please share some pointer to solve this issue? Thanks and regards, Srijeeta |
From: Michal B. (TAU) <mic...@tu...> - 2020-04-01 13:57:40
|
Hello, RC2 with a few fixes is now available: https://github.com/pocl/pocl/releases/tag/v1.5-RC2 Regards, -- mb ________________________________ From: Michal Babej (TAU) Sent: Monday, March 30, 2020 3:22 PM To: poc...@li... <poc...@li...> Subject: Pocl 1.5 Hi all, LLVM 10 is out, so Pocl is preparing a new release. Please help us by testing and reporting your results as instructed here: https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.5 Thanks, -- mb |
From: Michal B. (TAU) <mic...@tu...> - 2020-03-30 15:01:42
|
Hi all, LLVM 10 is out, so Pocl is preparing a new release. Please help us by testing and reporting your results as instructed here: https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.5 Thanks, -- mb |
From: Michal B. (TAU) <mic...@tu...> - 2019-10-31 11:32:21
|
Hello, It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary). There is no tool, because every OpenCL implementation (pocl, AMD SDK etc) uses its own binary format. There is no single format which every implementation accepts. Pocl has a tool for compiling an OpenCL source to a binary, poclcc, which simply calls clCreateProgram/clBuildProgram/clGetProgramInfo(...CL_PROGRAM_BINARIES...) and stores the result in a file. That file is only loadable by pocl. clang should be able to compile kernels, however, in praxis it doesn't Clang can compile OpenCL to LLVM bitcode. However it cannot output binaries which OpenCL implementations can load, because (again) every implementation uses its own binary format. That's why you must always compile through implementation, not through Clang. There is one somewhat common binary format, SPIR-V, but it currently works with only a few OpenCL implementations. And it probably won't help you, because a SPIR-V binary needs to be compiled for the target device by the implementation, so the target OpenCL implementation will need a compiler anyway. I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU) Sounds like you tried to compile some version of pocl with autotools, which is ancient and we don't provide any support for it anymore. The following post seems to be outdated It is outdated, and wrong. RISC-V is supported by upstream LLVM (from my quick googling), and pocl haven't used autotools for a few years now. So to clear up the confusion around cross-compiling here: 1) cross compiling the implementation (pocl) itself for another device - this may or may not work with pocl, i haven't tried. The reason is that pocl can now be compiled natively on devices as weak as a Raspberry Pi with 1GB ram. If you have a RISC-V device with at least 1GB ram, you can probably compile pocl with LLVM and the CPU driver natively on it. 2) "cross compiling the OpenCL sources" - this doesn't exist. There is something similar: offline compilation, which is compiling OpenCL sources for a device which is not present in the system. However this has to be supported by the implementation. Pocl doesn't support this ATM. What you can do with pocl is: 1) on a sufficiently powerful RISC-V machine, compile a native build with LLVM and use it to compile sources to pocl-binaries; and 2) on weaker RISC-V machines (<1G ram), you can compile a compiler-less version of pocl that only supports loading from binaries, and you can use binaries from 1) here. Regards, -- mb ________________________________ From: Raad Bahmani via pocl-devel <poc...@li...> Sent: Thursday, October 31, 2019 11:36 AM To: poc...@li... <poc...@li...> Cc: Raad Bahmani <r.b...@ya...> Subject: [pocl-devel] Cross-compiling for RISC-V Dear all, I’m trying to cross-compile OpenCL (on X86 for RISC-V). Thereby, the kernel (cl file) should be loaded from binary and this is the problem. It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary). I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU) but the generated binary can not be loaded on RISC-V. According to its documentation ( https://clang.llvm.org/docs/UsersManual.html#opencl-features) clang should be able to compile kernels, however, in praxis it doesn't. The following post seems to be outdated (and it has also not been accepted as an answer). https://stackoverflow.com/questions/41204224/configuring-pocl-for-riscv/42800248#42800248 It would be a great help if you could give some information (best case would be a simple working example:) ) about how binaries of OpenCL-Kernels can be generated for RISC-V. Best, Raad |
From: Raad B. <r.b...@ya...> - 2019-10-31 09:36:18
|
Dear all, I’m trying to cross-compile OpenCL (on X86 for RISC-V). Thereby, the kernel (cl file) should be loaded from binary and this is the problem. It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary). I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU) but the generated binary can not be loaded on RISC-V. According to its documentation ( https://clang.llvm.org/docs/UsersManual.html#opencl-features) clang should be able to compile kernels, however, in praxis it doesn't. The following post seems to be outdated (and it has also not been accepted as an answer). https://stackoverflow.com/questions/41204224/configuring-pocl-for-riscv/42800248#42800248 It would be a great help if you could give some information (best case would be a simple working example:) ) about how binaries of OpenCL-Kernels can be generated for RISC-V. Best,Raad |
From: Hartmann, O. <o.h...@wa...> - 2019-10-26 13:20:30
|
POCL 1.4 on FreeBSD seems to have introduced a mutual bug, which is not with POCL 1.3. Running either FreeBSD 12.1-PRE, 12-STABLE (recent version) or CURRENT (recent version), POCL 1.4 (compiled with CLANG/LLVM 9) fails using devel/clpeak with: [...] Platform: Portable Computing Language Device: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz Driver version : 1.4 (FreeBSD) Compute units : 4 Clock frequency : 2501 MHz Global memory bandwidth (GBPS) float : clEnqueueNDRangeKernel (-63) Tests skipped Single-precision compute (GFLOPS) clCreateBuffer (-61) Tests skipped No half precision support! Skipped Double-precision compute (GFLOPS) clCreateBuffer (-61) Tests skipped Integer compute (GIOPS) clCreateBuffer (-61) Tests skipped Transfer bandwidth (GBPS) enqueueWriteBuffer : 0.00 enqueueReadBuffer : 0.00 enqueueMapBuffer(for read) : 0.00 memcpy from mapped ptr : inf enqueueUnmap(after write) : 0.00 memcpy to mapped ptr : inf Kernel launch latency : dlopen("/root/.cache/pocl/kcache/EK/MAFMMCDPPMBCLMDMPBFICNBJFELHOLLNFBIDP/global_bandwidth_v1_local_offset/256-1-1-goffs0-smallgrid/global_bandwidth_v1_local_offset.so") failed with 'Undefined symbol "clGetPlatformInfo"'. note: missing symbols in the kernel binary might be reported as 'file not found' errors. Abort (core dumped) The problem doesn't reveal itself using POCL 1.3. Need some advice to track the problem down. Kind regards, oh |
From: Michal B. (TAU) <mic...@tu...> - 2019-10-14 10:25:26
|
Hello, Pocl 1.4 has just been released. Feature Highlights ------------------ * Support for Clang/LLVM 9.0 * Support for LLVM older than 6.0 has been removed (6.0 is still supported). * It's possible to build relocatable Pocl * Improved SPIR and SPIR-V support for CPU device Links ----- Home page: http://portablecl.org/ Change log: http://portablecl.org/downloads/CHANGES Download: http://portablecl.org/download.html Regards, -- mb |
From: Enrique G. <enr...@gm...> - 2019-09-30 15:20:30
|
Hi, Im doing the build at a Virtual Machine on Vitural Box (Ubuntu 18.04). My hardware: GPU: GeForce GTX 1060 CPU: i7-8750H RAM 16GB The original OS is Windows10, the VM uses 6 cores and 10 RAM GB, i cannot give to the VM more resources. El lun., 30 sept. 2019 a las 17:13, Benson Muite (< ben...@em...>) escribió: > Hi, > > May want to indicate what platform (operating system and chip) you are > building on as well as the error messages you get. > > OpenCL optimizations are not always portable. Most people have focused on > portability on different GPUs, but portability for CPU performance using > POCL is also of interest. > > Benson > On 9/30/19 5:45 PM, Enrique González wrote: > > Hello everyone, > > I post this message at this list because I thought that dev or other users > could reply me if they read me, I am sorry for also send a private email. > > I am doing final degree project, where I have to discover the way for > optimize OpenCL code in different ways, doing iterative optimizations. I > thought that using POCL (because of the tool description) could help me for > doing this work, but reading the documentation[1] at the official website, > I do not found how to use this tool. I only see usual usage of gcc, at > least it seems to me. > > Also, when building the project from github[2], it gives me the error I > said in my last message, taht says that itdoes notencounterclang > executables, when I have installed Clang and LLVM from LLVM Project[3],, > the newest versions of this tools I do not have the "clang" command, but > have the "clang-10" command, the newest one (version 10.0). I do not know > why it does not detect clang at my PC because of this, when I install other > things it detects the 10.0 clang version, but this build does not. > > If someone can give me some help, I would appreciate it. > Thanks in advance. > > References: > [1] http://portablecl.org/docs/html/using.html > [2] https://github.com/pocl <https://github.com/pocl/pocl> > [3] https://github.com/llvm/llvm-project > > > El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) (< > pek...@tu...>) escribió: > >> Hi Enrique, >> >> It's not possible to help you based on this description of your >> problem. >> >> Our group is rather busy at the moment so we cannot generally >> answer to private support requests with short latency, so please send >> your questions to this list instead of private emails. >> >> If you add adequate description of your problem and post it to >> this mailing list, also other users/developers of POCL might be able to >> help you. >> >> Thanks, >> Pekka >> >> On 30.9.2019 12.22, Enrique González wrote: >> > Dear all, >> > >> > I installed POCL (and an old LLVM version, so it requires one between >> > 4.0 and 8.0). I have read the documentation at their page and it does >> > not seems to say anything about how you can use this tool for optimize >> > code, my boss neither has find anything, and we are looking for some >> > site where we can learn how to use it and which flags we can prove for >> > diferente results, because POCL looks like the tool we need for OpenCL >> > pipeline optimizations. >> > >> > Could you help us with this problem? >> > >> > Thanks in advance. >> > >> > >> > _______________________________________________ >> > pocl-devel mailing list >> > poc...@li... >> > https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > >> >> -- >> Pekka >> >> _______________________________________________ >> pocl-devel mailing list >> poc...@li... >> https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > > > -- > Un saludo > > Enrique González > > > _______________________________________________ > pocl-devel mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pocl-devel > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Un saludo Enrique González |
From: Benson M. <ben...@em...> - 2019-09-30 15:12:35
|
Hi, May want to indicate what platform (operating system and chip) you are building on as well as the error messages you get. OpenCL optimizations are not always portable. Most people have focused on portability on different GPUs, but portability for CPU performance using POCL is also of interest. Benson On 9/30/19 5:45 PM, Enrique González wrote: > Hello everyone, > > I post this message at this list because I thought that dev or other > users could reply me if they read me, I am sorry for also send a > private email. > > I am doing final degree project, where I have to discover the way for > optimize OpenCL code in different ways, doing iterative optimizations. > I thought that using POCL (because of the tool description) could help > me for doing this work, but reading the documentation[1] at the > official website, I do not found how to use this tool. I only see > usual usage of gcc, at least it seems to me. > > Also, when building the project from github[2], it gives me the error > I said in my last message, taht says that itdoes notencounterclang > executables, when I have installed Clang and LLVM from LLVM > Project[3],, the newest versions of this tools I do not have the > "clang" command, but have the "clang-10" command, the newest one > (version 10.0). I do not know why it does not detect clang at my PC > because of this, when I install other things it detects the 10.0 clang > version, but this build does not. > > If someone can give me some help, I would appreciate it. > Thanks in advance. > > References: > [1] http://portablecl.org/docs/html/using.html > [2] https://github.com/pocl <https://github.com/pocl/pocl> > [3] https://github.com/llvm/llvm-project > > > El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) > (<pek...@tu... <mailto:pek...@tu...>>) > escribió: > > Hi Enrique, > > It's not possible to help you based on this description of your > problem. > > Our group is rather busy at the moment so we cannot generally > answer to private support requests with short latency, so please send > your questions to this list instead of private emails. > > If you add adequate description of your problem and post it to > this mailing list, also other users/developers of POCL might be > able to > help you. > > Thanks, > Pekka > > On 30.9.2019 12.22, Enrique González wrote: > > Dear all, > > > > I installed POCL (and an old LLVM version, so it requires one > between > > 4.0 and 8.0). I have read the documentation at their page and it > does > > not seems to say anything about how you can use this tool for > optimize > > code, my boss neither has find anything, and we are looking for > some > > site where we can learn how to use it and which flags we can > prove for > > diferente results, because POCL looks like the tool we need for > OpenCL > > pipeline optimizations. > > > > Could you help us with this problem? > > > > Thanks in advance. > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > <mailto:poc...@li...> > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > _______________________________________________ > pocl-devel mailing list > poc...@li... > <mailto:poc...@li...> > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Un saludo > > Enrique González > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel |
From: Enrique G. <enr...@gm...> - 2019-09-30 14:46:02
|
Hello everyone, I post this message at this list because I thought that dev or other users could reply me if they read me, I am sorry for also send a private email. I am doing final degree project, where I have to discover the way for optimize OpenCL code in different ways, doing iterative optimizations. I thought that using POCL (because of the tool description) could help me for doing this work, but reading the documentation[1] at the official website, I do not found how to use this tool. I only see usual usage of gcc, at least it seems to me. Also, when building the project from github[2], it gives me the error I said in my last message, taht says that itdoes notencounterclang executables, when I have installed Clang and LLVM from LLVM Project[3],, the newest versions of this tools I do not have the "clang" command, but have the "clang-10" command, the newest one (version 10.0). I do not know why it does not detect clang at my PC because of this, when I install other things it detects the 10.0 clang version, but this build does not. If someone can give me some help, I would appreciate it. Thanks in advance. References: [1] http://portablecl.org/docs/html/using.html [2] https://github.com/pocl <https://github.com/pocl/pocl> [3] https://github.com/llvm/llvm-project El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) (< pek...@tu...>) escribió: > Hi Enrique, > > It's not possible to help you based on this description of your > problem. > > Our group is rather busy at the moment so we cannot generally > answer to private support requests with short latency, so please send > your questions to this list instead of private emails. > > If you add adequate description of your problem and post it to > this mailing list, also other users/developers of POCL might be able to > help you. > > Thanks, > Pekka > > On 30.9.2019 12.22, Enrique González wrote: > > Dear all, > > > > I installed POCL (and an old LLVM version, so it requires one between > > 4.0 and 8.0). I have read the documentation at their page and it does > > not seems to say anything about how you can use this tool for optimize > > code, my boss neither has find anything, and we are looking for some > > site where we can learn how to use it and which flags we can prove for > > diferente results, because POCL looks like the tool we need for OpenCL > > pipeline optimizations. > > > > Could you help us with this problem? > > > > Thanks in advance. > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Un saludo Enrique González |
From: Pekka J. (T. <pek...@tu...> - 2019-09-30 12:43:49
|
Hi Enrique, It's not possible to help you based on this description of your problem. Our group is rather busy at the moment so we cannot generally answer to private support requests with short latency, so please send your questions to this list instead of private emails. If you add adequate description of your problem and post it to this mailing list, also other users/developers of POCL might be able to help you. Thanks, Pekka On 30.9.2019 12.22, Enrique González wrote: > Dear all, > > I installed POCL (and an old LLVM version, so it requires one between > 4.0 and 8.0). I have read the documentation at their page and it does > not seems to say anything about how you can use this tool for optimize > code, my boss neither has find anything, and we are looking for some > site where we can learn how to use it and which flags we can prove for > diferente results, because POCL looks like the tool we need for OpenCL > pipeline optimizations. > > Could you help us with this problem? > > Thanks in advance. > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Pekka |
From: Enrique G. <enr...@gm...> - 2019-09-30 09:23:15
|
Dear all, I installed POCL (and an old LLVM version, so it requires one between 4.0 and 8.0). I have read the documentation at their page and it does not seems to say anything about how you can use this tool for optimize code, my boss neither has find anything, and we are looking for some site where we can learn how to use it and which flags we can prove for diferente results, because POCL looks like the tool we need for OpenCL pipeline optimizations. Could you help us with this problem? Thanks in advance. |
From: Michal B. (TAU) <mic...@tu...> - 2019-09-18 11:31:10
|
Hi Andreas, Thanks for following up on the memory leaks. I've done a run on our end to investigate, and I've found that 1.4 still consumes considerably more memory than the Intel ICD, but it seems there has been some measure of improvement from 1.2 to 1.4: That's quite possible, for multiple reasons. It could still be a memleak in some code path my tests don't cover. But it could also be just memory consumption from pocl's design, or memory fragmentation from pocl's use of malloc/free. I have put some effort into keeping pocl free of memory *leaks*, but AFAIK nobody has optimized pocl for memory *usage*. But if you're interested in trying something out, PR 768 has some larger changes WRT memory, that could improve memory usage in certain scenarios. -- mb |
From: Andreas K. <li...@in...> - 2019-09-18 03:00:09
|
Hi Michal, "Michal Babej (TAU)" <mic...@tu...> writes: >> valgrind. Were you able to reproduce that on your end? > > Yes; I commented in the issue. Thanks very much to you and Pekka for taking a look. We'll continue to pursue this and will share what we find. >> Possible memory leak in POCL: > > https://gitlab.tiker.net/inducer/pytential/issues/131 > > We have a buildbot which tracks memory leaks by building pocl with > Address Sanitizer (cmake -DENABLE_ASAN=ON). I have discovered that due > to some faulty logic in cmake/ctest, some memleaks went unnoticed; > this has now been fixed in release_1_4 & master. Ofc it's possible > that it's a different leak; the test coverage is not full. > > Unfortunately i don't have any good advice for finding memory leaks > when using pocl in Python. ASan may be impossible to use, and valgrind > will probably report a lot of false positives (and be slow). Also i'm > not sure it's possible to force Python to release an object, which > makes it even more problematic. With C/C++ it's much simpler - if you > call clUnloadPlatformCompiler() at the end of your program, and you've > properly released all cl_* objects, pocl will additionally release all > LLVM related data, including long-lived static data structures; after > this there should only be a single memory leak, from LLVM signal > handlers. Thanks for following up on the memory leaks. I've done a run on our end to investigate, and I've found that 1.4 still consumes considerably more memory than the Intel ICD, but it seems there has been some measure of improvement from 1.2 to 1.4: https://gitlab.tiker.net/inducer/pytential/issues/131 Best, Andreas |
From: Michal B. (TAU) <mic...@tu...> - 2019-09-11 11:49:50
|
Hi Andreas, > valgrind. Were you able to reproduce that on your end? Yes; I commented in the issue. > Possible memory leak in POCL: https://gitlab.tiker.net/inducer/pytential/issues/131 We have a buildbot which tracks memory leaks by building pocl with Address Sanitizer (cmake -DENABLE_ASAN=ON). I have discovered that due to some faulty logic in cmake/ctest, some memleaks went unnoticed; this has now been fixed in release_1_4 & master. Ofc it's possible that it's a different leak; the test coverage is not full. Unfortunately i don't have any good advice for finding memory leaks when using pocl in Python. ASan may be impossible to use, and valgrind will probably report a lot of false positives (and be slow). Also i'm not sure it's possible to force Python to release an object, which makes it even more problematic. With C/C++ it's much simpler - if you call clUnloadPlatformCompiler() at the end of your program, and you've properly released all cl_* objects, pocl will additionally release all LLVM related data, including long-lived static data structures; after this there should only be a single memory leak, from LLVM signal handlers. Regards, -- mb |
From: Andreas K. <li...@in...> - 2019-09-10 15:06:24
|
Hi Michal, all, "Michal Babej (TAU)" <mic...@tu...> writes: > LLVM 9.0 is nearing release, so it's also time for a new pocl release. > I've created RC1 on friday but already a few bugfixes appeared, so it's RC2: > > https://github.com/pocl/pocl/releases/tag/v1.4-RC2 > > Please test and report your results here: > https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.4 Thank you for all the work you putting into pocl! I wanted to ask about this issue which I reported a while ago: https://github.com/pocl/pocl/issues/757 It seems indicative of a possible miscompilation, with a simple reproducer (essentially a vector copy) that gets flagged by valgrind. Were you able to reproduce that on your end? Another potential issue that we're tracking on our end is a possible memory leak in POCL: https://gitlab.tiker.net/inducer/pytential/issues/131 Thanks again, Andreas |
From: Michal B. (TAU) <mic...@tu...> - 2019-09-09 17:01:35
|
Hello everyone, LLVM 9.0 is nearing release, so it's also time for a new pocl release. I've created RC1 on friday but already a few bugfixes appeared, so it's RC2: https://github.com/pocl/pocl/releases/tag/v1.4-RC2 Please test and report your results here: https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.4 Regards, -- mb |
From: Pekka J. (T. <pek...@tu...> - 2019-04-04 14:45:35
|
Pocl is a portable open source (MIT-licensed) implementation of the OpenCL standard (1.2 with some 2.0 features supported). In addition to being an easily portable multi-device open-source OpenCL implementation, another major goal of this project is improving performance portability of OpenCL programs with the kernel compiler and the task runtime, reducing the need for target-dependent manual optimizations. Upstream pocl currently supports various CPU devices, NVIDIA GPUs via libcuda, HSA-supported GPUs and TCE ASIPs (experimental, see http://openasip.org), with various known private adopters as well. Release Highlights ------------------ * Support for Clang/LLVM 8.0. * Support ICD on OSX. * A lot of major and minor bug fixes and internal code cleanups. Notes ----- * Support code for older than LLVM 6.0 will be removed in the beginning of the next release cycle to clean up the code base. If you for older LLVM versions in the future pocl releases and wish to maintain it (run a buildbot and fix issues), let us know! * Support for Vecmathlib has been removed. Please note that there's an official pocl "maintenance policy" in place. This text describes the policy and how you can get your favourite project that uses OpenCL to remain regression free in the future pocl releases: http://portablecl.org/docs/html/maintainer-policy.html Acknowledgements ---------------- Most of the code that landed to the pocl code base during this release cycle was produced for the needs of research projects funded by various sources. Customized Parallel Computing research group of Tampere University, Finland likes to thank the Academy of Finland (funding decision 297548), Business Finland (FiDiPro project StreamPro, 1846/31/2014), ECSEL JU project FitOptiVis (project number 783162) and HSA Foundation for funding most of the development work in this release. Much appreciated! Links ----- Home page: http://portablecl.org/ This announcement: http://portablecl.org/downloads/ANNOUNCEMENT Change log: http://portablecl.org/downloads/CHANGES Download: http://portablecl.org/download.html -- Pekka |
From: Pekka J. (T. <pek...@tu...> - 2019-04-02 11:39:42
|
Hi all, Time to test a new pocl release! Please help us shaking any remaining regressions out by testing and reporting your results as instructed here: https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.3 BR, -- Pekka |
From: Timo B. <tim...@gm...> - 2019-03-14 10:53:45
|
Hi, I have opened an issue and summarised the discussion so far in https://github.com/pocl/pocl/issues/701 Best wishes Timo On Thu, 14 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) < pek...@tu...> wrote: > Hi Timo, > > Can you please open an issue of this, it's easier to track > in Github? > > Thanks, > Pekka > > On 14.3.2019 1.49, Timo Betcke wrote: > > Hi, > > > > I have pinned down the next failed test. It still seems related to the > > multi-indexing even with your bugfixed version. The corresponding gist > > is here: > > > > https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e > > > > At the end of the kernel all entries in shapeIntegral should have the > > value 1.0. However, while shapeIntegral[0][0] is correct, > > shapeIntegral[1][0] is not. > > If I move the second print statement for shapeIntegral[1][0] into the > > for loop the variables are correctly updated. > > > > Just something for context. The actual kernel from which this example is > > derived, is doing a finite element integral on a triangle. The test > > values are from the test space and the trial values from the domain > > space. Via C Macros I am adapting the dimensions of the arrays to the > > actual number of test and trial functions. The crash happens for trial > > dimension 1 and test dimension 3. > > > > Thanks again for your help. I am excited about getting Pocl to work with > > our software. > > > > Best wishes > > > > Timo > > > > > > On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm... > > <mailto:tim...@gm...>> wrote: > > > > Hi Michal, > > > > thanks for the bugfix. The crashes have now disappeared and more > > tests are passing with your bugfix version. However, several unit > > tests still fail that work with AMD and Intel. Briefly looking at > > the results I see lots of nan entries in the pocl output. I will try > > to pin this down more and then report back to you. > > > > Best wishes > > > > Timo > > > > On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU) > > <mic...@tu... <mailto:mic...@tu...>> wrote: > > > > Hello, > > > > > > I remember trying to fix this bug last year, but then i got > > sidetracked by other things. (BTW it would be preferable if you > > reported bugs as github issues in the future) > > > > > > Anyway, i've hopefully fixed it. Can you test your program with > > master branch from https://github.com/franz/pocl > > > > > > Regards, > > > > -- mb > > > > > ------------------------------------------------------------------------ > > *From:* Timo Betcke <tim...@gm... > > <mailto:tim...@gm...>> > > *Sent:* Friday, March 8, 2019 3:48:34 AM > > *To:* Portable Computing Language development discussion > > *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation > > Dear Pekka, > > > > I have now cooked up a small example that crashes in vmovaps. > > The gist is available here (uses PyOpenCL to run): > > > > https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a > > > > The example is fairly nonsensical and was derived by reducing a > > crashing kernel as far as possible while retaining the crash. > > It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on > > an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary > > I can create an environment with updated llvm, but would like to > > avoid it (unless it is llvm 6 related). Pocl is the most recent > > git master. > > > > The code crashes at the following assembler instructions: > > > > 0x00007fffe02575e3 <+195>: xor r9d,r9d > > 0x00007fffe02575e6 <+198>: xor r10d,r10d > > 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] > > 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 > > => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR > > [rdi+r9*1-0x10],xmm0 > > 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 > > 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR > > [rdx+r9*1-0x10],xmm0 > > 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR > > [rdi+r9*1-0x8] > > 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR > > [rdx+r9*1-0x8] > > 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR > > [rdx+r9*1-0x8],xmm1 > > 0x00007fffe0257623 <+259>: mov r8,r11 > > 0x00007fffe0257626 <+262>: sar r8,0x20 > > 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] > > > > Removing any of the for loops or the localResult variable (or > > removing its __local attribute) leads to the kernel working on > Pocl. > > It would be great to get to the source of this. Please let me > > know if you need more information from me. > > > > Best wishes > > > > Timo > > > > > > On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm... > > <mailto:tim...@gm...>> wrote: > > > > Hi Pekka, > > > > thanks for your hints and the link. I had one buffer in the > > kernel call that had a cast from a float type to a vector > > type. I have fixed this. But the segfault remains. In the > > next few days I will try to cook up a simple example that > > produces the segfault. Fortunately, the kernel itself is not > > too complicated, so should be able to reduce it. > > > > Best wishes > > > > Timo > > > > On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) > > <pek...@tu... > > <mailto:pek...@tu...>> wrote: > > > > Yes, now that I look at it more closely, > > your stack trace looks _very_ much to the common data > > alignment > > issues people have. I think this might be worth a FAQ > > item somewhere. > > > > > https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc > > > > On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > > > Hi Timo, > > > > > > Shooting in the dark here, but since just yesterday I > > debugged a similar > > > looking issue > > > which was caused by an illegal cast in the source > > code from float* to > > > float4*. It trusted > > > the alignment is still fine, which it wasn't after > > vectorization. A very > > > target specific programming > > > error which many ocl targets can easily hide. > > > > > > If this is something else, we need a test case, > > smaller the better, to > > > help you here. > > > Before opening an issue though, please with the > > latest master and LLVM 8. > > > > > > Pekka > > > > > > > > > ------------------------------------------------------------------------ > > > *From:* Timo Betcke <tim...@gm... > > <mailto:tim...@gm...>> > > > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > > > *To:* Portable Computing Language development > discussion > > > *Subject:* [pocl-devel] POCL Crash in vmovaps > operation > > > Dear Pocl community, > > > > > > I was just testing the newest Pocl Version (github > > master branch) with > > > our software. During execution of one of our kernels > > Pocl crashed. > > > Disassembling the crash shows the following > > operations during the crash: > > > > > > ------------------ > > > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > > > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > > > 0x00007fffb81efde0 <+672>: vpermilpd > xmm5,xmm4,0x1 > > > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > > > 0x00007fffb81efdea <+682>: vmulsd > xmm4,xmm15,xmm4 > > > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > > > 0x00007fffb81efdf2 <+690>: vpermilpd > xmm1,xmm1,0x1 > > > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > > > 0x00007fffb81efdfc <+700>: vpermilpd > xmm1,xmm0,0x1 > > > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > > > 0x00007fffb81efe06 <+710>: lea > rsi,[rdx+rdx*2] > > > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR > > [rbx+0x38] > > > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR > > [rdx+rsi*8],xmm12 > > > ---Type <return> to continue, or q <return> to quit--- > > > 0x00007fffb81efe13 <+723>: mov QWORD PTR > > [rbx+0x40],rsi > > > 0x00007fffb81efe17 <+727>: mov QWORD PTR > > [rdx+rsi*8+0x10],0x0 > > > 0x00007fffb81efe20 <+736>: vinsertf32x4 > > ymm1,ymm16,xmm0,0x1 > > > ----------------------------- > > > This seems to be a similar bug that I discussed a > > year ago on the > > > mailing list. See the thread here: > > > > > > https://www.mail-archive.com/poc...@li.../msg01087.html > . > > > > > In summary, the issue was related to us using arrays > > of arrays within > > > our kernels and pocl creating wrong code for it. > > > > > > During that time a gist was suggested for Pocl, which > > I tested but did > > > not improve things. Afterwards I let it drop for a > > while as we were in > > > early development and had loads of building sites. > > But our software is > > > now close to release ready and it would be great to > > get it working with > > > pocl. > > > > > > Any help would be greatly appreciated. > > > Best wishes > > > > > > Timo > > > > > > -- > > > Timo Betcke > > > Professor of Computational Mathematics > > > University College London > > > Department of Mathematics > > > E-Mail: t.b...@uc... > > <mailto:t.b...@uc...> <mailto:t.b...@uc... > > <mailto:t.b...@uc...>> > > > Tel.: +44 (0) 20-3108-4068 > > > > > > > > > _______________________________________________ > > > pocl-devel mailing list > > > poc...@li... > > <mailto:poc...@li...> > > > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > > > > -- > > Pekka > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > <mailto:poc...@li...> > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > > > > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > <mailto:poc...@li...> > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > > > > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |
From: Pekka J. (T. <pek...@tu...> - 2019-03-14 10:20:10
|
Hi Timo, Can you please open an issue of this, it's easier to track in Github? Thanks, Pekka On 14.3.2019 1.49, Timo Betcke wrote: > Hi, > > I have pinned down the next failed test. It still seems related to the > multi-indexing even with your bugfixed version. The corresponding gist > is here: > > https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e > > At the end of the kernel all entries in shapeIntegral should have the > value 1.0. However, while shapeIntegral[0][0] is correct, > shapeIntegral[1][0] is not. > If I move the second print statement for shapeIntegral[1][0] into the > for loop the variables are correctly updated. > > Just something for context. The actual kernel from which this example is > derived, is doing a finite element integral on a triangle. The test > values are from the test space and the trial values from the domain > space. Via C Macros I am adapting the dimensions of the arrays to the > actual number of test and trial functions. The crash happens for trial > dimension 1 and test dimension 3. > > Thanks again for your help. I am excited about getting Pocl to work with > our software. > > Best wishes > > Timo > > > On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm... > <mailto:tim...@gm...>> wrote: > > Hi Michal, > > thanks for the bugfix. The crashes have now disappeared and more > tests are passing with your bugfix version. However, several unit > tests still fail that work with AMD and Intel. Briefly looking at > the results I see lots of nan entries in the pocl output. I will try > to pin this down more and then report back to you. > > Best wishes > > Timo > > On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU) > <mic...@tu... <mailto:mic...@tu...>> wrote: > > Hello, > > > I remember trying to fix this bug last year, but then i got > sidetracked by other things. (BTW it would be preferable if you > reported bugs as github issues in the future) > > > Anyway, i've hopefully fixed it. Can you test your program with > master branch from https://github.com/franz/pocl > > > Regards, > > -- mb > > ------------------------------------------------------------------------ > *From:* Timo Betcke <tim...@gm... > <mailto:tim...@gm...>> > *Sent:* Friday, March 8, 2019 3:48:34 AM > *To:* Portable Computing Language development discussion > *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation > Dear Pekka, > > I have now cooked up a small example that crashes in vmovaps. > The gist is available here (uses PyOpenCL to run): > > https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a > > The example is fairly nonsensical and was derived by reducing a > crashing kernel as far as possible while retaining the crash. > It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on > an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary > I can create an environment with updated llvm, but would like to > avoid it (unless it is llvm 6 related). Pocl is the most recent > git master. > > The code crashes at the following assembler instructions: > > 0x00007fffe02575e3 <+195>: xor r9d,r9d > 0x00007fffe02575e6 <+198>: xor r10d,r10d > 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] > 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 > => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR > [rdi+r9*1-0x10],xmm0 > 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 > 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR > [rdx+r9*1-0x10],xmm0 > 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR > [rdi+r9*1-0x8] > 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR > [rdx+r9*1-0x8] > 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR > [rdx+r9*1-0x8],xmm1 > 0x00007fffe0257623 <+259>: mov r8,r11 > 0x00007fffe0257626 <+262>: sar r8,0x20 > 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] > > Removing any of the for loops or the localResult variable (or > removing its __local attribute) leads to the kernel working on Pocl. > It would be great to get to the source of this. Please let me > know if you need more information from me. > > Best wishes > > Timo > > > On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm... > <mailto:tim...@gm...>> wrote: > > Hi Pekka, > > thanks for your hints and the link. I had one buffer in the > kernel call that had a cast from a float type to a vector > type. I have fixed this. But the segfault remains. In the > next few days I will try to cook up a simple example that > produces the segfault. Fortunately, the kernel itself is not > too complicated, so should be able to reduce it. > > Best wishes > > Timo > > On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) > <pek...@tu... > <mailto:pek...@tu...>> wrote: > > Yes, now that I look at it more closely, > your stack trace looks _very_ much to the common data > alignment > issues people have. I think this might be worth a FAQ > item somewhere. > > https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc > > On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > > Hi Timo, > > > > Shooting in the dark here, but since just yesterday I > debugged a similar > > looking issue > > which was caused by an illegal cast in the source > code from float* to > > float4*. It trusted > > the alignment is still fine, which it wasn't after > vectorization. A very > > target specific programming > > error which many ocl targets can easily hide. > > > > If this is something else, we need a test case, > smaller the better, to > > help you here. > > Before opening an issue though, please with the > latest master and LLVM 8. > > > > Pekka > > > > > ------------------------------------------------------------------------ > > *From:* Timo Betcke <tim...@gm... > <mailto:tim...@gm...>> > > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > > *To:* Portable Computing Language development discussion > > *Subject:* [pocl-devel] POCL Crash in vmovaps operation > > Dear Pocl community, > > > > I was just testing the newest Pocl Version (github > master branch) with > > our software. During execution of one of our kernels > Pocl crashed. > > Disassembling the crash shows the following > operations during the crash: > > > > ------------------ > > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 > > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 > > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 > > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 > > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] > > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR > [rbx+0x38] > > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR > [rdx+rsi*8],xmm12 > > ---Type <return> to continue, or q <return> to quit--- > > 0x00007fffb81efe13 <+723>: mov QWORD PTR > [rbx+0x40],rsi > > 0x00007fffb81efe17 <+727>: mov QWORD PTR > [rdx+rsi*8+0x10],0x0 > > 0x00007fffb81efe20 <+736>: vinsertf32x4 > ymm1,ymm16,xmm0,0x1 > > ----------------------------- > > This seems to be a similar bug that I discussed a > year ago on the > > mailing list. See the thread here: > > > https://www.mail-archive.com/poc...@li.../msg01087.html. > > > In summary, the issue was related to us using arrays > of arrays within > > our kernels and pocl creating wrong code for it. > > > > During that time a gist was suggested for Pocl, which > I tested but did > > not improve things. Afterwards I let it drop for a > while as we were in > > early development and had loads of building sites. > But our software is > > now close to release ready and it would be great to > get it working with > > pocl. > > > > Any help would be greatly appreciated. > > Best wishes > > > > Timo > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... > <mailto:t.b...@uc...> <mailto:t.b...@uc... > <mailto:t.b...@uc...>> > > Tel.: +44 (0) 20-3108-4068 > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > <mailto:poc...@li...> > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > <mailto:poc...@li...> > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > Tel.: +44 (0) 20-3108-4068 > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > Tel.: +44 (0) 20-3108-4068 > _______________________________________________ > pocl-devel mailing list > poc...@li... > <mailto:poc...@li...> > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > Tel.: +44 (0) 20-3108-4068 > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > Tel.: +44 (0) 20-3108-4068 > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Pekka |
From: Timo B. <tim...@gm...> - 2019-03-14 00:50:09
|
Hi, I have pinned down the next failed test. It still seems related to the multi-indexing even with your bugfixed version. The corresponding gist is here: https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e At the end of the kernel all entries in shapeIntegral should have the value 1.0. However, while shapeIntegral[0][0] is correct, shapeIntegral[1][0] is not. If I move the second print statement for shapeIntegral[1][0] into the for loop the variables are correctly updated. Just something for context. The actual kernel from which this example is derived, is doing a finite element integral on a triangle. The test values are from the test space and the trial values from the domain space. Via C Macros I am adapting the dimensions of the arrays to the actual number of test and trial functions. The crash happens for trial dimension 1 and test dimension 3. Thanks again for your help. I am excited about getting Pocl to work with our software. Best wishes Timo On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm...> wrote: > Hi Michal, > > thanks for the bugfix. The crashes have now disappeared and more tests are > passing with your bugfix version. However, several unit tests still fail > that work with AMD and Intel. Briefly looking at the results I see lots of > nan entries in the pocl output. I will try to pin this down more and then > report back to you. > > Best wishes > > Timo > > On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU) <mic...@tu...> > wrote: > >> Hello, >> >> >> I remember trying to fix this bug last year, but then i got sidetracked >> by other things. (BTW it would be preferable if you reported bugs as github >> issues in the future) >> >> >> Anyway, i've hopefully fixed it. Can you test your program with master >> branch from https://github.com/franz/pocl >> >> >> Regards, >> >> -- mb >> ------------------------------ >> *From:* Timo Betcke <tim...@gm...> >> *Sent:* Friday, March 8, 2019 3:48:34 AM >> *To:* Portable Computing Language development discussion >> *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation >> >> Dear Pekka, >> >> I have now cooked up a small example that crashes in vmovaps. The gist is >> available here (uses PyOpenCL to run): >> >> https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a >> >> The example is fairly nonsensical and was derived by reducing a crashing >> kernel as far as possible while retaining the crash. >> It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on an AMD >> GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary >> I can create an environment with updated llvm, but would like to avoid it >> (unless it is llvm 6 related). Pocl is the most recent git master. >> >> The code crashes at the following assembler instructions: >> >> 0x00007fffe02575e3 <+195>: xor r9d,r9d >> 0x00007fffe02575e6 <+198>: xor r10d,r10d >> 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] >> 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 >> => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR [rdi+r9*1-0x10],xmm0 >> 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 >> 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR [rdx+r9*1-0x10],xmm0 >> 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR [rdi+r9*1-0x8] >> 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR >> [rdx+r9*1-0x8] >> 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR [rdx+r9*1-0x8],xmm1 >> 0x00007fffe0257623 <+259>: mov r8,r11 >> 0x00007fffe0257626 <+262>: sar r8,0x20 >> 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] >> >> Removing any of the for loops or the localResult variable (or removing >> its __local attribute) leads to the kernel working on Pocl. >> It would be great to get to the source of this. Please let me know if you >> need more information from me. >> >> Best wishes >> >> Timo >> >> >> On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...> wrote: >> >> Hi Pekka, >> >> thanks for your hints and the link. I had one buffer in the kernel call >> that had a cast from a float type to a vector type. I have fixed this. But >> the segfault remains. In the next few days I will try to cook up a simple >> example that produces the segfault. Fortunately, the kernel itself is not >> too complicated, so should be able to reduce it. >> >> Best wishes >> >> Timo >> >> On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) < >> pek...@tu...> wrote: >> >> Yes, now that I look at it more closely, >> your stack trace looks _very_ much to the common data alignment >> issues people have. I think this might be worth a FAQ item somewhere. >> >> >> https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc >> >> On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: >> > Hi Timo, >> > >> > Shooting in the dark here, but since just yesterday I debugged a >> similar >> > looking issue >> > which was caused by an illegal cast in the source code from float* to >> > float4*. It trusted >> > the alignment is still fine, which it wasn't after vectorization. A >> very >> > target specific programming >> > error which many ocl targets can easily hide. >> > >> > If this is something else, we need a test case, smaller the better, to >> > help you here. >> > Before opening an issue though, please with the latest master and LLVM >> 8. >> > >> > Pekka >> > >> > ------------------------------------------------------------------------ >> > *From:* Timo Betcke <tim...@gm...> >> > *Sent:* Tuesday, March 5, 2019 11:27:12 PM >> > *To:* Portable Computing Language development discussion >> > *Subject:* [pocl-devel] POCL Crash in vmovaps operation >> > Dear Pocl community, >> > >> > I was just testing the newest Pocl Version (github master branch) with >> > our software. During execution of one of our kernels Pocl crashed. >> > Disassembling the crash shows the following operations during the crash: >> > >> > ------------------ >> > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 >> > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 >> > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 >> > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 >> > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 >> > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 >> > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 >> > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 >> > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 >> > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 >> > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] >> > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] >> > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 >> > ---Type <return> to continue, or q <return> to quit--- >> > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi >> > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 >> > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 >> > ----------------------------- >> > This seems to be a similar bug that I discussed a year ago on the >> > mailing list. See the thread here: >> > >> https://www.mail-archive.com/poc...@li.../msg01087.html. >> >> > In summary, the issue was related to us using arrays of arrays within >> > our kernels and pocl creating wrong code for it. >> > >> > During that time a gist was suggested for Pocl, which I tested but did >> > not improve things. Afterwards I let it drop for a while as we were in >> > early development and had loads of building sites. But our software is >> > now close to release ready and it would be great to get it working with >> > pocl. >> > >> > Any help would be greatly appreciated. >> > Best wishes >> > >> > Timo >> > >> > -- >> > Timo Betcke >> > Professor of Computational Mathematics >> > University College London >> > Department of Mathematics >> > E-Mail: t.b...@uc... <mailto:t.b...@uc...> >> > Tel.: +44 (0) 20-3108-4068 >> > >> > >> > _______________________________________________ >> > pocl-devel mailing list >> > poc...@li... >> > https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > >> >> -- >> Pekka >> >> >> _______________________________________________ >> pocl-devel mailing list >> poc...@li... >> https://lists.sourceforge.net/lists/listinfo/pocl-devel >> >> >> >> -- >> Timo Betcke >> Professor of Computational Mathematics >> University College London >> Department of Mathematics >> E-Mail: t.b...@uc... >> Tel.: +44 (0) 20-3108-4068 >> >> >> >> -- >> Timo Betcke >> Professor of Computational Mathematics >> University College London >> Department of Mathematics >> E-Mail: t.b...@uc... >> Tel.: +44 (0) 20-3108-4068 >> _______________________________________________ >> pocl-devel mailing list >> poc...@li... >> https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... > Tel.: +44 (0) 20-3108-4068 > -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |