pocl-devel Mailing List for pocl (Page 3)

pocl-devel — Portable OpenCL development discussion

You can subscribe to this list here.

2011	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (25)	Nov (11)	Dec (36)
2012	Jan (30)	Feb (4)	Mar (4)	Apr (7)	May (5)	Jun (31)	Jul (6)	Aug (19)	Sep (38)	Oct (30)	Nov (22)	Dec (19)
2013	Jan (55)	Feb (39)	Mar (77)	Apr (10)	May (83)	Jun (52)	Jul (86)	Aug (61)	Sep (29)	Oct (9)	Nov (38)	Dec (22)
2014	Jan (14)	Feb (29)	Mar (4)	Apr (19)	May (3)	Jun (27)	Jul (6)	Aug (5)	Sep (3)	Oct (48)	Nov	Dec (5)
2015	Jan (8)	Feb (2)	Mar (8)	Apr (16)	May	Jun	Jul (2)	Aug (1)	Sep (2)	Oct (13)	Nov (5)	Dec (2)
2016	Jan (26)	Feb (6)	Mar (8)	Apr (8)	May (2)	Jun	Jul	Aug (11)	Sep (3)	Oct (5)	Nov (14)	Dec (2)
2017	Jan (16)	Feb (4)	Mar (11)	Apr (4)	May (5)	Jun (5)	Jul (3)	Aug	Sep (6)	Oct	Nov (10)	Dec (6)
2018	Jan	Feb (21)	Mar (11)	Apr (3)	May (2)	Jun (8)	Jul	Aug (13)	Sep (6)	Oct (2)	Nov	Dec (11)
2019	Jan	Feb (5)	Mar (10)	Apr (2)	May	Jun	Jul	Aug	Sep (10)	Oct (4)	Nov	Dec
2020	Jan	Feb	Mar (1)	Apr (4)	May	Jun	Jul (3)	Aug	Sep (3)	Oct	Nov	Dec (4)
2021	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (4)	Aug	Sep	Oct (4)	Nov	Dec
2022	Jan	Feb	Mar (4)	Apr	May (11)	Jun (1)	Jul (3)	Aug	Sep (1)	Oct	Nov (2)	Dec (1)
2023	Jan (4)	Feb	Mar (1)	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec (1)

Flat | Threaded

<< < 1 2 3 4 5 .. 56 > >> (Page 3 of 56)

[pocl-devel] calling ordinary C functions from kernel functions in opencl

From: mashilamani s. <kan...@gm...> - 2020-07-06 18:16:44

Hi,
  I am new to opencl. I am trying to port huge amount of C code to run on
GPU with minimum modification to original C code.

In that direction, I am trying to call ordinary C functions from my kernel
functions.

I want to know if that is possible with opencl 1.2 which you have
implemented.

Assume that I am taking care that each kernel has its own address space,
its own stack etc which dont clash with other kernels.

I have NVIDIA GTX1060 6GB.

Thanks much,
mani

PS> I have tried in openmp and failed. In openmp, it is impossible to call
ordinary functions from a function which is executing on "target" device.

[pocl-devel] Portable Computing Language (pocl) v1.5 released

From: Pekka J. <pek...@tu...> - 2020-04-03 10:04:06

PoCL is a portable open source (MIT-licensed) implementation of the
OpenCL standard (1.2 with some 2.0 features supported). In addition to
being an easily portable multi-device open-source OpenCL implementation,
a major goal of this project is improving interoperability of diversity
of OpenCL-capable devices by integrating them to a single centrally
orchestrated platform.

Upstream PoCL currently supports various CPUs, NVIDIA GPUs via libcuda,
HSA-supported GPUs and TCE ASIPs (experimental, see:
http://openasip.org) It also is known to have multiple (private) ports.

Release Highlights
------------------

* Support for Clang/LLVM 10.0
* An easy to use kernel profiling feature
* Plenty of fixes and performance improvements

Please note that there's an official PoCL "maintenance policy" in place.
This text describes the policy and how you can get your favourite
project that uses OpenCL to remain regression free in the future PoCL
releases:

http://portablecl.org/docs/html/maintainer-policy.html

Acknowledgements
----------------

Most of the code that landed to the PoCL code base during this release
cycle was produced for the needs of research projects funded by various
sources. Customized Parallel Computing research group of Tampere
University, Finland likes to thank the ECSEL JU project FitOptiVis
(project number 783162) and HSA Foundation for funding most of the
development work in this release. Much appreciated!

Links
-----
Home page:         http://portablecl.org/
This announcement: http://portablecl.org/downloads/ANNOUNCEMENT
Change log:        http://portablecl.org/downloads/CHANGES
Download:          http://portablecl.org/download.html

-- 
Pekka

Re: [pocl-devel] Subdevices andCPU affinity mismatch

From: Michal B. (TAU) <mic...@tu...> - 2020-04-01 18:36:26

Hi,

Pocl's CPU driver does not touch the affinity, by default. IOW it leaves it to the kernel. By default it creates as much threads as there are logical CPU cores; this can be limited by
POCL_MAX_PTHREAD_COUNT env variable.

If you want to force Pocl to set the affinity of threads, set POCL_AFFINITY env var to 1.

As for:

"When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilised cores seems to be one less than actually specified"

This has been discussed already in https://github.com/pocl/pocl/issues/716.
The short version is that while kernel commands are split and processed on all driver threads, the other commands (clReadBuffer) are never split, and are processed in a single thread.

Regards,
-- mb

________________________________
From: Srijeeta Maity <sri...@gm...>
Sent: Wednesday, April 1, 2020 5:49 PM
To: poc...@li... <poc...@li...>
Subject: [pocl-devel] Subdevices andCPU affinity mismatch


Hello,

I found some observation while checking the CPU core affinity after creating OpenCL subdevices using PoCL.

1. I created 8 subdevices in a platform having 8CPU cores (Odroid XU4) and assigned some tasks on different subdevices . I checked  with htop  that the device affinity of those subdevices is not constant for the whole runtime (i.e. it is reset from time to time).
2. Some of those subdevices seem to be set to the same affinity. This can also be a side effect of the refresh rate in htop, so that I am not able to see when the affinity has changed.
3. When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilised cores seems to be one less than actually specified (i.e. partition equally to subdevices each having 4 compute units, will actually only use 3 cpu cores).

While checking for solution I found there was a similar bug reported for OpenCL in 2013.  https://software.intel.com/en-us/forums/opencl/topic/377784

Can you please share some pointer to solve this issue?

Thanks and regards,
Srijeeta

[pocl-devel] Subdevices andCPU affinity mismatch

From: Srijeeta M. <sri...@gm...> - 2020-04-01 14:49:48

Hello,

I found some observation while checking the CPU core affinity after
creating OpenCL subdevices using PoCL.

1. I created 8 subdevices in a platform having 8CPU cores (Odroid XU4) and
assigned some tasks on different subdevices . I checked  with htop  that
the device affinity of those subdevices is not constant for the whole
runtime (i.e. it is reset from time to time).
2. Some of those subdevices seem to be set to the same affinity. This can
also be a side effect of the refresh rate in htop, so that I am not able to
see when the affinity has changed.
3. When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the
number of utilised cores seems to be one less than actually specified (i.e.
partition equally to subdevices each having 4 compute units, will actually
only use 3 cpu cores).

While checking for solution I found there was a similar bug reported for
OpenCL in 2013.  https://software.intel.com/en-us/forums/opencl/topic/377784
Can you please share some pointer to solve this issue?

Thanks and regards,
Srijeeta

Re: [pocl-devel] Pocl 1.5

From: Michal B. (TAU) <mic...@tu...> - 2020-04-01 13:57:40

Hello,

RC2 with a few fixes is now available:

https://github.com/pocl/pocl/releases/tag/v1.5-RC2

Regards,
-- mb
________________________________
From: Michal Babej (TAU)
Sent: Monday, March 30, 2020 3:22 PM
To: poc...@li... <poc...@li...>
Subject: Pocl 1.5

Hi all,

LLVM 10 is out, so Pocl is preparing a new release.

Please help us by testing and reporting your results as instructed here:

https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.5

Thanks,
-- mb

[pocl-devel] Pocl 1.5

From: Michal B. (TAU) <mic...@tu...> - 2020-03-30 15:01:42

Hi all,

LLVM 10 is out, so Pocl is preparing a new release.

Please help us by testing and reporting your results as instructed here:

https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.5

Thanks,
-- mb

Re: [pocl-devel] Cross-compiling for RISC-V

From: Michal B. (TAU) <mic...@tu...> - 2019-10-31 11:32:21

Hello,
It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary).
There is no tool, because every OpenCL implementation (pocl, AMD SDK etc) uses its own binary format. There is no single format which every implementation accepts.

Pocl has a tool for compiling an OpenCL source to a binary, poclcc, which simply calls clCreateProgram/clBuildProgram/clGetProgramInfo(...CL_PROGRAM_BINARIES...) and stores the result in a file. That file is only loadable by pocl.
clang should be able to compile kernels, however, in praxis it doesn't
Clang can compile OpenCL to LLVM bitcode. However it cannot output binaries which OpenCL implementations can load, because (again) every implementation uses its own binary format. That's why you must always compile through implementation, not through Clang.

There is one somewhat common binary format, SPIR-V, but it currently works with only a few OpenCL implementations. And it probably won't help you, because a SPIR-V binary needs to be compiled for the target device by the implementation, so the target OpenCL implementation will need a compiler anyway.
I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU)
Sounds like you tried to compile some version of pocl with autotools, which is ancient and we don't provide any support for it anymore.
The following post seems to be outdated
It is outdated, and wrong. RISC-V is supported by upstream LLVM (from my quick googling), and pocl haven't used autotools for a few years now.

So to clear up the confusion around cross-compiling here:

1) cross compiling the implementation (pocl) itself for another device - this may or may not work with pocl, i haven't tried. The reason is that pocl can now be compiled natively on devices as weak as a Raspberry Pi with 1GB ram. If you have a RISC-V device with at least 1GB ram, you can probably compile pocl with LLVM and the CPU driver natively on it.

2) "cross compiling the OpenCL sources" - this doesn't exist. There is something similar: offline compilation, which is compiling OpenCL sources for a device which is not present in the system. However this has to be supported by the implementation. Pocl doesn't support this ATM.

What you can do with pocl is: 1) on a sufficiently powerful RISC-V machine, compile a native build with LLVM and use it to compile sources to pocl-binaries; and 2) on weaker RISC-V machines (<1G ram), you can compile a compiler-less version of pocl that only supports loading from binaries, and you can use binaries from 1) here.

Regards,
-- mb
________________________________
From: Raad Bahmani via pocl-devel <poc...@li...>
Sent: Thursday, October 31, 2019 11:36 AM
To: poc...@li... <poc...@li...>
Cc: Raad Bahmani <r.b...@ya...>
Subject: [pocl-devel] Cross-compiling for RISC-V

Dear all,

I’m trying to cross-compile OpenCL (on X86 for RISC-V). Thereby, the kernel (cl file) should be loaded from binary and this is the problem.

It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary).


I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU) but the generated binary can not be loaded on RISC-V.

According to its documentation ( https://clang.llvm.org/docs/UsersManual.html#opencl-features) clang should be able to compile kernels, however, in praxis it doesn't.

The following post seems to be outdated (and it has also not been accepted as an answer).
 https://stackoverflow.com/questions/41204224/configuring-pocl-for-riscv/42800248#42800248


It would be a great help if you could give some information (best case would be a simple working example:) ) about how binaries of OpenCL-Kernels can be generated for RISC-V.


Best,
Raad

[pocl-devel] Cross-compiling for RISC-V

From: Raad B. <r.b...@ya...> - 2019-10-31 09:36:18

Dear all,

I’m trying to cross-compile OpenCL (on X86 for RISC-V). Thereby, the kernel (cl file) should be loaded from binary and this is the problem.
It seems to me that there is no tool for (cross) compiling the kernel (cl file to binary).

I compiled pocl with the host_cpu='riscv64' (this flag apparently specifies the target CPU) but the generated binary can not be loaded on RISC-V.
According to its documentation ( https://clang.llvm.org/docs/UsersManual.html#opencl-features) clang should be able to compile kernels, however, in praxis it doesn't.
The following post seems to be outdated (and it has also not been accepted as an answer). https://stackoverflow.com/questions/41204224/configuring-pocl-for-riscv/42800248#42800248

It would be a great help if you could give some information (best case would be a simple working example:) ) about how binaries of OpenCL-Kernels can be generated for RISC-V.

Best,Raad

[pocl-devel] POCL 1.4: failed with 'Undefined symbol "clGetPlatformInfo"'

From: Hartmann, O. <o.h...@wa...> - 2019-10-26 13:20:30

POCL 1.4 on FreeBSD seems to have introduced a mutual bug, which is not
with POCL 1.3.

Running either FreeBSD 12.1-PRE, 12-STABLE (recent version) or CURRENT
(recent version), POCL 1.4 (compiled with CLANG/LLVM 9) fails using
devel/clpeak with:

[...]
Platform: Portable Computing Language
  Device: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz
    Driver version  : 1.4 (FreeBSD)
    Compute units   : 4
    Clock frequency : 2501 MHz

    Global memory bandwidth (GBPS)
      float   : clEnqueueNDRangeKernel (-63)
      Tests skipped

    Single-precision compute (GFLOPS)
clCreateBuffer (-61)
      Tests skipped

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
clCreateBuffer (-61)
      Tests skipped

    Integer compute (GIOPS)
clCreateBuffer (-61)
      Tests skipped

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 0.00
      enqueueReadBuffer          : 0.00
      enqueueMapBuffer(for read) : 0.00
        memcpy from mapped ptr   : inf
      enqueueUnmap(after write)  : 0.00
        memcpy to mapped ptr     : inf

    Kernel launch latency :
    dlopen("/root/.cache/pocl/kcache/EK/MAFMMCDPPMBCLMDMPBFICNBJFELHOLLNFBIDP/global_bandwidth_v1_local_offset/256-1-1-goffs0-smallgrid/global_bandwidth_v1_local_offset.so")
    failed with 'Undefined symbol "clGetPlatformInfo"'. note: missing
    symbols in the kernel binary might be reported as 'file not found'
    errors. Abort (core dumped)

The problem doesn't reveal itself using POCL 1.3.

Need some advice to track the problem down.

Kind regards,

oh

[pocl-devel] Pocl 1.4

From: Michal B. (TAU) <mic...@tu...> - 2019-10-14 10:25:26

Hello,

Pocl 1.4 has just been released.

Feature Highlights
------------------
  * Support for Clang/LLVM 9.0
  * Support for LLVM older than 6.0 has been removed (6.0 is still supported).
  * It's possible to build relocatable Pocl
  * Improved SPIR and SPIR-V support for CPU device

Links
-----
Home page: http://portablecl.org/
Change log: http://portablecl.org/downloads/CHANGES
Download: http://portablecl.org/download.html


Regards,

-- mb

Re: [pocl-devel] Pocl usage

From: Enrique G. <enr...@gm...> - 2019-09-30 15:20:30

Hi,

Im doing the build at a Virtual Machine on Vitural Box (Ubuntu 18.04).

My hardware:
GPU: GeForce GTX 1060
CPU: i7-8750H
RAM 16GB

The original OS is Windows10, the VM uses 6 cores and 10 RAM GB, i cannot
give to the VM more resources.

El lun., 30 sept. 2019 a las 17:13, Benson Muite (<
ben...@em...>) escribió:

> Hi,
>
> May want to indicate what platform (operating system and chip) you are
> building on as well as the error messages you get.
>
> OpenCL optimizations are not always portable.  Most people have focused on
> portability on different GPUs, but portability for CPU performance using
> POCL is also of interest.
>
> Benson
> On 9/30/19 5:45 PM, Enrique González wrote:
>
> Hello everyone,
>
> I post this message at this list because I thought that dev or other users
> could reply me if they read me, I am sorry for also send a private email.
>
> I am doing final degree project, where I have to discover the way for
> optimize OpenCL code in different ways, doing iterative optimizations. I
> thought that using POCL (because of the tool description) could help me for
> doing this work, but reading the documentation[1] at the official website,
> I do not found how to use this tool. I only see usual usage of gcc, at
> least it seems to me.
>
> Also, when building the project from github[2], it gives me the error I
> said in my last message, taht says that itdoes notencounterclang
> executables, when I have installed Clang and LLVM from LLVM Project[3],,
> the newest versions of this tools I do not have the "clang" command, but
> have the "clang-10" command, the newest one (version 10.0). I do not know
> why it does not detect clang at my PC because of this, when I install other
> things it detects the 10.0 clang version, but this build does not.
>
> If someone can give me some help, I would appreciate it.
> Thanks in advance.
>
> References:
> [1]  http://portablecl.org/docs/html/using.html
> [2] https://github.com/pocl <https://github.com/pocl/pocl>
> [3] https://github.com/llvm/llvm-project
>
>
> El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) (<
> pek...@tu...>) escribió:
>
>> Hi Enrique,
>>
>> It's not possible to help you based on this description of your
>> problem.
>>
>> Our group is rather busy at the moment so we cannot generally
>> answer to private support requests with short latency, so please send
>> your questions to this list instead of private emails.
>>
>> If you add adequate description of your problem and post it to
>> this mailing list, also other users/developers of POCL might be able to
>> help you.
>>
>> Thanks,
>> Pekka
>>
>> On 30.9.2019 12.22, Enrique González wrote:
>> > Dear all,
>> >
>> > I installed POCL (and an old LLVM version, so it requires one between
>> > 4.0 and 8.0). I have read the documentation at their page and it does
>> > not seems to say anything about how you can use this tool for optimize
>> > code, my boss neither has find anything, and we are looking for some
>> > site where we can learn how to use it and which flags we can prove for
>> > diferente results, because POCL looks like the tool we need for OpenCL
>> > pipeline optimizations.
>> >
>> > Could you help us with this problem?
>> >
>> > Thanks in advance.
>> >
>> >
>> > _______________________________________________
>> > pocl-devel mailing list
>> > poc...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pocl-devel
>> >
>>
>> --
>> Pekka
>>
>> _______________________________________________
>> pocl-devel mailing list
>> poc...@li...
>> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>>
>
>
> --
> Un saludo
>
> Enrique González
>
>
> _______________________________________________
> pocl-devel mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pocl-devel
>
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>


-- 
Un saludo

Enrique González

Re: [pocl-devel] Pocl usage

From: Benson M. <ben...@em...> - 2019-09-30 15:12:35

Hi,

May want to indicate what platform (operating system and chip) you are 
building on as well as the error messages you get.

OpenCL optimizations are not always portable.  Most people have focused 
on portability on different GPUs, but portability for CPU performance 
using POCL is also of interest.

Benson

On 9/30/19 5:45 PM, Enrique González wrote:
> Hello everyone,
>
> I post this message at this list because I thought that dev or other 
> users could reply me if they read me, I am sorry for also send a 
> private email.
>
> I am doing final degree project, where I have to discover the way for 
> optimize OpenCL code in different ways, doing iterative optimizations. 
> I thought that using POCL (because of the tool description) could help 
> me for doing this work, but reading the documentation[1] at the 
> official website, I do not found how to use this tool. I only see 
> usual usage of gcc, at least it seems to me.
>
> Also, when building the project from github[2], it gives me the error 
> I said in my last message, taht says that itdoes notencounterclang 
> executables, when I have installed Clang and LLVM from LLVM 
> Project[3],, the newest versions of this tools I do not have the 
> "clang" command, but have the "clang-10" command, the newest one 
> (version 10.0). I do not know why it does not detect clang at my PC 
> because of this, when I install other things it detects the 10.0 clang 
> version, but this build does not.
>
> If someone can give me some help, I would appreciate it.
> Thanks in advance.
>
> References:
> [1] http://portablecl.org/docs/html/using.html
> [2] https://github.com/pocl <https://github.com/pocl/pocl>
> [3] https://github.com/llvm/llvm-project
>
>
> El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) 
> (<pek...@tu... <mailto:pek...@tu...>>) 
> escribió:
>
>     Hi Enrique,
>
>     It's not possible to help you based on this description of your
>     problem.
>
>     Our group is rather busy at the moment so we cannot generally
>     answer to private support requests with short latency, so please send
>     your questions to this list instead of private emails.
>
>     If you add adequate description of your problem and post it to
>     this mailing list, also other users/developers of POCL might be
>     able to
>     help you.
>
>     Thanks,
>     Pekka
>
>     On 30.9.2019 12.22, Enrique González wrote:
>     > Dear all,
>     >
>     > I installed POCL (and an old LLVM version, so it requires one
>     between
>     > 4.0 and 8.0). I have read the documentation at their page and it
>     does
>     > not seems to say anything about how you can use this tool for
>     optimize
>     > code, my boss neither has find anything, and we are looking for
>     some
>     > site where we can learn how to use it and which flags we can
>     prove for
>     > diferente results, because POCL looks like the tool we need for
>     OpenCL
>     > pipeline optimizations.
>     >
>     > Could you help us with this problem?
>     >
>     > Thanks in advance.
>     >
>     >
>     > _______________________________________________
>     > pocl-devel mailing list
>     > poc...@li...
>     <mailto:poc...@li...>
>     > https://lists.sourceforge.net/lists/listinfo/pocl-devel
>     >
>
>     -- 
>     Pekka
>
>     _______________________________________________
>     pocl-devel mailing list
>     poc...@li...
>     <mailto:poc...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pocl-devel
>
>
>
> -- 
> Un saludo
>
> Enrique González
>
>
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] Pocl usage

From: Enrique G. <enr...@gm...> - 2019-09-30 14:46:02

Hello everyone,

I post this message at this list because I thought that dev or other users
could reply me if they read me, I am sorry for also send a private email.

I am doing final degree project, where I have to discover the way for
optimize OpenCL code in different ways, doing iterative optimizations. I
thought that using POCL (because of the tool description) could help me for
doing this work, but reading the documentation[1] at the official website,
I do not found how to use this tool. I only see usual usage of gcc, at
least it seems to me.

Also, when building the project from github[2], it gives me the error I
said in my last message, taht says that itdoes notencounterclang
executables, when I have installed Clang and LLVM from LLVM Project[3],,
the newest versions of this tools I do not have the "clang" command, but
have the "clang-10" command, the newest one (version 10.0). I do not know
why it does not detect clang at my PC because of this, when I install other
things it detects the 10.0 clang version, but this build does not.

If someone can give me some help, I would appreciate it.
Thanks in advance.

References:
[1]  http://portablecl.org/docs/html/using.html
[2] https://github.com/pocl <https://github.com/pocl/pocl>
[3] https://github.com/llvm/llvm-project

El lun., 30 sept. 2019 a las 14:44, Pekka Jääskeläinen (TAU) (<
pek...@tu...>) escribió:

> Hi Enrique,
>
> It's not possible to help you based on this description of your
> problem.
>
> Our group is rather busy at the moment so we cannot generally
> answer to private support requests with short latency, so please send
> your questions to this list instead of private emails.
>
> If you add adequate description of your problem and post it to
> this mailing list, also other users/developers of POCL might be able to
> help you.
>
> Thanks,
> Pekka
>
> On 30.9.2019 12.22, Enrique González wrote:
> > Dear all,
> >
> > I installed POCL (and an old LLVM version, so it requires one between
> > 4.0 and 8.0). I have read the documentation at their page and it does
> > not seems to say anything about how you can use this tool for optimize
> > code, my boss neither has find anything, and we are looking for some
> > site where we can learn how to use it and which flags we can prove for
> > diferente results, because POCL looks like the tool we need for OpenCL
> > pipeline optimizations.
> >
> > Could you help us with this problem?
> >
> > Thanks in advance.
> >
> >
> > _______________________________________________
> > pocl-devel mailing list
> > poc...@li...
> > https://lists.sourceforge.net/lists/listinfo/pocl-devel
> >
>
> --
> Pekka
>
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>

-- 
Un saludo

Enrique González

Re: [pocl-devel] Pocl usage

From: Pekka J. (T. <pek...@tu...> - 2019-09-30 12:43:49

Hi Enrique,

It's not possible to help you based on this description of your
problem.

Our group is rather busy at the moment so we cannot generally
answer to private support requests with short latency, so please send
your questions to this list instead of private emails.

If you add adequate description of your problem and post it to
this mailing list, also other users/developers of POCL might be able to 
help you.

Thanks,
Pekka

On 30.9.2019 12.22, Enrique González wrote:
> Dear all,
> 
> I installed POCL (and an old LLVM version, so it requires one between 
> 4.0 and 8.0). I have read the documentation at their page and it does 
> not seems to say anything about how you can use this tool for optimize 
> code, my boss neither has find anything, and we are looking for some 
> site where we can learn how to use it and which flags we can prove for 
> diferente results, because POCL looks like the tool we need for OpenCL 
> pipeline optimizations.
> 
> Could you help us with this problem?
> 
> Thanks in advance.
> 
> 
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
> 

-- 
Pekka

[pocl-devel] Pocl usage

From: Enrique G. <enr...@gm...> - 2019-09-30 09:23:15

Dear all,

I installed POCL (and an old LLVM version, so it requires one between 4.0
and 8.0). I have read the documentation at their page and it does not seems
to say anything about how you can use this tool for optimize code, my boss
neither has find anything, and we are looking for some site where we can
learn how to use it and which flags we can prove for diferente results,
because POCL looks like the tool we need for OpenCL pipeline optimizations.

Could you help us with this problem?

Thanks in advance.

Re: [pocl-devel] pocl 1.4

From: Michal B. (TAU) <mic...@tu...> - 2019-09-18 11:31:10

Hi Andreas,

Thanks for following up on the memory leaks. I've done a run on our end
to investigate, and I've found that 1.4 still consumes considerably more
memory than the Intel ICD, but it seems there has been some measure of
improvement from 1.2 to 1.4:


That's quite possible, for multiple reasons. It could still be a memleak in some code path my tests don't cover. But it could also be just memory consumption from pocl's design, or memory fragmentation from pocl's use of malloc/free. I have put some effort into keeping pocl free of memory *leaks*, but AFAIK nobody has optimized pocl for memory *usage*.

But if you're interested in trying something out, PR 768 has some larger changes WRT memory, that could improve memory usage in certain scenarios.

 -- mb

Re: [pocl-devel] pocl 1.4

From: Andreas K. <li...@in...> - 2019-09-18 03:00:09

Attachments: signature.asc

Hi Michal,

"Michal Babej (TAU)" <mic...@tu...> writes:
>> valgrind. Were you able to reproduce that on your end?
>
> Yes; I commented in the issue.

Thanks very much to you and Pekka for taking a look. We'll continue to
pursue this and will share what we find.

>> Possible memory leak in POCL:
>
> https://gitlab.tiker.net/inducer/pytential/issues/131
>
> We have a buildbot which tracks memory leaks by building pocl with
> Address Sanitizer (cmake -DENABLE_ASAN=ON). I have discovered that due
> to some faulty logic in cmake/ctest, some memleaks went unnoticed;
> this has now been fixed in release_1_4 & master. Ofc it's possible
> that it's a different leak; the test coverage is not full.
>
> Unfortunately i don't have any good advice for finding memory leaks
> when using pocl in Python. ASan may be impossible to use, and valgrind
> will probably report a lot of false positives (and be slow). Also i'm
> not sure it's possible to force Python to release an object, which
> makes it even more problematic. With C/C++ it's much simpler - if you
> call clUnloadPlatformCompiler() at the end of your program, and you've
> properly released all cl_* objects, pocl will additionally release all
> LLVM related data, including long-lived static data structures; after
> this there should only be a single memory leak, from LLVM signal
> handlers.

Thanks for following up on the memory leaks. I've done a run on our end
to investigate, and I've found that 1.4 still consumes considerably more
memory than the Intel ICD, but it seems there has been some measure of
improvement from 1.2 to 1.4:

https://gitlab.tiker.net/inducer/pytential/issues/131

Best,
Andreas

Re: [pocl-devel] pocl 1.4

From: Michal B. (TAU) <mic...@tu...> - 2019-09-11 11:49:50

Hi Andreas,

> valgrind. Were you able to reproduce that on your end?

Yes; I commented in the issue.

> Possible memory leak in POCL:

https://gitlab.tiker.net/inducer/pytential/issues/131

We have a buildbot which tracks memory leaks by building pocl with Address Sanitizer (cmake  -DENABLE_ASAN=ON). I have discovered that due to some faulty logic in cmake/ctest, some memleaks went unnoticed; this has now been fixed in release_1_4 & master. Ofc it's possible that it's a different leak; the test coverage is not full.

Unfortunately i don't have any good advice for finding memory leaks when using pocl in Python. ASan may be impossible to use, and valgrind will probably report a lot of false positives (and be slow). Also i'm not sure it's possible to force Python to release an object, which makes it even more problematic. With C/C++ it's much simpler - if you call clUnloadPlatformCompiler() at the end of your program, and you've properly released all cl_* objects, pocl will additionally release all LLVM related data, including long-lived static data structures; after this there should only be a single memory leak, from LLVM signal handlers.

Regards,
-- mb

Re: [pocl-devel] pocl 1.4

From: Andreas K. <li...@in...> - 2019-09-10 15:06:24

Attachments: signature.asc

Hi Michal, all,

"Michal Babej (TAU)" <mic...@tu...> writes:
> LLVM 9.0 is nearing release, so it's also time for a new pocl release.
> I've created RC1 on friday but already a few bugfixes appeared, so it's RC2:
>
> https://github.com/pocl/pocl/releases/tag/v1.4-RC2
>
> Please test and report your results here:
> https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.4

Thank you for all the work you putting into pocl! I wanted to ask about
this issue which I reported a while ago:

https://github.com/pocl/pocl/issues/757

It seems indicative of a possible miscompilation, with a simple
reproducer (essentially a vector copy) that gets flagged by
valgrind. Were you able to reproduce that on your end?

Another potential issue that we're tracking on our end is a possible
memory leak in POCL:

https://gitlab.tiker.net/inducer/pytential/issues/131

Thanks again,
Andreas

[pocl-devel] pocl 1.4

From: Michal B. (TAU) <mic...@tu...> - 2019-09-09 17:01:35

Hello everyone,

LLVM 9.0 is nearing release, so it's also time for a new pocl release.
I've created RC1 on friday but already a few bugfixes appeared, so it's RC2:

https://github.com/pocl/pocl/releases/tag/v1.4-RC2

Please test and report your results here:
https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.4

Regards,
-- mb

[pocl-devel] Portable Computing Language (pocl) v1.3 released

From: Pekka J. (T. <pek...@tu...> - 2019-04-04 14:45:35

Pocl is a portable open source (MIT-licensed) implementation of the 
OpenCL standard (1.2 with some 2.0 features supported). In addition to
being an easily portable multi-device open-source OpenCL implementation,
another major goal of this project is improving performance portability
of OpenCL programs with the kernel compiler and the task runtime,
reducing the need for target-dependent manual optimizations.

Upstream pocl currently supports various CPU devices, NVIDIA GPUs via
libcuda, HSA-supported GPUs and TCE ASIPs (experimental, see 
http://openasip.org), with various known private adopters as well.

Release Highlights
------------------

* Support for Clang/LLVM 8.0.
* Support ICD on OSX.
* A lot of major and minor bug fixes and internal code cleanups.

Notes
-----

* Support code for older than LLVM 6.0 will be removed in the
beginning of the next release cycle to clean up the code base. If you
for older LLVM versions in the future pocl releases and wish to
maintain it (run a buildbot and fix issues), let us know!
* Support for Vecmathlib has been removed.

Please note that there's an official pocl "maintenance policy" in place.
This text describes the policy and how you can get your favourite
project that uses OpenCL to remain regression free in the future pocl 
releases:

   http://portablecl.org/docs/html/maintainer-policy.html

Acknowledgements
----------------

Most of the code that landed to the pocl code base during this release
cycle was produced for the needs of research projects funded by various
sources. Customized Parallel Computing research group of Tampere
University, Finland likes to thank the Academy of Finland (funding
decision 297548), Business Finland (FiDiPro project StreamPro,
1846/31/2014), ECSEL JU project FitOptiVis (project number 783162) and
HSA Foundation for funding most of the development work in this release.
Much appreciated!

Links
-----
Home page:         http://portablecl.org/
This announcement: http://portablecl.org/downloads/ANNOUNCEMENT
Change log:        http://portablecl.org/downloads/CHANGES
Download:          http://portablecl.org/download.html

-- 
Pekka

[pocl-devel] pocl 1.3 rc1

From: Pekka J. (T. <pek...@tu...> - 2019-04-02 11:39:42

Hi all,

Time to test a new pocl release!

Please help us shaking any remaining regressions out by testing and
reporting your results as instructed here:

https://github.com/pocl/pocl/wiki/Release-testing-of-pocl-1.3

BR,
-- 
Pekka

Re: [pocl-devel] POCL Crash in vmovaps operation

From: Timo B. <tim...@gm...> - 2019-03-14 10:53:45

Hi,

I have opened an issue and summarised the discussion so far in

https://github.com/pocl/pocl/issues/701

Best wishes

Timo


On Thu, 14 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) <
pek...@tu...> wrote:

> Hi Timo,
>
> Can you please open an issue of this, it's easier to track
> in Github?
>
> Thanks,
> Pekka
>
> On 14.3.2019 1.49, Timo Betcke wrote:
> > Hi,
> >
> > I have pinned down the next failed test. It still seems related to the
> > multi-indexing even with your bugfixed version. The corresponding gist
> > is here:
> >
> > https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e
> >
> > At the end of the kernel all entries in shapeIntegral should have the
> > value 1.0. However, while shapeIntegral[0][0] is correct,
> > shapeIntegral[1][0] is not.
> > If I move the second print statement for shapeIntegral[1][0] into the
> > for loop the variables are correctly updated.
> >
> > Just something for context. The actual kernel from which this example is
> > derived, is doing a finite element integral on a triangle. The test
> > values are from the test space and the trial values from the domain
> > space. Via C Macros I am adapting the dimensions of the arrays to the
> > actual number of test and trial functions. The crash happens for trial
> > dimension 1 and test dimension 3.
> >
> > Thanks again for your help. I am excited about getting Pocl to work with
> > our software.
> >
> > Best wishes
> >
> > Timo
> >
> >
> > On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm...
> > <mailto:tim...@gm...>> wrote:
> >
> >     Hi Michal,
> >
> >     thanks for the bugfix. The crashes have now disappeared and more
> >     tests are passing with your bugfix version. However, several unit
> >     tests still fail that work with AMD and Intel. Briefly looking at
> >     the results I see lots of nan entries in the pocl output. I will try
> >     to pin this down more and then report back to you.
> >
> >     Best wishes
> >
> >     Timo
> >
> >     On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU)
> >     <mic...@tu... <mailto:mic...@tu...>> wrote:
> >
> >         Hello,
> >
> >
> >         I remember trying to fix this bug last year, but then i got
> >         sidetracked by other things. (BTW it would be preferable if you
> >         reported bugs as github issues in the future)
> >
> >
> >         Anyway, i've hopefully fixed it. Can you test your program with
> >         master branch from https://github.com/franz/pocl
> >
> >
> >         Regards,
> >
> >         -- mb
> >
> >
>  ------------------------------------------------------------------------
> >         *From:* Timo Betcke <tim...@gm...
> >         <mailto:tim...@gm...>>
> >         *Sent:* Friday, March 8, 2019 3:48:34 AM
> >         *To:* Portable Computing Language development discussion
> >         *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation
> >         Dear Pekka,
> >
> >         I have now cooked up a small example that crashes in vmovaps.
> >         The gist is available here (uses PyOpenCL to run):
> >
> >         https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a
> >
> >         The example is fairly nonsensical and was derived by reducing a
> >         crashing kernel as far as possible while retaining the crash.
> >         It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on
> >         an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary
> >         I can create an environment with updated llvm, but would like to
> >         avoid it (unless it is llvm 6 related). Pocl is the most recent
> >         git master.
> >
> >         The code crashes at the following assembler instructions:
> >
> >             0x00007fffe02575e3 <+195>:   xor    r9d,r9d
> >             0x00007fffe02575e6 <+198>:   xor    r10d,r10d
> >             0x00007fffe02575e9 <+201>:   nop    DWORD PTR [rax+0x0]
> >             0x00007fffe02575f0 <+208>:   mov    QWORD PTR [rdx+r9*1],0x0
> >         => 0x00007fffe02575f8 <+216>:   vmovaps XMMWORD PTR
> >         [rdi+r9*1-0x10],xmm0
> >             0x00007fffe02575ff <+223>:   mov    QWORD PTR [rdi+r9*1],0x0
> >             0x00007fffe0257607 <+231>:   vmovaps XMMWORD PTR
> >         [rdx+r9*1-0x10],xmm0
> >             0x00007fffe025760e <+238>:   vmovupd xmm1,XMMWORD PTR
> >         [rdi+r9*1-0x8]
> >             0x00007fffe0257615 <+245>:   vaddpd xmm1,xmm1,XMMWORD PTR
> >         [rdx+r9*1-0x8]
> >             0x00007fffe025761c <+252>:   vmovupd XMMWORD PTR
> >         [rdx+r9*1-0x8],xmm1
> >             0x00007fffe0257623 <+259>:   mov    r8,r11
> >             0x00007fffe0257626 <+262>:   sar    r8,0x20
> >             0x00007fffe025762a <+266>:   lea    rsi,[r8+r8*2]
> >
> >         Removing any of the for loops or the localResult variable (or
> >         removing its __local attribute) leads to the kernel working on
> Pocl.
> >         It would be great to get to the source of this. Please let me
> >         know if you need more information from me.
> >
> >         Best wishes
> >
> >         Timo
> >
> >
> >         On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...
> >         <mailto:tim...@gm...>> wrote:
> >
> >             Hi Pekka,
> >
> >             thanks for your hints and the link. I had one buffer in the
> >             kernel call that had a cast from a float type to a vector
> >             type. I have fixed this. But the segfault remains. In the
> >             next few days I will try to cook up a simple example that
> >             produces the segfault. Fortunately, the kernel itself is not
> >             too complicated, so should be able to reduce it.
> >
> >             Best wishes
> >
> >             Timo
> >
> >             On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU)
> >             <pek...@tu...
> >             <mailto:pek...@tu...>> wrote:
> >
> >                 Yes, now that I look at it more closely,
> >                 your stack trace looks _very_ much to the common data
> >                 alignment
> >                 issues people have. I think this might be worth a FAQ
> >                 item somewhere.
> >
> >
> https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc
> >
> >                 On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote:
> >                  > Hi Timo,
> >                  >
> >                  > Shooting in the dark here, but since just yesterday I
> >                 debugged a similar
> >                  > looking issue
> >                  > which was caused by an illegal cast in the source
> >                 code from float* to
> >                  > float4*. It trusted
> >                  > the alignment is still fine, which it wasn't after
> >                 vectorization. A very
> >                  > target specific programming
> >                  > error which many ocl targets can easily hide.
> >                  >
> >                  > If this is something else, we need a test case,
> >                 smaller the better, to
> >                  > help you here.
> >                  > Before opening an issue though, please with the
> >                 latest master and LLVM 8.
> >                  >
> >                  > Pekka
> >                  >
> >                  >
> >
>  ------------------------------------------------------------------------
> >                  > *From:* Timo Betcke <tim...@gm...
> >                 <mailto:tim...@gm...>>
> >                  > *Sent:* Tuesday, March 5, 2019 11:27:12 PM
> >                  > *To:* Portable Computing Language development
> discussion
> >                  > *Subject:* [pocl-devel] POCL Crash in vmovaps
> operation
> >                  > Dear Pocl community,
> >                  >
> >                  > I was just testing the newest Pocl Version (github
> >                 master branch) with
> >                  > our software. During execution of one of our kernels
> >                 Pocl crashed.
> >                  > Disassembling the crash shows the following
> >                 operations during the crash:
> >                  >
> >                  > ------------------
> >                  >     0x00007fffb81efdd8 <+664>:   vmulpd xmm2,xmm2,xmm6
> >                  >     0x00007fffb81efddc <+668>:   vsubpd xmm2,xmm5,xmm2
> >                  >     0x00007fffb81efde0 <+672>:   vpermilpd
> xmm5,xmm4,0x1
> >                  >     0x00007fffb81efde6 <+678>:   vmulsd xmm3,xmm3,xmm5
> >                  >     0x00007fffb81efdea <+682>:   vmulsd
> xmm4,xmm15,xmm4
> >                  >     0x00007fffb81efdee <+686>:   vsubsd xmm3,xmm3,xmm4
> >                  >     0x00007fffb81efdf2 <+690>:   vpermilpd
> xmm1,xmm1,0x1
> >                  >     0x00007fffb81efdf8 <+696>:   vmulpd xmm0,xmm0,xmm1
> >                  >     0x00007fffb81efdfc <+700>:   vpermilpd
> xmm1,xmm0,0x1
> >                  >     0x00007fffb81efe02 <+706>:   vsubsd xmm0,xmm0,xmm1
> >                  >     0x00007fffb81efe06 <+710>:   lea
> rsi,[rdx+rdx*2]
> >                  >     0x00007fffb81efe0a <+714>:   mov    rdx,QWORD PTR
> >                 [rbx+0x38]
> >                  > => 0x00007fffb81efe0e <+718>:   vmovaps XMMWORD PTR
> >                 [rdx+rsi*8],xmm12
> >                  > ---Type <return> to continue, or q <return> to quit---
> >                  >     0x00007fffb81efe13 <+723>:   mov    QWORD PTR
> >                 [rbx+0x40],rsi
> >                  >     0x00007fffb81efe17 <+727>:   mov    QWORD PTR
> >                 [rdx+rsi*8+0x10],0x0
> >                  >     0x00007fffb81efe20 <+736>:   vinsertf32x4
> >                 ymm1,ymm16,xmm0,0x1
> >                  > -----------------------------
> >                  > This seems to be a similar bug that I discussed a
> >                 year ago on the
> >                  > mailing list. See the thread here:
> >                  >
> >
> https://www.mail-archive.com/poc...@li.../msg01087.html
> .
> >
> >                  > In summary, the issue was related to us using arrays
> >                 of arrays within
> >                  > our kernels and pocl creating wrong code for it.
> >                  >
> >                  > During that time a gist was suggested for Pocl, which
> >                 I tested but did
> >                  > not improve things. Afterwards I let it drop for a
> >                 while as we were in
> >                  > early development and had loads of building sites.
> >                 But our software is
> >                  > now close to release ready and it would be great to
> >                 get it working with
> >                  > pocl.
> >                  >
> >                  > Any help would be greatly appreciated.
> >                  > Best wishes
> >                  >
> >                  > Timo
> >                  >
> >                  > --
> >                  > Timo Betcke
> >                  > Professor of Computational Mathematics
> >                  > University College London
> >                  > Department of Mathematics
> >                  > E-Mail: t.b...@uc...
> >                 <mailto:t.b...@uc...> <mailto:t.b...@uc...
> >                 <mailto:t.b...@uc...>>
> >                  > Tel.: +44 (0) 20-3108-4068
> >                  >
> >                  >
> >                  > _______________________________________________
> >                  > pocl-devel mailing list
> >                  > poc...@li...
> >                 <mailto:poc...@li...>
> >                  >
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
> >                  >
> >
> >                 --
> >                 Pekka
> >
> >
> >                 _______________________________________________
> >                 pocl-devel mailing list
> >                 poc...@li...
> >                 <mailto:poc...@li...>
> >                 https://lists.sourceforge.net/lists/listinfo/pocl-devel
> >
> >
> >
> >             --
> >             Timo Betcke
> >             Professor of Computational Mathematics
> >             University College London
> >             Department of Mathematics
> >             E-Mail: t.b...@uc... <mailto:t.b...@uc...>
> >             Tel.: +44 (0) 20-3108-4068
> >
> >
> >
> >         --
> >         Timo Betcke
> >         Professor of Computational Mathematics
> >         University College London
> >         Department of Mathematics
> >         E-Mail: t.b...@uc... <mailto:t.b...@uc...>
> >         Tel.: +44 (0) 20-3108-4068
> >         _______________________________________________
> >         pocl-devel mailing list
> >         poc...@li...
> >         <mailto:poc...@li...>
> >         https://lists.sourceforge.net/lists/listinfo/pocl-devel
> >
> >
> >
> >     --
> >     Timo Betcke
> >     Professor of Computational Mathematics
> >     University College London
> >     Department of Mathematics
> >     E-Mail: t.b...@uc... <mailto:t.b...@uc...>
> >     Tel.: +44 (0) 20-3108-4068
> >
> >
> >
> > --
> > Timo Betcke
> > Professor of Computational Mathematics
> > University College London
> > Department of Mathematics
> > E-Mail: t.b...@uc... <mailto:t.b...@uc...>
> > Tel.: +44 (0) 20-3108-4068
> >
> >
> > _______________________________________________
> > pocl-devel mailing list
> > poc...@li...
> > https://lists.sourceforge.net/lists/listinfo/pocl-devel
> >
>
> --
> Pekka
>
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>


-- 
Timo Betcke
Professor of Computational Mathematics
University College London
Department of Mathematics
E-Mail: t.b...@uc...
Tel.: +44 (0) 20-3108-4068

Re: [pocl-devel] POCL Crash in vmovaps operation

From: Pekka J. (T. <pek...@tu...> - 2019-03-14 10:20:10

Hi Timo,

Can you please open an issue of this, it's easier to track
in Github?

Thanks,
Pekka

On 14.3.2019 1.49, Timo Betcke wrote:
> Hi,
> 
> I have pinned down the next failed test. It still seems related to the 
> multi-indexing even with your bugfixed version. The corresponding gist 
> is here:
> 
> https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e
> 
> At the end of the kernel all entries in shapeIntegral should have the 
> value 1.0. However, while shapeIntegral[0][0] is correct, 
> shapeIntegral[1][0] is not.
> If I move the second print statement for shapeIntegral[1][0] into the 
> for loop the variables are correctly updated.
> 
> Just something for context. The actual kernel from which this example is 
> derived, is doing a finite element integral on a triangle. The test 
> values are from the test space and the trial values from the domain 
> space. Via C Macros I am adapting the dimensions of the arrays to the 
> actual number of test and trial functions. The crash happens for trial 
> dimension 1 and test dimension 3.
> 
> Thanks again for your help. I am excited about getting Pocl to work with 
> our software.
> 
> Best wishes
> 
> Timo
> 
> 
> On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm... 
> <mailto:tim...@gm...>> wrote:
> 
>     Hi Michal,
> 
>     thanks for the bugfix. The crashes have now disappeared and more
>     tests are passing with your bugfix version. However, several unit
>     tests still fail that work with AMD and Intel. Briefly looking at
>     the results I see lots of nan entries in the pocl output. I will try
>     to pin this down more and then report back to you.
> 
>     Best wishes
> 
>     Timo
> 
>     On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU)
>     <mic...@tu... <mailto:mic...@tu...>> wrote:
> 
>         Hello,
> 
> 
>         I remember trying to fix this bug last year, but then i got
>         sidetracked by other things. (BTW it would be preferable if you
>         reported bugs as github issues in the future)
> 
> 
>         Anyway, i've hopefully fixed it. Can you test your program with
>         master branch from https://github.com/franz/pocl
> 
> 
>         Regards,
> 
>         -- mb
> 
>         ------------------------------------------------------------------------
>         *From:* Timo Betcke <tim...@gm...
>         <mailto:tim...@gm...>>
>         *Sent:* Friday, March 8, 2019 3:48:34 AM
>         *To:* Portable Computing Language development discussion
>         *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation
>         Dear Pekka,
> 
>         I have now cooked up a small example that crashes in vmovaps.
>         The gist is available here (uses PyOpenCL to run):
> 
>         https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a
> 
>         The example is fairly nonsensical and was derived by reducing a
>         crashing kernel as far as possible while retaining the crash.
>         It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on
>         an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary
>         I can create an environment with updated llvm, but would like to
>         avoid it (unless it is llvm 6 related). Pocl is the most recent
>         git master.
> 
>         The code crashes at the following assembler instructions:
> 
>             0x00007fffe02575e3 <+195>:   xor    r9d,r9d
>             0x00007fffe02575e6 <+198>:   xor    r10d,r10d
>             0x00007fffe02575e9 <+201>:   nop    DWORD PTR [rax+0x0]
>             0x00007fffe02575f0 <+208>:   mov    QWORD PTR [rdx+r9*1],0x0
>         => 0x00007fffe02575f8 <+216>:   vmovaps XMMWORD PTR
>         [rdi+r9*1-0x10],xmm0
>             0x00007fffe02575ff <+223>:   mov    QWORD PTR [rdi+r9*1],0x0
>             0x00007fffe0257607 <+231>:   vmovaps XMMWORD PTR
>         [rdx+r9*1-0x10],xmm0
>             0x00007fffe025760e <+238>:   vmovupd xmm1,XMMWORD PTR
>         [rdi+r9*1-0x8]
>             0x00007fffe0257615 <+245>:   vaddpd xmm1,xmm1,XMMWORD PTR
>         [rdx+r9*1-0x8]
>             0x00007fffe025761c <+252>:   vmovupd XMMWORD PTR
>         [rdx+r9*1-0x8],xmm1
>             0x00007fffe0257623 <+259>:   mov    r8,r11
>             0x00007fffe0257626 <+262>:   sar    r8,0x20
>             0x00007fffe025762a <+266>:   lea    rsi,[r8+r8*2]
> 
>         Removing any of the for loops or the localResult variable (or
>         removing its __local attribute) leads to the kernel working on Pocl.
>         It would be great to get to the source of this. Please let me
>         know if you need more information from me.
> 
>         Best wishes
> 
>         Timo
> 
> 
>         On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...
>         <mailto:tim...@gm...>> wrote:
> 
>             Hi Pekka,
> 
>             thanks for your hints and the link. I had one buffer in the
>             kernel call that had a cast from a float type to a vector
>             type. I have fixed this. But the segfault remains. In the
>             next few days I will try to cook up a simple example that
>             produces the segfault. Fortunately, the kernel itself is not
>             too complicated, so should be able to reduce it.
> 
>             Best wishes
> 
>             Timo
> 
>             On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU)
>             <pek...@tu...
>             <mailto:pek...@tu...>> wrote:
> 
>                 Yes, now that I look at it more closely,
>                 your stack trace looks _very_ much to the common data
>                 alignment
>                 issues people have. I think this might be worth a FAQ
>                 item somewhere.
> 
>                 https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc
> 
>                 On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote:
>                  > Hi Timo,
>                  >
>                  > Shooting in the dark here, but since just yesterday I
>                 debugged a similar
>                  > looking issue
>                  > which was caused by an illegal cast in the source
>                 code from float* to
>                  > float4*. It trusted
>                  > the alignment is still fine, which it wasn't after
>                 vectorization. A very
>                  > target specific programming
>                  > error which many ocl targets can easily hide.
>                  >
>                  > If this is something else, we need a test case,
>                 smaller the better, to
>                  > help you here.
>                  > Before opening an issue though, please with the
>                 latest master and LLVM 8.
>                  >
>                  > Pekka
>                  >
>                  >
>                 ------------------------------------------------------------------------
>                  > *From:* Timo Betcke <tim...@gm...
>                 <mailto:tim...@gm...>>
>                  > *Sent:* Tuesday, March 5, 2019 11:27:12 PM
>                  > *To:* Portable Computing Language development discussion
>                  > *Subject:* [pocl-devel] POCL Crash in vmovaps operation
>                  > Dear Pocl community,
>                  >
>                  > I was just testing the newest Pocl Version (github
>                 master branch) with
>                  > our software. During execution of one of our kernels
>                 Pocl crashed.
>                  > Disassembling the crash shows the following
>                 operations during the crash:
>                  >
>                  > ------------------
>                  >     0x00007fffb81efdd8 <+664>:   vmulpd xmm2,xmm2,xmm6
>                  >     0x00007fffb81efddc <+668>:   vsubpd xmm2,xmm5,xmm2
>                  >     0x00007fffb81efde0 <+672>:   vpermilpd xmm5,xmm4,0x1
>                  >     0x00007fffb81efde6 <+678>:   vmulsd xmm3,xmm3,xmm5
>                  >     0x00007fffb81efdea <+682>:   vmulsd xmm4,xmm15,xmm4
>                  >     0x00007fffb81efdee <+686>:   vsubsd xmm3,xmm3,xmm4
>                  >     0x00007fffb81efdf2 <+690>:   vpermilpd xmm1,xmm1,0x1
>                  >     0x00007fffb81efdf8 <+696>:   vmulpd xmm0,xmm0,xmm1
>                  >     0x00007fffb81efdfc <+700>:   vpermilpd xmm1,xmm0,0x1
>                  >     0x00007fffb81efe02 <+706>:   vsubsd xmm0,xmm0,xmm1
>                  >     0x00007fffb81efe06 <+710>:   lea    rsi,[rdx+rdx*2]
>                  >     0x00007fffb81efe0a <+714>:   mov    rdx,QWORD PTR
>                 [rbx+0x38]
>                  > => 0x00007fffb81efe0e <+718>:   vmovaps XMMWORD PTR
>                 [rdx+rsi*8],xmm12
>                  > ---Type <return> to continue, or q <return> to quit---
>                  >     0x00007fffb81efe13 <+723>:   mov    QWORD PTR
>                 [rbx+0x40],rsi
>                  >     0x00007fffb81efe17 <+727>:   mov    QWORD PTR
>                 [rdx+rsi*8+0x10],0x0
>                  >     0x00007fffb81efe20 <+736>:   vinsertf32x4
>                 ymm1,ymm16,xmm0,0x1
>                  > -----------------------------
>                  > This seems to be a similar bug that I discussed a
>                 year ago on the
>                  > mailing list. See the thread here:
>                  >
>                 https://www.mail-archive.com/poc...@li.../msg01087.html.
> 
>                  > In summary, the issue was related to us using arrays
>                 of arrays within
>                  > our kernels and pocl creating wrong code for it.
>                  >
>                  > During that time a gist was suggested for Pocl, which
>                 I tested but did
>                  > not improve things. Afterwards I let it drop for a
>                 while as we were in
>                  > early development and had loads of building sites.
>                 But our software is
>                  > now close to release ready and it would be great to
>                 get it working with
>                  > pocl.
>                  >
>                  > Any help would be greatly appreciated.
>                  > Best wishes
>                  >
>                  > Timo
>                  >
>                  > --
>                  > Timo Betcke
>                  > Professor of Computational Mathematics
>                  > University College London
>                  > Department of Mathematics
>                  > E-Mail: t.b...@uc...
>                 <mailto:t.b...@uc...> <mailto:t.b...@uc...
>                 <mailto:t.b...@uc...>>
>                  > Tel.: +44 (0) 20-3108-4068
>                  >
>                  >
>                  > _______________________________________________
>                  > pocl-devel mailing list
>                  > poc...@li...
>                 <mailto:poc...@li...>
>                  > https://lists.sourceforge.net/lists/listinfo/pocl-devel
>                  >
> 
>                 -- 
>                 Pekka
> 
> 
>                 _______________________________________________
>                 pocl-devel mailing list
>                 poc...@li...
>                 <mailto:poc...@li...>
>                 https://lists.sourceforge.net/lists/listinfo/pocl-devel
> 
> 
> 
>             -- 
>             Timo Betcke
>             Professor of Computational Mathematics
>             University College London
>             Department of Mathematics
>             E-Mail: t.b...@uc... <mailto:t.b...@uc...>
>             Tel.: +44 (0) 20-3108-4068
> 
> 
> 
>         -- 
>         Timo Betcke
>         Professor of Computational Mathematics
>         University College London
>         Department of Mathematics
>         E-Mail: t.b...@uc... <mailto:t.b...@uc...>
>         Tel.: +44 (0) 20-3108-4068
>         _______________________________________________
>         pocl-devel mailing list
>         poc...@li...
>         <mailto:poc...@li...>
>         https://lists.sourceforge.net/lists/listinfo/pocl-devel
> 
> 
> 
>     -- 
>     Timo Betcke
>     Professor of Computational Mathematics
>     University College London
>     Department of Mathematics
>     E-Mail: t.b...@uc... <mailto:t.b...@uc...>
>     Tel.: +44 (0) 20-3108-4068
> 
> 
> 
> -- 
> Timo Betcke
> Professor of Computational Mathematics
> University College London
> Department of Mathematics
> E-Mail: t.b...@uc... <mailto:t.b...@uc...>
> Tel.: +44 (0) 20-3108-4068
> 
> 
> _______________________________________________
> pocl-devel mailing list
> poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
> 

-- 
Pekka

Re: [pocl-devel] POCL Crash in vmovaps operation

From: Timo B. <tim...@gm...> - 2019-03-14 00:50:09

Hi,

I have pinned down the next failed test. It still seems related to the
multi-indexing even with your bugfixed version. The corresponding gist is
here:

https://gist.github.com/tbetcke/0bf7e12a2f3ab8032339cc38b8441b6e

At the end of the kernel all entries in shapeIntegral should have the value
1.0. However, while shapeIntegral[0][0] is correct, shapeIntegral[1][0] is
not.
If I move the second print statement for shapeIntegral[1][0] into the for
loop the variables are correctly updated.

Just something for context. The actual kernel from which this example is
derived, is doing a finite element integral on a triangle. The test values
are from the test space and the trial values from the domain space. Via C
Macros I am adapting the dimensions of the arrays to the actual number of
test and trial functions. The crash happens for trial dimension 1 and test
dimension 3.

Thanks again for your help. I am excited about getting Pocl to work with
our software.

Best wishes

Timo


On Wed, 13 Mar 2019 at 23:23, Timo Betcke <tim...@gm...> wrote:

> Hi Michal,
>
> thanks for the bugfix. The crashes have now disappeared and more tests are
> passing with your bugfix version. However, several unit tests still fail
> that work with AMD and Intel. Briefly looking at the results I see lots of
> nan entries in the pocl output. I will try to pin this down more and then
> report back to you.
>
> Best wishes
>
> Timo
>
> On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU) <mic...@tu...>
> wrote:
>
>> Hello,
>>
>>
>> I remember trying to fix this bug last year, but then i got sidetracked
>> by other things. (BTW it would be preferable if you reported bugs as github
>> issues in the future)
>>
>>
>> Anyway, i've hopefully fixed it. Can you test your program with master
>> branch from https://github.com/franz/pocl
>>
>>
>> Regards,
>>
>> -- mb
>> ------------------------------
>> *From:* Timo Betcke <tim...@gm...>
>> *Sent:* Friday, March 8, 2019 3:48:34 AM
>> *To:* Portable Computing Language development discussion
>> *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation
>>
>> Dear Pekka,
>>
>> I have now cooked up a small example that crashes in vmovaps. The gist is
>> available here (uses PyOpenCL to run):
>>
>> https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a
>>
>> The example is fairly nonsensical and was derived by reducing a crashing
>> kernel as far as possible while retaining the crash.
>> It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on an AMD
>> GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary
>> I can create an environment with updated llvm, but would like to avoid it
>> (unless it is llvm 6 related). Pocl is the most recent git master.
>>
>> The code crashes at the following assembler instructions:
>>
>>    0x00007fffe02575e3 <+195>:   xor    r9d,r9d
>>    0x00007fffe02575e6 <+198>:   xor    r10d,r10d
>>    0x00007fffe02575e9 <+201>:   nop    DWORD PTR [rax+0x0]
>>    0x00007fffe02575f0 <+208>:   mov    QWORD PTR [rdx+r9*1],0x0
>> => 0x00007fffe02575f8 <+216>:   vmovaps XMMWORD PTR [rdi+r9*1-0x10],xmm0
>>    0x00007fffe02575ff <+223>:   mov    QWORD PTR [rdi+r9*1],0x0
>>    0x00007fffe0257607 <+231>:   vmovaps XMMWORD PTR [rdx+r9*1-0x10],xmm0
>>    0x00007fffe025760e <+238>:   vmovupd xmm1,XMMWORD PTR [rdi+r9*1-0x8]
>>    0x00007fffe0257615 <+245>:   vaddpd xmm1,xmm1,XMMWORD PTR
>> [rdx+r9*1-0x8]
>>    0x00007fffe025761c <+252>:   vmovupd XMMWORD PTR [rdx+r9*1-0x8],xmm1
>>    0x00007fffe0257623 <+259>:   mov    r8,r11
>>    0x00007fffe0257626 <+262>:   sar    r8,0x20
>>    0x00007fffe025762a <+266>:   lea    rsi,[r8+r8*2]
>>
>> Removing any of the for loops or the localResult variable (or removing
>> its __local attribute) leads to the kernel working on Pocl.
>> It would be great to get to the source of this. Please let me know if you
>> need more information from me.
>>
>> Best wishes
>>
>> Timo
>>
>>
>> On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...> wrote:
>>
>> Hi Pekka,
>>
>> thanks for your hints and the link. I had one buffer in the kernel call
>> that had a cast from a float type to a vector type. I have fixed this. But
>> the segfault remains. In the next few days I will try to cook up a simple
>> example that produces the segfault. Fortunately, the kernel itself is not
>> too complicated, so should be able to reduce it.
>>
>> Best wishes
>>
>> Timo
>>
>> On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) <
>> pek...@tu...> wrote:
>>
>> Yes, now that I look at it more closely,
>> your stack trace looks _very_ much to the common data alignment
>> issues people have. I think this might be worth a FAQ item somewhere.
>>
>>
>> https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc
>>
>> On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote:
>> > Hi Timo,
>> >
>> > Shooting in the dark here, but since just yesterday I debugged a
>> similar
>> > looking issue
>> > which was caused by an illegal cast in the source code from float* to
>> > float4*. It trusted
>> > the alignment is still fine, which it wasn't after vectorization. A
>> very
>> > target specific programming
>> > error which many ocl targets can easily hide.
>> >
>> > If this is something else, we need a test case, smaller the better, to
>> > help you here.
>> > Before opening an issue though, please with the latest master and LLVM
>> 8.
>> >
>> > Pekka
>> >
>> > ------------------------------------------------------------------------
>> > *From:* Timo Betcke <tim...@gm...>
>> > *Sent:* Tuesday, March 5, 2019 11:27:12 PM
>> > *To:* Portable Computing Language development discussion
>> > *Subject:* [pocl-devel] POCL Crash in vmovaps operation
>> > Dear Pocl community,
>> >
>> > I was just testing the newest Pocl Version (github master branch) with
>> > our software. During execution of one of our kernels Pocl crashed.
>> > Disassembling the crash shows the following operations during the crash:
>> >
>> > ------------------
>> >     0x00007fffb81efdd8 <+664>:   vmulpd xmm2,xmm2,xmm6
>> >     0x00007fffb81efddc <+668>:   vsubpd xmm2,xmm5,xmm2
>> >     0x00007fffb81efde0 <+672>:   vpermilpd xmm5,xmm4,0x1
>> >     0x00007fffb81efde6 <+678>:   vmulsd xmm3,xmm3,xmm5
>> >     0x00007fffb81efdea <+682>:   vmulsd xmm4,xmm15,xmm4
>> >     0x00007fffb81efdee <+686>:   vsubsd xmm3,xmm3,xmm4
>> >     0x00007fffb81efdf2 <+690>:   vpermilpd xmm1,xmm1,0x1
>> >     0x00007fffb81efdf8 <+696>:   vmulpd xmm0,xmm0,xmm1
>> >     0x00007fffb81efdfc <+700>:   vpermilpd xmm1,xmm0,0x1
>> >     0x00007fffb81efe02 <+706>:   vsubsd xmm0,xmm0,xmm1
>> >     0x00007fffb81efe06 <+710>:   lea    rsi,[rdx+rdx*2]
>> >     0x00007fffb81efe0a <+714>:   mov    rdx,QWORD PTR [rbx+0x38]
>> > => 0x00007fffb81efe0e <+718>:   vmovaps XMMWORD PTR [rdx+rsi*8],xmm12
>> > ---Type <return> to continue, or q <return> to quit---
>> >     0x00007fffb81efe13 <+723>:   mov    QWORD PTR [rbx+0x40],rsi
>> >     0x00007fffb81efe17 <+727>:   mov    QWORD PTR [rdx+rsi*8+0x10],0x0
>> >     0x00007fffb81efe20 <+736>:   vinsertf32x4 ymm1,ymm16,xmm0,0x1
>> > -----------------------------
>> > This seems to be a similar bug that I discussed a year ago on the
>> > mailing list. See the thread here:
>> >
>> https://www.mail-archive.com/poc...@li.../msg01087.html.
>>
>> > In summary, the issue was related to us using arrays of arrays within
>> > our kernels and pocl creating wrong code for it.
>> >
>> > During that time a gist was suggested for Pocl, which I tested but did
>> > not improve things. Afterwards I let it drop for a while as we were in
>> > early development and had loads of building sites. But our software is
>> > now close to release ready and it would be great to get it working with
>> > pocl.
>> >
>> > Any help would be greatly appreciated.
>> > Best wishes
>> >
>> > Timo
>> >
>> > --
>> > Timo Betcke
>> > Professor of Computational Mathematics
>> > University College London
>> > Department of Mathematics
>> > E-Mail: t.b...@uc... <mailto:t.b...@uc...>
>> > Tel.: +44 (0) 20-3108-4068
>> >
>> >
>> > _______________________________________________
>> > pocl-devel mailing list
>> > poc...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pocl-devel
>> >
>>
>> --
>> Pekka
>>
>>
>> _______________________________________________
>> pocl-devel mailing list
>> poc...@li...
>> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>>
>>
>>
>> --
>> Timo Betcke
>> Professor of Computational Mathematics
>> University College London
>> Department of Mathematics
>> E-Mail: t.b...@uc...
>> Tel.: +44 (0) 20-3108-4068
>>
>>
>>
>> --
>> Timo Betcke
>> Professor of Computational Mathematics
>> University College London
>> Department of Mathematics
>> E-Mail: t.b...@uc...
>> Tel.: +44 (0) 20-3108-4068
>> _______________________________________________
>> pocl-devel mailing list
>> poc...@li...
>> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>>
>
>
> --
> Timo Betcke
> Professor of Computational Mathematics
> University College London
> Department of Mathematics
> E-Mail: t.b...@uc...
> Tel.: +44 (0) 20-3108-4068
>


-- 
Timo Betcke
Professor of Computational Mathematics
University College London
Department of Mathematics
E-Mail: t.b...@uc...
Tel.: +44 (0) 20-3108-4068

5 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 5 .. 56 > >> (Page 3 of 56)