You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
(11) |
Dec
(36) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(30) |
Feb
(4) |
Mar
(4) |
Apr
(7) |
May
(5) |
Jun
(31) |
Jul
(6) |
Aug
(19) |
Sep
(38) |
Oct
(30) |
Nov
(22) |
Dec
(19) |
2013 |
Jan
(55) |
Feb
(39) |
Mar
(77) |
Apr
(10) |
May
(83) |
Jun
(52) |
Jul
(86) |
Aug
(61) |
Sep
(29) |
Oct
(9) |
Nov
(38) |
Dec
(22) |
2014 |
Jan
(14) |
Feb
(29) |
Mar
(4) |
Apr
(19) |
May
(3) |
Jun
(27) |
Jul
(6) |
Aug
(5) |
Sep
(3) |
Oct
(48) |
Nov
|
Dec
(5) |
2015 |
Jan
(8) |
Feb
(2) |
Mar
(8) |
Apr
(16) |
May
|
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
(13) |
Nov
(5) |
Dec
(2) |
2016 |
Jan
(26) |
Feb
(6) |
Mar
(8) |
Apr
(8) |
May
(2) |
Jun
|
Jul
|
Aug
(11) |
Sep
(3) |
Oct
(5) |
Nov
(14) |
Dec
(2) |
2017 |
Jan
(16) |
Feb
(4) |
Mar
(11) |
Apr
(4) |
May
(5) |
Jun
(5) |
Jul
(3) |
Aug
|
Sep
(6) |
Oct
|
Nov
(10) |
Dec
(6) |
2018 |
Jan
|
Feb
(21) |
Mar
(11) |
Apr
(3) |
May
(2) |
Jun
(8) |
Jul
|
Aug
(13) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
(11) |
2019 |
Jan
|
Feb
(5) |
Mar
(10) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(10) |
Oct
(4) |
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(4) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(11) |
Jun
(1) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
(2) |
Dec
(1) |
2023 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Timo B. <tim...@gm...> - 2019-03-13 23:23:23
|
Hi Michal, thanks for the bugfix. The crashes have now disappeared and more tests are passing with your bugfix version. However, several unit tests still fail that work with AMD and Intel. Briefly looking at the results I see lots of nan entries in the pocl output. I will try to pin this down more and then report back to you. Best wishes Timo On Mon, 11 Mar 2019 at 10:50, Michal Babej (TAU) <mic...@tu...> wrote: > Hello, > > > I remember trying to fix this bug last year, but then i got sidetracked by > other things. (BTW it would be preferable if you reported bugs as github > issues in the future) > > > Anyway, i've hopefully fixed it. Can you test your program with master > branch from https://github.com/franz/pocl > > > Regards, > > -- mb > ------------------------------ > *From:* Timo Betcke <tim...@gm...> > *Sent:* Friday, March 8, 2019 3:48:34 AM > *To:* Portable Computing Language development discussion > *Subject:* Re: [pocl-devel] POCL Crash in vmovaps operation > > Dear Pekka, > > I have now cooked up a small example that crashes in vmovaps. The gist is > available here (uses PyOpenCL to run): > > https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a > > The example is fairly nonsensical and was derived by reducing a crashing > kernel as far as possible while retaining the crash. > It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on an AMD > GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary > I can create an environment with updated llvm, but would like to avoid it > (unless it is llvm 6 related). Pocl is the most recent git master. > > The code crashes at the following assembler instructions: > > 0x00007fffe02575e3 <+195>: xor r9d,r9d > 0x00007fffe02575e6 <+198>: xor r10d,r10d > 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] > 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 > => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR [rdi+r9*1-0x10],xmm0 > 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 > 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR [rdx+r9*1-0x10],xmm0 > 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR [rdi+r9*1-0x8] > 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR [rdx+r9*1-0x8] > 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR [rdx+r9*1-0x8],xmm1 > 0x00007fffe0257623 <+259>: mov r8,r11 > 0x00007fffe0257626 <+262>: sar r8,0x20 > 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] > > Removing any of the for loops or the localResult variable (or removing its > __local attribute) leads to the kernel working on Pocl. > It would be great to get to the source of this. Please let me know if you > need more information from me. > > Best wishes > > Timo > > > On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...> wrote: > > Hi Pekka, > > thanks for your hints and the link. I had one buffer in the kernel call > that had a cast from a float type to a vector type. I have fixed this. But > the segfault remains. In the next few days I will try to cook up a simple > example that produces the segfault. Fortunately, the kernel itself is not > too complicated, so should be able to reduce it. > > Best wishes > > Timo > > On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) < > pek...@tu...> wrote: > > Yes, now that I look at it more closely, > your stack trace looks _very_ much to the common data alignment > issues people have. I think this might be worth a FAQ item somewhere. > > > https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc > > On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > > Hi Timo, > > > > Shooting in the dark here, but since just yesterday I debugged a similar > > looking issue > > which was caused by an illegal cast in the source code from float* to > > float4*. It trusted > > the alignment is still fine, which it wasn't after vectorization. A very > > target specific programming > > error which many ocl targets can easily hide. > > > > If this is something else, we need a test case, smaller the better, to > > help you here. > > Before opening an issue though, please with the latest master and LLVM 8. > > > > Pekka > > > > ------------------------------------------------------------------------ > > *From:* Timo Betcke <tim...@gm...> > > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > > *To:* Portable Computing Language development discussion > > *Subject:* [pocl-devel] POCL Crash in vmovaps operation > > Dear Pocl community, > > > > I was just testing the newest Pocl Version (github master branch) with > > our software. During execution of one of our kernels Pocl crashed. > > Disassembling the crash shows the following operations during the crash: > > > > ------------------ > > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 > > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 > > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 > > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 > > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] > > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] > > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 > > ---Type <return> to continue, or q <return> to quit--- > > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi > > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 > > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 > > ----------------------------- > > This seems to be a similar bug that I discussed a year ago on the > > mailing list. See the thread here: > > > https://www.mail-archive.com/poc...@li.../msg01087.html. > > > In summary, the issue was related to us using arrays of arrays within > > our kernels and pocl creating wrong code for it. > > > > During that time a gist was suggested for Pocl, which I tested but did > > not improve things. Afterwards I let it drop for a while as we were in > > early development and had loads of building sites. But our software is > > now close to release ready and it would be great to get it working with > > pocl. > > > > Any help would be greatly appreciated. > > Best wishes > > > > Timo > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... > Tel.: +44 (0) 20-3108-4068 > > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... > Tel.: +44 (0) 20-3108-4068 > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |
From: Michal B. (TAU) <mic...@tu...> - 2019-03-11 10:50:35
|
Hello, I remember trying to fix this bug last year, but then i got sidetracked by other things. (BTW it would be preferable if you reported bugs as github issues in the future) Anyway, i've hopefully fixed it. Can you test your program with master branch from https://github.com/franz/pocl Regards, -- mb ________________________________ From: Timo Betcke <tim...@gm...> Sent: Friday, March 8, 2019 3:48:34 AM To: Portable Computing Language development discussion Subject: Re: [pocl-devel] POCL Crash in vmovaps operation Dear Pekka, I have now cooked up a small example that crashes in vmovaps. The gist is available here (uses PyOpenCL to run): https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a The example is fairly nonsensical and was derived by reducing a crashing kernel as far as possible while retaining the crash. It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary I can create an environment with updated llvm, but would like to avoid it (unless it is llvm 6 related). Pocl is the most recent git master. The code crashes at the following assembler instructions: 0x00007fffe02575e3 <+195>: xor r9d,r9d 0x00007fffe02575e6 <+198>: xor r10d,r10d 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR [rdi+r9*1-0x10],xmm0 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR [rdx+r9*1-0x10],xmm0 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR [rdi+r9*1-0x8] 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR [rdx+r9*1-0x8] 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR [rdx+r9*1-0x8],xmm1 0x00007fffe0257623 <+259>: mov r8,r11 0x00007fffe0257626 <+262>: sar r8,0x20 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] Removing any of the for loops or the localResult variable (or removing its __local attribute) leads to the kernel working on Pocl. It would be great to get to the source of this. Please let me know if you need more information from me. Best wishes Timo On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...<mailto:tim...@gm...>> wrote: Hi Pekka, thanks for your hints and the link. I had one buffer in the kernel call that had a cast from a float type to a vector type. I have fixed this. But the segfault remains. In the next few days I will try to cook up a simple example that produces the segfault. Fortunately, the kernel itself is not too complicated, so should be able to reduce it. Best wishes Timo On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) <pek...@tu...<mailto:pek...@tu...>> wrote: Yes, now that I look at it more closely, your stack trace looks _very_ much to the common data alignment issues people have. I think this might be worth a FAQ item somewhere. https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > Hi Timo, > > Shooting in the dark here, but since just yesterday I debugged a similar > looking issue > which was caused by an illegal cast in the source code from float* to > float4*. It trusted > the alignment is still fine, which it wasn't after vectorization. A very > target specific programming > error which many ocl targets can easily hide. > > If this is something else, we need a test case, smaller the better, to > help you here. > Before opening an issue though, please with the latest master and LLVM 8. > > Pekka > > ------------------------------------------------------------------------ > *From:* Timo Betcke <tim...@gm...<mailto:tim...@gm...>> > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > *To:* Portable Computing Language development discussion > *Subject:* [pocl-devel] POCL Crash in vmovaps operation > Dear Pocl community, > > I was just testing the newest Pocl Version (github master branch) with > our software. During execution of one of our kernels Pocl crashed. > Disassembling the crash shows the following operations during the crash: > > ------------------ > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 > ---Type <return> to continue, or q <return> to quit--- > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 > ----------------------------- > This seems to be a similar bug that I discussed a year ago on the > mailing list. See the thread here: > https://www.mail-archive.com/poc...@li.../msg01087.html. > In summary, the issue was related to us using arrays of arrays within > our kernels and pocl creating wrong code for it. > > During that time a gist was suggested for Pocl, which I tested but did > not improve things. Afterwards I let it drop for a while as we were in > early development and had loads of building sites. But our software is > now close to release ready and it would be great to get it working with > pocl. > > Any help would be greatly appreciated. > Best wishes > > Timo > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc...<mailto:t.b...@uc...> <mailto:t.b...@uc...<mailto:t.b...@uc...>> > Tel.: +44 (0) 20-3108-4068 > > > _______________________________________________ > pocl-devel mailing list > poc...@li...<mailto:poc...@li...> > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Pekka _______________________________________________ pocl-devel mailing list poc...@li...<mailto:poc...@li...> https://lists.sourceforge.net/lists/listinfo/pocl-devel -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc...<mailto:t.b...@uc...> Tel.: +44 (0) 20-3108-4068 -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc...<mailto:t.b...@uc...> Tel.: +44 (0) 20-3108-4068 |
From: Timo B. <tim...@gm...> - 2019-03-08 01:48:55
|
Dear Pekka, I have now cooked up a small example that crashes in vmovaps. The gist is available here (uses PyOpenCL to run): https://gist.github.com/tbetcke/b4da01465b587e85cc88801aafdced0a The example is fairly nonsensical and was derived by reducing a crashing kernel as far as possible while retaining the crash. It runs fine under Intel CPU OpenCL on a Xeon and Rocm OpenCL on an AMD GPU. My platform is Ubuntu 18.04 with llvm 6. If necessary I can create an environment with updated llvm, but would like to avoid it (unless it is llvm 6 related). Pocl is the most recent git master. The code crashes at the following assembler instructions: 0x00007fffe02575e3 <+195>: xor r9d,r9d 0x00007fffe02575e6 <+198>: xor r10d,r10d 0x00007fffe02575e9 <+201>: nop DWORD PTR [rax+0x0] 0x00007fffe02575f0 <+208>: mov QWORD PTR [rdx+r9*1],0x0 => 0x00007fffe02575f8 <+216>: vmovaps XMMWORD PTR [rdi+r9*1-0x10],xmm0 0x00007fffe02575ff <+223>: mov QWORD PTR [rdi+r9*1],0x0 0x00007fffe0257607 <+231>: vmovaps XMMWORD PTR [rdx+r9*1-0x10],xmm0 0x00007fffe025760e <+238>: vmovupd xmm1,XMMWORD PTR [rdi+r9*1-0x8] 0x00007fffe0257615 <+245>: vaddpd xmm1,xmm1,XMMWORD PTR [rdx+r9*1-0x8] 0x00007fffe025761c <+252>: vmovupd XMMWORD PTR [rdx+r9*1-0x8],xmm1 0x00007fffe0257623 <+259>: mov r8,r11 0x00007fffe0257626 <+262>: sar r8,0x20 0x00007fffe025762a <+266>: lea rsi,[r8+r8*2] Removing any of the for loops or the localResult variable (or removing its __local attribute) leads to the kernel working on Pocl. It would be great to get to the source of this. Please let me know if you need more information from me. Best wishes Timo On Wed, 6 Mar 2019 at 21:21, Timo Betcke <tim...@gm...> wrote: > Hi Pekka, > > thanks for your hints and the link. I had one buffer in the kernel call > that had a cast from a float type to a vector type. I have fixed this. But > the segfault remains. In the next few days I will try to cook up a simple > example that produces the segfault. Fortunately, the kernel itself is not > too complicated, so should be able to reduce it. > > Best wishes > > Timo > > On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) < > pek...@tu...> wrote: > >> Yes, now that I look at it more closely, >> your stack trace looks _very_ much to the common data alignment >> issues people have. I think this might be worth a FAQ item somewhere. >> >> >> https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc >> >> On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: >> > Hi Timo, >> > >> > Shooting in the dark here, but since just yesterday I debugged a >> similar >> > looking issue >> > which was caused by an illegal cast in the source code from float* to >> > float4*. It trusted >> > the alignment is still fine, which it wasn't after vectorization. A >> very >> > target specific programming >> > error which many ocl targets can easily hide. >> > >> > If this is something else, we need a test case, smaller the better, to >> > help you here. >> > Before opening an issue though, please with the latest master and LLVM >> 8. >> > >> > Pekka >> > >> > ------------------------------------------------------------------------ >> > *From:* Timo Betcke <tim...@gm...> >> > *Sent:* Tuesday, March 5, 2019 11:27:12 PM >> > *To:* Portable Computing Language development discussion >> > *Subject:* [pocl-devel] POCL Crash in vmovaps operation >> > Dear Pocl community, >> > >> > I was just testing the newest Pocl Version (github master branch) with >> > our software. During execution of one of our kernels Pocl crashed. >> > Disassembling the crash shows the following operations during the crash: >> > >> > ------------------ >> > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 >> > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 >> > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 >> > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 >> > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 >> > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 >> > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 >> > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 >> > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 >> > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 >> > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] >> > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] >> > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 >> > ---Type <return> to continue, or q <return> to quit--- >> > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi >> > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 >> > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 >> > ----------------------------- >> > This seems to be a similar bug that I discussed a year ago on the >> > mailing list. See the thread here: >> > >> https://www.mail-archive.com/poc...@li.../msg01087.html. >> >> > In summary, the issue was related to us using arrays of arrays within >> > our kernels and pocl creating wrong code for it. >> > >> > During that time a gist was suggested for Pocl, which I tested but did >> > not improve things. Afterwards I let it drop for a while as we were in >> > early development and had loads of building sites. But our software is >> > now close to release ready and it would be great to get it working with >> > pocl. >> > >> > Any help would be greatly appreciated. >> > Best wishes >> > >> > Timo >> > >> > -- >> > Timo Betcke >> > Professor of Computational Mathematics >> > University College London >> > Department of Mathematics >> > E-Mail: t.b...@uc... <mailto:t.b...@uc...> >> > Tel.: +44 (0) 20-3108-4068 >> > >> > >> > _______________________________________________ >> > pocl-devel mailing list >> > poc...@li... >> > https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > >> >> -- >> Pekka >> >> >> _______________________________________________ >> pocl-devel mailing list >> poc...@li... >> https://lists.sourceforge.net/lists/listinfo/pocl-devel >> > > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... > Tel.: +44 (0) 20-3108-4068 > -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |
From: Timo B. <tim...@gm...> - 2019-03-06 21:22:16
|
Hi Pekka, thanks for your hints and the link. I had one buffer in the kernel call that had a cast from a float type to a vector type. I have fixed this. But the segfault remains. In the next few days I will try to cook up a simple example that produces the segfault. Fortunately, the kernel itself is not too complicated, so should be able to reduce it. Best wishes Timo On Wed, 6 Mar 2019 at 10:20, Pekka Jääskeläinen (TAU) < pek...@tu...> wrote: > Yes, now that I look at it more closely, > your stack trace looks _very_ much to the common data alignment > issues people have. I think this might be worth a FAQ item somewhere. > > > https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc > > On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > > Hi Timo, > > > > Shooting in the dark here, but since just yesterday I debugged a similar > > looking issue > > which was caused by an illegal cast in the source code from float* to > > float4*. It trusted > > the alignment is still fine, which it wasn't after vectorization. A very > > target specific programming > > error which many ocl targets can easily hide. > > > > If this is something else, we need a test case, smaller the better, to > > help you here. > > Before opening an issue though, please with the latest master and LLVM 8. > > > > Pekka > > > > ------------------------------------------------------------------------ > > *From:* Timo Betcke <tim...@gm...> > > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > > *To:* Portable Computing Language development discussion > > *Subject:* [pocl-devel] POCL Crash in vmovaps operation > > Dear Pocl community, > > > > I was just testing the newest Pocl Version (github master branch) with > > our software. During execution of one of our kernels Pocl crashed. > > Disassembling the crash shows the following operations during the crash: > > > > ------------------ > > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 > > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 > > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 > > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 > > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] > > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] > > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 > > ---Type <return> to continue, or q <return> to quit--- > > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi > > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 > > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 > > ----------------------------- > > This seems to be a similar bug that I discussed a year ago on the > > mailing list. See the thread here: > > > https://www.mail-archive.com/poc...@li.../msg01087.html. > > > In summary, the issue was related to us using arrays of arrays within > > our kernels and pocl creating wrong code for it. > > > > During that time a gist was suggested for Pocl, which I tested but did > > not improve things. Afterwards I let it drop for a while as we were in > > early development and had loads of building sites. But our software is > > now close to release ready and it would be great to get it working with > > pocl. > > > > Any help would be greatly appreciated. > > Best wishes > > > > Timo > > > > -- > > Timo Betcke > > Professor of Computational Mathematics > > University College London > > Department of Mathematics > > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > > Tel.: +44 (0) 20-3108-4068 > > > > > > _______________________________________________ > > pocl-devel mailing list > > poc...@li... > > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > > -- > Pekka > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |
From: Pekka J. (T. <pek...@tu...> - 2019-03-06 10:20:32
|
Yes, now that I look at it more closely, your stack trace looks _very_ much to the common data alignment issues people have. I think this might be worth a FAQ item somewhere. https://stackoverflow.com/questions/5983389/how-to-align-stack-at-32-byte-boundary-in-gcc On 6.3.2019 8.45, Pekka Jääskeläinen (TAU) wrote: > Hi Timo, > > Shooting in the dark here, but since just yesterday I debugged a similar > looking issue > which was caused by an illegal cast in the source code from float* to > float4*. It trusted > the alignment is still fine, which it wasn't after vectorization. A very > target specific programming > error which many ocl targets can easily hide. > > If this is something else, we need a test case, smaller the better, to > help you here. > Before opening an issue though, please with the latest master and LLVM 8. > > Pekka > > ------------------------------------------------------------------------ > *From:* Timo Betcke <tim...@gm...> > *Sent:* Tuesday, March 5, 2019 11:27:12 PM > *To:* Portable Computing Language development discussion > *Subject:* [pocl-devel] POCL Crash in vmovaps operation > Dear Pocl community, > > I was just testing the newest Pocl Version (github master branch) with > our software. During execution of one of our kernels Pocl crashed. > Disassembling the crash shows the following operations during the crash: > > ------------------ > 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 > 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 > 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 > 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 > 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 > 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 > 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 > 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 > 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 > 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 > 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] > 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] > => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 > ---Type <return> to continue, or q <return> to quit--- > 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi > 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 > 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 > ----------------------------- > This seems to be a similar bug that I discussed a year ago on the > mailing list. See the thread here: > https://www.mail-archive.com/poc...@li.../msg01087.html. > In summary, the issue was related to us using arrays of arrays within > our kernels and pocl creating wrong code for it. > > During that time a gist was suggested for Pocl, which I tested but did > not improve things. Afterwards I let it drop for a while as we were in > early development and had loads of building sites. But our software is > now close to release ready and it would be great to get it working with > pocl. > > Any help would be greatly appreciated. > Best wishes > > Timo > > -- > Timo Betcke > Professor of Computational Mathematics > University College London > Department of Mathematics > E-Mail: t.b...@uc... <mailto:t.b...@uc...> > Tel.: +44 (0) 20-3108-4068 > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Pekka |
From: Pekka J. (T. <pek...@tu...> - 2019-03-06 08:00:43
|
Hi Timo, Shooting in the dark here, but since just yesterday I debugged a similar looking issue which was caused by an illegal cast in the source code from float* to float4*. It trusted the alignment is still fine, which it wasn't after vectorization. A very target specific programming error which many ocl targets can easily hide. If this is something else, we need a test case, smaller the better, to help you here. Before opening an issue though, please with the latest master and LLVM 8. Pekka ________________________________ From: Timo Betcke <tim...@gm...> Sent: Tuesday, March 5, 2019 11:27:12 PM To: Portable Computing Language development discussion Subject: [pocl-devel] POCL Crash in vmovaps operation Dear Pocl community, I was just testing the newest Pocl Version (github master branch) with our software. During execution of one of our kernels Pocl crashed. Disassembling the crash shows the following operations during the crash: ------------------ 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 ---Type <return> to continue, or q <return> to quit--- 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 ----------------------------- This seems to be a similar bug that I discussed a year ago on the mailing list. See the thread here: https://www.mail-archive.com/poc...@li.../msg01087.html. In summary, the issue was related to us using arrays of arrays within our kernels and pocl creating wrong code for it. During that time a gist was suggested for Pocl, which I tested but did not improve things. Afterwards I let it drop for a while as we were in early development and had loads of building sites. But our software is now close to release ready and it would be great to get it working with pocl. Any help would be greatly appreciated. Best wishes Timo -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc...<mailto:t.b...@uc...> Tel.: +44 (0) 20-3108-4068 |
From: Timo B. <tim...@gm...> - 2019-03-05 22:27:33
|
Dear Pocl community, I was just testing the newest Pocl Version (github master branch) with our software. During execution of one of our kernels Pocl crashed. Disassembling the crash shows the following operations during the crash: ------------------ 0x00007fffb81efdd8 <+664>: vmulpd xmm2,xmm2,xmm6 0x00007fffb81efddc <+668>: vsubpd xmm2,xmm5,xmm2 0x00007fffb81efde0 <+672>: vpermilpd xmm5,xmm4,0x1 0x00007fffb81efde6 <+678>: vmulsd xmm3,xmm3,xmm5 0x00007fffb81efdea <+682>: vmulsd xmm4,xmm15,xmm4 0x00007fffb81efdee <+686>: vsubsd xmm3,xmm3,xmm4 0x00007fffb81efdf2 <+690>: vpermilpd xmm1,xmm1,0x1 0x00007fffb81efdf8 <+696>: vmulpd xmm0,xmm0,xmm1 0x00007fffb81efdfc <+700>: vpermilpd xmm1,xmm0,0x1 0x00007fffb81efe02 <+706>: vsubsd xmm0,xmm0,xmm1 0x00007fffb81efe06 <+710>: lea rsi,[rdx+rdx*2] 0x00007fffb81efe0a <+714>: mov rdx,QWORD PTR [rbx+0x38] => 0x00007fffb81efe0e <+718>: vmovaps XMMWORD PTR [rdx+rsi*8],xmm12 ---Type <return> to continue, or q <return> to quit--- 0x00007fffb81efe13 <+723>: mov QWORD PTR [rbx+0x40],rsi 0x00007fffb81efe17 <+727>: mov QWORD PTR [rdx+rsi*8+0x10],0x0 0x00007fffb81efe20 <+736>: vinsertf32x4 ymm1,ymm16,xmm0,0x1 ----------------------------- This seems to be a similar bug that I discussed a year ago on the mailing list. See the thread here: https://www.mail-archive.com/poc...@li.../msg01087.html. In summary, the issue was related to us using arrays of arrays within our kernels and pocl creating wrong code for it. During that time a gist was suggested for Pocl, which I tested but did not improve things. Afterwards I let it drop for a while as we were in early development and had loads of building sites. But our software is now close to release ready and it would be great to get it working with pocl. Any help would be greatly appreciated. Best wishes Timo -- Timo Betcke Professor of Computational Mathematics University College London Department of Mathematics E-Mail: t.b...@uc... Tel.: +44 (0) 20-3108-4068 |
From: Savonichev, A. <and...@in...> - 2019-02-18 13:08:17
|
Hi Pekka, Pekka Jääskeläinen wrote: > Savonichev, Andrew kirjoitti 14.2.2019 klo 18.44: >> As far as I know, POCL does not support several features that we >> support in our proprietary CPU runtime, so if we decide to switch >> our development to POCL (with your agreement, of course), we'll need >> to port all these features. This includes the features required for >> the OpenCL 2.1 Conformance (we have 100% pass rate now), as well as >> various performance improvements for Intel CPU. >> >> What is your opinion on this? > > I'm glad you are considering this. Open source helps best when it > reduces duplicated work and the collaboration via a shared code base > leads to improved results for everyone. > >> OpenCL runtimes, by contrast, are all independent, and have nothing >> shared (except maybe for the ICD loader). This situation seems bad >> for developers because of effort duplication, but it is also bad from >> a user perspective: anyone who tries to make an OpenCL program >> portable between different OpenCL implementations, will inevitably >> deal with quirks and bugs of each runtime. > > Right, I personally don't see the benefit in users having to deal with > different quirks especially if the different runtimes all conform to a > standard and if it's possible to use a fully open option which everyone > can improve. > >> This makes me wonder whether this situation will improve if we have >> an OpenCL runtime under the LLVM.org umbrella. LLVM community values >> portability a lot, so POCL seems to be a good choice for this >> purpose. Have you ever considered upstreaming POCL to LLVM.org? > > My longer term hope with pocl has been to clean up the generic de-SPMD > kernel compiler passes gradually and try to contribute them to the main > LLVM project as IR passes. Too bad I don't have enough time for pocl > open source maintenance these days so this doesn't seem to realize very > quickly without others stepping in to help. We have our own set of OpenCL specific LLVM passes, which we are going to upstream to LLVM.org at some point. Some of our passes are really similar to LLVM passes implemented in POCL[1], so I guess it makes sense for us to collaborate on this. > > About moving rest of pocl under the LLVM.org umbrella as > an LLVM project, I'm not sure if it's the logical thing to do, or what > the benefits would be outside more visibility. > > pocl code base is designed such that it does not require LLVM; one can > implement device drivers that support only offline compilation and other > exotic setups we actively work on in our lab. Of course majority of the > devices and use cases rely on Clang/LLVM for online compilation. > > Having said that, the visibility as such would not hurt, so I'm > open for this. Visibility is very important, but there are also some of technical reasons to do this. Compiler engineers who contribute OpenCL specific changes to LLVM or Clang will be able to test their changes on a "reference" OpenCL runtime. Since the runtime is a subproject, it is always in sync with LLVM and buildbots can enforce this. In addition to that, if we assume that all passes are moved to LLVM, vendors who do not use the opencl subproject will still be able to use these passes. > >> Anyway, our plans are not finalized yet, and I will be happy to hear >> any feedback. > > If you decide to move on with merging your code towards pocl code base, > let's have a technical telco or even a face-to-face on how to proceed. > In any case, don't hesitate to ask further questions. Good idea. I will follow up on this off-list. [1]: https://github.com/pocl/pocl/tree/master/lib/llvmopencl -- Andrew |
From: Savonichev, A. <and...@in...> - 2019-02-15 08:04:30
|
Benson Muite wrote: > From a user perspective, some choice is good, though one also wants to > minimize reduplicated work. Choices allow finding errors when they > exist and possibly testing of new features that support a particular > hardware advance. I agree that there is some benefit from having a choice between runtimes, but given the fragmentation of the OpenCL ecosystem, it may do more harm than good. > How would runtime support for specific features on different hardware > be flexibly incorporated, ie. which runtime features should be shared? If you look at the available open source OpenCL runtimes, you'll find many repetitive patterns: OpenCL object system (with refcounting), error handling, bindings with LLVM and Clang for JIT-capable runtimes, and many others. Design of many of these features is implicitly enforced by the OpenCL specification, so such similarity is expected. In addition to that, co-existence of different runtimes in the same codebase opens up a possibility of having a "shared context", where buffers or other objects can be shared between devices. See [1] for an example of this. > What will happen to beignet? According to its website[2], starting in Q1'2018, Beignet has been deprecated in favor of NEO OpenCL driver[3]. Beignet still supports Intel platforms up to Haswell, while NEO supports recent platforms starting from Broadwell. [1]: https://software.intel.com/en-us/node/540471 [2]: https://01.org/beignet [3]: https://01.org/compute-runtime > On 2/14/19 6:44 PM, Savonichev, Andrew wrote: >> Hello POCL developers, >> >> I work at Intel Compiler team, and we develop the Intel OpenCL >> Compiler and Runtime for CPU. Our code is based on LLVM and Clang, and >> it remains proprietary since the early days of development. >> >> We are now investigating a possibility to make an open source product >> by either opening our current codebase, or contributing our LLVM >> passes and runtime features into an existing open source project. >> POCL is a great example of such project: it does a very good job being >> portable, and supports different devices such as CPU, GPU, DSP, etc. >> >> As far as I know, POCL does not support several features that we >> support in our proprietary CPU runtime, so if we decide to switch our >> development to POCL (with your agreement, of course), we'll need to >> port all these features. This includes the features required for the >> OpenCL 2.1 Conformance (we have 100% pass rate now), as well as various >> performance improvements for Intel CPU. >> >> What is your opinion on this? >> >> Another question is about collaboration between OpenCL implementers. >> As you probably know, LLVM and Clang are used by majority of OpenCL >> implementations, and at least the frontend (Clang) is more or less the >> same in all these implementations. This is really good for end users, >> because they get a consistent experience from every platform. >> >> OpenCL runtimes, by contrast, are all independent, and have nothing >> shared (except maybe for the ICD loader). This situation seems bad for >> developers because of effort duplication, but it is also bad from a >> user perspective: anyone who tries to make an OpenCL program portable >> between different OpenCL implementations, will inevitably deal with >> quirks and bugs of each runtime. >> >> This makes me wonder whether this situation will improve if we have >> an OpenCL runtime under the LLVM.org umbrella. LLVM community values >> portability a lot, so POCL seems to be a good choice for this >> purpose. Have you ever considered upstreaming POCL to LLVM.org? >> >> Anyway, our plans are not finalized yet, and I will be happy to hear >> any feedback. >> -- Andrew |
From: Pekka J. (T. <pek...@tu...> - 2019-02-15 06:05:49
|
Hi Andrew, Savonichev, Andrew kirjoitti 14.2.2019 klo 18.44: > As far as I know, POCL does not support several features that we > support in our proprietary CPU runtime, so if we decide to switch > our development to POCL (with your agreement, of course), we'll need > to port all these features. This includes the features required for > the OpenCL 2.1 Conformance (we have 100% pass rate now), as well as > various performance improvements for Intel CPU. > > What is your opinion on this? I'm glad you are considering this. Open source helps best when it reduces duplicated work and the collaboration via a shared code base leads to improved results for everyone. > OpenCL runtimes, by contrast, are all independent, and have nothing > shared (except maybe for the ICD loader). This situation seems bad > for developers because of effort duplication, but it is also bad from > a user perspective: anyone who tries to make an OpenCL program > portable between different OpenCL implementations, will inevitably > deal with quirks and bugs of each runtime. Right, I personally don't see the benefit in users having to deal with different quirks especially if the different runtimes all conform to a standard and if it's possible to use a fully open option which everyone can improve. > This makes me wonder whether this situation will improve if we have > an OpenCL runtime under the LLVM.org umbrella. LLVM community values > portability a lot, so POCL seems to be a good choice for this > purpose. Have you ever considered upstreaming POCL to LLVM.org? My longer term hope with pocl has been to clean up the generic de-SPMD kernel compiler passes gradually and try to contribute them to the main LLVM project as IR passes. Too bad I don't have enough time for pocl open source maintenance these days so this doesn't seem to realize very quickly without others stepping in to help. About moving rest of pocl under the LLVM.org umbrella as an LLVM project, I'm not sure if it's the logical thing to do, or what the benefits would be outside more visibility. pocl code base is designed such that it does not require LLVM; one can implement device drivers that support only offline compilation and other exotic setups we actively work on in our lab. Of course majority of the devices and use cases rely on Clang/LLVM for online compilation. Having said that, the visibility as such would not hurt, so I'm open for this. > Anyway, our plans are not finalized yet, and I will be happy to hear > any feedback. If you decide to move on with merging your code towards pocl code base, let's have a technical telco or even a face-to-face on how to proceed. In any case, don't hesitate to ask further questions. Regards, -- Pekka |
From: Benson M. <ben...@em...> - 2019-02-14 20:37:06
|
Hi, From a user perspective, some choice is good, though one also wants to minimize reduplicated work. Choices allow finding errors when they exist and possibly testing of new features that support a particular hardware advance. How would runtime support for specific features on different hardware be flexibly incorporated, ie. which runtime features should be shared? What will happen to beignet? Regards, Benson On 2/14/19 6:44 PM, Savonichev, Andrew wrote: > Hello POCL developers, > > I work at Intel Compiler team, and we develop the Intel OpenCL > Compiler and Runtime for CPU. Our code is based on LLVM and Clang, and > it remains proprietary since the early days of development. > > We are now investigating a possibility to make an open source product > by either opening our current codebase, or contributing our LLVM > passes and runtime features into an existing open source project. > POCL is a great example of such project: it does a very good job being > portable, and supports different devices such as CPU, GPU, DSP, etc. > > As far as I know, POCL does not support several features that we > support in our proprietary CPU runtime, so if we decide to switch our > development to POCL (with your agreement, of course), we'll need to > port all these features. This includes the features required for the > OpenCL 2.1 Conformance (we have 100% pass rate now), as well as various > performance improvements for Intel CPU. > > What is your opinion on this? > > Another question is about collaboration between OpenCL implementers. > As you probably know, LLVM and Clang are used by majority of OpenCL > implementations, and at least the frontend (Clang) is more or less the > same in all these implementations. This is really good for end users, > because they get a consistent experience from every platform. > > OpenCL runtimes, by contrast, are all independent, and have nothing > shared (except maybe for the ICD loader). This situation seems bad for > developers because of effort duplication, but it is also bad from a > user perspective: anyone who tries to make an OpenCL program portable > between different OpenCL implementations, will inevitably deal with > quirks and bugs of each runtime. > > This makes me wonder whether this situation will improve if we have > an OpenCL runtime under the LLVM.org umbrella. LLVM community values > portability a lot, so POCL seems to be a good choice for this > purpose. Have you ever considered upstreaming POCL to LLVM.org? > > Anyway, our plans are not finalized yet, and I will be happy to hear > any feedback. > |
From: Savonichev, A. <and...@in...> - 2019-02-14 16:44:45
|
Hello POCL developers, I work at Intel Compiler team, and we develop the Intel OpenCL Compiler and Runtime for CPU. Our code is based on LLVM and Clang, and it remains proprietary since the early days of development. We are now investigating a possibility to make an open source product by either opening our current codebase, or contributing our LLVM passes and runtime features into an existing open source project. POCL is a great example of such project: it does a very good job being portable, and supports different devices such as CPU, GPU, DSP, etc. As far as I know, POCL does not support several features that we support in our proprietary CPU runtime, so if we decide to switch our development to POCL (with your agreement, of course), we'll need to port all these features. This includes the features required for the OpenCL 2.1 Conformance (we have 100% pass rate now), as well as various performance improvements for Intel CPU. What is your opinion on this? Another question is about collaboration between OpenCL implementers. As you probably know, LLVM and Clang are used by majority of OpenCL implementations, and at least the frontend (Clang) is more or less the same in all these implementations. This is really good for end users, because they get a consistent experience from every platform. OpenCL runtimes, by contrast, are all independent, and have nothing shared (except maybe for the ICD loader). This situation seems bad for developers because of effort duplication, but it is also bad from a user perspective: anyone who tries to make an OpenCL program portable between different OpenCL implementations, will inevitably deal with quirks and bugs of each runtime. This makes me wonder whether this situation will improve if we have an OpenCL runtime under the LLVM.org umbrella. LLVM community values portability a lot, so POCL seems to be a good choice for this purpose. Have you ever considered upstreaming POCL to LLVM.org? Anyway, our plans are not finalized yet, and I will be happy to hear any feedback. -- Andrew |
From: Michal B. <mic...@tu...> - 2018-12-31 11:08:12
|
Hello, This seems to be related to how you built LLVM. It does indeed segfault reliably when pocl is built against my distribution's LLVM; OTOH it works fine when using my own build of LLVM. I think this comes down to CMake "build mode" when building LLVM; i usually use RelWithDebInfo while distribution LLVMs usually build in Release mode. We could probably #ifdef this out in pocl, though it could also be a bug in LLVM. Sidenote: if you're looking into LLVM's autovectorizer, in my experience it refuses to vectorize almost all floating-point code unless you provide it with options like -cl-unsafe-math (with obvious drawbacks). It might be worth a try though. OTOH, most of the math functions in pocl's kernel library (https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/mathFunctions.html) have explicitly vectorized sources and will use even AVX512 if it's available. But i'm afraid your OpenCL sources would have to be explicitly vectorized to take advantage of these. Sidenote 2: if you run into something that's obviously a bug, please file an issue in https://github.com/pocl/pocl/issues - the mailing list is more useful for discussing ideas or asking for help. Regards, -- mb ________________________________ From: Noah Reddell <noa...@gm...> Sent: Friday, December 28, 2018 7:42:27 PM To: Portable Computing Language development discussion Subject: [pocl-devel] POCL_DEBUG_LLVM_PASSES=1 causes SEGFAULT This one is 100% repeatable and independent of POCL_CACHE_DIR, POCL_DEBUG, and POCL_VECTORIZER_REMARKS. export POCL_DEBUG_LLVM_PASSES=1 In first call to clBuildProgram: WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:195 InitializeLLVM()@pocl_llvm_utils.cc:371 ATP Stack walkback for Rank 46 done Process died with signal 11: 'Segmentation fault' Forcing core dumps of ranks 46, 0 Any ideas to try? |
From: Michal B. <mic...@tu...> - 2018-12-31 10:07:45
|
Hi, > Since the default behavior is to remove build products, Actually the default behavior is to remove the intermediate build products, but keep the final product (.so dynamic library). > $HOME is generally going to be slower and farther away than $TMPDIR. That's true, but unless $HOME is on the other side of the planet, compilation is likely going to be much slower than whatever filesystem the $HOME is on. > A general user wouldn't know to adjust the variable upon encountering this SEGFAULT unless discovering record of this discussion. You're right. I will make a note of this issue in the documentation. I'm reluctant to change the current behavior though, as $HOME on network filesystem seems to be the exception rather than the rule. > it seems most likely related to false success of open(O_CREAT | O_EXCL) >From quick research (AKA googling), that seems to be the case indeed, at least for some versions of NFS. If someone with a networked setup comes up with a patch for open(create-exclusive) replacement that works on NFS, and does not break on local filesystems, we'll be happy to accept it. Regards, -- mb ________________________________ From: Noah Reddell <noa...@gm...> Sent: Friday, December 28, 2018 7:30:57 PM To: Portable Computing Language development discussion Subject: Re: [pocl-devel] intermittent clang ComputeLineNumbers SegFault Having it on /tmp on many systems makes the cache non-persistent, which kind of defeats the purpose of having a cache in the first place... perhaps there is a more suitable place, but i'm not aware of it. There's a complex set of factors to balance for sure. Since the default behavior is to remove build products, I don't think the default POCL_CACHE_DIR needs to be persistent storage. $HOME is generally going to be slower and farther away than $TMPDIR. Most importantly the behavior is already customizable through POCL_CACHE_DIR variable. I have a work-around. A general user wouldn't know to adjust the variable upon encountering this SEGFAULT unless discovering record of this discussion. The lingering problem is that we don't understand what is driving the clang SEGFAULT but it seems most likely related to false success of open(O_CREAT | O_EXCL) on this DVS filesystem. (speculating this encounters same issue as older NFS filesystem) In addition to the working local /tmp for POCL_CACHE_DIR, I tried a Lustre parallel filesystem path (common to all compute nodes). This works as well, presumably because this more sophisticated filesystem is correctly supporting O_EXCL. Side question: when I export POCL_VECTORIZER_REMARKS=1, where should the output go? I'm not seeing anything in the stdout/stderr streams or ${POCL_CACHE_DIR}/*/*/build.log |
From: Pekka J. <pek...@tu...> - 2018-12-30 11:55:48
|
Hi, I'm afraid the vec remarks feature got broken with the latest LLVMs and no one has had spare time to fix it. Should not be too difficult to fix though if you want to give it a try. https://github.com/pocl/pocl/issues/613 BR, Pekka Pekka Jääskeläinen ________________________________ From: Noah Reddell <noa...@gm...> Sent: Friday, December 28, 2018 7:30:57 PM To: Portable Computing Language development discussion Subject: Re: [pocl-devel] intermittent clang ComputeLineNumbers SegFault Having it on /tmp on many systems makes the cache non-persistent, which kind of defeats the purpose of having a cache in the first place... perhaps there is a more suitable place, but i'm not aware of it. There's a complex set of factors to balance for sure. Since the default behavior is to remove build products, I don't think the default POCL_CACHE_DIR needs to be persistent storage. $HOME is generally going to be slower and farther away than $TMPDIR. Most importantly the behavior is already customizable through POCL_CACHE_DIR variable. I have a work-around. A general user wouldn't know to adjust the variable upon encountering this SEGFAULT unless discovering record of this discussion. The lingering problem is that we don't understand what is driving the clang SEGFAULT but it seems most likely related to false success of open(O_CREAT | O_EXCL) on this DVS filesystem. (speculating this encounters same issue as older NFS filesystem) In addition to the working local /tmp for POCL_CACHE_DIR, I tried a Lustre parallel filesystem path (common to all compute nodes). This works as well, presumably because this more sophisticated filesystem is correctly supporting O_EXCL. Side question: when I export POCL_VECTORIZER_REMARKS=1, where should the output go? I'm not seeing anything in the stdout/stderr streams or ${POCL_CACHE_DIR}/*/*/build.log |
From: Andreas K. <li...@in...> - 2018-12-29 13:53:26
|
Noah, Noah Reddell <noa...@gm...> writes: >> Having it on /tmp on many systems makes the cache non-persistent, which >> kind of defeats the purpose of having a cache in the first place... perhaps >> there is a more suitable place, but i'm not aware of it. >> > There's a complex set of factors to balance for sure. Since the default > behavior is to remove build products, I don't think the default > POCL_CACHE_DIR needs to be persistent storage. $HOME is generally going to > be slower and farther away than $TMPDIR. > Most importantly the behavior is already customizable through > POCL_CACHE_DIR variable. I have a work-around. A general user wouldn't > know to adjust the variable upon encountering this SEGFAULT unless > discovering record of this discussion. ~/.cache (or, really, whatever $XDG_CACHE_HOME points to) is the default location for "user-specific non-essential data files" under the XDG Base Directory Specification [1]. While that's a desktop-focused spec, it establishes a convention that is independent of the desktop use case per se. In particular, all parts of the spec are applicable (in a technical sense) even in a command line context. Arguably, the machine you are using should be configured to put $XDG_CACHE_HOME someplace sensible (ideally, on a per-compute-node FS). IMO, this would be a much preferable outcome compared to inventing yet another convention or reverting to someplace in $TMPDIR, which is insecure in a multi-user workstation/desktop setting. [1] https://specifications.freedesktop.org/basedir-spec/latest/ Andreas |
From: Noah R. <noa...@gm...> - 2018-12-28 17:42:54
|
This one is 100% repeatable and independent of POCL_CACHE_DIR, POCL_DEBUG, and POCL_VECTORIZER_REMARKS. export POCL_DEBUG_LLVM_PASSES=1 In first call to clBuildProgram: WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:195 InitializeLLVM()@pocl_llvm_utils.cc:371 ATP Stack walkback for Rank 46 done Process died with signal 11: 'Segmentation fault' Forcing core dumps of ranks 46, 0 Any ideas to try? |
From: Noah R. <noa...@gm...> - 2018-12-28 17:31:27
|
> > Having it on /tmp on many systems makes the cache non-persistent, which > kind of defeats the purpose of having a cache in the first place... perhaps > there is a more suitable place, but i'm not aware of it. > There's a complex set of factors to balance for sure. Since the default behavior is to remove build products, I don't think the default POCL_CACHE_DIR needs to be persistent storage. $HOME is generally going to be slower and farther away than $TMPDIR. Most importantly the behavior is already customizable through POCL_CACHE_DIR variable. I have a work-around. A general user wouldn't know to adjust the variable upon encountering this SEGFAULT unless discovering record of this discussion. The lingering problem is that we don't understand what is driving the clang SEGFAULT but it seems most likely related to false success of open(O_CREAT | O_EXCL) on this DVS filesystem. (speculating this encounters same issue as older NFS filesystem) In addition to the working local /tmp for POCL_CACHE_DIR, I tried a Lustre parallel filesystem path (common to all compute nodes). This works as well, presumably because this more sophisticated filesystem is correctly supporting O_EXCL. Side question: when I export POCL_VECTORIZER_REMARKS=1, where should the output go? I'm not seeing anything in the stdout/stderr streams or ${POCL_CACHE_DIR}/*/*/build.log > > |
From: Michal B. <mic...@tu...> - 2018-12-28 09:10:05
|
Hello Noah, > I would think a better default location for the pocl cache (linux) would be derived from $TMPDIR rather than $HOME. Having it on /tmp on many systems makes the cache non-persistent, which kind of defeats the purpose of having a cache in the first place... perhaps there is a more suitable place, but i'm not aware of it. > I wonder if sys::fs::createUniqueFile() is not so unique after-all at this scale? Could this lead to a sort of race between the create and open(exclusive)...? I'm 99.9% sure it's unique. I'm not sure what race you have in mind, but IIRC LLVM just appends a random string to a template filename, then tries open(O_CREAT | O_EXCL), and repeats if that fails. Pocl then closes the descriptor and hands over the filename to Clang's preprocessor. It's possible Clang removes the file before re-opening to write into it, or there is something else going on which triggers a bug. Regards, -- mb ________________________________ From: Noah Reddell <noa...@gm...> Sent: Friday, December 28, 2018 12:30:01 AM To: Portable Computing Language development discussion Subject: Re: [pocl-devel] intermittent clang ComputeLineNumbers SegFault Hi Michal, Thank you for the suggestion of POCL_CACHE_DIR. Setting this to a tmps unique to each compute node immediately worked around the issue. I can now reliably run my application. On most Cray systems, $HOME is a DFS mount when mounted on compute nodes. I'm sure there are many similarities from DFS to NFS. I would think a better default location for the pocl cache (linux) would be derived from $TMPDIR rather than $HOME. I wonder if sys::fs::createUniqueFile() is not so unique after-all at this scale? Could this lead to a sort of race between the create and open(exclusive)...? Cheers, Noah On Thu, Dec 27, 2018 at 11:14 AM Michal Babej <mic...@tu...<mailto:mic...@tu...>> wrote: Hello, > Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Not to the working directory (AFAIK, i haven't inspected the entire Clang codebase), but pocl writes to its own cache directory, which by default is $HOME/.cache/pocl/kcache; you can change it to a different directory by setting the POCL_CACHE_DIR env variable. IIRC there have been some issues before, when people had the cache dir located on NFS shares; is that your case (is your $HOME shared) ? You could try pointing POCL_CACHE_DIR to /tmp/pocl_cache and see if it makes the problem go away. It's possible pocl / Clang makes some assumption about filesystem which does not hold for NFS. In the backtrace you pasted, it seems it's crashing in the preprocessing phase. Here pocl writes to a temporary file created by LLVM's sys::fs::createUniqueFile() which in turn uses open() with exclusive flag on a randomized path. Regards, -- mb ________________________________ From: Noah Reddell <noa...@gm...<mailto:noah.reddell%2B...@gm...>> Sent: Saturday, December 22, 2018 12:09:55 AM To: poc...@li...<mailto:poc...@li...> Subject: [pocl-devel] intermittent clang ComputeLineNumbers SegFault Hi, I figured it is about time I give pocl a try with my physics simulation code. I've been using Intel's OpenCL library for computing on Cray systems with Xeon CPU. Today I built pocl (today's git master ) on a Cray XC40 using clang+llvm-7.0.0-x86_64-linux-sles12.3 I was able to run a simple Hello World kernel as well as clinfo. When running my physics application at necessary scale, I'm seeing about 0.2% of clBuildProgram fail by SEGFAULT, all with a common stack signature. (pasted below) I'm not sure why this would be so intermittent. I've tried reducing to one process per compute node, so only one clBuildProgram would be executing on that node at a time. In this testing, that leaves 90 processes doing the same program compile simultaneously in the same working directory. Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Google-ing for similar stack language, I find one mention that may well be the same bug: https://www.mail-archive.com/llv...@li.../msg28677.html https://bugs.llvm.org/show_bug.cgi?id=39833 "poclcc" is successful with the same OpenCL kernel source. I assume I'd need to run it hundreds of times, perhaps in parallel to potentially trigger the same bug. Any advice would be appreciated. Now that I've thought through the situation, I think I should probably create an account and contribute to the LLVM bug 39833 discussion with a me-too. Cheers, Noah Reddell WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:489 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07 clang::FrontendAction::Execute()@0x2aaaabf1c106 clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328 clang::DoPrintPreprocessedInput(clang::Preprocessor&, llvm::raw_ostream*, clang::PreprocessorOutputOptions const&)@0x2aaaabf51226 clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc clang::Preprocessor::EnterSourceFile(clang::FileID, clang::DirectoryLookup const*, clang::SourceLocation)@0x2aaaacbf7407 (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID)@0x2aaaabf5212d clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) const@0x2aaaacc4e00e clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) const@0x2aaaacc4e43a ComputeLineNumbers(clang::DiagnosticsEngine&, clang::SrcMgr::ContentCache*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, clang::SourceManager const&, bool&)@0x2aaaacc4e683 _______________________________________________ pocl-devel mailing list poc...@li...<mailto:poc...@li...> https://lists.sourceforge.net/lists/listinfo/pocl-devel |
From: Pekka J. <pek...@tu...> - 2018-12-28 08:47:58
|
Hi Noah, > I would think a better default location for the pocl cache (linux) would > be derived from $TMPDIR rather than $HOME. It used to be under /tmp, but then someone had an issue with a multi-node NFS-mounted system with CPUs with incompatible ISA getting the same binaries, IIRC. I'm really not sure what would be the best overall default for it . A /tmp/XXX dir that is unique per node? This might be related: https://github.com/pocl/pocl/issues/430 BR, -- Pekka |
From: Noah R. <noa...@gm...> - 2018-12-27 22:30:29
|
Hi Michal, Thank you for the suggestion of POCL_CACHE_DIR. Setting this to a tmps unique to each compute node immediately worked around the issue. I can now reliably run my application. On most Cray systems, $HOME is a DFS mount when mounted on compute nodes. I'm sure there are many similarities from DFS to NFS. I would think a better default location for the pocl cache (linux) would be derived from $TMPDIR rather than $HOME. I wonder if sys::fs::createUniqueFile() is not so unique after-all at this scale? Could this lead to a sort of race between the create and open(exclusive)...? Cheers, Noah On Thu, Dec 27, 2018 at 11:14 AM Michal Babej <mic...@tu...> wrote: > Hello, > > > > Is pocl or clang trying to write anything to the working directory? In > my restricted case, /tmp is private to each compute node and thus each > process. > > > Not to the working directory (AFAIK, i haven't inspected the entire Clang > codebase), but pocl writes to its own cache directory, which by default is > $HOME/.cache/pocl/kcache; you can change it to a different directory by > setting the POCL_CACHE_DIR env variable. > > > IIRC there have been some issues before, when people had the cache dir > located on NFS shares; is that your case (is your $HOME shared) ? You could > try pointing POCL_CACHE_DIR to /tmp/pocl_cache and see if it makes the > problem go away. It's possible pocl / Clang makes some assumption about > filesystem which does not hold for NFS. > > > In the backtrace you pasted, it seems it's crashing in the preprocessing > phase. Here pocl writes to a temporary file created by LLVM's sys::fs::createUniqueFile() > which in turn uses open() with exclusive flag on a randomized path. > > > Regards, > > -- mb > ------------------------------ > *From:* Noah Reddell <noa...@gm...> > *Sent:* Saturday, December 22, 2018 12:09:55 AM > *To:* poc...@li... > *Subject:* [pocl-devel] intermittent clang ComputeLineNumbers SegFault > > Hi, > > I figured it is about time I give pocl a try with my physics > simulation code. I've been using Intel's OpenCL library for computing on > Cray systems with Xeon CPU. > Today I built pocl (today's git master ) on a Cray XC40 > using clang+llvm-7.0.0-x86_64-linux-sles12.3 > I was able to run a simple Hello World kernel as well as clinfo. > When running my physics application at necessary scale, I'm seeing about > 0.2% of clBuildProgram fail by SEGFAULT, all with a common stack signature. > (pasted below) > I'm not sure why this would be so intermittent. I've tried > reducing to one process per compute node, so only one clBuildProgram would > be executing on that node at a time. In this testing, that leaves 90 > processes doing the same program compile simultaneously in the same working > directory. Is pocl or clang trying to write anything to the working > directory? In my restricted case, /tmp is private to each compute node and > thus each process. > Google-ing for similar stack language, I find one mention that may > well be the same bug: > https://www.mail-archive.com/llv...@li.../msg28677.html > https://bugs.llvm.org/show_bug.cgi?id=39833 > > "poclcc" is successful with the same OpenCL kernel source. I assume > I'd need to run it hundreds of times, perhaps in parallel to potentially > trigger the same bug. > > Any advice would be appreciated. Now that I've thought through the > situation, I think I should probably create an account and contribute to > the LLVM bug 39833 discussion with a me-too. > > Cheers, > > Noah Reddell > > > WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, > boost::shared_ptr<WmComputeAssignment const>, > std::vector<boost::shared_ptr<WmSubDomain const>, > std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, > WmComputeMachine&)@wmresidentpatchprocessor.cc:358 > POclBuildProgram@clBuildProgram.c:37 > compile_and_link_program@pocl_build.c:624 > pocl_llvm_build_program@pocl_llvm_build.cc:489 > > clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07 > clang::FrontendAction::Execute()@0x2aaaabf1c106 > clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328 > clang::DoPrintPreprocessedInput(clang::Preprocessor&, > llvm::raw_ostream*, clang::PreprocessorOutputOptions const&)@0x2aaaabf51226 > clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc > clang::Preprocessor::EnterSourceFile(clang::FileID, > clang::DirectoryLookup const*, clang::SourceLocation)@0x2aaaacbf7407 > (anonymous > namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, > clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, > clang::FileID)@0x2aaaabf5212d > clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) > const@0x2aaaacc4e00e > clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) > const@0x2aaaacc4e43a > *ComputeLineNumbers*(clang::DiagnosticsEngine&, > clang::SrcMgr::ContentCache*, > llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, > clang::SourceManager const&, bool&)@0x2aaaacc4e683 > > > > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > |
From: Michal B. <mic...@tu...> - 2018-12-27 19:13:51
|
Hello, > Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Not to the working directory (AFAIK, i haven't inspected the entire Clang codebase), but pocl writes to its own cache directory, which by default is $HOME/.cache/pocl/kcache; you can change it to a different directory by setting the POCL_CACHE_DIR env variable. IIRC there have been some issues before, when people had the cache dir located on NFS shares; is that your case (is your $HOME shared) ? You could try pointing POCL_CACHE_DIR to /tmp/pocl_cache and see if it makes the problem go away. It's possible pocl / Clang makes some assumption about filesystem which does not hold for NFS. In the backtrace you pasted, it seems it's crashing in the preprocessing phase. Here pocl writes to a temporary file created by LLVM's sys::fs::createUniqueFile() which in turn uses open() with exclusive flag on a randomized path. Regards, -- mb ________________________________ From: Noah Reddell <noa...@gm...> Sent: Saturday, December 22, 2018 12:09:55 AM To: poc...@li... Subject: [pocl-devel] intermittent clang ComputeLineNumbers SegFault Hi, I figured it is about time I give pocl a try with my physics simulation code. I've been using Intel's OpenCL library for computing on Cray systems with Xeon CPU. Today I built pocl (today's git master ) on a Cray XC40 using clang+llvm-7.0.0-x86_64-linux-sles12.3 I was able to run a simple Hello World kernel as well as clinfo. When running my physics application at necessary scale, I'm seeing about 0.2% of clBuildProgram fail by SEGFAULT, all with a common stack signature. (pasted below) I'm not sure why this would be so intermittent. I've tried reducing to one process per compute node, so only one clBuildProgram would be executing on that node at a time. In this testing, that leaves 90 processes doing the same program compile simultaneously in the same working directory. Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Google-ing for similar stack language, I find one mention that may well be the same bug: https://www.mail-archive.com/llv...@li.../msg28677.html https://bugs.llvm.org/show_bug.cgi?id=39833 "poclcc" is successful with the same OpenCL kernel source. I assume I'd need to run it hundreds of times, perhaps in parallel to potentially trigger the same bug. Any advice would be appreciated. Now that I've thought through the situation, I think I should probably create an account and contribute to the LLVM bug 39833 discussion with a me-too. Cheers, Noah Reddell WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:489 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07 clang::FrontendAction::Execute()@0x2aaaabf1c106 clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328 clang::DoPrintPreprocessedInput(clang::Preprocessor&, llvm::raw_ostream*, clang::PreprocessorOutputOptions const&)@0x2aaaabf51226 clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc clang::Preprocessor::EnterSourceFile(clang::FileID, clang::DirectoryLookup const*, clang::SourceLocation)@0x2aaaacbf7407 (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID)@0x2aaaabf5212d clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) const@0x2aaaacc4e00e clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) const@0x2aaaacc4e43a ComputeLineNumbers(clang::DiagnosticsEngine&, clang::SrcMgr::ContentCache*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, clang::SourceManager const&, bool&)@0x2aaaacc4e683 |
From: Noah R. <noa...@gm...> - 2018-12-21 22:10:28
|
Hi, I figured it is about time I give pocl a try with my physics simulation code. I've been using Intel's OpenCL library for computing on Cray systems with Xeon CPU. Today I built pocl (today's git master ) on a Cray XC40 using clang+llvm-7.0.0-x86_64-linux-sles12.3 I was able to run a simple Hello World kernel as well as clinfo. When running my physics application at necessary scale, I'm seeing about 0.2% of clBuildProgram fail by SEGFAULT, all with a common stack signature. (pasted below) I'm not sure why this would be so intermittent. I've tried reducing to one process per compute node, so only one clBuildProgram would be executing on that node at a time. In this testing, that leaves 90 processes doing the same program compile simultaneously in the same working directory. Is pocl or clang trying to write anything to the working directory? In my restricted case, /tmp is private to each compute node and thus each process. Google-ing for similar stack language, I find one mention that may well be the same bug: https://www.mail-archive.com/llv...@li.../msg28677.html https://bugs.llvm.org/show_bug.cgi?id=39833 "poclcc" is successful with the same OpenCL kernel source. I assume I'd need to run it hundreds of times, perhaps in parallel to potentially trigger the same bug. Any advice would be appreciated. Now that I've thought through the situation, I think I should probably create an account and contribute to the LLVM bug 39833 discussion with a me-too. Cheers, Noah Reddell WmResidentPatchProcessor::WmResidentPatchProcessor(WmComputeProgram*, boost::shared_ptr<WmComputeAssignment const>, std::vector<boost::shared_ptr<WmSubDomain const>, std::allocator<boost::shared_ptr<WmSubDomain const> > > const&, WmComputeMachine&)@wmresidentpatchprocessor.cc:358 POclBuildProgram@clBuildProgram.c:37 compile_and_link_program@pocl_build.c:624 pocl_llvm_build_program@pocl_llvm_build.cc:489 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)@0x2aaaabebfd07 clang::FrontendAction::Execute()@0x2aaaabf1c106 clang::PrintPreprocessedAction::ExecuteAction()@0x2aaaabf22328 clang::DoPrintPreprocessedInput(clang::Preprocessor&, llvm::raw_ostream*, clang::PreprocessorOutputOptions const&)@0x2aaaabf51226 clang::Preprocessor::EnterMainSourceFile()@0x2aaaacc1cabc clang::Preprocessor::EnterSourceFile(clang::FileID, clang::DirectoryLookup const*, clang::SourceLocation)@0x2aaaacbf7407 (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID)@0x2aaaabf5212d clang::SourceManager::getPresumedLoc(clang::SourceLocation, bool) const@0x2aaaacc4e00e clang::SourceManager::getLineNumber(clang::FileID, unsigned int, bool*) const@0x2aaaacc4e43a *ComputeLineNumbers*(clang::DiagnosticsEngine&, clang::SrcMgr::ContentCache*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>&, clang::SourceManager const&, bool&)@0x2aaaacc4e683 |
From: Isuru F. <is...@gm...> - 2018-10-05 20:39:49
|
Hi, First one is easy to solve. PR on the way. Second one is because of a lock being destroyed twice. https://github.com/pocl/pocl/blob/370c1e400d551453f49dbd3a75054c5d039400eb/lib/CL/devices/pthread/pthread_scheduler.c#L579 which is called when the thread exits in https://github.com/pocl/pocl/blob/370c1e400d551453f49dbd3a75054c5d039400eb/lib/CL/devices/pthread/pthread_scheduler.c#L145 and at https://github.com/pocl/pocl/blob/370c1e400d551453f49dbd3a75054c5d039400eb/lib/CL/devices/pthread/pthread_scheduler.c#L146 Isuru On Fri, Oct 5, 2018 at 3:07 AM Michal Babej <mic...@tu...> wrote: > Hi, > > > Anyone knows what's up with Travis buildbots ? One of them has been giving > this error for a while: > > > dyld: Library not loaded: @rpath/libxml2.2.dylib > Referenced from: > /Users/travis/build/franz/pocl/build/tests/kernel/test_shuffle > Reason: Incompatible library version: test_shuffle requires version > 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0 > > the other one started giving this error: > > > Assertion failed: (r == 0), function void pthread_scheduler_uninit(), > file > > /Users/travis/build/franz/pocl/lib/CL/devices/pthread/pthread_scheduler.c, > line 146. > > Any ideas ? > > > Thanks, > > -- mb > _______________________________________________ > pocl-devel mailing list > poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > |
From: Michal B. <mic...@tu...> - 2018-10-05 08:06:56
|
Hi, Anyone knows what's up with Travis buildbots ? One of them has been giving this error for a while: dyld: Library not loaded: @rpath/libxml2.2.dylib Referenced from: /Users/travis/build/franz/pocl/build/tests/kernel/test_shuffle Reason: Incompatible library version: test_shuffle requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0 the other one started giving this error: Assertion failed: (r == 0), function void pthread_scheduler_uninit(), file /Users/travis/build/franz/pocl/lib/CL/devices/pthread/pthread_scheduler.c, line 146. Any ideas ? Thanks, -- mb |