polybench-discussion Mailing List for polybench
Brought to you by:
pouchet,
tomofumi-yuki
You can subscribe to this list here.
2012 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(7) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2016 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Tomofumi Y. <tom...@in...> - 2016-02-09 13:45:51
|
Dear Willy, Thanks for letting us know. We will fix this before the next release. Thanks, Tomofumi Yuki ----- Original Message ----- > From: "Willy WOLFF" <wil...@gm...> > To: pol...@li... > Sent: Tuesday, February 9, 2016 12:46:43 PM > Subject: [Polybench-discussion] missing math symbol > > Hi PolyBench team, > > In the script utilities/makefile-gen.pl, you forgot to add > "\${EXTRA_FLAGS}" at the end of line 65. > Some compiler complains or won't compile because of missing sqrt symbol. > > Best Regards, > > -- > Willy WOLFF, MSc > PhD student > School of Computing and Communications > Lancaster University LA1 4WA, UK > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Polybench-discussion mailing list > Pol...@li... > https://lists.sourceforge.net/lists/listinfo/polybench-discussion > |
From: Willy W. <wil...@gm...> - 2016-02-09 11:46:46
|
Hi PolyBench team, In the script utilities/makefile-gen.pl, you forgot to add "\${EXTRA_FLAGS}" at the end of line 65. Some compiler complains or won't compile because of missing sqrt symbol. Best Regards, -- Willy WOLFF, MSc PhD student School of Computing and Communications Lancaster University LA1 4WA, UK |
From: Louis-Noel P. <po...@cs...> - 2012-08-13 01:06:39
|
Sven & Tobias: thanks for raising the problem. I think the properties of the input matrix(ces) have been highly overlooked so far in PolyBench. I have to look one-by-one at the algorithms to ensure the input is well formed. Obviously it does not seem to be the case for the benchmarks you mention. Reg. cholesky, a quick look on the web makes me unsure if the problem is well conditioned (I'm unclear if the input matrix must be symmetric or not, I have to read further, for the moment the input matrix is _not_ symmetric). Thanks, ++ On Aug 12, 2012, at 2:30 PM, Sven Verdoolaege wrote: > On Sun, Aug 12, 2012 at 10:37:25PM +0200, Tobias Grosser wrote: >> gramschmidt, lu and ludcmp have also this problem. They cause test >> case failures in the LLVM test suite as nan values are formatted >> differently on different platforms, which makes correctness checking >> across platforms more difficult. > > Well, I believe that technically a NaN is never equal to any other NaN, > so it's not actually that bad that they get printed differently. > > In those cases where you get numbers, do you get exactly the same > values across different platforms? I'm still somewhat surprised > that I can't get exact results from a CUDA device. I briefly > scanned their paper and they point out that you can even get > different results on the same CPU depending on whether you compile > for 32bits or 64bits because of differences in libraries, but > AFAIK, the polybench benchmarks only use basic arithmetic operations. > >> Louis-Noel, if you see how to fix this, that would be great. > > I would also appreciate that. > (I hadn't noticed those other cases, because I broke off the tests > after the first problem.) > > skimo -- Louis-Noel Pouchet po...@cs... |
From: Sven V. <sk...@ko...> - 2012-08-12 21:30:46
|
On Sun, Aug 12, 2012 at 10:37:25PM +0200, Tobias Grosser wrote: > gramschmidt, lu and ludcmp have also this problem. They cause test > case failures in the LLVM test suite as nan values are formatted > differently on different platforms, which makes correctness checking > across platforms more difficult. Well, I believe that technically a NaN is never equal to any other NaN, so it's not actually that bad that they get printed differently. In those cases where you get numbers, do you get exactly the same values across different platforms? I'm still somewhat surprised that I can't get exact results from a CUDA device. I briefly scanned their paper and they point out that you can even get different results on the same CPU depending on whether you compile for 32bits or 64bits because of differences in libraries, but AFAIK, the polybench benchmarks only use basic arithmetic operations. > Louis-Noel, if you see how to fix this, that would be great. I would also appreciate that. (I hadn't noticed those other cases, because I broke off the tests after the first problem.) skimo |
From: Tobias G. <to...@gr...> - 2012-08-12 20:56:27
|
On 08/11/2012 07:59 PM, Sven Verdoolaege wrote: > On Fri, Aug 10, 2012 at 06:18:51PM -0700, Louis-Noel Pouchet wrote: >> I've checked in a fix, now the matrix is SDP and you should not experience NaNs anymore. Initialization takes much longer, I went for the naive solution and get a SDP matrix from a "random" one with matmult... > > It seems trisolv has a similar problem. In fact, the output contains only NaNs. > Can you fix that too? gramschmidt, lu and ludcmp have also this problem. They cause test case failures in the LLVM test suite as nan values are formatted differently on different platforms, which makes correctness checking across platforms more difficult. Louis-Noel, if you see how to fix this, that would be great. Tobi |
From: Sven V. <sk...@ko...> - 2012-08-11 17:59:40
|
On Fri, Aug 10, 2012 at 06:18:51PM -0700, Louis-Noel Pouchet wrote: > I've checked in a fix, now the matrix is SDP and you should not experience NaNs anymore. Initialization takes much longer, I went for the naive solution and get a SDP matrix from a "random" one with matmult... It seems trisolv has a similar problem. In fact, the output contains only NaNs. Can you fix that too? skimo |
From: Sven V. <sk...@ko...> - 2012-08-11 16:19:45
|
On Fri, Aug 10, 2012 at 06:18:51PM -0700, Louis-Noel Pouchet wrote: > Hi Sven, > > No it's not expected in the sense the output should not contain NaN, and I can reproduce. This implies the way the array is initialized is incorrect (the current system cannot be solved with Cholesky algo). > I've checked in a fix, now the matrix is SDP and you should not experience NaNs anymore. Initialization takes much longer, I went for the naive solution and get a SDP matrix from a "random" one with matmult... Thanks. Is the resulting problem well-conditioned? I get reasonably accurate results on MINI_DATASET and SMALL_DATASET, but not on STANDARD_DATASET. (I can't set N on the command line to narrow it down because N appears in some cuda header.) It doesn't appear to be a problem in the generated code because PPCG apparently doesn't find any parallelism on this benchmark and generates very inefficient code, but code that doesn't essentially depend on the problem size, so if it is correct for small values, it should be correct for bigger values. Incidentally, here's some other changes I've had to apply diff --git a/trunk/linear-algebra/kernels/doitgen/doitgen.c b/trunk/linear-algebra/kernels/doitgen/doitgen.c index bdb571b..0dae5bf 100644 --- a/trunk/linear-algebra/kernels/doitgen/doitgen.c +++ b/trunk/linear-algebra/kernels/doitgen/doitgen.c @@ -48,7 +48,7 @@ void print_array(int nr, int nq, int np, for (j = 0; j < nq; j++) for (k = 0; k < np; k++) { fprintf (stderr, DATA_PRINTF_MODIFIER, A[i][j][k]); - if (i % 20 == 0) fprintf (stderr, "\n"); + if (k % 20 == 0) fprintf (stderr, "\n"); } fprintf (stderr, "\n"); } Without this, the output lines get too long, making it difficult to compare. diff --git a/trunk/utilities/polybench.c b/trunk/utilities/polybench.c index bb0d17a..3504dfa 100644 --- a/trunk/utilities/polybench.c +++ b/trunk/utilities/polybench.c @@ -380,14 +380,14 @@ static void * xmalloc (size_t num) { - void* new = NULL; - int ret = posix_memalign (&new, 32, num); - if (! new || ret) + void* cur = NULL; + int ret = posix_memalign (&cur, 32, num); + if (! cur || ret) { fprintf (stderr, "[PolyBench] posix_memalign: cannot allocate memory"); exit (1); } - return new; + return cur; } Apparently nvcc is based on a C++ compiler, which doesn't like variables called "new". skimo |
From: Louis-Noel P. <po...@cs...> - 2012-08-11 01:38:37
|
Hi Sven, No it's not expected in the sense the output should not contain NaN, and I can reproduce. This implies the way the array is initialized is incorrect (the current system cannot be solved with Cholesky algo). I've checked in a fix, now the matrix is SDP and you should not experience NaNs anymore. Initialization takes much longer, I went for the naive solution and get a SDP matrix from a "random" one with matmult... Thanks, ++ On Aug 10, 2012, at 10:50 AM, Sven Verdoolaege wrote: > When I compile the cholesky benchmark with -O3.-DPOLYBENCH_DUMP_ARRAYS, > the output contains lots of NaNs (even for -DMINI_DATASET). > Is that expected? > > It doesn't look like it's a meaningful result. > It also makes it difficult to check the correctness of a mapping > to a CUDA device. Apparently, the floating point operations on those > devices are slightly different. If the results are floating point numbers, > then you can still compare the results with CPU results by allowing > some relative error, but some of the NaNs have different signs on > CUDA vs CPU and so the difference doesn't satisfy any tolerance. > > Could the array be initialized with something that is expected > to produce meaningful results? > > (This is with revision 31 of https://polybench.svn.sourceforge.net/svnroot/polybench) > > Thanks, > > skimo -- Louis-Noel Pouchet po...@cs... |
From: Sven V. <sk...@ko...> - 2012-08-10 18:08:28
|
When I compile the cholesky benchmark with -O3.-DPOLYBENCH_DUMP_ARRAYS, the output contains lots of NaNs (even for -DMINI_DATASET). Is that expected? It doesn't look like it's a meaningful result. It also makes it difficult to check the correctness of a mapping to a CUDA device. Apparently, the floating point operations on those devices are slightly different. If the results are floating point numbers, then you can still compare the results with CPU results by allowing some relative error, but some of the NaNs have different signs on CUDA vs CPU and so the difference doesn't satisfy any tolerance. Could the array be initialized with something that is expected to produce meaningful results? (This is with revision 31 of https://polybench.svn.sourceforge.net/svnroot/polybench) Thanks, skimo |
From: Sven V. <sk...@ko...> - 2012-02-03 22:10:59
|
On Fri, Feb 03, 2012 at 02:37:44PM -0500, Louis-Noel Pouchet wrote: > I am very reluctant to say yes. The problem is the following: using this kind of syntax makes much more complex the task of simplistic source code analyzer, because of the "function call" in the loop bounds. Indeed, unless using a CPP, the macro is interpreted as a function here by tools such as Clan (at least it's current version embedded in PoCC). > > I see maybe a possibility: > - in the benchmark header file: > #define paramN POLYBENCH_SCALARBOUNDS_SELECT(N,n) > > - in the source file: > > #pragma scop > - for (i = 0; i < n; i++) > - for (j = 0; j < n; j++) > + for (i = 0; i < paramN; i++) > + for (j = 0; j < paramN; j++) > x1[i] = x1[i] + A[i][j] * y_1[j]; > > ... > > > Would that work for you? Sure. I just didn't know what name to pick. Could you make these changes, or do you want me to send in a patch? (I could ask the student, but it would probably take me more time to explain it to him than to just do it myself.) skimo |
From: Louis-Noel P. <po...@cs...> - 2012-02-03 20:03:14
|
On Feb 1, 2012, at 5:50 AM, Sven Verdoolaege wrote: > Hi, > > The lastest PolyBench has proper support for parametric benchmarks, > i.e., if you define POLYBENCH_USE_C99_PROTO, both iteration domains > and array sizes are parametric. > > However, there does not seem to be any way of obtaining instances > of the benchmarks for specific values of the parameters. > If you don't define POLYBENCH_USE_C99_PROTO the arrays have > a fixed size, but the iteration domains are still parametric. > > When generating CUDA code, you typically want to generate code > for specific values of the parameters. If the iteration domains > are still parametric, you just get a bunch of useless extra > conditions on the parameters, slowing down the execution. > > Would it be possible to apply something like this to > all benchmarks? I am very reluctant to say yes. The problem is the following: using this kind of syntax makes much more complex the task of simplistic source code analyzer, because of the "function call" in the loop bounds. Indeed, unless using a CPP, the macro is interpreted as a function here by tools such as Clan (at least it's current version embedded in PoCC). I see maybe a possibility: - in the benchmark header file: #define paramN POLYBENCH_SCALARBOUNDS_SELECT(N,n) - in the source file: #pragma scop - for (i = 0; i < n; i++) - for (j = 0; j < n; j++) + for (i = 0; i < paramN; i++) + for (j = 0; j < paramN; j++) x1[i] = x1[i] + A[i][j] * y_1[j]; ... Would that work for you? A solution like that allows "real" compilers to end up with the actual scalar value associated to N in the loop nest, after some CPP, and "toy" parsers to still see a simple expression for the loop nest. I'm interested in any other suggestion that: - preserves simple expression for the loop bounds in between the #pragma scop/endscop - default to a variable symbol in the code (eg, 'n') - allows with some flag to have the scalar value for the loop bound Thanks, ++ > > diff --git a/trunk/linear-algebra/kernels/mvt/mvt.c b/trunk/linear-algebra/kernels/mvt/mvt.c > index 619599d..b2a6f7d 100644 > --- a/trunk/linear-algebra/kernels/mvt/mvt.c > +++ b/trunk/linear-algebra/kernels/mvt/mvt.c > @@ -72,11 +72,11 @@ void kernel_mvt(int n, > int i, j; > > #pragma scop > - for (i = 0; i < n; i++) > - for (j = 0; j < n; j++) > + for (i = 0; i < POLYBENCH_C99_SELECT(N,n); i++) > + for (j = 0; j < POLYBENCH_C99_SELECT(N,n); j++) > x1[i] = x1[i] + A[i][j] * y_1[j]; > - for (i = 0; i < n; i++) > - for (j = 0; j < n; j++) > + for (i = 0; i < POLYBENCH_C99_SELECT(N,n); i++) > + for (j = 0; j < POLYBENCH_C99_SELECT(N,n); j++) > x2[i] = x2[i] + A[j][i] * y_2[j]; > #pragma endscop > > Or do you have any other suggestions for obtaining fixed size instances? > > (Of course, we can always plug in the value of n after extracting a model, > but this is error-prone since we need to ensure that the values of n and N > are the same.) > > Thanks, > > skimo > > Btw, http://www.cse.ohio-state.edu/~pouchet/software/polybench/ > doesn't seem to mention the mailing list. -- Louis-Noel Pouchet po...@cs... |
From: Sven V. <sk...@ko...> - 2012-02-01 11:22:11
|
Hi, The lastest PolyBench has proper support for parametric benchmarks, i.e., if you define POLYBENCH_USE_C99_PROTO, both iteration domains and array sizes are parametric. However, there does not seem to be any way of obtaining instances of the benchmarks for specific values of the parameters. If you don't define POLYBENCH_USE_C99_PROTO the arrays have a fixed size, but the iteration domains are still parametric. When generating CUDA code, you typically want to generate code for specific values of the parameters. If the iteration domains are still parametric, you just get a bunch of useless extra conditions on the parameters, slowing down the execution. Would it be possible to apply something like this to all benchmarks? diff --git a/trunk/linear-algebra/kernels/mvt/mvt.c b/trunk/linear-algebra/kernels/mvt/mvt.c index 619599d..b2a6f7d 100644 --- a/trunk/linear-algebra/kernels/mvt/mvt.c +++ b/trunk/linear-algebra/kernels/mvt/mvt.c @@ -72,11 +72,11 @@ void kernel_mvt(int n, int i, j; #pragma scop - for (i = 0; i < n; i++) - for (j = 0; j < n; j++) + for (i = 0; i < POLYBENCH_C99_SELECT(N,n); i++) + for (j = 0; j < POLYBENCH_C99_SELECT(N,n); j++) x1[i] = x1[i] + A[i][j] * y_1[j]; - for (i = 0; i < n; i++) - for (j = 0; j < n; j++) + for (i = 0; i < POLYBENCH_C99_SELECT(N,n); i++) + for (j = 0; j < POLYBENCH_C99_SELECT(N,n); j++) x2[i] = x2[i] + A[j][i] * y_2[j]; #pragma endscop Or do you have any other suggestions for obtaining fixed size instances? (Of course, we can always plug in the value of n after extracting a model, but this is error-prone since we need to ensure that the values of n and N are the same.) Thanks, skimo Btw, http://www.cse.ohio-state.edu/~pouchet/software/polybench/ doesn't seem to mention the mailing list. |