|
From: Aaron J. <aja...@re...> - 2014-04-26 06:07:08
|
I honestly have no idea what the optimizer is doing, however, I have isolated the behavior down to a simple change that eliminates the problem for the -O2 optimization.
for (index = 0; index < arrayP->numProcs; index++)
{
if (arrayP->pgprocnos[index] == proc->pgprocno)
{
/* Keep the PGPROC array sorted. See notes above */
memmove(&arrayP->pgprocnos[index], &arrayP->pgprocnos[i$
(arrayP->numProcs - index - 1) * sizeof$
arrayP->pgprocnos[arrayP->numProcs - 1] = -1; $
arrayP->numProcs--;
LWLockRelease(ProcArrayLock);
return;
}
}
/* Ooops */
LWLockRelease(ProcArrayLock);
elog(LOG, "ProcArrayRemove(post-test): %p", &index);
elog(LOG, "failed to find proc %p in ProcArray", proc);
}
The *only* change I made is to log the pointer to the index after the loop. I tried many things, but it was a necessity to do an operation that forced the evaluation of index's address.
Hope this helps,
Aaron
________________________________
From: Aaron Jackson [aja...@re...]
Sent: Friday, April 25, 2014 4:26 PM
To: pos...@li...
Subject: Re: [Postgres-xc-general] failed to find proc - increasing numProcs
It's quite possible I'm missing something obvious, but here is how I've modified procarray.c - the idea was to capture the values that were failing to understand why it was failing.
void
ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
{
ProcArrayStruct *arrayP = procArray;
int index;
int _xNumProcs;
int _xIndex;
...
for (index = 0; (_xIndex = index) < (_xNumProcs = arrayP->numProcs); index++)
{
if (arrayP->pgprocnos[index] == proc->pgprocno)
{
/* Keep the PGPROC array sorted. See notes above */
memmove(&arrayP->pgprocnos[index], &arrayP->pgprocnos[index + 1],
(arrayP->numProcs - index - 1) * sizeof(int));
arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
arrayP->numProcs--;
LWLockRelease(ProcArrayLock);
return;
}
}
/* Ooops */
LWLockRelease(ProcArrayLock);
elog(LOG, "ProcArrayRemove(post-test): %d | %d | %d | %d", _xIndex, _xNumProcs, arrayP->numProcs, _xIndex < _xNumProcs);
elog(LOG, "failed to find proc %p in ProcArray", proc);
}
With CFLAGS="" this works as expected. Once I set CFLAGS="-O2" (or anything else similar) it falls apart. For example, the fall through case triggered and it showed the following ...
ProcArrayRemove(post-test): 1 | 9 | 9 | 1
Which means the loop test should have succeeded. I could take this one step further and cache the result of the for loop, however, I can tell you from prior experience, _xIndex < _xNumProcs evaluated as FALSE. Really not sure what the compiler is doing to draw that conclusion from 1 < 9.
Aaron
________________________________
From: Aaron Jackson [aja...@re...]
Sent: Friday, April 25, 2014 3:05 PM
To: 鈴木 幸市
Cc: pos...@li...
Subject: Re: [Postgres-xc-general] failed to find proc - increasing numProcs
CFLAGS="-O2"
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9)
The failed evaluation occurs on line 421 of backend/storage/ipc/procarray.c
The test portion of the clause fails. I'm not entirely sure why gcc specifically fails, but if I were taking an educated guess, it would be that arrayP->numProcs was volatile and the resultant value of the test was optimized and cached. I've used several techniques (none of which I like) to fool gcc into believing the value is volatile and discarding the value of arrayP->numProcs. It concerns me more because the ProcArrayLock should be locked during this sequence.
Aaron
________________________________
From: 鈴木 幸市 [ko...@in...]
Sent: Sunday, April 13, 2014 7:55 PM
To: Aaron Jackson
Cc: pos...@li...
Subject: Re: [Postgres-xc-general] failed to find proc - increasing numProcs
Thank you Aaron for the detailed analysis. As long as the issue is just for XC, we need a fix for it to work correctly regardless the compiler optimization.
Did to locate where such wrong estimation takes place? And what compilation option did you use?
They are very helpful.
Best;
---
Koichi Suzuki
2014/04/12 11:40、Aaron Jackson <aja...@re...<mailto:aja...@re...>> のメール:
It appears that problem is a compiler optimization issue. I narrowed the issue down to the loop at the end of the ProcArrayRemove method. I'm not entirely sure why, but the compiler generated code that evaluates the test block of the loop improperly. Since changing the compiler options, the problem has been resolved.
Aaron
________________________________
From: Aaron Jackson [aja...@re...<mailto:aja...@re...>]
Sent: Friday, April 11, 2014 1:07 AM
To: pos...@li...<mailto:pos...@li...>
Subject: Re: [Postgres-xc-general] failed to find proc - increasing numProcs
I forgot to mention that if I injected a context switch (sleep(0) did the trick as did an elog statement) during the test in the ProcArrayRemove, that it no longer failed. Hopefully that will help in understanding the reasons why that may have triggered the ProcArrayRemove to succeed.
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees_______________________________________________
Postgres-xc-general mailing list
Pos...@li...
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
|