It was reported that datanode autovacuum crashes occasionally. Crash is caused in SEGV in malloc(). It is highly probable that this was caused by preceding buffer overflow/underflow.
Tracing buffer overflow by e-fence suggested one buffer overflow and found code section in question. Testing improved code to see if no more buffer overflow is observed and the result will be reported with a patch.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The previous work solved DBT-2 long term test and autovacuum problem. But it was found that this patch crashes many regression tests. The fix in the code was not completed. I will re-open this issue.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By reviewing procarray.c code carefully, I found there are many more code which does not handle XC specifics (size of xip member of SnapshotData should not be fixed as maxProcs. It could be even larger).
Will submit the patch to fix it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tracing buffer overflow by e-fence suggested one buffer overflow and found code section in question. Testing improved code to see if no more buffer overflow is observed and the result will be reported with a patch.
768402786001578ff4d938cf69d5ebc7bafcb266 fixes it for REL1_0_STABLE.
6457e8d7b2bda1081c44e0242aea00a05925f8a1 fixes it for the master.
This is a patch to fix autovacuum crash for REL1_0_STABLE branch. Regression tested. DBT-2 and e-fence check should be done before commit.
The previous work solved DBT-2 long term test and autovacuum problem. But it was found that this patch crashes many regression tests. The fix in the code was not completed. I will re-open this issue.
By reviewing procarray.c code carefully, I found there are many more code which does not handle XC specifics (size of xip member of SnapshotData should not be fixed as maxProcs. It could be even larger).
Will submit the patch to fix it.
This is similar to procarray_c_20121206_02.patch but for the current master. Regression tested. DBT-2 and e-fence check should be done before commit.
Have uploaded two patches for the fix, for REL1_0_STABLE and master respectively. Both passed the regression. Will run DBT-2 with/without e-fence.
The patches procarray_c_20121207_01_master.patch and procarray_c_20121206_02.patch is the fix of the bug for REL1_0_STABLE and master respectively.
Commit ID for REL1_0_STABLE: 742ec027820a1cb4a72e7f30eefd8d3c895ef947
for master: 07e15efd88e29673064585cb5aeaf5043273ddcd