From: Steve L. (JIRA) <ji...@sm...> - 2010-09-10 13:12:23
|
Race condition still in TestCompoundImpl ---------------------------------------- Key: SFOS-1526 URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 Project: SmartFrog Issue Type: Bug Components: .sfCore Affects Versions: 3.17.x Environment: OS/X Reporter: Steve Loughran Assignee: Steve Loughran Fix For: 3.17.x Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.smartfrog.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-09-10 13:14:23
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12275#action_12275 ] Steve Loughran commented on SFOS-1526: -------------------------------------- Test was org.smartfrog.test.system.passwords.PasswordsTest Test failed: (unknown) -TestCompletedEvent at Fri Sep 10 13:51:21 BST 2010 alive: true status: Termination Record: HOST homemac:rootProcess:testPropertyPassword:tests, type: normal, description: null; A child that was neither an action or a test failed Create a system property and check that the property password provider can read it succeeded:false forcedTimeout:false skipped:false Termination Record: HOST homemac:rootProcess:testPropertyPassword:tests, type: normal, description: null; A child that was neither an action or a test failed org.smartfrog.test.TerminationRecordException: Test failed: (unknown) -TestCompletedEvent at Fri Sep 10 13:51:21 BST 2010 alive: true status: Termination Record: HOST homemac:rootProcess:testPropertyPassword:tests, type: normal, description: null; A child that was neither an action or a test failed Create a system property and check that the property password provider can read it succeeded:false forcedTimeout:false skipped:false Termination Record: HOST homemac:rootProcess:testPropertyPassword:tests, type: normal, description: null; A child that was neither an action or a test failed Termination Record: HOST homemac:rootProcess:testPropertyPassword:tests, type: normal, description: null; A child that was neither an action or a test failed at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317) at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340) at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRun(DeployingTestBase.java:426) at org.smartfrog.test.system.passwords.PasswordsTest.testPropertyPassword(PasswordsTest.java:47) > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.17.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.17.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.smartfrog.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-09-10 13:16:33
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12276#action_12276 ] Steve Loughran commented on SFOS-1526: -------------------------------------- Note the child id, "tests" is that of the tests, and it was a normal termination. So, the test finished normally, but the TestCompoundImpl class didn't think it was what was expected because it didn't match its reference to the test child > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.17.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.17.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.smartfrog.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-09-10 13:23:23
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12277#action_12277 ] Steve Loughran commented on SFOS-1526: -------------------------------------- This is a follow on from SFOS-1510; interesting that it's the same machine playing up in both cases. The OS/JVM/CPU options may trigger it. > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.17.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.17.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.smartfrog.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-12-10 14:16:40
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12321#comment-12321 ] Steve Loughran commented on SFOS-1526: -------------------------------------- again, surfaces repeatedly on the mac@home, never on the work JVMs. Assume: platform or networking problems trigger a race condition. org.smartfrog.test.TerminationRecordException: Test failed: (unknown) -TestCompletedEvent at Fri Dec 10 14:10:59 GMT 2010 alive: true status: Termination Record: HOST homemac:rootProcess:testFilesDuplicatesDropped:tests, type: normal, description: Empty Sequence; A child that was neither an action nor a test terminated ( testPrim = null actionPrim=FilesCompoundImpl_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f89, 4515414531947255261]]]] terminatingChild=Sequence_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f7e, 4013794299856267286]]]]) test that the files component finds files and that duplicates are merged succeeded:false forcedTimeout:false skipped:false Termination Record: HOST homemac:rootProcess:testFilesDuplicatesDropped:tests, type: normal, description: Empty Sequence; A child that was neither an action nor a test terminated ( testPrim = null actionPrim=FilesCompoundImpl_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f89, 4515414531947255261]]]] terminatingChild=Sequence_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f7e, 4013794299856267286]]]]) Termination Record: HOST homemac:rootProcess:testFilesDuplicatesDropped:tests, type: normal, description: Empty Sequence; A child that was neither an action nor a test terminated ( testPrim = null actionPrim=FilesCompoundImpl_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f89, 4515414531947255261]]]] terminatingChild=Sequence_Stub[UnicastRef [liveRef: [endpoint:[192.168.1.88:53196](local),objID:[b61ffca:12cd09ea6d4:-7f7e, 4013794299856267286]]]]) at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317) at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340) at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRun(DeployingTestBase.java:426) at org.smartfrog.test.system.filesystem.files.FilesCompoundTest.testFilesDuplicatesDropped(FilesCompoundTest.java:57) > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.18.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.18.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-12-10 14:19:39
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12322#comment-12322 ] Steve Loughran commented on SFOS-1526: -------------------------------------- Incidentally, this is the test prim that is terminating, the test here is tests extends Sequence { } and what we are seeing terminate is an empty test. So the testsPrim isn't being updated between the child deploying (and clearly terminating immediately). Yet 1. the tests prim is set in a synchronized block: assignment then deploy 2. the tests prim is queried in a non sync method. Hypothesis: the tests prim (and other shared prims) must be marked as volatile to prevent them being cached. > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.18.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.18.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira |
From: Steve L. (JIRA) <ji...@sm...> - 2010-12-10 14:45:40
|
[ http://jira.smartfrog.org/jira/browse/SFOS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SFOS-1526. ---------------------------------- Resolution: Fixed Compatibility: backwards compatible (was: unknown) the cause here was cached values weren't marked as volatile so when accessed in a non-synchronized method the new values weren't picked up. Changed by (a) marking as volatile, and then, to make really sure, having a synchronized copy of the values into local variables. That's overkill, but means that the test runs would catch any re-entrancy bug between the startTests() method and onChildTerminated(), that being the core problem. It doesn't just work, we'd know when things went wrong. > Race condition still in TestCompoundImpl > ---------------------------------------- > > Key: SFOS-1526 > URL: http://jira.smartfrog.org/jira/browse/SFOS-1526 > Project: SmartFrog > Issue Type: Bug > Components: .sfCore > Affects Versions: 3.18.x > Environment: OS/X > Reporter: Steve Loughran > Assignee: Steve Loughran > Fix For: 3.18.x > > > Getting an intermittent failure on a test -it failed on the big run, but not standalone. That and the message implies there's still a race condition in TestCompoundImpl, in which it gets confused if a child test compound fails before it even notices that it has started -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira |