|
From: Geoff B. <geo...@gm...> - 2023-09-19 06:34:27
|
Hi Karl! This seems a plausible fix. I don't have any access to SGE myself any more so it's difficult for me to reproduce errors like this. Have you run the self-tests at all? (If you're working at my former employer Jeppesen - the only people I know of using TextTest and SGE - there are a few people who could help you with that) It would be best if you could submit the change in pull request format on github also. Regards, Geoff On Wed, Sep 13, 2023 at 4:44 PM Karl Koehler <ka...@ac...> wrote: > Hi, > > We are using texttest with SGE, and have found that there is a problem > when qacct is not fast enough. > What happens: > (1) the job completes > > (2) via a message from the slave, the status of the test is updated > ( masterprocess.py", line 1025, in handleRequestFromHost ) . This is an > independent thread. > > (3) masterprocess.py, updateJobStatus sees that the job is not in the > qstat any longer. > But in masterprocess.py:198, updateJobStatus, the jobComplete is not yet > true. > (4) setSlaveFailed will wait a long time for qacct to finally get the info > on the job, at which time step (2) has happened. > > Result: failures that are not quite real, and incorrect error messages in > the test.state.freeText and test.state.briefText. > > So, there are questions: > * Should there be a lock around Test.ChangeState ? > * And what do you think of the following work-around/solution for > the problem that SGE is too late, regardless of locking ? > > > > -bash-4.2$ diff -du > ~/texttest_latest/texttest-master/texttestlib/queuesystem/masterprocess.py > texttestlib/queuesystem/masterprocess.py > --- > /home/karlkoehler/texttest_latest/texttest-master/texttestlib/queuesystem/masterprocess.py > 2023-08-28 10:04:33.000000000 -0700 > +++ texttestlib/queuesystem/masterprocess.py 2023-09-12 > 17:00:14.795473685 -0700 > @@ -646,8 +646,11 @@ > return system > > def changeState(self, test, newState, previouslySubmitted=True): > - test.changeState(newState) > - self.handleLocalError(test, previouslySubmitted) > + # this has to check the test because otherwise slowness in sge > qacct will > + # set the state to failed and with the wrong message. > + if not test.state.isComplete(): > + test.changeState(newState) > + self.handleLocalError(test, previouslySubmitted) > > Thanks, > Karl Koehler > > > > _______________________________________________ > Texttest-users mailing list > Tex...@li... > https://lists.sourceforge.net/lists/listinfo/texttest-users > |