Re: [Texttest-users] Texttest-error with multiple queuesystems

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi again Karl,

Much the same observations as for your previous message. (I have no
possibility to test this myself - please run self-tests and submit a pull
request, and preferably write a new self-test).

With the added observation that I think this setup - with some tests
running locally and others via a queuesystem in the same run - is not a
scenario I have ever run myself and I would not be surprised to encounter
other problems with it than the ones you have found.
A simple fix is to run TextTest separately on test suites / applications
that need to run sequentially.

Regards,
Geoff

On Fri, Sep 15, 2023 at 5:59 PM Karl Koehler <ka...@ac...> wrote:

> Hi everyone,
>
> as you know, each testsuite can be configured with it's own queuesystem.
> This is useful if you have some tests that run fast, and some tests that
> run slowly - you want the fast jobs to be on the local machine because the
> SGE queuing time would be likely longer than the test execution time.
> Thus there are testsuites with:
>   config_module:queuesystem
>   queue_system_module:SGE
> and testsuites with
>   queue_system_module:local
>
> Now here's the bug in texttest: When looking for "all jobs complete", we
> only look at the queuesystem for test[0]. If that is "local" and we have
> SGE tests, this has the effect that we try to attribute error-states to all
> other tests, then quit early.
> Thus:
>
> ---
> ~/texttest_latest/texttest-master/texttestlib/queuesystem/masterprocess.py
> 2023-08-28 10:04:33.000000000 -0700
> +++ queuesystem/masterprocess.py        2023-09-15 08:30:15.505148514 -0700
> @@ -183,21 +183,52 @@
>          return queueSystem.supportsPolling()
>
>      def updateJobStatus(self):
> -        queueSystem = self.getQueueSystem(list(self.jobs.keys())[0])
> -        statusInfo = queueSystem.getStatusForAllJobs()
> +        ##
> +        # set of queueSystems
> +        statusInfo = dict()
> +        qsSet = set()
> +        for test in self.jobs.keys():
> +            qsSet.add(self.getQueueSystem(test))
> +        for queueSystem in qsSet :
> +            statusInfo.update(queueSystem.getStatusForAllJobs())
> +
>          self.diag.info("Got status for all jobs : " + repr(statusInfo))
>          if statusInfo is not None:  # queue system not available for some
> reason
>
>
> I'd like to say that probably making the following thing a two-pass loop
> is better, waiting for all jobs to complete before we call "qacct". There
> is a gap in time between SGE job completion and the job appearing on qacct,
> and thus if we wait for the queue to be empty we are less likely to spend
> effort waiting for qacct to be ready for individual failed jobs, so we can
> update actually running and passing job status more expediently:
>
>
> ---
> ~/texttest_latest/texttest-master/texttestlib/queuesystem/masterprocess.py
> 2023-08-28 10:04:33.000000000 -0700
> +++ queuesystem/masterprocess.py        2023-09-15 08:51:15.828557223 -0700
> @@ -183,21 +183,38 @@
>          return queueSystem.supportsPolling()
>
>      def updateJobStatus(self):
> -        queueSystem = self.getQueueSystem(list(self.jobs.keys())[0])
> -        statusInfo = queueSystem.getStatusForAllJobs()
> +        ##
> +        # set of queueSystems
> +        statusInfo = dict()
> +        qsSet = set()
> +        for test in self.jobs.keys():
> +            qsSet.add(self.getQueueSystem(test))
> +        for queueSystem in qsSet :
> +            statusInfo.update(queueSystem.getStatusForAllJobs())
> +
>          self.diag.info("Got status for all jobs : " + repr(statusInfo))
>          if statusInfo is not None:  # queue system not available for some
> reason
> +            ##
> +            # setSlaveFailed only if there are no more jobs running.
> +            activejobs = 0
>              for test, jobs in list(self.jobs.items()):
>                  if not test.state.isComplete():
>                      for jobId, jobName in jobs:
>                          status = statusInfo.get(jobId)
>                          if status:
> +                            activejobs += 1
>                              # Only do this to test jobs (might make a
> difference for derived configurations)
>                              # Ignore filtering states for now, which have
> empty 'briefText'.
>                              self.updateRunStatus(test, status)
> -                        elif not status and not self.jobCompleted(test,
> jobName):
> -                            # Do this to any jobs
> -                            self.setSlaveFailed(test,
> self.jobStarted(test, jobName), True, jobId)
> +            if activejobs == 0:
> +                for test, jobs in list(self.jobs.items()):
> +                    if not test.state.isComplete():
> +                        for jobId, jobName in jobs:
> +                            status = statusInfo.get(jobId)
> +                            if not status and not self.jobCompleted(test,
> jobName):
> +                                print("state of %s : %s" % (str(test),
> test.state.category))
> +                                # Do this to any jobs
> +                                self.setSlaveFailed(test,
> self.jobStarted(test, jobName), True, jobId)
>
>
>
> Similar with the cleanup function.
>
> @@ -391,8 +408,16 @@
>      def cleanup(self, final=False):
>          cleanupComplete = True
>          if self.jobs:
> -            queueSystem = self.getQueueSystem(list(self.jobs.keys())[0])
> -            cleanupComplete &= queueSystem.cleanup(final)
> +            ## multi-queue-system
> +            #
> +            qsSet = set()
> +            for test in self.jobs.keys():
> +                qsSet.add(self.getQueueSystem(test))
> +            for queueSystem in qsSet :
> +                cleanupComplete &= queueSystem.cleanup(final)
> +            #
> +            ##
>
> Thanks,
>
>  - Karl Koehler
>
>
>
> _______________________________________________
> Texttest-users mailing list
> Tex...@li...
> https://lists.sourceforge.net/lists/listinfo/texttest-users
>