From: Kumar, S. <shr...@hp...> - 2010-07-07 15:25:53
|
Hi Peter, Thanks for this investigation, and the information about SCANCEL_VERBOSE. That looks like a good way to keep the code independent of explicit version requirements. If you have any SSM crash logs, that would be useful to understand any other issues in the code. Regards -- Shree -----Original Message----- From: Pet...@cs... [mailto:Pet...@cs...] Sent: Wednesday, July 07, 2010 12:24 PM To: viz...@li... Subject: Re: [vizstack-users] Vizstack and SLURM issue Hi Shree, It appears that the segfault is due to a syntax change in scancel in the current version of SLURM. VizStack uses, "scancel -q <jobid>" syntax which would kill a job in quiet mode in SLURM 2.0.9 and prior, but now -q relates to qos. -Q is the current parameter for quiet mode. Both these situations could be covered by using the SCANCEL_VERBOSE environment variable. I think the SSM was crashing due to an excess of jobs in the queue, I'll see how it goes with jobs being cancelled as they should. Regards, Peter ________________________________________ From: Kumar, Shree [shr...@hp...] Sent: Tuesday, 6 July 2010 2:37 PM To: viz...@li... Subject: Re: [vizstack-users] Vizstack and SLURM issue Hi Peter, That's the first time I have seen a SLURM segfault with VizStack. As part of the cleanup sequence, we issue a single scancel command that cancels the job. Looks like it is the first time you are seeing this error. Is it reproducible using the same steps that you have mentioned ?? Can you also check the following ? - Start the SSM - Start the viz-tvnc script - Let the session startup (You may connect to the session to verify) - Lookup the SLURM queue using "sinfo" - Cancel the SLURM job using "scancel" - Do you see a similar segfault ? I am trying to simulate the things the vsapi does here. Also, can you send me the SSM log when it terminates ? ( /var/log/vs-ssm.log ) I don't like SSM crashes, since it's a single point of failure ! Regards -- Shree -----Original Message----- From: Pet...@cs... [mailto:Pet...@cs...] Sent: Monday, July 05, 2010 2:09 PM To: viz...@li... Subject: [vizstack-users] Vizstack and SLURM issue Hello, I'm seeing an issue with Vizstack 1.1-2 and Slurm running under Ubuntu 10.04. I currently have two nodes running with the distro provided SLURM (2.1.0) configured as per the Vizstack manual. I can start viz-tvnc fine and an X server will be started on a node/GPU as expected but when terminating the session, the job remains in the SLURM queue and a message such as the following in the syslog: scancel[5701]: segfault at 0 ip 00007f6c22edc376 sp 00007fff1341ca08 error 4 in libc-2.11.1.so[7f6c22db6000+178000] I can manually clear the jobs with the scancel command. The SSM seems prone to terminating when there are such jobs left the the queue. Any ideas? Regards, Peter Peter Tyson CSIRO IM&T - Advanced Scientific Computing Gate 5 Normanby Road Clayton Vic 3168 Ph +61 3 9545 2021 ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ vizstack-users mailing list viz...@li... https://lists.sourceforge.net/lists/listinfo/vizstack-users ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ vizstack-users mailing list viz...@li... https://lists.sourceforge.net/lists/listinfo/vizstack-users ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ vizstack-users mailing list viz...@li... https://lists.sourceforge.net/lists/listinfo/vizstack-users |