Re: [Osgmm-discuss] Setting low rank on <site> because we have not heard from condor_q
Brought to you by:
mats_rynge
From: Mats R. <ry...@IS...> - 2009-12-30 03:49:03
|
Peter Doherty wrote: > I'm getting a lot of warnings in the osgmm.log file like this: > > Setting low rank on LIGO_UWM_NEMO_osg-nemo-ce.phys.uwm.edu because we > have not heard from condor_q > > What is causing this? Is there a timeout built into OSGMM around > condor_q? > Our system has a lot of jobs in the queue right now, and just issuing > condor_q takes about 10-15 seconds to return. Yes, the many jobs in the queue is causing condor_q to time out. I have made some improvements in later OSGMM versions, but the issues has not been fully solved. Occasionally seeing this message is fine, but if you see it all the time, that is a problem. > Is there any way to increase the logging of OSGMM so that I can try > and figure out what parameters the OSGMM is using to set ranks on > sites? It doesn't seem clear to me why certain sites got the rank > they did sometimes. Not currently. I have been thinking about putting an explanation in the classad so that that it could be seen with condor_grid_overview. Would you prefer that over have more information in the logs? -- Mats Rynge USC/ISI <http://www.isi.edu> |