Re: [Osgmm-discuss] Condor Negotiator Crashing
Brought to you by:
mats_rynge
From: Peter D. <do...@cr...> - 2009-06-23 21:26:34
|
On Jun 22, 2009, at 3:18 PM, Mats Rynge wrote: > > I haven't seen the negotiator crash before, but I have seen the > hostname problem recently. Please try this preview of 0.7: > > http://www.renci.org/~rynge/osgmm-0.6.jar > > Replace the one you have in lib It turns out this version breaks the verification runs for me. Are there updated scripts to go along with it? For example the fork.condor file in ~osgmm/var/verification-runs/SITE-NAME listed the executable as "fork.script.123591332490" but that executable didn't exist anywhere. I've reverted back to the older version for the time being. I cleared out everything in ~osgmm/var/final-ads, verification-runs, maintenance-runs and restarted the match maker. And then I restarted Condor. This helped. But I'm having trouble figuring out why the verification tests aren't working right anymore. The Ranks for all the sites are low (1 or 3) although the Success score is 100%. And several sites aren't even being tested. It would really be helpful to me to get more logging information showing why a site was dropped from the list, and why a test can complete with TEST SUCCESSFUL, but the site Rank is still 1. Like our site SBGrid-Harvard-East is no longer in my list from condor_grid_overview, and since it doesn't have a directory under verification-runs, I can't see the output from the tests. Restarting the MatchMaker seems to clear out the osgmm.log file without rolling it over. So after a few restarts this afternoon I now have a huge gap in the log files, and perhaps that's where the answer is why the East site was dropped. --Peter |