Re: [Osgmm-discuss] Condor Negotiator Crashing
Brought to you by:
mats_rynge
From: Mats R. <ry...@re...> - 2009-06-22 19:18:37
|
Peter Doherty wrote: > I don't know what's going on here, but my jobs submitted to the > MatchMaker aren't being matched, and I found out the condor negotiator > keeps crashing. If I shut down osgmm, the negotiator keeps running, > but then if I start up osgmm, the negotiator crashes when it starts to > match one of my jobs. > Here are some of the errors I'm getting. > I'm not sure where to start with this. I haven't seen the negotiator crash before, but I have seen the hostname problem recently. Please try this preview of 0.7: http://www.renci.org/~rynge/osgmm-0.6.jar Replace the one you have in lib/ > NegotiatorLog > > 6/22 14:41:25 ****************************************************** > 6/22 14:41:25 ** condor_negotiator (CONDOR_NEGOTIATOR) STARTING UP > 6/22 14:41:25 ** /opt/osg-shared/se/app/site/condor-7.2.1/sbin/ > condor_negotiator > 6/22 14:41:25 ** SubsystemInfo: name=NEGOTIATOR type=NEGOTIATOR(4) > class=DAEMON(1) > 6/22 14:41:25 ** Configuration: subsystem:NEGOTIATOR local:<NONE> > class:DAEMON > 6/22 14:41:25 ** $CondorVersion: 7.2.1 Feb 18 2009 BuildID: 133382 $ > 6/22 14:41:25 ** $CondorPlatform: X86_64-LINUX_RHEL5 $ > 6/22 14:41:25 ** PID = 4322 > 6/22 14:41:25 ** Log last touched 6/22 14:36:34 > 6/22 14:41:25 ****************************************************** > 6/22 14:41:25 Using config source: /opt/osg-shared/se/app/site/condor/ > etc/condor_config > 6/22 14:41:25 Using local config sources: > 6/22 14:41:25 /opt/osg-local/condor/condor_config.local > 6/22 14:41:25 DaemonCore: Command Socket at <10.0.10.39:51423> > 6/22 14:41:25 About to rotate ClassAd log /opt/osg-local/condor/spool/ > Accountantnew.log > 6/22 14:41:25 NEGOTIATOR_SOCKET_CACHE_SIZE = 16 > 6/22 14:41:25 PREEMPTION_REQUIREMENTS = ( (CurrentTime - > EnteredCurrentState) > (1 * (60 * 60)) && RemoteUserPrio > > SubmittorPrio * 1.2 ) || (MY.NiceUser == True) > 6/22 14:41:25 ACCOUNTANT_HOST = None (local) > 6/22 14:41:25 NEGOTIATOR_INTERVAL = 25 sec > 6/22 14:41:25 NEGOTIATOR_TIMEOUT = 30 sec > 6/22 14:41:25 MAX_TIME_PER_SUBMITTER = 31536000 sec > 6/22 14:41:25 MAX_TIME_PER_PIESPIN = 31536000 sec > 6/22 14:41:25 PREEMPTION_RANK = (RemoteUserPrio * 1000000) - > TARGET.ImageSize > 6/22 14:41:25 NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED > 6/22 14:41:25 NEGOTIATOR_POST_JOB_RANK = None > 6/22 14:41:25 ---------- Started Negotiation Cycle ---------- > 6/22 14:41:25 Phase 1: Obtaining ads from collector ... > 6/22 14:41:25 Getting all public ads ... > 6/22 14:41:25 Sorting 175 ads ... > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Can't evaluate STARTD_AD_REEVAL_EXPR > target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, > treating as TRUE > 6/22 14:41:25 Getting startd private ads ... > 6/22 14:41:25 Got ads: 175 public and 123 private > 6/22 14:41:25 Public ads include 6 submitter, 137 startd > 6/22 14:41:25 Phase 2: Performing accounting ... > 6/22 14:41:25 ERROR "Assertion ERROR on > (resource_hash.insert( ResourceName, ResourceAd ) == 0)" at line 785 > in file Accountant.cpp > > > > > after starting up osgmm: > > [root@abitibi condor]# /etc/init.d/osgmm start > Starting up OSGMM > [root@abitibi condor]# Exception in thread "Thread-1" > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.String.substring(String.java:1768) > at org.renci.osgmm.Site.getHostName(Site.java:141) > at org.renci.osgmm.Sites.addSite(Sites.java:106) > at org.renci.osgmm.ReSS.processReSSAd(ReSS.java:228) > at org.renci.osgmm.ReSS.pullReSS(ReSS.java:178) > at org.renci.osgmm.ReSS.run(ReSS.java:102) > > > > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Osgmm-discuss mailing list > Osg...@li... > https://lists.sourceforge.net/lists/listinfo/osgmm-discuss > -- Mats Rynge Renaissance Computing Institute <http://www.renci.org> |