From: <er...@he...> - 2002-10-10 17:22:40
|
On Wed, Oct 09, 2002 at 07:47:30PM -0600, Wilton Wong wrote: > We have been trying to integrate this for sometime without much success, there > seems to be a deadlock in the kernel, somewhere someone is locking using the > wrong lock or in the wrong context or something.. when we run more than one > process per node, in this case "bpsh <node> yes".. eventually (within a matter > of seconds the kernel is too busy to handle and requests such as responding to > the bproc heartbeat) > > A forced kernel stack dump using lcrash reveals: > > .... > dc61c000 0 1462 1418 0x01 0x00000000 0:0 yes > dc64c000 0 1463 1415 0x00 0x00000040 402:127 yes > dc5e6000 0 1464 1418 0x02 0x00000000 25:6 yes > dc5c2000 0 1465 1415 0x00 0x00000040 402:127 yes > >> trace dc5e6000 > ================================================================ > STACK TRACE FOR TASK: 0xdc5e6000(yes) > > 0 schedule+901 [0xc01197d5] > 1 schedule_timeout+18 [0xc0126582] > 2 [bproc]bproc_response_wait+115 [0xe08c3697] > 3 [bproc]send_process+163 [0xe08c20e3] > 4 [bproc]do_execmove+126 [0xe08c6eee] > 5 [bproc]do_bproc+980 [0xe08c7744] > 6 system_call+44 [0xc0108f94] > ebx: 00000000 ecx: 00000000 edx: 00000000 esi: 00000000 > edi: 00000000 ebp: 00000000 eax: 00000000 ds: 002b > es: 002b eip: 40000b50 cs: 0023 eflags: 00000216 > esp: bffffb50 ss: 002b > ================================================================ > > And of course if we remove the O(1) scheduler everything works fine.. any help > in where to look for this problem would be appreciated. If you're eventually falling down on a ping timeout, it sounds like something is making a bad scheduling decision. Try commenting out this snippet from the slave daemon and see if things get better. p.sched_priority = 1; if (sched_setscheduler(0, SCHED_FIFO, &p)) syslog(LOG_NOTICE, "Failed to set real-time scheduling for" " slave daemon.\n"); That's the only even vaguely odd scheduling thing BProc does. For the rest it's just very uninteresting wait queue and task status (running, interruptible, etc.) stuff. - Erik |