Re: [SSI-users] OpenSSI 1.0.0-rc1 for RH9!!!
Brought to you by:
brucewalker,
rogertsang
From: John B. <joh...@hp...> - 2004-01-22 02:20:52
|
Craig Pitout wrote: > Hi Guys, > > Almost there - I promise ;-) > > Could someone please advise me on how to solve the following issue (or at > least help me understand it better)? Below is the difference in output from > a 2 straces of 2 separate commands. It is a bit difficult to explain as no > one here has used Progress but basically they are console database access > tools. The database is called 'sports' and although there is a broker > involved they access is the db via the disk. (I also found that have the DB > in a directory of the / filesystem gave worse results than having it in > /var. I have a ext3/cfs on one machine and no drive in the other.) > > This is the first tool. I have compared the differences of 2 straces. (One > works, the other fails.) This is the line after the failure. (In other words > the failed strace doesn't show this; it just hangs the console.) > > open("/u/databases/sports.db", O_RDWR) = 6 > fcntl64(6, F_SETFD, FD_CLOEXEC) = 0 > > This is the second program: > semop(67469312, 0xbfffdbac, 1) = 0 > brk(0) = 0x84dc000 > brk(0x84dd000) = 0x84dd000 > statfs("/var/databases/sports.db", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, > f_blocks=2520130, f_bfree=2497392, f_files=1281696, f_ffree=1280361, > f_namelen=255}) = 0 > open("/var/databases/sports.db", O_RDWR) = 7 > fcntl64(7, F_SETFD, FD_CLOEXEC) = 0 > statfs("/var/databases/sports.db", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, > f_blocks=2520130, f_bfree=2497392, f_files=1281696, f_ffree=1280361, > f_namelen=255}) = 0 > open("/var/databases/sports.db", O_RDWR|O_SYNC) = 8 > fcntl64(8, F_SETFD, FD_CLOEXEC) = 0 > statfs("/var/databases/sports.b1", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, > f_blocks=2520130, f_bfree=2497392, f_files=1281696, f_ffree=1280361, > f_namelen=255}) = 0 > > Has anyone any hints about what could be causing this type of problem? It is > closed source and Progress wouldn't support this so I can't ask them... > > The cluster is working great otherwise. Really awesome development guys - > you should all be very proud! > > Thanks again > > Cheers > > Craig Pitout If I understand the straces properly, I don't understand what is going on. When you say the console hangs, is the system totally hung or just the console? If the system is dead, then there is probably a locking deadlock somewhere. You might be able to find out what it is by adding "nmi_watchdog=1" to your boot arguments. If the system deadlocks with blocked interrupts, this will break it into the debugger. The "bt" command at this point may get us some useful information. If the console hangs, but the system is active; about the only thing I can suggest is that you look into setting up a netdump server and generating a netdump. If we can get the rather large resulting file somehow, we might be able to find something out from that. John Byrne |