Re: [Aoetools-discuss] one array keeps disconnecting
Brought to you by:
ecashin,
elcapitansam
From: Scott R. <ska...@gm...> - 2007-06-19 19:05:19
|
OK, with that in mind, we've decided to upgrade our controller system (the one that mounts the four aoe arrays). The kernel is 2.6.15-1-686-smp on Debian 3.1. What I need to know is: - What kernel should I upgrade to? - What version of aoetools/vblade should I be using? - What version of xfsprogs should I be using? We're going to arrange for a maintenance window to bring everything down, upgrade, run repairs on all of the arrays, and bring them back online. One final concern. The arrays are all 5TB in size. I understand that xfs_repair had an issue with devices larger than 2TB. Does the later version correct this? On 6/18/07, Ed L. Cashin <ec...@co...> wrote: > On Fri, Jun 15, 2007 at 11:44:28PM -0700, Scott Raymond wrote: > > Actually, I did paste the output of df from my ssh session in my > > follow-up to Smart. > > > > I suppose I could have been more clear - I'm not a *nix n00b, just an > > aoetools/vblade n00b. :) > > Ah. Sorry for the confusion. More comments follow inline below. > > > We didn't see any unusual output in dmesg. > > However, I did find something in /var/log/messages that might hold a > > clue. Sorry for the large paste: > > > ... > > Jun 8 03:09:05 netstoragecontrol2 kernel: [pg0+950020086/1070175232] > > xfs_free_ag_extent+0xc4/0x604 [xfs] > ... > > Jun 8 03:09:05 netstoragecontrol2 kernel: [sys_ftruncate64+220/250] > > sys_ftruncate64+0xdc/0xfa > > Jun 8 03:09:05 netstoragecontrol2 kernel: [syscall_call+7/11] > > syscall_call+0x7 /0xb Jun 8 03:09:05 netstoragecontrol2 kernel: > > xfs_force_shutdown(etherd/e4.0,0x8) c alled from line 4125 of file > > fs/xfs/xfs_bmap.c. Return address = 0xf8d78068 Jun 8 03:22:17 > > netstoragecontrol2 kernel: xfs_force_shutdown(etherd/e4.0,0x1) c alled > > from line 339 of file fs/xfs/xfs_rw.c. Return address = 0xf8db4c71 > > > > All I can tell from that is that xfs didn't like something on the > > array and shut down the connection. But I'll be damned if I know what > > it was xfs didn't like. > > Again, there's no connection. XFS shut down the filesystem. Usually > when that happens it is because there's a problem with the > filesystem. Some kernels have bugs in the xfs driver, and so a course > of action might be to ... > > * look in the changelogs for your kernel to see whether there is a > new version that has XFS bugfixes, upgrading if so, > > * look for a new version of the xfstools, upgrading if a better one > is available, > > * use xfs_check and/or xfs_repair to make sure there's nothing wrong > with the filesystem, or (even better) ... > > * really make sure there's nothing wrong with the filesystem by > rebuilding it with "mkfs -t xfs ...". > > Good luck with your endeavors. XFS can be a good filesystem but if > you catch a bad release, it can result in some frustrating stumbling > blocks like this. > > -- > Ed L Cashin <ec...@co...> > -- Scott |