#266 Bus Error when running fsck.jfs on sparc

bug
open
nobody
fsck (38)
5
2014-08-18
2008-09-16
No

The following problem has been reported by Ivan Jager to the Debian BTS as #449078 (http://bugs.debian.org/499078/):

I lost power on my server a while back. Fortunately, I am running a journaling
filesystem, JFS, so it shouldn't be a big deal. Unfortunately, fsck.jfs
dies with a bus error when trying to replay the journal on my root filesystem.

I got my system running again, by copying the root FS to an amd64 box (which
doesn't enforce alignment requirements), running fsck.jfs on it, and copying
it back. (I kept a copy of the original filesystem so I could reproduce the
bug.)

I have a patch which fixes the first unaligned access, but there is at least
one more.

Here is the original backtrace (I did apt-get source and compiled with -g):
aij@ypane:~$ gdb ~/deb/jfsutils-1.1.11/fsck/jfs_fsck
GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-linux-gnu"...Using host libthread_db library "/lib/v9/libthread_db.so.1".

(gdb) run md0.orig
Starting program: /home/aij/deb/jfsutils-1.1.11/fsck/jfs_fsck md0.orig
/home/aij/deb/jfsutils-1.1.11/fsck/jfs_fsck version 1.1.11, 05-Jun-2006
processing started: 9/15/2008 18.43.40
Using default parameter: -p
The current device is: md0.orig
Block size in bytes: 4096
Filesystem size in blocks: 1464816
**Phase 0 - Replay Journal Log

Program received signal SIGBUS, Bus error.
0x0005f5f0 in ujfs_swap_superblock (sblk=0xffff232c) at jfs_endian.c:554
554 sblk->s_size = __le64_to_cpu(sblk->s_size);
(gdb) bt
#0 0x0005f5f0 in ujfs_swap_superblock (sblk=0xffff232c) at jfs_endian.c:554
#1 0x00059df0 in ujfs_put_superblk (fp=0x1786c8, sb=0xb4568, is_primary=1)
at super.c:172
#2 0x0005562c in phase0_processing () at xchkdsk.c:1877
#3 0x00050ef0 in main (argc=2, argv=0xffff34f4) at xchkdsk.c:333
(gdb)

The patch to fix that (buf needs proper alignment):
----- cut here -----
--- libfs/super.c.orig 2005-11-22 15:43:55.000000000 -0500
+++ libfs/super.c 2008-09-15 21:03:27.936428396 -0400
@@ -162,14 +162,14 @@
*/
int ujfs_put_superblk(FILE *fp, struct superblock *sb, int16_t is_primary)
{
- char buf[SIZE_OF_SUPER];
+ struct superblock buf[(SIZE_OF_SUPER+sizeof(*sb)-1)/sizeof(*sb)];
int rc;

memset(buf, 0, SIZE_OF_SUPER);
memcpy(buf, sb, sizeof (*sb));

/* swap if on big endian machine */
- ujfs_swap_superblock((struct superblock *) buf);
+ ujfs_swap_superblock(buf);

rc = ujfs_rw_diskblocks(fp, (is_primary ? SUPER1_OFF : SUPER2_OFF),
SIZE_OF_SUPER, buf, PUT);
------ cut here -----

And a new backtrace:
aij@ypane:~$ gdb ~/deb/jfsutils-1.1.11/fsck/jfs_fsck
GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-linux-gnu"...Using host libthread_db library "/usr/lib/debug/libthread_db.so.1".

(gdb) run md0.orig
Starting program: /home/aij/deb/jfsutils-1.1.11/fsck/jfs_fsck md0.orig
/home/aij/deb/jfsutils-1.1.11/fsck/jfs_fsck version 1.1.11, 05-Jun-2006
processing started: 9/15/2008 21.9.54
Using default parameter: -p
The current device is: md0.orig
Block size in bytes: 4096
Filesystem size in blocks: 1464816
**Phase 0 - Replay Journal Log

Program received signal SIGBUS, Bus error.
0x0006a6e8 in updatePage (ld=0xffa6f2c0, logaddr=1572600) at log_work.c:2809
2809 rc = markImap(&vopen[vol].fsimap_lst,
(gdb) bt
#0 0x0006a6e8 in updatePage (ld=0xffa6f2c0, logaddr=1572600)
at log_work.c:2809
#1 0x00065ae0 in doAfter (ld=0xffa6f2c0, logaddr=1572600) at log_work.c:633
#2 0x00061a98 in jfs_logredo (pathname=0xffa6f5f8 "md0.orig", fp=0x1786c8,
use_2nd_aggSuper=0) at logredo.c:691
#3 0x00055664 in phase0_processing () at xchkdsk.c:1880
#4 0x00050ef0 in main (argc=2, argv=0xffa6f4d4) at xchkdsk.c:333
(gdb) print dip
$1 = (struct dinode *) 0xbe794
(gdb) print &dip->di_ixpxd
$2 = (pxd_t *) 0xbe7a4
(gdb) print sizeof(pxd_t)
$3 = 8
(gdb) x/i 0x0006a6e8
0x6a6e8 <updatePage+4300>: ldd [ %g1 + 0x10 ], %g2
(gdb) print/x $g1
$5 = 0xbe794
(gdb)

Clearly the problem here is that dip is only 4 byte aligned, but it has an
8-byte field, which requires 8-byte alignment.

Fixing that may be as simple as 8-byte aligning afterdata, but I'm yet unsure
about all the other variables dip depends on... I'll be happy to test any
patches for this. ;)

Discussion

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks