Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#288 system hangs after several fsck_ufs tasks

v0.686bx
open
Volker
9
2013-12-20
2007-12-19
Wanninger
No

Hi,

after a fresh install of 686b3, the system recognizes
the need of fsck on all disks installed in the system.

After system boot the fsck_ufs commands for four disk
are going into background, and a few minutes later,
the system hangs. Not a hard hang, but it is not
accessable any longer from ssh nor webiface.

The last time I waited 10 hours, but the system state
did not change, so I made a powercycle.

The system now acts like a endless loop.

see ps ax below:

nas0:~# ps ax
PID TT STAT TIME COMMAND
0 ?? WLs 0:00.01 [swapper]
1 ?? SLs 0:00.01 /sbin/init --
2 ?? DL 0:00.02 [g_event]
3 ?? DL 0:00.30 [g_up]
4 ?? DL 0:00.31 [g_down]
5 ?? DL 0:00.00 [crypto]
6 ?? DL 0:00.00 [crypto returns]
7 ?? DL 0:00.00 [kqueue taskq]
8 ?? DL 0:00.00 [thread taskq]
9 ?? DL 0:00.00 [acpi_task_0]
10 ?? RL 1:43.35 [idle: cpu1]
11 ?? RL 1:38.77 [idle: cpu0]
12 ?? WL 0:00.24 [swi4: clock sio]
13 ?? WL 0:00.00 [swi3: vm]
14 ?? WL 0:00.01 [swi1: net]
15 ?? DL 0:00.07 [yarrow]
16 ?? WL 0:00.00 [swi5: +]
17 ?? WL 0:00.00 [swi6: Giant taskq]
18 ?? WL 0:00.00 [swi6: task queue]
19 ?? DL 0:00.00 [acpi_task_1]
20 ?? DL 0:00.00 [acpi_task_2]
21 ?? WL 0:00.16 [swi2: cambio]
22 ?? WL 0:00.00 [irq9: acpi0]
23 ?? WL 0:00.00 [irq20: fxp0]
24 ?? WL 0:00.08 [irq17: atapci0]
25 ?? WL 0:00.01 [irq18: atapci1]
26 ?? WL 0:00.01 [irq19: atapci2]
27 ?? WL 0:04.69 [irq14: ata0]
28 ?? WL 0:00.00 [irq15: ata1]
29 ?? WL 0:00.01 [irq21: em0]
30 ?? WL 0:00.00 [irq31: em1]
31 ?? WL 0:00.00 [irq22: em2]
32 ?? RL 0:00.83 [irq30: em3 ehci0]
33 ?? WL 0:00.00 [irq23: uhci0]
34 ?? DL 0:00.00 [usb0]
35 ?? DL 0:00.00 [usbtask]
36 ?? WL 0:00.00 [irq29: uhci1]
37 ?? DL 0:00.00 [usb1]
38 ?? DL 0:00.00 [usb2]
39 ?? WL 0:00.00 [irq24: sym0]
40 ?? WL 0:00.00 [irq25: sym1]
41 ?? WL 0:00.00 [irq1: atkbd0]
42 ?? DL 0:00.00 [fdc0]
43 ?? WL 0:00.00 [swi0: sio]
44 ?? DL 0:00.00 [pagedaemon]
45 ?? DL 0:04.54 [pagezero]
46 ?? DL 0:00.00 [idlepoll]
47 ?? DL 0:00.28 [bufdaemon]
48 ?? DL 0:00.01 [vnlru]
49 ?? DL 0:00.01 [syncer]
50 ?? DL 0:00.01 [softdepflush]
51 ?? DL 0:00.01 [schedcpu]
812 ?? Ss 0:00.02 /usr/sbin/syslogd -ss -f /var/etc/syslogd.conf
819 ?? Ss 0:00.02 /usr/sbin/rpcbind
890 ?? Is 0:00.00 /usr/sbin/mountd -r -r /var/etc/exports
898 ?? Is 0:00.07 nfsd: master (nfsd)
899 ?? I 0:00.00 nfsd: server (nfsd)
900 ?? I 0:00.00 nfsd: server (nfsd)
902 ?? I 0:00.00 nfsd: server (nfsd)
903 ?? I 0:00.00 nfsd: server (nfsd)
908 ?? Ss 0:00.00 /usr/sbin/rpc.statd
913 ?? Ss 0:00.01 rpc.lockd: server (rpc.lockd)
924 ?? I 0:00.00 rpc.lockd: client (rpc.lockd)
945 ?? Ss 0:00.00 /usr/sbin/sshd -f /var/etc/ssh/sshd_config -h /var/etc/ssh/ssh_host_dsa_key
981 ?? I 0:00.01 /usr/local/sbin/smartd --pidfile=/var/run/smartd.pid --logfacility=local5
1058 ?? S 0:00.01 /usr/local/sbin/lighttpd -f /var/etc/lighttpd.conf -m /usr/local/lib/lighttpd
1123 ?? Ss 0:00.00 /usr/sbin/cron -s
1186 ?? DN 0:00.45 fsck_ufs -p -B /dev/ad4p1
1188 ?? DN 0:00.43 fsck_ufs -p -B /dev/ad5p1
1191 ?? DN 0:00.39 fsck_ufs -p -B /dev/ad6p1
1192 ?? DN 0:01.59 fsck_ufs -p -B /dev/da0p1
1195 ?? Ss 0:00.06 sshd: root@ttyp0 (sshd)
1169 v0 Is 0:00.03 login [pam] (login)
1171 v0 I 0:00.02 -tcsh (csh)
1177 v0 I+ 0:00.01 /bin/sh /etc/rc.initial
1170 v1 Is+ 0:00.01 /usr/libexec/getty Pc ttyv1
1022 con- I 0:00.01 /usr/local/bin/msntp -r -P no -l /var/run/msntp.pid -x 6 192.168.100.1
1140 con- I 0:00.00 sh /etc/rc autoboot
1141 con- I 0:00.00 logger -p daemon.notice -t fsck
1143 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad4p1
1146 con- I 0:00.00 sh /etc/rc autoboot
1147 con- I 0:00.00 logger -p daemon.notice -t fsck
1149 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad5p1
1152 con- I 0:00.00 sh /etc/rc autoboot
1153 con- I 0:00.00 logger -p daemon.notice -t fsck
1155 con- IN 0:00.01 fsck -B -p -t ufs /dev/da0p1
1158 con- I 0:00.00 sh /etc/rc autoboot
1159 con- I 0:00.00 logger -p daemon.notice -t fsck
1161 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad6p1
1197 p0 Ss 0:00.04 -tcsh (csh)
1198 p0 R+ 0:00.00 ps ax
nas0:~#

Discussion

  • Dan Merschi
    Dan Merschi
    2007-12-19

    Logged In: YES
    user_id=1512153
    Originator: NO

    That's a good point.
    Starting a background check on all filesystems simultaneous can create a memory problem(fsck require RAM).

    Volker
    Can you make "at boot background fsck" optional per filesystem(mount)?

    Regards,
    Dan

     
  • Wanninger
    Wanninger
    2007-12-19

    Logged In: YES
    user_id=1441560
    Originator: YES

    Hi,

    I think the RAM is not the whole truth. My current system has 1Gb RAM and the max mem increases up to 15%.

    Then I renamed /sbin/fsck so it could not be started automatically and
    entered the following command to see what happens:

    nas0:/sbin# /sbin/fsck.sav -B -p -v -t ufs /dev/ad6p1
    start /mnt/RESERVE wait fsck_ufs -p -F /dev/ad6p1
    start /mnt/RESERVE wait fsck_ufs -p -B /dev/ad6p1
    /dev/ad6p1: CANNOT CREATE SNAPSHOT /mnt/RESERVE/.snap/fsck_snapshot: Device busy

    /dev/ad6p1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

    Running fsck manually works fine. (device unmounted of course)

    rgds

    --Wanninger

     
  • Jerome Warnier
    Jerome Warnier
    2008-02-25

    Logged In: YES
    user_id=149431
    Originator: NO

    I noticed a huge slowdown on background fsck on FreeNAS 0.686 on last Friday.
    There are two volumes on my machine: one of 400G, and the other on an Areca 1120 with 1.4TB attached (5 x 400G RAID5).
    The machine was almost unusable for about 5 minutes, while booted and running.

     
  • Jerome Warnier
    Jerome Warnier
    2008-04-29

    Logged In: YES
    user_id=149431
    Originator: NO

    fsck on UFS filesystems involves creating a snapshot of it, which is really slow on UFS.
    It causes other problems also, like snapshots not being deleted after check when check failed for whatever reason.