Menu

#288 system hangs after several fsck_ufs tasks

v0.686bx
open
Volker
9
2014-11-23
2007-12-19
Wanninger
No

Hi,

after a fresh install of 686b3, the system recognizes
the need of fsck on all disks installed in the system.

After system boot the fsck_ufs commands for four disk
are going into background, and a few minutes later,
the system hangs. Not a hard hang, but it is not
accessable any longer from ssh nor webiface.

The last time I waited 10 hours, but the system state
did not change, so I made a powercycle.

The system now acts like a endless loop.

see ps ax below:

nas0:~# ps ax
PID TT STAT TIME COMMAND
0 ?? WLs 0:00.01 [swapper]
1 ?? SLs 0:00.01 /sbin/init --
2 ?? DL 0:00.02 [g_event]
3 ?? DL 0:00.30 [g_up]
4 ?? DL 0:00.31 [g_down]
5 ?? DL 0:00.00 [crypto]
6 ?? DL 0:00.00 [crypto returns]
7 ?? DL 0:00.00 [kqueue taskq]
8 ?? DL 0:00.00 [thread taskq]
9 ?? DL 0:00.00 [acpi_task_0]
10 ?? RL 1:43.35 [idle: cpu1]
11 ?? RL 1:38.77 [idle: cpu0]
12 ?? WL 0:00.24 [swi4: clock sio]
13 ?? WL 0:00.00 [swi3: vm]
14 ?? WL 0:00.01 [swi1: net]
15 ?? DL 0:00.07 [yarrow]
16 ?? WL 0:00.00 [swi5: +]
17 ?? WL 0:00.00 [swi6: Giant taskq]
18 ?? WL 0:00.00 [swi6: task queue]
19 ?? DL 0:00.00 [acpi_task_1]
20 ?? DL 0:00.00 [acpi_task_2]
21 ?? WL 0:00.16 [swi2: cambio]
22 ?? WL 0:00.00 [irq9: acpi0]
23 ?? WL 0:00.00 [irq20: fxp0]
24 ?? WL 0:00.08 [irq17: atapci0]
25 ?? WL 0:00.01 [irq18: atapci1]
26 ?? WL 0:00.01 [irq19: atapci2]
27 ?? WL 0:04.69 [irq14: ata0]
28 ?? WL 0:00.00 [irq15: ata1]
29 ?? WL 0:00.01 [irq21: em0]
30 ?? WL 0:00.00 [irq31: em1]
31 ?? WL 0:00.00 [irq22: em2]
32 ?? RL 0:00.83 [irq30: em3 ehci0]
33 ?? WL 0:00.00 [irq23: uhci0]
34 ?? DL 0:00.00 [usb0]
35 ?? DL 0:00.00 [usbtask]
36 ?? WL 0:00.00 [irq29: uhci1]
37 ?? DL 0:00.00 [usb1]
38 ?? DL 0:00.00 [usb2]
39 ?? WL 0:00.00 [irq24: sym0]
40 ?? WL 0:00.00 [irq25: sym1]
41 ?? WL 0:00.00 [irq1: atkbd0]
42 ?? DL 0:00.00 [fdc0]
43 ?? WL 0:00.00 [swi0: sio]
44 ?? DL 0:00.00 [pagedaemon]
45 ?? DL 0:04.54 [pagezero]
46 ?? DL 0:00.00 [idlepoll]
47 ?? DL 0:00.28 [bufdaemon]
48 ?? DL 0:00.01 [vnlru]
49 ?? DL 0:00.01 [syncer]
50 ?? DL 0:00.01 [softdepflush]
51 ?? DL 0:00.01 [schedcpu]
812 ?? Ss 0:00.02 /usr/sbin/syslogd -ss -f /var/etc/syslogd.conf
819 ?? Ss 0:00.02 /usr/sbin/rpcbind
890 ?? Is 0:00.00 /usr/sbin/mountd -r -r /var/etc/exports
898 ?? Is 0:00.07 nfsd: master (nfsd)
899 ?? I 0:00.00 nfsd: server (nfsd)
900 ?? I 0:00.00 nfsd: server (nfsd)
902 ?? I 0:00.00 nfsd: server (nfsd)
903 ?? I 0:00.00 nfsd: server (nfsd)
908 ?? Ss 0:00.00 /usr/sbin/rpc.statd
913 ?? Ss 0:00.01 rpc.lockd: server (rpc.lockd)
924 ?? I 0:00.00 rpc.lockd: client (rpc.lockd)
945 ?? Ss 0:00.00 /usr/sbin/sshd -f /var/etc/ssh/sshd_config -h /var/etc/ssh/ssh_host_dsa_key
981 ?? I 0:00.01 /usr/local/sbin/smartd --pidfile=/var/run/smartd.pid --logfacility=local5
1058 ?? S 0:00.01 /usr/local/sbin/lighttpd -f /var/etc/lighttpd.conf -m /usr/local/lib/lighttpd
1123 ?? Ss 0:00.00 /usr/sbin/cron -s
1186 ?? DN 0:00.45 fsck_ufs -p -B /dev/ad4p1
1188 ?? DN 0:00.43 fsck_ufs -p -B /dev/ad5p1
1191 ?? DN 0:00.39 fsck_ufs -p -B /dev/ad6p1
1192 ?? DN 0:01.59 fsck_ufs -p -B /dev/da0p1
1195 ?? Ss 0:00.06 sshd: root@ttyp0 (sshd)
1169 v0 Is 0:00.03 login [pam] (login)
1171 v0 I 0:00.02 -tcsh (csh)
1177 v0 I+ 0:00.01 /bin/sh /etc/rc.initial
1170 v1 Is+ 0:00.01 /usr/libexec/getty Pc ttyv1
1022 con- I 0:00.01 /usr/local/bin/msntp -r -P no -l /var/run/msntp.pid -x 6 192.168.100.1
1140 con- I 0:00.00 sh /etc/rc autoboot
1141 con- I 0:00.00 logger -p daemon.notice -t fsck
1143 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad4p1
1146 con- I 0:00.00 sh /etc/rc autoboot
1147 con- I 0:00.00 logger -p daemon.notice -t fsck
1149 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad5p1
1152 con- I 0:00.00 sh /etc/rc autoboot
1153 con- I 0:00.00 logger -p daemon.notice -t fsck
1155 con- IN 0:00.01 fsck -B -p -t ufs /dev/da0p1
1158 con- I 0:00.00 sh /etc/rc autoboot
1159 con- I 0:00.00 logger -p daemon.notice -t fsck
1161 con- IN 0:00.01 fsck -B -p -t ufs /dev/ad6p1
1197 p0 Ss 0:00.04 -tcsh (csh)
1198 p0 R+ 0:00.00 ps ax
nas0:~#

Discussion

  • Dan Merschi

    Dan Merschi - 2007-12-19

    Logged In: YES
    user_id=1512153
    Originator: NO

    That's a good point.
    Starting a background check on all filesystems simultaneous can create a memory problem(fsck require RAM).

    Volker
    Can you make "at boot background fsck" optional per filesystem(mount)?

    Regards,
    Dan

     
  • Wanninger

    Wanninger - 2007-12-19

    Logged In: YES
    user_id=1441560
    Originator: YES

    Hi,

    I think the RAM is not the whole truth. My current system has 1Gb RAM and the max mem increases up to 15%.

    Then I renamed /sbin/fsck so it could not be started automatically and
    entered the following command to see what happens:

    nas0:/sbin# /sbin/fsck.sav -B -p -v -t ufs /dev/ad6p1
    start /mnt/RESERVE wait fsck_ufs -p -F /dev/ad6p1
    start /mnt/RESERVE wait fsck_ufs -p -B /dev/ad6p1
    /dev/ad6p1: CANNOT CREATE SNAPSHOT /mnt/RESERVE/.snap/fsck_snapshot: Device busy

    /dev/ad6p1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

    Running fsck manually works fine. (device unmounted of course)

    rgds

    --Wanninger

     
  • Jerome Warnier

    Jerome Warnier - 2008-02-25

    Logged In: YES
    user_id=149431
    Originator: NO

    I noticed a huge slowdown on background fsck on FreeNAS 0.686 on last Friday.
    There are two volumes on my machine: one of 400G, and the other on an Areca 1120 with 1.4TB attached (5 x 400G RAID5).
    The machine was almost unusable for about 5 minutes, while booted and running.

     
  • Jerome Warnier

    Jerome Warnier - 2008-04-29

    Logged In: YES
    user_id=149431
    Originator: NO

    fsck on UFS filesystems involves creating a snapshot of it, which is really slow on UFS.
    It causes other problems also, like snapshots not being deleted after check when check failed for whatever reason.

     

Log in to post a comment.