#311 pgld stops working, but process keeps running

PeerGuardian_Linux
closed
jre-phoenix
None
5
2012-11-11
2011-08-01
jre-phoenix
No

In the last time about once a week pgld (current git code) stops working here. I /guess/ this happens since I enabled dbus, Nothing gets added to the logfile any more and (I think) all traffic that is sent to it, simply gets lost.
I get successful returns on sending it a "kill -0" (so it responds). Also sending it a HUP, USR1 and USR2 signal gets a successful "return 0", but nothing seems to actually happen. After stopping and starting it again, everything is ok.

--- System information. ---
Debian wheezy (testing/unstable)
Architecture: amd64
Kernel: Linux 2.6.32-5-amd64

--- Package information. ---
Depends (Version) | Installed
====================================-+-=============
libc6 (>= 2.7) | 2.13-10
libdbus-1-3 (>= 1.0.2) | 1.4.12-5
libnetfilter-queue1 (>= 0.0.15) | 0.0.17-1
libnfnetlink0 (>= 1.0.0) | 1.0.0-1
zlib1g (>= 1:1.1.4) | 1:1.2.3.4.dfsg-3
iptables | 1.4.11.1-3

Below you'll find a backtrace, I've also attached it and posted it at http://pastebin.com/LzMzVBb9. I first hooked into the process (line 1-49), then did a "bt" (line 51-69), and finally pressed enter (line 71-74).

Any ideas?

$ sudo gdb pgld 2639
GNU gdb (GDB) 7.2-debian
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/sbin/pgld...Reading symbols from /usr/lib/debug/usr/sbin/pgld...done.
done.
Attaching to program: /usr/sbin/pgld, process 2639
Reading symbols from /lib/x86_64-linux-gnu/libdbus-1.so.3...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdbus-1.so.3.5.7...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdbus-1.so.3
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libpthread-2.13.so...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /usr/lib/libnetfilter_queue.so.1...Reading symbols from /usr/lib/debug/usr/lib/libnetfilter_queue.so.1.1.0...done.
done.
Loaded symbols for /usr/lib/libnetfilter_queue.so.1
Reading symbols from /usr/lib/libnfnetlink.so.0...Reading symbols from /usr/lib/debug/usr/lib/libnfnetlink.so.0.2.0...done.
done.
Loaded symbols for /usr/lib/libnfnetlink.so.0
Reading symbols from /usr/lib/libz.so.1...Reading symbols from /usr/lib/debug/usr/lib/libz.so.1.2.3.4...done.
done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib/x86_64-linux-gnu/gconv/ISO8859-1.so...Reading symbols from /usr/lib/debug/usr/lib/x86_64-linux-gnu/gconv/ISO8859-1.so...done.
done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/gconv/ISO8859-1.so
Reading symbols from /usr/lib/pgl/dbus.so...Reading symbols from /usr/lib/debug/usr/lib/pgl/dbus.so...done.
done.
Loaded symbols for /usr/lib/pgl/dbus.so
__lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
97 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1 0x00007f06498dbf48 in _L_lock_9627 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f06498da331 in __libc_free (mem=0x7f0649be2e60) at malloc.c:3736
#3 0x00007f06498fb53d in tzset_internal (always=<value optimized="" out="">, explicit=<value optimized="" out="">) at tzset.c:435
#4 0x00007f06498fb679 in __tz_convert (timer=0x7fffff80cec0, use_localtime=1, tp=0x7f0649be63a0) at tzset.c:624
#5 0x0000000000402111 in do_log (priority=6, format=0x406f38 "STATS: Blocked hit statistics:") at src/pgld.c:73
#6 0x000000000040541a in blocklist_stats (clearhits=1) at src/blocklist.c:193
#7 0x000000000040279a in sighandler (sig=10, info=0x7fffff80d170, context=0x7fffff80d040) at src/pgld.c:234
#8 <signal handler="" called="">
#9 malloc_consolidate (av=0x7f0649be2e60) at malloc.c:5150
#10 0x00007f06498d7d64 in _int_malloc (av=0x7f0649be2e60, bytes=32640) at malloc.c:4373
#11 0x00007f06498da420 in __libc_malloc (bytes=32640) at malloc.c:3660
#12 0x00007f0649882b6d in __gconv_open (toset=<value optimized="" out="">, fromset=<value optimized="" out="">, handle=<value optimized="" out="">,
flags=<value optimized="" out="">) at gconv_open.c:284
#13 0x00007f0649882551 in iconv_open (tocode=<value optimized="" out="">, fromcode=0x407083 "ISO8859-1") at iconv_open.c:72
#14 0x000000000040577c in loadlist_ascii (filename=0x7fffff80fe94 "/var/lib/pgl/master_blocklist.p2p", charset=0x407083 "ISO8859-1") at src/parser.c:59
#15 0x000000000040612e in load_list (filename=0x7fffff80fe94 "/var/lib/pgl/master_blocklist.p2p", charset=0x0) at src/parser.c:290
#16 0x0000000000402615 in load_all_lists () at src/pgld.c:203
#17 0x0000000000402884 in sighandler (sig=1, info=0x7fffff80e470, context=0x7fffff80e340) at src/pgld.c:253
---Type <return> to continue, or q <return> to quit---
#18 <signal handler="" called="">
#19 0x00007f064a620470 in __libc_recv (fd=<value optimized="" out="">, buf=0x7fffff80e7d0, n=1500, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:30
#20 0x0000000000403744 in nfqueue_loop () at src/pgld.c:518
#21 0x0000000000404112 in main (argc=14, argv=0x7fffff80eee8) at src/pgld.c:690

Discussion

  • jre-phoenix
    jre-phoenix
    2011-08-01

    pgld backtrace

     
    Attachments
  • jre-phoenix
    jre-phoenix
    2011-08-06

    And here we are again, same stuff happening again :-/

    This seems to happen when I just turn on the computer, and just start
    using it after an hour or so.
    So
    1.) pgl starts up
    2.) the logrotate gets rotated (including a SIGHUP to the daemon), and
    3.) the blocklist update cron job fires up (at least today it started
    after logrotation).

    The problem arises somewhere between 2.) and 3.). Exactly this setup
    worked for years.

    The last entry in pgld's logfile shows the reopening after the logroation:
    #####################################################################
    Aug 6 14:02:29 INFO: Reopened logfile: /var/log/pgl/pgld.log
    #####################################################################

    But I miss some logging as it gets there normally after the
    blocklist-update+reload, like this
    #####################################################################
    Aug 6 15:57:09 INFO: Reopened logfile: /var/log/pgl/pgld.log
    Aug 6 15:57:10 INFO: ASCII: 275900 entries loaded from
    "/var/lib/pgl/master_blocklist.p2p"
    Aug 6 15:57:10 INFO: Blocklist has 275900 IP ranges (1723812739 IPs)
    Aug 6 15:57:10 INFO: Blocklist(s) reloaded.
    #####################################################################

    The last entry in pglcmd's logfile (Note the WARN/INFO lines here result
    from a separate pgld process which is used to merge the single
    blocklists to the master blocklist. This master blocklist is then
    "reloaded" by the actual daemon pgld process (return code 0 here, but
    nothing in pgld logfile, see above!):
    #####################################################################
    2011-08-06 14:03:25 CEST Begin: pglcmd update
    [...]
    Blocklists updated.
    Building blocklist ...
    WARN: Invalid ASCII line:
    INFO: ASCII: 753571 entries loaded from "STDIN"
    INFO: Merged 477671 of 753571 entries.
    INFO: Blocklist has 275900 IP ranges (1723812739 IPs)
    Blocklist built.
    Reloading pgld.
    2011-08-06 14:05:37 CEST End: pglcmd update
    #####################################################################

    Backtrace of a working process looks like this:
    #####################################################################
    0x00007fd87e5c9472 in __libc_recv (fd=<value optimized="" out="">,
    buf=0x7fff004d2710, n=1500, flags=<value optimized="" out="">)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:30
    30 ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
    in ../sysdeps/unix/sysv/linux/x86_64/recv.c
    #####################################################################

    Backtraces of the problematic process always has this (see last bt):
    #####################################################################
    __lll_lock_wait_private () at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
    97 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such
    file or directory.
    in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
    #####################################################################

    I have no idea if the "No such file or directory." stuff in these
    backtraces is relevant.

     
  • jre-phoenix
    jre-phoenix
    2011-08-10

    It seems this happens whenever I turn on the computer the first time a day (and probably only when I don't start working immediately). On a later reboot I could never observe this.

    TEST IF THIS PROBLEM OCCURS:
    "pglcmd test" results in this output
    ----------------------------------------------------------------------------------
    [...]
    Trying to ping 2.49.109.51 from /var/lib/pgl/master_blocklist.p2p ...
    pgld did not mark the IP to be blocked.
    2.49.109.51 did not answer the ping.

    No clear test result! Trying "tcptraceroute -n -m 2 2.49.109.51 12345" now:
    Selected device wlan0, address 192.168.178.21, port 54332 for outgoing packets
    Tracing the path to 2.49.109.51 on TCP port 12345, 2 hops max
    1 * * *
    2 * * *
    Destination not reached

    Probably 2.49.109.51 is down/doesn't answer to pings.
    This still means that pgld is not working!
    ----------------------------------------------------------------------------------

    WORKAROUND: "pglcmd restart"

    For future reference: this happens with these dependencies:
    linux-image-[...]-amd64 2.6.32-31 and 3.0.0-1
    libdbus-1-3 1.4.12-5 and 1.4.14-1
    iptables 1.4.11.1-3 and 1.4.12-1
    libnetfilter-queue1 0.0.17-1
    libnfnetlink0 1.0.0-1
    libc6 2.13-10
    zlib1g 1:1.2.3.4.dfsg-3
    lsb-base 3.2-27
    pgl: git code since at least July 2011, dbus enabled.