Pgl 2.2.4 and Raspberry Pi

Help
2014-06-04
2014-07-16
1 2 > >> (Page 1 of 2)
  • Hi,

    I've compiled pgl 2.2.4 on my raspberry pi (running raspbian) and it running mostly fine, except that on list refresh, pgld fails with an error:

    Unbinding from queue '23552', recv returned No buffer space available

    I've read that this is known to happen under load and have seen possible workarounds: pgl is compiled with LOWMEM option, but I haven't tried nicing it to a higer priority yet.

    Is there anything else that can be done to refresh pgld reliably?

    Pgld was configured like this:

    ./configure --prefix=/usr --mandir=/usr/share/man --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var --with-lsb=/lib/lsb/init-functions --enable-cron --disable-dbus --enable-logrotate --enable-networkmanager --enable-zlib --without-qt4 --enable-lowmem

    Thank you,
    Marko

     
  • jre-phoenix
    jre-phoenix
    2014-06-04

    First some general, perhaps unrelated, questions:

    How much RAM does your Rasperry have?
    Did you change the NFQUEUE number to 23552? The default is 92.
    Do the lines in your /var/lib/pgl/master_blocklist.p2p also contain descriptions or only IP ranges?

    I didn't write that part of the code, but as far as I understand it:
    You get a error message that comes from handling the packets (normal operation of pgld). So it is not a problem of reloading the new blocklist - but of decreased memory during this loading being available for the normal operation.

    Setting a lower nice level (in extreme NICE_LEVEL="-20") might help, check it out.

    A workaround would be to stop pgl, then update and then start again. Of course this is not the best solution and leaves you some time unprotected.

    But probably the best tip (original info was found here by user dogg:
    Increase the default receive/send window:
    sysctl -w net.core.rmem_default=8388608
    sysctl -w net.core.wmem_default=8388608
    If this helps do this permanently by either setting above commands somewhere in the system start or e.g. to /etc/pgl/insert.sh.

    Hope this helps, please report back.

     
  • Hi,

    Pi is a Model B, so 512M RAM.

    I didn't change the NFQUEUE number (wouldn't know how :).

    Master blocklist contains only IPs (I believe that's the --enable-lowmem config option)

    I've set the nice level to -10, we'll see how that goes.

    I'll also try setting receive/send windows and report back...

    Thank you,
    Marko

     
  • Oh, one more thing... Here's a sample of pgld's log when it fails:

    Jun 4 06:25:53 INFO: Reopened logfile: /var/log/pgl/pgld.log
    Jun 4 06:26:11 INFO: ASCII: 227555 entries loaded from "/var/lib/pgl/master_blocklist.p2p"
    Jun 4 06:26:12 INFO: Blocking 227555 IP ranges (2896894064 IPs).
    Jun 4 06:26:12 INFO: Blocklist(s) reloaded.
    Jun 4 06:26:12 ERROR: Unbinding from queue '23552', recv returned No buffer space available
    Jun 4 06:30:26 INFO: Started.
    Jun 4 06:30:37 INFO: ASCII: 218625 entries loaded from "/var/lib/pgl/master_blocklist.p2p"
    Jun 4 06:30:37 INFO: Blocking 218625 IP ranges (2963431412 IPs).
    Jun 4 06:30:37 INFO: NFQUEUE: binding to queue 92
    Jun 4 06:30:37 INFO: ACCEPT mark: 20
    Jun 4 06:30:37 INFO: REJECT mark: 10
    ...

    I checked yesterday's log, it's essentially the same, the error message mentions queue 23552 and when pgld gets restarted, it binds to queue 92.

    I just ran pglcmd update with pgld's nice level set to -10 and it worked. I'll let it run now and check it tomorrow morning.

     
  • Ok, pgld updated two days in a row without problems with nice level set to -10. Today it failed. I have increased send and receive windows, we'll see how that goes.

    Marko

     
  • Cader
    Cader
    2014-06-09

    I updated pgld to make the error less confusing (I hope)
    For some reason (probably due to big/little endian issues or the way the nfq bind worked) I was changing the input value of the queue number vi host to network byte order. Then whenever I used queue_num I did network to host. But not really sure why I was.
    I cleaned up the log messages so the numbers should now be right. So the 23552 is really 92 just in the wrong byte order.

    As far as the no buffer space - I am unsure.

     
  • jre-phoenix
    jre-phoenix
    2014-06-10

    I think this is what happens:

    While the master blocklist gets reloaded pgld obviously can't handle new traffic. Therefore the traffic sent to pgld/NFQUEUE accumulates in the buffer during the reload.
    Now the raspberry pi is probably slow and needs some time to reload the master blocklist. So the buffer fills up before the new master blocklist is fully loaded.

    Therefore we currently have probably two working solutions/workarounds:
    - Increase the priority of pgld, so that the blocklist gets loaded quicker (and pgld can kick in again before the buffer is full)
    - Increase the buffer size (so that we have more time to load the master blocklist before the buffer is full)

    Further things that should be done:
    - Improve efficiency of the blocklist loading code (probably hard to do, don't know)
    - If the buffer is full simply flush it/reject new incoming traffic instead of having pgld unbind from nfqueue. (Currently the traffic waiting in the buffer for pgld is lost anyway if this happens.)

    @Cader:
    What do you think? Can you implement the latter thing (flush/reject)?

    @Marko Bozikovic:
    What's your experience with the increased windows? (Do you apply this permanently (by putting the command in a file like i proposed)?)

     
  • Cader
    Cader
    2014-06-10

    That sounds like it could very well be an issue.
    I was poking around the netfilter site again the other day and saw a queue length option.
    I didn't see what the default is so I have to find that and then maybe increasing it would help. One thing that I have to verify as well was I think this required kernel 2.6.25+ or so to be able to modify the kernel queue from user space. So that may be an issue.

    int nfq_set_queue_maxlen ( struct nfq_q_handle * qh,
    u_int32_t queuelen
    )
    nfq_set_queue_maxlen - Set kernel queue maximum length parameter

    Parameters:
    qh Netfilter queue handle obtained by call to nfq_create_queue().
    queuelen the length of the queue
    Sets the size of the queue in kernel. This fixes the maximum number of packets the kernel will store before internally before dropping upcoming packets.

    Returns:
    -1 on error; >=0 otherwise.
    Definition at line 610 of file libnetfilter_queue.c.

     
  • Cader
    Cader
    2014-06-10

    I am thinking that of adding a -Q option that if set will set the queue length.
    That way you can tune it to the needs of your machine rather than hard code something.
    Still looking for default and if there is a /proc or /sys file that shows current size.

    Thoughts?

     
  • Cader
    Cader
    2014-06-10

    I added a -Q option to pgld to allow for tuning the kernel packet queue in the latest git.
    @JRE do you want to add that as an option to pglcmd?

    I have no clue (yet) what the default is so if it is set use the set value with -Q otherwise don't pass -Q to pgld and it will use what ever the default is.

    Again this requires a 2.6.20+ kernel but guessing that should be an issue.

     
  • jre-phoenix
    jre-phoenix
    2014-06-11

    Sorry, I answered per EMail but that seems not to work, so reposting 2 older posts and a current one:

    Will that be an issue to require 2.6.20? That is fairly old...

    No problem at all, 2.6.20 is from 2007. We already have a requirement of
    Linux kernel >= 2.6.13 for NFQUEUE support, so we'll just increase that
    (edit the INSTALL file in the requirements section).

    Besides that currently I don't fully understand you. remember "I'm not a
    programmer"
    Just go, we'll test afterwards.

     
  • jre-phoenix
    jre-phoenix
    2014-06-11

    On 06/10/2014 04:38 PM, Cader wrote:

    I am thinking that of adding a -Q option that if set will set the queue length.
    That way you can tune it to the needs of your machine rather than hard code something.
    Still looking for default and if there is a /proc or /sys file that shows current size.

    Thoughts?

    -Q option sounds good (perhaps add a hint to it in the relevant error
    message).

    A quick grep for some keywords in /proc and /sys didnt show me any
    results (but for other kernel modules I always found something in
    /proc/net).

     
  • jre-phoenix
    jre-phoenix
    2014-06-11

    Hi

    On 06/10/2014 09:10 PM, Cader wrote:

    I added a -Q option to pgld to allow for tuning the kernel packet queue in the latest git.
    @JRE do you want to add that as an option to pglcmd?

    Great, for sure I will do this (today or tomorrow).

    I have no clue (yet) what the default is so if it is set use the set value with -Q otherwise don't pass -Q to pgld and it will use what ever the default is.

    Yes, that's how I'll implement that. (But anyway I'm still searching the
    web to better understand this stuff)

    Again this requires a 2.6.20+ kernel but guessing that should be an issue.

    NOT an issue

    Now the big question is, whether this is really the right solution for
    the original problem. And if yes, what values should be used - I assume
    the valid range (because of uint32) is 0 - 4294967295, right?

    In a previous message I recommended these commands:

    sysctl -w net.core.rmem_default=8388608
    sysctl -w net.core.wmem_default=8388608
    

    Do they the same things? Can we recommend "pgld -Q 8388608"?

    @Marko Bozikovic: Did you try the sysctl commands?

    Further thoughts to improve this:

    I found this:

    "Too slow verdict [from the user space application] will result in a
    full queue. Kernel will then drop incoming packets instead of en-queuing
    them."
    (http://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/)

    So if the queue buffer is full, new packets will simply be dropped, right!?

    So in pgld.c in the nfqueue_loop function in that part (line 550):

        int err=errno;
        do_log(LOG_ERR, "ERROR: Unbinding from queue '%hu', recv returned
    %s", queue_num, strerror(err));
        if ( err == ENOBUFS ) {
            /* close and return, nfq_destroy_queue() won't work as we've no
    buffers */
            nfq_close(nfqueue_h);
            exit(1);
    
        } else {
            nfqueue_unbind();
            exit(0);
        }
    

    ... I don't understand why to close the queue and exit. Instead pgld
    might simply continue working and the kernel would just drop packets if
    the buffer is full because pgld is too slow.

    So I propose to remove the close & exit part.

    Independently I propose to add a log message here like "The queue
    buffer is full, consider increasing its size with pgld -Q ..."

     
  • Cader
    Cader
    2014-06-11

    sysctl -w net.core.rmem_default=’8388608′
    sysctl -w net.core.wmem_default=’8388608′

    These are for the interface buffers in bytes. If the issue is the netfilter buffer these sysctl commands won't help

    I made the change to not exit on ENOBUFS and log it with the "use -Q option" so we can see if we are on the right track here.
    I also added a log message for anything other than ENOBUFS

    while ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
        nfq_handle_packet(nfqueue_h, buf, rv);
    }
    int err=errno;
    if ( err == ENOBUFS ) {
        do_log(LOG_ERR, "ERROR: ENOBUFS error on queue '%hu'. Use -Q to increase buffers, recv returned %s", queue_num, strerror(err));
    } else {
        do_log(LOG_ERR, "ERROR: Error on queue '%hu', recv returned %s", queue_num, strerror(err));
        nfqueue_unbind();
        exit(0);
    }
    
     
    Last edit: Cader 2014-06-11
  • jre-phoenix
    jre-phoenix
    2014-06-12

    1.)

    On 06/11/2014 04:14 PM, Cader wrote:

    sysctl -w net.core.rmem_default=’8388608′
    sysctl -w net.core.wmem_default=’8388608′

    These are for the interface buffers in bytes. If the issue is the netfilter buffer these sysctl commands won't help

    I /try/ to understand that stuff. But for me the following two sound quite similar (of course this doesn't mean they are for the same thing):

    sysctl man page:
    sysctl is used to modify kernel parameters at runtime. The parameters available are those listed under /proc/sys/.

    http://www.netfilter.org/projects/libnetfilter_queue/doxygen/group__Queue.html:
    nfq_set_queue_maxlen - Set kernel queue maximum length parameter
    Sets the size of the queue in kernel. This fixes the maximum number of packets the kernel will store before internally before dropping upcoming packets.

    2.)

    I implemented the stuff in pglcmd. Set it with NFQUEUE_MAXLEN="value" in pglcmd.conf.
    For now I chose as valid values 0 - 2147483647 (2^31 - 1). For higher values I got stuff like this in the pgld.log:

    Jun 12 02:29:17 INFO: Kernel queue maximum length: 4294967295
    Jun 12 02:29:17 INFO: ACCEPT mark: 20
    Jun 12 02:29:17 INFO: REJECT mark: 10
    Jun 12 02:29:17 INFO: Set netfilter queue length to -1 packets
    

    Is this correct or doesn't pgld show the correct number with %d?

    3.)

    I started pgld with -Q 0. But this seems to be a noop, at least nothing in pgld.log. But "pglcmd status" shows the option was passed correctly:

    PID: 12343    CMD: /usr/sbin/pgld -l /var/log/pgl/pgld.log -d -p /var/run/pgld.pid -q 92 -Q 0 -r 10 -a 20 /var/lib/pgl/master_blocklist.p2p
    

    4.)

    I started pgld with -Q 1. But even if I reload a bunch of websites at the same time I just get hundreds of blocks in a few seconds shown in pgld.log. But no errors.

    Shouldn't such a small value trigger your new error messages?

     
    • Cader
      Cader
      2014-06-12

      1) yes sysctl controls many kernel params but the buffer for rmem and wmem are for the interface. This kernel queue should be the netfilter queue. so a different queue.

      2) the queue_length is an unsigned int so you can go to 4,294,967,295 packets.
      %d is for signed int. Use %u for unsigned int with printf

      3) if (queue_length) is what is doing it - if queue_length is 0 that means false therefore wouldn't print or even go into the nfq_set_queue_maxlen block since I have that as "if ( queue_length > 0) {".

      4) Yeah I tried to set to 1 as well and was hoping to get the error too but didn't. I don't have a Pi or anything really slow to test with so I am just stabbing at the error. I wonder if there is some sanity in the function to not go below a certain packet threshold as a buffer of 1 wouldn't be very sane.

      Just did a quick search and see the error in other stuff and the explanation is what we think is the issue - no kernel buffer left. So I think we are on the right track at least. Reproducing it is the hard part now.

       
  • Cader
    Cader
    2014-06-12

    ahh cripes that log line was my mistake - not sure why I did %d and not %u.
    fixed it so it should log right
    I removed the debug set lines too since it should log correctly now.

     
  • Hi guys,

    Sorry for a late answer... I've been running pgld on RPi with increased buffers, and had no errors in the past 5 days, even with high network traffic.

    @jre-phoenix I haven't applied the change permanently yet. I'll try to test this over the weekend. Also, I can try playing with the git version, but I can't promise exactly when.

    FWIW, I like the idea of including a command line option, or even a config option. Just like the NICE option, it will probably only be used on low-end hardware, like RPi.

    Cheers,
    Marko

     
  • jre-phoenix
    jre-phoenix
    2014-06-12

    I adapted pglcmd to the correct range values. So from my point of view we just need some testing now in order to give correct advices for people with

    @Marko: When you find time to test the current version, then please reset the nice setting and go with the default sysctl buffers. Just use the new option for now. I suggest to start with the values that already seem to work with the sysctl commands.
    So just set in pglcmd.conf NFQUEUE_MAXLEN="8388608". Then restart your machine so that everything else goes back to its defaults.
    If you experience errors then you may increase the value up to 4,294,967,295 (although I think that this is too high on your machine with 512 MB RAM).

    Generally I assume all these values are in bytes and are limited first by the unsigned int (4,294,967,295) and second by the available RAM (in this case something below 512,000,000).

    I'd still like to know the current default for maxlen and where to find it in /proc

     
  • Cader
    Cader
    2014-06-12

    I would say that the 8388608 is way to high.
    This number is in packets not bytes.
    "Sets the size of the queue in kernel. This fixes the maximum number of packets the kernel will store before internally before dropping upcoming packets"
    Since each packet could be up to 1500 bytes you would be looking at 12GB of mem.

    I would start with 2048 or 4096.
    The default might be 1000

    I am looking more to find a /proc or /sys file with the value

     
  • @jre-phoenix are the changes on master, or another branch?

     
  • Cader
    Cader
    2014-06-13

    The changes should all be in master.

     
  • Ok, I ran autogen.sh and configure successfully, I get the following error on make:
    /home/pi/smece/peerguardian-code/pgl/pgld/src/pgld.c:418: undefined reference to nfq_set_verdict2' /home/pi/smece/peerguardian-code/pgl/pgld/src/pgld.c:396: undefined reference tonfq_set_verdict2'
    /home/pi/smece/peerguardian-code/pgl/pgld/src/pgld.c:454: undefined reference to nfq_set_verdict2' /home/pi/smece/peerguardian-code/pgl/pgld/src/pgld.c:443: undefined reference tonfq_set_verdict2'
    /home/pi/smece/peerguardian-code/pgl/pgld/src/pgld.c:465: undefined reference to `nfq_set_verdict2'

    I have libnetfilter-queue-dev package installed, I think nfq_set_verdict2 function should be declared there...

     
  • jre-phoenix
    jre-phoenix
    2014-06-14

    1.) @Marko Bozikovic
    I just reverted the change that introduced nfq_set_verdict2 and pushed it to a new branch "pgl_nfq_set_verdict". I hope I did it right since I had changed something else in the same code area in the meantime. At least "pglcmd test" worked here, though.

    2.) @Marko Bozikovic
    nfq_set_verdict_mark() was deprecated in favour of nfq_set_verdict2() on 2010-05-09. So I guess at least since libnetfilter-queue 1.0.0 this function was available. Marko, which version of this library do you have installed?

    I never had a problem compiling with the new version here.

    3.)
    I found a page about the ENOBUFS:
    http://www.netfilter.org/projects/libnetfilter_queue/doxygen/index.html

    There are some tips to avoid them (e.g. set NICE to -20), but also:

    • increase the default socket buffer size by means of nfnl_rcvbufsiz()
      --> I guess that's something similar to what we try with maxlen, but it's not the same

    • set NETLINK_NO_ENOBUFS socket option to avoid receiving ENOBUFS errors (requires Linux kernel >= 2.6.30).
      --> Sounds interesting, but I got no real clue what that means exactly and how to implement

     
1 2 > >> (Page 1 of 2)