Menu

wrong number of concurrent connections in combination with QS_ClientPrefer

2018-02-05
2018-02-12
  • Armin Abfalterer

    Hi

    We have a problem with a httpd proxy-server (httpd 2.4.25 with mod_qos 11.39) causing under heavy load to block all incoming connections as soon as QS_ClientPrefer limit is reached. It seems the counter of "concurrent connections" is not decremented correctly.

    When the first mod_qos(066) event was triggered the number of concurrent connections was 959. From this point on, the counter was getting only higher (up to 13431) though MaxRequestWorkers is 1024.

    [Tue Jan 23 13:42:02 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=4 0x00): max=819, concurrent connections=959, c=xxx #77683(2642242416)
    [Tue Jan 23 13:42:06 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=4 0x00): max=819, concurrent connections=959, c=xxx #78814(2768120688)
    [Tue Jan 23 13:42:11 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=4 0x00): max=819, concurrent connections=983, c=xxx #79360(2820569968)
    
    [Wed Jan 24 00:55:40 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=6 0x04): max=819, concurrent connections=13420, c=xxx #78814(2631752560)
    [Wed Jan 24 00:58:40 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=6 0x04): max=819, concurrent connections=13431, c=xxx #78814(2579303280)
    [Wed Jan 24 00:59:40 2018] [error] mod_qos(066): access denied, QS_ClientPrefer rule (penalty=6 0x04): max=819, concurrent connections=13431, c=xxx #78814(2862529392)
    

    Regards, Armin

     
  • Pascal Buchbinder

    Yes, Apache 2.4 (especially when using the MPM event module) sometimes waits a few seconds until calling the connection cleanup method (freeing the allocated memory) which decrements the counter. This may be the reason why you see a higher number of connections in the case that many connections are opened / closed in a very short period of time.

     

    Last edit: Pascal Buchbinder 2018-02-05
    • Armin Abfalterer

      Hi Pascal, thanks for your reply!

      Ok, but on this system the "concurrent connections" counter didn't go down any more, so that all further connections were blocked. As you can see in the log excerpt the time span was over 10 hours.

      The only place in the code that decrements the counter is the one in qos_cleanup_conn ()

          if((m_generation != u->qos_cc->generation_locked) && u->qos_cc->connections > 0) {
            u->qos_cc->connections--;
          }
      

      I could imagine something went wrong with the generation counter m_generation. What do you think?

      BTW, the proxy-server uses worker MPM.

       
      • Pascal Buchbinder

        The generation id is set when starting or restarting (usr1) the server and each forked worker process use the id to verify that the shared memory belongs to the same version (a new shared memory structure is created at usr1 before forking new child processed). The cause might be that this check is only present when decrementing the counter but not when incrementing (but I would be very surprised, if a dying process still gets new connections - but maybe this is some when changed in any Apache 2.4 version). The other possibility are process crashes, but I assume you already checked the log to ensure that this is not the case, do you.
        I'm going to write some test cases (particular for Apache 2.4) and will introduce the generation check at counter increment - just to be sure....

         
        • Pascal Buchbinder

          this could be patch worth trying if it fixed the problem:

          Index: httpd_src/modules/qos/mod_qos.c
          ===================================================================
          --- httpd_src/modules/qos/mod_qos.c (revision 2363)
          +++ httpd_src/modules/qos/mod_qos.c (working copy)
          @@ -2714,7 +2714,8 @@
               apr_global_mutex_lock(u->qos_cc->lock);          /* @CRT37 */
               u->qos_cc->connections = 0;
               if(m_generation > 0) {
          -      u->qos_cc->generation_locked = m_generation; // this process generation must not dec. anymore
          +      // this process generation must not update the connections counter anymore
          +      u->qos_cc->generation_locked = m_generation;
               }
               entry = u->qos_cc->ipd;
               for(i = 0; i < u->qos_cc->max; i++) {
          @@ -6351,7 +6352,9 @@
          
               /* max connections */
               if(cconf->sconf->has_qos_cc && cconf->sconf->qos_cc_prefer) {
          -      u->qos_cc->connections++;
          +      if(m_generation != u->qos_cc->generation_locked) {
          +        u->qos_cc->connections++;
          +      }
                 if((*e)->lowrate) {
                   if(c->notes) {
                     char *flags = apr_psprintf(c->pool, "0x%02x", (*e)->lowratestatus);
          
           
          • Armin Abfalterer

            ok, with that patch the counter might not be increased anymore. but the problem that it isn't counted down might still exist, or how do you see it?

             
            • Pascal Buchbinder

              If don't increment, we don't have to decrement.

               
      • Pascal Buchbinder

        Questions:
        1. you don't see child processes terminating unexpectedly (child pid ... exit signal)?
        2. are you doing graceful restarts when this happens (kill -usr1)?
        3. what's your MaxConnectionsPerChild setting?

         

        Last edit: Pascal Buchbinder 2018-02-06
        • Armin Abfalterer

          1. no, there were no unexpected child exits
          2. yes, the system was up about 30 days and there were some graceful restarts in between (due to configuration changes)
          3. MaxConnectionsPerChild is set to 1000000

          btw, I did some testing with graceful restarts and forced child exits but couldn't reproduce the misbehavior. I think massive load plays an important role here...

           

          Last edit: Armin Abfalterer 2018-02-08
          • Pascal Buchbinder

            I suggest you try to apply the patch shown above (I've even tested and committed the change but not yet build a new release - I could not reproduce the issue neither, but I assume the change does not make anything worse)

            https://sourceforge.net/p/mod-qos/source/2367/tree//trunk/httpd_src/modules/qos/mod_qos.c?diff=2359

            I also propose to run the server in the QS_LogOnly mode until you are sure the problem is solved by this change.

             

            Last edit: Pascal Buchbinder 2018-02-08
  • Armin Abfalterer

    Pascal, yes - the proxy uses mod_http2.

     
  • Pascal Buchbinder

    I've now release mod_qos 11.49 with the intention to improve the h2 compatibility.

     

    Last edit: Pascal Buchbinder 2018-02-10
    • Armin Abfalterer

      I did some tests with HTTP/2 and could see that the "connection counter" didn't go back to zero when there were no more connections on the proxy. However, I still couldn't reproduce that the counter is constantly growing, even with HTTP/2.

      Anyway, thanks for the improvements... I'll give 11.49 a try.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.