Menu

QS_EventRequestLimit by envvar value rather than pattern / limiting distributed download accelerators

2022-07-05
2022-07-07
  • Matt Blissett

    Matt Blissett - 2022-07-05

    Hi,

    I have an Apache server containing about 300TB of files, ranging in size
    from 500kB or so to 800GB. It's public data, so there are no restrictions
    on downloading.

    Some users use download accelerators and so on to download these files.
    I've used a simple "QS_ClientEventRequestLimit 6" to limit this.

    Recently, I've seen users in China use some sort of distributed downloading
    thing -- yesterday a 600GB file received over 20,000 requests with Range
    headers, but the requests came from 240 IPs.

    All these requests have an HTTP Range header, so limiting the number of
    concurrent with-Range requests to each location would be good. Something
    like this:

    SetEnvIf Range .+ QS_Range=1 # Set only for Range requests
    SetEnvIf Request_URI "(.*)$" QS_Location=$1 # All requests
    QS_SetEnv QS_RangeRequest "${QS_Range} ${QS_Location}" # Only for Range requests
    QS_EventRequestLimit QS_RangeRequests 6
    

    But this would set a global limit of 6 Range requests at a time, even if
    they are on different files (i.e. probably different people).

    Instead of QS_EventRequestLimit counting requests having the same
    environment variable, or a pattern on that variable, I'd need it to use the
    value of that variable.

    Please consider this a feature request, or else a request for another way
    to accomplish this.

    Thanks,

    Matt

     

    Last edit: Matt Blissett 2022-07-05
    • Pascal Buchbinder

      Thank you for sharing your observation and ideas.

      20'000 requests by 240 clients means an average of about 83 request each to download the the 600GB. Each request loading 7GB. This does not look wrong to me. I could also imagine even more but smaller requests.

      Why do you intend to handle those clients differently than the others?

       
      • Matt Blissett

        Matt Blissett - 2022-07-06

        I don't know what tool was used, but the requests came in parallel -- some multiple (4?) of 240 simultaneous requests. The filesystem with the huge files has poor performance with this.

         
  • Pascal Buchbinder

    Having a rule for every possible RequestURI match would be difficult to implement (guess we would require a pool of rules which would dynamically be reassigned to new requests).

    If you want to count the request per RequestURI for each client, you could generate a dummy IP address consisting of the clients IP address and the RequestURI using SetHashHeaderPlus and QS_ClientIpFromHeader.

     
    • Pascal Buchbinder

      Sample configuration:

         SetEnvIfPlus               Request_URI (.*) PerLocation=$1
         SetEnvIfPlus               Remote_Addr (.*) PerLocationClient=${PerLocation}$1
         SetHashHeaderPlus          PerLocationClient PerLocationClient
         QS_ClientIpFromHeader      PerLocationClient
      
       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.