naviserver-devel Mailing List for NaviServer (Page 62)

NaviServer, a high performance web server written in C and Tcl

Brought to you by: eide, gneumann, itannir, sdeasey, and 2 others

naviserver-devel — Developer discussion

You can subscribe to this list here.

2005	Jan	Feb (53)	Mar (62)	Apr (88)	May (55)	Jun (204)	Jul (52)	Aug	Sep (1)	Oct (94)	Nov (15)	Dec (68)
2006	Jan (130)	Feb (105)	Mar (34)	Apr (61)	May (41)	Jun (92)	Jul (176)	Aug (102)	Sep (247)	Oct (69)	Nov (32)	Dec (140)
2007	Jan (58)	Feb (51)	Mar (11)	Apr (20)	May (34)	Jun (37)	Jul (18)	Aug (60)	Sep (41)	Oct (105)	Nov (19)	Dec (14)
2008	Jan (3)	Feb	Mar (7)	Apr (5)	May (123)	Jun (5)	Jul (1)	Aug (29)	Sep (15)	Oct (21)	Nov (51)	Dec (3)
2009	Jan	Feb (36)	Mar (29)	Apr	May	Jun (7)	Jul (4)	Aug	Sep (4)	Oct	Nov (13)	Dec
2010	Jan	Feb	Mar (9)	Apr (11)	May (16)	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan (7)	Feb (3)	Mar	Apr	May	Jun (3)	Jul	Aug	Sep	Oct (92)	Nov (28)	Dec (16)
2013	Jan (9)	Feb (2)	Mar	Apr (4)	May (4)	Jun (6)	Jul (14)	Aug (12)	Sep (4)	Oct (13)	Nov (1)	Dec (6)
2014	Jan (23)	Feb (19)	Mar (10)	Apr (14)	May (11)	Jun (6)	Jul (11)	Aug (15)	Sep (41)	Oct (95)	Nov (23)	Dec (11)
2015	Jan (3)	Feb (9)	Mar (19)	Apr (3)	May (1)	Jun (3)	Jul (11)	Aug (1)	Sep (15)	Oct (5)	Nov (2)	Dec
2016	Jan (7)	Feb (11)	Mar (8)	Apr (1)	May (3)	Jun (17)	Jul (12)	Aug (3)	Sep (5)	Oct (19)	Nov (12)	Dec (6)
2017	Jan (30)	Feb (23)	Mar (12)	Apr (32)	May (27)	Jun (7)	Jul (13)	Aug (16)	Sep (6)	Oct (11)	Nov	Dec (12)
2018	Jan (1)	Feb (5)	Mar (6)	Apr (7)	May (23)	Jun (3)	Jul (2)	Aug (1)	Sep (6)	Oct (6)	Nov (10)	Dec (3)
2019	Jan (26)	Feb (15)	Mar (9)	Apr	May (8)	Jun (14)	Jul (10)	Aug (10)	Sep (4)	Oct (2)	Nov (20)	Dec (10)
2020	Jan (10)	Feb (14)	Mar (29)	Apr (11)	May (25)	Jun (21)	Jul (23)	Aug (12)	Sep (19)	Oct (6)	Nov (8)	Dec (12)
2021	Jan (29)	Feb (9)	Mar (8)	Apr (8)	May (2)	Jun (2)	Jul (9)	Aug (9)	Sep (3)	Oct (4)	Nov (12)	Dec (13)
2022	Jan (4)	Feb	Mar (4)	Apr (12)	May (15)	Jun (7)	Jul (10)	Aug (2)	Sep	Oct (1)	Nov (8)	Dec
2023	Jan (15)	Feb	Mar (23)	Apr (1)	May (2)	Jun (10)	Jul	Aug (22)	Sep (19)	Oct (2)	Nov (20)	Dec
2024	Jan (1)	Feb	Mar (16)	Apr (15)	May (6)	Jun (4)	Jul (1)	Aug (1)	Sep	Oct (13)	Nov (18)	Dec (6)
2025	Jan (12)	Feb	Mar (2)	Apr (1)	May (11)	Jun (5)	Jul (4)	Aug (1)	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 60 61 62 63 64 .. 178 > >> (Page 62 of 178)

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-12-04 17:21:32

Am 29.11.12 19:51, schrieb Gustaf Neumann:
> I will reduce the writersize further, but still a slow delivery can 
> even slow down the delivery of the headers, which happens still in the 
> connection thread. Most likely, i won't address this now.  however, i 
> will check the effects of fastpath cache, which was deactivated so far...
It turned out that the interactions between the various delivery methods 
of naviserver are more differentiated than i expected. As one can see 
from the graphics below, the content returning infrastructure of 
naviserver depends is very differentiated.

Look below at the call graph.
* The boxed nodes are the tcl commands, the pink nodes denote handling 
by the writer thread.
* Of course, the ns_writer command can be used on the application layer 
and tries always delivery via the writer thread.
* The implicit calls (e.g. ns_returnfile) go through FastReturn(), which 
calls Ns_ConnReturnOpenFd() only, when neither caching or mmap are 
activated. As soon one of these are activated, ns_return etc. will never 
use the writer thread.
* All of the error messages or redirects use the path via Ns_ConnWriteChars.
* Only content sent via Ns_ConnWriteVChars has the chance to get compressed.
* Similarly, range requests are not handled when the data is not sent 
ReturnOpen to the writer Queue.
* The nodes with the gray background are marked as deprecated
* The nodes with the grey font are not called at all
* The nodes with the blue font are in the driver, and not further analyzed.

To make it short:
* caching and mmap deactivate the implicit delivery via the writer
* the non-200 return messages are always delivered via the connection 
thread; reducing the writersize does not make any difference.
* there is quite some potential to simplify / orthogonalize the servers 
infrastructure.
* improving this structure has nothing to do with 
naviserver-connthreadqueue, and should happen at some time in the main tip.

-gustaf neumann

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-12-03 10:39:20

Am 29.11.12 19:51, schrieb Gustaf Neumann:
 > However, i am still in the process to clean up and address
 > some strange interactions (e.g. for nsssl some socket
 > closing interactions between driver and connection threads
 > seems to complex to me), so i am still for a while busy with that

The problem in the bad interaction was that the driver poll reported, 
there should be data to read from the socket, while the nsssl receive 
said, there is none and vice versa. Since the nsssl was defined as sync, 
the socket processing happens in the connection threads via 
NsGetRequest(), which returned frequently empty, meaning a useless 
wakeup and ConnRun etc. in the connection thread. The reason for this 
was that the nsssl sets the socket to non-blocking without defining the 
driver as ASYNC, so that it was possible that a receive returned 0 
bytes. Setting the driver simply to ASYNC did not help either, since the 
driver used its own event loop via Ns_SockTimedWait(), which interacts 
badly with the drivers poll loop (did not realize on Linux that a socket 
becomes readable, while this worked on Mac OS X). So i changed the 
receive operation to simply report the read state back to the main 
driver, which works now apparently ok (tested on Mac OS X and Linux). To 
be on the save side, i have tagged the previous version of nsssl on 
bitbucket and incremented the version number. This change led as well to 
a simplification in the main driver.

The new version is running since sat on next-scripting.org and shows 
even better results. The average queue time is lower by again a factor 
of 4, the standard deviation is as well much better.

	accept	queue	run	total
avg	0,0599	0,0002	0,0183	0,0196
stdev	0,3382	0,0039	0,0413	0,0530
min	0,0000	0,0001	0,0011	0,0012
max	12,1002	0,2324	0,8958	2,0112
median	0,0088	0,0001	0,0118	0,0120

the accept times show now up since nsssl is async, but these are often 
not meaningful for the performance (due to closewait the accept might be 
a previous accept operation). now, all performance measurement happens 
relative to queue time (similar as in naviserver in the main tip).

Note, that this data is based not on a synthetic test but on a 
real-world web site with potentially different traffic patterns from day 
to day, so don't draw too much conclusions from this. By moving from 
sync to async some potentially time consuming processing (waiting for 
more data) is moved from the connection thread to the driver's event 
loop, so the time span in the connection thread is reduced (for nsssl 
the same way as it was before for nssock). Therefore the "runtime" has 
less randomness and is shorter, leading to the improvements.

The behavior of the async log writer is now configurable (by default 
turned off), log-rolling and shutdown should now wort with this reliable 
as well.

All changes are on bitbucket (nsssl and naviserver-connthreadqueue). The 
server measures now as well the filter time separately from the runtime, 
but i am just in the process to collect data on that...

-gustaf neumann


> one more update. Here is now the data of a full day with the 
> Async-Writer thread. The avg queueing time dropped from 0,0547 to 
> 0,0009, the standard deviation from 0,7481 to 0,0211. This is a 
> noticable improvement.
> 	accept	queue	run	total
> avg	0,0000	0,0009	0,1304	0,1313
> stdev	0,0000	0,0211	0,9022	0,9027
> min	0,0000	0,0000	0,0032	0,0033
> max	0,0000	0,9772	28,1699	28,1700
> median	0,0000	0,0001	0,0208	0,0209
>
> But still, the sometime large values are worth to investigate. It 
> turned out that the large queueing time came from requests from 
> taipeh, which contained several 404 errors. The size of the 404 
> request is 727 bytes, and therefore under the writersize, which was 
> configured as 1000. The delivery of an error message takes to this 
> site more than a second. Funny enough, the delivery of the error 
> message blocked the connection thread longer than the delivery of the 
> image when it is above the writersize.
>
> I will reduce the writersize further, but still a slow delivery can 
> even slow down the delivery of the headers, which happens still in the 
> connection thread. Most likely, i won't address this now.  however, i 
> will check the effects of fastpath cache, which was deactivated so far...
>
> best regards
> -gustaf neumann
>
> Am 28.11.12 11:38, schrieb Gustaf Neumann:
>> Dear all,
>>
>> here is a short update of the findings and changes since last week.
>>
>> One can now activate for nslog "logpartialtimes", which adds an entry 
>> to the access log containing the partial request times (accept time, 
>> queuing time, and run time). The sample site runs now with minthreads 
>> == 5, maxthreads 10.
>>
>> If one analyzes the data of one day of our sample site, one can see 
>> that in this case, the accept time is always 0 (this is caused by 
>> nsssl, which is a sync driver, header parsing happens solely in the 
>> connection thread), the queue time is on average 54 ms 
>> (milliseconds), which is huge, while the median is 0.1 ms. The reason 
>> is that we can see a maximum queue time of 21 secs! As a consequence, 
>> the standard deviation is as well huge. When one looks more closely 
>> to the log file, one can see that at times, when the queuing time is 
>> huge, as well the runtime is often huge, leading to cascading effects 
>> as described below. cause and reason are hard to determine, since we 
>> saw large runtimes even on delivery of rather small times
>> 	accept	queue	run	total
>> avg	0,0000	0,0547	0,2656	0,3203
>> stdev	0,0000	0,7481	0,9740	1,4681
>> min	0,0000	0,0000	0,0134	0,0135
>> max	0,0000	21,9594	16,6905	29,1668
>> median	0,0000	0,0001	0,0329	0,0330
>>
>> The causes for the "random" slow cases are at least from two sources: 
>> slow delivery (some clients are slow in retrieving data, especially 
>> some bots seems to be bad) and system specific latencies (e.g. backup 
>> running, other servers becoming active, etc.). To address this 
>> problem, i moved more file deliveries to the writer thread (reducing 
>> writersize) and activated TCP_DEFER_ACCEPT. One sees significant 
>> improvements, the average queuing time is 4.6 ms, but still the 
>> standard deviation is huge, caused by some "erratic" some large times.
>> 	accept	queue	run	total
>> avg	0,0000	0,0046	0,1538	0,1583
>> stdev	0,0000	0,0934	0,6868	0,6967
>> min	0,0000	0,0000	0,0137	0,0138
>> max	0,0000	4,6041	22,4101	22,4102
>> median	0,0000	0,0001	0,0216	0,0217
>>
>> There are still some huge values, and the queuing time was "lucky" 
>> not to be influenced by the still large runtimes of requests. Aside 
>> of the top values, one sees sometimes unexplained delays of a few 
>> seconds. which seem to block everything.
>>
>> A large source for latencies is the file system. On our production 
>> system we already invested some efforts to tune the file system in 
>> the past, such we don't see effects as large as on this sample site. 
>> Since NaviServer should work well also on non-tuned file systems, i 
>> implemented an async writer thread, that decouples writing to files 
>> (access.log and error.log) from the connection threads. The interface 
>> is the same as unix write(), but does not block.
>>
>> I have this running since yesterday evening on the sample site, and 
>> do far, it seems to help to reduce the random latencies 
>> significantly. i'll wait with posting data until i have a similar 
>> time range to compare, maybe the good times i am seeing are accidentally.
>>
>> I have not yet looked into the details for handling the async writer 
>> withing logrotate and shutdown.
>>
>> On another front, i was experimenting with jemalloc and tcmalloc.
>> Previously we had the following figures, using tcl as distributed 
>> with zippy malloc
>>
>>   with minthreads=2, maxthreads=2
>>       requests 10307 spools 1280 queued 2704 connthreads 11 rss 425
>>
>> When changing minthreads to 5, maxthreads 10, but using jemalloc,
>> one sees the following figures
>>     requests 3933 spools 188 queued 67 connthreads 7 rss 488
>>     requests 6325 spools 256 queued 90 connthreads 9 rss 528
>>     requests 10021 spools 378 queued 114 connthreads 14 rss 530
>> It is interesting to see, that with always 5 connections threads 
>> running and using jemalloc, we see a rss consumption only slightly 
>> larger than with plain tcl and zippy malloc having maxthreads == 2, 
>> having less requests queued.
>>
>> Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10
>>     requests 2062 spools 49 queued 3 connthreads 6 rss 376
>>     requests 7743 spools 429 queued 359 connthreads 11 rss 466
>>     requests 8389 spools 451 queued 366 connthreads 12 rss 466
>> which is even better.
>>
>> For more information on malloc tests see
>> https://next-scripting.org/xowiki/docs/misc/thread-mallocs
>> or the tcl-core mailing list.
>>
>> That's all for now
>>
>> -gustaf neumann
>>
>> Am 20.11.12 20:07, schrieb Gustaf Neumann:
>>> Dear all,
>>>
>>>> The idea of controlling the number of running threads via queuing 
>>>> latency is interesting, but i have to look into the details before 
>>>> i can comment on this.
>>> before one can consider controlling the number of running threads 
>>> via queuing latency, one has to improve the awareness in NaviServer 
>>> about the various phases in the requests lifetime.
>>>
>>> In the experimental version, we have now the following time stamps 
>>> recorded:
>>>
>>> -  acceptTime (the time, a socket was accepted)
>>> -  requestQueueTime (the time the request was queued; was startTime)
>>> -  requestDequeueTime (the time the request was dequeued)
>>>
>>> The difference between requestQueueTime and acceptTime is the setup 
>>> cost and depends on the amount of work, the driver does. For 
>>> instance, nssock of naviserver performs read-ahead, while nsssl does 
>>> not and passes connection right away. So, the previously used 
>>> startTime (which is acctually the time the request was queued) was 
>>> for drivers with read ahead not correct. In the experimental 
>>> version, [ns_conn start] returns now always the accept time.
>>>
>>> The next paragraph uses the term endTime, which is the time, when a 
>>> connection thread is done with a request (either the content was 
>>> delivered, or the content was handed over to a writer thread).
>>>
>>> The difference between requestDequeueTime and requestQueueTime is 
>>> the time spent in the queue. The difference between endTime and 
>>> requestDequeueTime is the pure runtime, the difference between 
>>> endTime and acceptTime is the totalTime. As a rough approximation 
>>> the time between requestDequeueTime and acceptTime is pretty much 
>>> influenced by the server setup, and the runtime by the application. 
>>> I used the term "approximation" since the runtime of certain other 
>>> requests influences the queuing time, as we see in the following:
>>>
>>> Consider a server with two running connection threads receiving 6 
>>> requests, where requests 2-5 are received in a very short time. The 
>>> first three requests are directly assign to connection threads, have 
>>> fromQueue == 0. These have queuing times between 88 and 110 micro 
>>> seconds, which includes signal sending/receiving, thread change, and 
>>> initial setup in the connection thread. The runtimes for these 
>>> requests are pretty bad, in the range of 0.24 to 3.8 seconds elapsed 
>>> time.
>>> [1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.000000 queue 0.000110 run 0.637781 total 0.637891
>>> [2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.000000 queue 0.000090 run 0.245030 total 0.245120
>>> [3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.000000 queue 0.000088 run 0.432421 total 0.432509
>>> [4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.000000 queue 0.244246 run 0.249208 total 0.493454
>>> [5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.000000 queue 0.431545 run 3.713331 total 4.144876
>>> [6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.000000 queue 0.480382 run 3.799818 total 4.280200
>>>
>>> Requests [4, 5, 6] are queued, and have queuing times between 0.2 
>>> and 0.5 seconds. The queuing times are pretty much the runtimes of 
>>> [2, 3, 4], therefore the runtime determines the queuing time. For 
>>> example, the totalTime of request [4] was 0.493454 secs, half of the 
>>> time it was waiting in the queue. Request [4] can consider itself 
>>> happy that it was not scheduled after [5] or [6], where its 
>>> totalTime would have been likely in the range of 4 secs (10 times 
>>> slower). Low waiting times are essential for good performance.
>>>
>>> This example shows pretty well the importance of aync delivery 
>>> mechanisms like the writer thread or bgdelivery in OpenACS. A file 
>>> being delivered by the connection thread over a slow internet 
>>> connection might block later requests substantially (as in the cases 
>>> above). This is even more important for todays web-sites, where a 
>>> single view might entail 60+ embedded requests for js, css, images, 
>>> .... where it is not feasible to defined hundreds of connection threads.
>>>
>>> Before going further into detail i'll provide further introspection 
>>> mechanism to the experimental version.
>>>
>>>  - [ns_server stats] adding total waiting time
>>>  - [ns_conn queuetime] ... time spent in queue
>>>  - [ns_conn dequeue]  ... time stamp when the req starts to actually 
>>> run (similar to [ns_conn start])
>>>
>>> The first can be used for server monitoring, the next two for single 
>>> connection introspection. The queuetime can be useful for better 
>>> awareness and for optional output in the access log, and the dequeue 
>>> time-stamp for application level profiling as base for a difference 
>>> with current time.
>>>
>>> Further wishes, suggestions, comments?
>>>
>>> -gustaf neumann
>>>

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-30 13:51:38

Am 29.11.12 20:24, schrieb Jeff Rogers:
>
> Hi Gustaf,
>
> One quick idea on the writer thread is to, regardless of size always 
> make one write attempt in the conn thread, and if less than the 
> complete buffer was written, then pass the remainder off to the writer 
> thread. This would get the best of both worlds - fast/small requests 
> don't incur the overhead of moving the duplicating the buffers, while 
> large/slow requests wouldn't block the conn thread.
yes, that might be interesting to investigate as well. one has to 
differentiation between synchronous and asynchronous drivers and between 
file-deliveries and string-deliveries, etc. so in the general case, it 
might be some more work. However, i am still in the process to clean up 
and address some strange interactions (e.g. for nsssl some socket 
closing interactions between driver and connection threads seems to 
complex to me), so i am still for a while busy with that. i hope, that i 
don't create too much collateral damage and let you know, when the new 
stuff is stabilized...

-gustaf neumann

-- 
Univ.Prof. Dr. Gustaf Neumann
Institute of Information Systems and New Media
WU Vienna
Augasse 2-6, A-1090 Vienna, AUSTRIA

Re: [naviserver-devel] naviserver with connection thread queue

From: Maurizio M. <Mau...@sp...> - 2012-11-29 19:30:30

This is good engineering work. I rise my hat to that.

Congrats,

Maurizio

 

 

From: Gustaf Neumann [mailto:ne...@wu...] 
Sent: 29 November 2012 19:51
To: nav...@li...
Subject: Re: [naviserver-devel] naviserver with connection thread queue

 

Dear all,

one more update. Here is now the data of a full day with the Async-Writer
thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard
deviation from 0,7481 to 0,0211. This is a noticable improvement.

        accept  queue   run     total
avg     0,0000  0,0009  0,1304  0,1313
stdev   0,0000  0,0211  0,9022  0,9027
min     0,0000  0,0000  0,0032  0,0033
max     0,0000  0,9772  28,1699 28,1700
median  0,0000  0,0001  0,0208  0,0209
 

But still, the sometime large values are worth to investigate. It turned out
that the large queueing time came from requests from taipeh, which contained
several 404 errors. The size of the 404 request is 727 bytes, and therefore
under the writersize, which was configured as 1000. The delivery of an error
message takes to this site more than a second. Funny enough, the delivery of
the error message blocked the connection thread longer than the delivery of
the image when it is above the writersize.

I will reduce the writersize further, but still a slow delivery can even
slow down the delivery of the headers, which happens still in the connection
thread. Most likely, i won't address this now.  however, i will check the
effects of fastpath cache, which was deactivated so far...

best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week. 

One can now activate for nslog "logpartialtimes", which adds an entry to the
access log containing the partial request times (accept time, queuing time,
and run time). The sample site runs now with minthreads == 5, maxthreads 10.

If one analyzes the data of one day of our sample site, one can see that in
this case, the accept time is always 0 (this is caused by nsssl, which is a
sync driver, header parsing happens solely in the connection thread), the
queue time is on average 54 ms (milliseconds), which is huge, while the
median is 0.1 ms. The reason is that we can see a maximum queue time of 21
secs! As a consequence, the standard deviation is as well huge. When one
looks more closely to the log file, one can see that at times, when the
queuing time is huge, as well the runtime is often huge, leading to
cascading effects as described below. cause and reason are hard to
determine, since we saw large runtimes even on delivery of rather small
times

  accept  queue   run     total
avg      0,0000  0,0547  0,2656  0,3203
stdev    0,0000  0,7481  0,9740  1,4681
min      0,0000  0,0000  0,0134  0,0135
max      0,0000  21,9594 16,6905 29,1668
median   0,0000  0,0001  0,0329  0,0330
 

The causes for the "random" slow cases are at least from two sources: slow
delivery (some clients are slow in retrieving data, especially some bots
seems to be bad) and system specific latencies (e.g. backup running, other
servers becoming active, etc.). To address this problem, i moved more file
deliveries to the writer thread (reducing writersize) and activated
TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing
time is 4.6 ms, but still the standard deviation is huge, caused by some
"erratic" some large times.

  accept  queue   run     total
avg      0,0000  0,0046  0,1538  0,1583
stdev    0,0000  0,0934  0,6868  0,6967
min      0,0000  0,0000  0,0137  0,0138
max      0,0000  4,6041  22,4101 22,4102
median   0,0000  0,0001  0,0216  0,0217
 

There are still some huge values, and the queuing time was "lucky" not to be
influenced by the still large runtimes of requests. Aside of the top values,
one sees sometimes unexplained delays of a few seconds. which seem to block
everything.

A large source for latencies is the file system. On our production system we
already invested some efforts to tune the file system in the past, such we
don't see effects as large as on this sample site. Since NaviServer should
work well also on non-tuned file systems, i implemented an async writer
thread, that decouples writing to files (access.log and error.log) from the
connection threads. The interface is the same as unix write(), but does not
block.

I have this running since yesterday evening on the sample site, and do far,
it seems to help to reduce the random latencies significantly. i'll wait
with posting data until i have a similar time range to compare, maybe the
good times i am seeing are accidentally.

I have not yet looked into the details for handling the async writer withing
logrotate and shutdown.

On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with zippy
malloc

  with minthreads=2, maxthreads=2
      requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing minthreads to 5, maxthreads 10, but using jemalloc,
one sees the following figures



   requests 3933 spools 188 queued 67 connthreads 7 rss 488
   requests 6325 spools 256 queued 90 connthreads 9 rss 528
   requests 10021 spools 378 queued 114 connthreads 14 rss 530

It is interesting to see, that with always 5 connections threads running and
using jemalloc, we see a rss consumption only slightly larger than with
plain tcl and zippy malloc having maxthreads == 2, having less requests
queued.

Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10 

   requests 2062 spools 49 queued 3 connthreads 6 rss 376
   requests 7743 spools 429 queued 359 connthreads 11 rss 466
   requests 8389 spools 451 queued 366 connthreads 12 rss 466

which is even better. 

For more information on malloc tests see
https://next-scripting.org/xowiki/docs/misc/thread-mallocs
or the tcl-core mailing list.

That's all for now

-gustaf neumann

Am 20.11.12 20:07, schrieb Gustaf Neumann:

Dear all,

The idea of controlling the number of running threads via queuing latency is
interesting, but i have to look into the details before i can comment on
this.

before one can consider controlling the number of running threads via
queuing latency, one has to improve the awareness in NaviServer about the
various phases in the requests lifetime.

In the experimental version, we have now the following time stamps recorded:

-  acceptTime (the time, a socket was accepted)
-  requestQueueTime (the time the request was queued; was startTime)
-  requestDequeueTime (the time the request was dequeued)

The difference between requestQueueTime and acceptTime is the setup cost and
depends on the amount of work, the driver does. For instance, nssock of
naviserver performs read-ahead, while nsssl does not and passes connection
right away. So, the previously used startTime (which is acctually the time
the request was queued) was for drivers with read ahead not correct. In the
experimental version, [ns_conn start] returns now always the accept time.

The next paragraph uses the term endTime, which is the time, when a
connection thread is done with a request (either the content was delivered,
or the content was handed over to a writer thread).

The difference between requestDequeueTime and requestQueueTime is the time
spent in the queue. The difference between endTime and requestDequeueTime is
the pure runtime, the difference between endTime and acceptTime is the
totalTime. As a rough approximation the time between requestDequeueTime and
acceptTime is pretty much influenced by the server setup, and the runtime by
the application. I used the term "approximation" since the runtime of
certain other requests influences the queuing time, as we see in the
following:

Consider a server with two running connection threads receiving 6 requests,
where requests 2-5 are received in a very short time. The first three
requests are directly assign to connection threads, have fromQueue == 0.
These have queuing times between 88 and 110 micro seconds, which includes
signal sending/receiving, thread change, and initial setup in the connection
thread. The runtimes for these requests are pretty bad, in the range of 0.24
to 3.8 seconds elapsed time.

[1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.000000 queue
0.000110 run 0.637781 total 0.637891
[2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.000000 queue
0.000090 run 0.245030 total 0.245120
[3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.000000 queue
0.000088 run 0.432421 total 0.432509
[4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.000000 queue
0.244246 run 0.249208 total 0.493454
[5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.000000 queue
0.431545 run 3.713331 total 4.144876
[6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.000000 queue
0.480382 run 3.799818 total 4.280200
 

Requests [4, 5, 6] are queued, and have queuing times between 0.2 and 0.5
seconds. The queuing times are pretty much the runtimes of [2, 3, 4],
therefore the runtime determines the queuing time. For example, the
totalTime of request [4] was 0.493454 secs, half of the time it was waiting
in the queue. Request [4] can consider itself happy that it was not
scheduled after [5] or [6], where its totalTime would have been likely in
the range of 4 secs (10 times slower). Low waiting times are essential for
good performance.

This example shows pretty well the importance of aync delivery mechanisms
like the writer thread or bgdelivery in OpenACS. A file being delivered by
the connection thread over a slow internet connection might block later
requests substantially (as in the cases above). This is even more important
for todays web-sites, where a single view might entail 60+ embedded requests
for js, css, images, .... where it is not feasible to defined hundreds of
connection threads.

Before going further into detail i'll provide further introspection
mechanism to the experimental version.

 - [ns_server stats] adding total waiting time 
 - [ns_conn queuetime] ... time spent in queue
 - [ns_conn dequeue]  ... time stamp when the req starts to actually run
(similar to [ns_conn start])

The first can be used for server monitoring, the next two for single
connection introspection. The queuetime can be useful for better awareness
and for optional output in the access log, and the dequeue time-stamp for
application level profiling as base for a difference with current time. 

Further wishes, suggestions, comments?

-gustaf neumann

Re: [naviserver-devel] naviserver with connection thread queue

From: Jeff R. <dv...@di...> - 2012-11-29 19:25:17

Gustaf Neumann wrote:
> Funny enough, the delivery of the error message blocked the connection
> thread longer than the delivery of the image when it is above the
> writersize.
>
> I will reduce the writersize further, but still a slow delivery can even
> slow down the delivery of the headers, which happens still in the
> connection thread. Most likely, i won't address this now.  however, i
> will check the effects of fastpath cache, which was deactivated so far...

Hi Gustaf,

This has been some really interesting bits you've been sharing.  I keep 
meaning to add in my thoughts but I haven't had the time.

One quick idea on the writer thread is to, regardless of size always 
make one write attempt in the conn thread, and if less than the complete 
buffer was written, then pass the remainder off to the writer thread. 
This would get the best of both worlds - fast/small requests don't incur 
the overhead of moving the duplicating the buffers, while large/slow 
requests wouldn't block the conn thread.

-J

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-29 18:51:44

Dear all,

one more update. Here is now the data of a full day with the 
Async-Writer thread. The avg queueing time dropped from 0,0547 to 
0,0009, the standard deviation from 0,7481 to 0,0211. This is a 
noticable improvement.

	accept	queue	run	total
avg	0,0000	0,0009	0,1304	0,1313
stdev	0,0000	0,0211	0,9022	0,9027
min	0,0000	0,0000	0,0032	0,0033
max	0,0000	0,9772	28,1699	28,1700
median	0,0000	0,0001	0,0208	0,0209

But still, the sometime large values are worth to investigate. It turned 
out that the large queueing time came from requests from taipeh, which 
contained several 404 errors. The size of the 404 request is 727 bytes, 
and therefore under the writersize, which was configured as 1000. The 
delivery of an error message takes to this site more than a second. 
Funny enough, the delivery of the error message blocked the connection 
thread longer than the delivery of the image when it is above the 
writersize.

I will reduce the writersize further, but still a slow delivery can even 
slow down the delivery of the headers, which happens still in the 
connection thread. Most likely, i won't address this now.  however, i 
will check the effects of fastpath cache, which was deactivated so far...

best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:
> Dear all,
>
> here is a short update of the findings and changes since last week.
>
> One can now activate for nslog "logpartialtimes", which adds an entry 
> to the access log containing the partial request times (accept time, 
> queuing time, and run time). The sample site runs now with minthreads 
> == 5, maxthreads 10.
>
> If one analyzes the data of one day of our sample site, one can see 
> that in this case, the accept time is always 0 (this is caused by 
> nsssl, which is a sync driver, header parsing happens solely in the 
> connection thread), the queue time is on average 54 ms (milliseconds), 
> which is huge, while the median is 0.1 ms. The reason is that we can 
> see a maximum queue time of 21 secs! As a consequence, the standard 
> deviation is as well huge. When one looks more closely to the log 
> file, one can see that at times, when the queuing time is huge, as 
> well the runtime is often huge, leading to cascading effects as 
> described below. cause and reason are hard to determine, since we saw 
> large runtimes even on delivery of rather small times
> 	accept	queue	run	total
> avg	0,0000	0,0547	0,2656	0,3203
> stdev	0,0000	0,7481	0,9740	1,4681
> min	0,0000	0,0000	0,0134	0,0135
> max	0,0000	21,9594	16,6905	29,1668
> median	0,0000	0,0001	0,0329	0,0330
>
> The causes for the "random" slow cases are at least from two sources: 
> slow delivery (some clients are slow in retrieving data, especially 
> some bots seems to be bad) and system specific latencies (e.g. backup 
> running, other servers becoming active, etc.). To address this 
> problem, i moved more file deliveries to the writer thread (reducing 
> writersize) and activated TCP_DEFER_ACCEPT. One sees significant 
> improvements, the average queuing time is 4.6 ms, but still the 
> standard deviation is huge, caused by some "erratic" some large times.
> 	accept	queue	run	total
> avg	0,0000	0,0046	0,1538	0,1583
> stdev	0,0000	0,0934	0,6868	0,6967
> min	0,0000	0,0000	0,0137	0,0138
> max	0,0000	4,6041	22,4101	22,4102
> median	0,0000	0,0001	0,0216	0,0217
>
> There are still some huge values, and the queuing time was "lucky" not 
> to be influenced by the still large runtimes of requests. Aside of the 
> top values, one sees sometimes unexplained delays of a few seconds. 
> which seem to block everything.
>
> A large source for latencies is the file system. On our production 
> system we already invested some efforts to tune the file system in the 
> past, such we don't see effects as large as on this sample site. Since 
> NaviServer should work well also on non-tuned file systems, i 
> implemented an async writer thread, that decouples writing to files 
> (access.log and error.log) from the connection threads. The interface 
> is the same as unix write(), but does not block.
>
> I have this running since yesterday evening on the sample site, and do 
> far, it seems to help to reduce the random latencies significantly. 
> i'll wait with posting data until i have a similar time range to 
> compare, maybe the good times i am seeing are accidentally.
>
> I have not yet looked into the details for handling the async writer 
> withing logrotate and shutdown.
>
> On another front, i was experimenting with jemalloc and tcmalloc.
> Previously we had the following figures, using tcl as distributed with 
> zippy malloc
>
>   with minthreads=2, maxthreads=2
>       requests 10307 spools 1280 queued 2704 connthreads 11 rss 425
>
> When changing minthreads to 5, maxthreads 10, but using jemalloc,
> one sees the following figures
>     requests 3933 spools 188 queued 67 connthreads 7 rss 488
>     requests 6325 spools 256 queued 90 connthreads 9 rss 528
>     requests 10021 spools 378 queued 114 connthreads 14 rss 530
> It is interesting to see, that with always 5 connections threads 
> running and using jemalloc, we see a rss consumption only slightly 
> larger than with plain tcl and zippy malloc having maxthreads == 2, 
> having less requests queued.
>
> Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10
>     requests 2062 spools 49 queued 3 connthreads 6 rss 376
>     requests 7743 spools 429 queued 359 connthreads 11 rss 466
>     requests 8389 spools 451 queued 366 connthreads 12 rss 466
> which is even better.
>
> For more information on malloc tests see
> https://next-scripting.org/xowiki/docs/misc/thread-mallocs
> or the tcl-core mailing list.
>
> That's all for now
>
> -gustaf neumann
>
> Am 20.11.12 20:07, schrieb Gustaf Neumann:
>> Dear all,
>>
>>> The idea of controlling the number of running threads via queuing 
>>> latency is interesting, but i have to look into the details before i 
>>> can comment on this.
>> before one can consider controlling the number of running threads via 
>> queuing latency, one has to improve the awareness in NaviServer about 
>> the various phases in the requests lifetime.
>>
>> In the experimental version, we have now the following time stamps 
>> recorded:
>>
>> -  acceptTime (the time, a socket was accepted)
>> -  requestQueueTime (the time the request was queued; was startTime)
>> -  requestDequeueTime (the time the request was dequeued)
>>
>> The difference between requestQueueTime and acceptTime is the setup 
>> cost and depends on the amount of work, the driver does. For 
>> instance, nssock of naviserver performs read-ahead, while nsssl does 
>> not and passes connection right away. So, the previously used 
>> startTime (which is acctually the time the request was queued) was 
>> for drivers with read ahead not correct. In the experimental version, 
>> [ns_conn start] returns now always the accept time.
>>
>> The next paragraph uses the term endTime, which is the time, when a 
>> connection thread is done with a request (either the content was 
>> delivered, or the content was handed over to a writer thread).
>>
>> The difference between requestDequeueTime and requestQueueTime is the 
>> time spent in the queue. The difference between endTime and 
>> requestDequeueTime is the pure runtime, the difference between 
>> endTime and acceptTime is the totalTime. As a rough approximation the 
>> time between requestDequeueTime and acceptTime is pretty much 
>> influenced by the server setup, and the runtime by the application. I 
>> used the term "approximation" since the runtime of certain other 
>> requests influences the queuing time, as we see in the following:
>>
>> Consider a server with two running connection threads receiving 6 
>> requests, where requests 2-5 are received in a very short time. The 
>> first three requests are directly assign to connection threads, have 
>> fromQueue == 0. These have queuing times between 88 and 110 micro 
>> seconds, which includes signal sending/receiving, thread change, and 
>> initial setup in the connection thread. The runtimes for these 
>> requests are pretty bad, in the range of 0.24 to 3.8 seconds elapsed 
>> time.
>> [1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.000000 queue 0.000110 run 0.637781 total 0.637891
>> [2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.000000 queue 0.000090 run 0.245030 total 0.245120
>> [3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.000000 queue 0.000088 run 0.432421 total 0.432509
>> [4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.000000 queue 0.244246 run 0.249208 total 0.493454
>> [5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.000000 queue 0.431545 run 3.713331 total 4.144876
>> [6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.000000 queue 0.480382 run 3.799818 total 4.280200
>>
>> Requests [4, 5, 6] are queued, and have queuing times between 0.2 and 
>> 0.5 seconds. The queuing times are pretty much the runtimes of [2, 3, 
>> 4], therefore the runtime determines the queuing time. For example, 
>> the totalTime of request [4] was 0.493454 secs, half of the time it 
>> was waiting in the queue. Request [4] can consider itself happy that 
>> it was not scheduled after [5] or [6], where its totalTime would have 
>> been likely in the range of 4 secs (10 times slower). Low waiting 
>> times are essential for good performance.
>>
>> This example shows pretty well the importance of aync delivery 
>> mechanisms like the writer thread or bgdelivery in OpenACS. A file 
>> being delivered by the connection thread over a slow internet 
>> connection might block later requests substantially (as in the cases 
>> above). This is even more important for todays web-sites, where a 
>> single view might entail 60+ embedded requests for js, css, images, 
>> .... where it is not feasible to defined hundreds of connection threads.
>>
>> Before going further into detail i'll provide further introspection 
>> mechanism to the experimental version.
>>
>>  - [ns_server stats] adding total waiting time
>>  - [ns_conn queuetime] ... time spent in queue
>>  - [ns_conn dequeue]  ... time stamp when the req starts to actually 
>> run (similar to [ns_conn start])
>>
>> The first can be used for server monitoring, the next two for single 
>> connection introspection. The queuetime can be useful for better 
>> awareness and for optional output in the access log, and the dequeue 
>> time-stamp for application level profiling as base for a difference 
>> with current time.
>>
>> Further wishes, suggestions, comments?
>>
>> -gustaf neumann
>>

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-28 10:38:33

Dear all,

here is a short update of the findings and changes since last week.

One can now activate for nslog "logpartialtimes", which adds an entry to 
the access log containing the partial request times (accept time, 
queuing time, and run time). The sample site runs now with minthreads == 
5, maxthreads 10.

If one analyzes the data of one day of our sample site, one can see that 
in this case, the accept time is always 0 (this is caused by nsssl, 
which is a sync driver, header parsing happens solely in the connection 
thread), the queue time is on average 54 ms (milliseconds), which is 
huge, while the median is 0.1 ms. The reason is that we can see a 
maximum queue time of 21 secs! As a consequence, the standard deviation 
is as well huge. When one looks more closely to the log file, one can 
see that at times, when the queuing time is huge, as well the runtime is 
often huge, leading to cascading effects as described below. cause and 
reason are hard to determine, since we saw large runtimes even on 
delivery of rather small times

	accept	queue	run	total
avg	0,0000	0,0547	0,2656	0,3203
stdev	0,0000	0,7481	0,9740	1,4681
min	0,0000	0,0000	0,0134	0,0135
max	0,0000	21,9594	16,6905	29,1668
median	0,0000	0,0001	0,0329	0,0330

The causes for the "random" slow cases are at least from two sources: 
slow delivery (some clients are slow in retrieving data, especially some 
bots seems to be bad) and system specific latencies (e.g. backup 
running, other servers becoming active, etc.). To address this problem, 
i moved more file deliveries to the writer thread (reducing writersize) 
and activated TCP_DEFER_ACCEPT. One sees significant improvements, the 
average queuing time is 4.6 ms, but still the standard deviation is 
huge, caused by some "erratic" some large times.

	accept	queue	run	total
avg	0,0000	0,0046	0,1538	0,1583
stdev	0,0000	0,0934	0,6868	0,6967
min	0,0000	0,0000	0,0137	0,0138
max	0,0000	4,6041	22,4101	22,4102
median	0,0000	0,0001	0,0216	0,0217

There are still some huge values, and the queuing time was "lucky" not 
to be influenced by the still large runtimes of requests. Aside of the 
top values, one sees sometimes unexplained delays of a few seconds. 
which seem to block everything.

A large source for latencies is the file system. On our production 
system we already invested some efforts to tune the file system in the 
past, such we don't see effects as large as on this sample site. Since 
NaviServer should work well also on non-tuned file systems, i 
implemented an async writer thread, that decouples writing to files 
(access.log and error.log) from the connection threads. The interface is 
the same as unix write(), but does not block.

I have this running since yesterday evening on the sample site, and do 
far, it seems to help to reduce the random latencies significantly. i'll 
wait with posting data until i have a similar time range to compare, 
maybe the good times i am seeing are accidentally.

I have not yet looked into the details for handling the async writer 
withing logrotate and shutdown.

On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with 
zippy malloc

   with minthreads=2, maxthreads=2
       requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing minthreads to 5, maxthreads 10, but using jemalloc,
one sees the following figures

    requests 3933 spools 188 queued 67 connthreads 7 rss 488
    requests 6325 spools 256 queued 90 connthreads 9 rss 528
    requests 10021 spools 378 queued 114 connthreads 14 rss 530

It is interesting to see, that with always 5 connections threads running 
and using jemalloc, we see a rss consumption only slightly larger than 
with plain tcl and zippy malloc having maxthreads == 2, having less 
requests queued.

Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10

    requests 2062 spools 49 queued 3 connthreads 6 rss 376
    requests 7743 spools 429 queued 359 connthreads 11 rss 466
    requests 8389 spools 451 queued 366 connthreads 12 rss 466

which is even better.

For more information on malloc tests see
https://next-scripting.org/xowiki/docs/misc/thread-mallocs
or the tcl-core mailing list.

That's all for now

-gustaf neumann

Am 20.11.12 20:07, schrieb Gustaf Neumann:
> Dear all,
>
>> The idea of controlling the number of running threads via queuing 
>> latency is interesting, but i have to look into the details before i 
>> can comment on this.
> before one can consider controlling the number of running threads via 
> queuing latency, one has to improve the awareness in NaviServer about 
> the various phases in the requests lifetime.
>
> In the experimental version, we have now the following time stamps 
> recorded:
>
> -  acceptTime (the time, a socket was accepted)
> -  requestQueueTime (the time the request was queued; was startTime)
> -  requestDequeueTime (the time the request was dequeued)
>
> The difference between requestQueueTime and acceptTime is the setup 
> cost and depends on the amount of work, the driver does. For instance, 
> nssock of naviserver performs read-ahead, while nsssl does not and 
> passes connection right away. So, the previously used startTime (which 
> is acctually the time the request was queued) was for drivers with 
> read ahead not correct. In the experimental version, [ns_conn start] 
> returns now always the accept time.
>
> The next paragraph uses the term endTime, which is the time, when a 
> connection thread is done with a request (either the content was 
> delivered, or the content was handed over to a writer thread).
>
> The difference between requestDequeueTime and requestQueueTime is the 
> time spent in the queue. The difference between endTime and 
> requestDequeueTime is the pure runtime, the difference between endTime 
> and acceptTime is the totalTime. As a rough approximation the time 
> between requestDequeueTime and acceptTime is pretty much influenced by 
> the server setup, and the runtime by the application. I used the term 
> "approximation" since the runtime of certain other requests influences 
> the queuing time, as we see in the following:
>
> Consider a server with two running connection threads receiving 6 
> requests, where requests 2-5 are received in a very short time. The 
> first three requests are directly assign to connection threads, have 
> fromQueue == 0. These have queuing times between 88 and 110 micro 
> seconds, which includes signal sending/receiving, thread change, and 
> initial setup in the connection thread. The runtimes for these 
> requests are pretty bad, in the range of 0.24 to 3.8 seconds elapsed time.
> [1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.000000 queue 0.000110 run 0.637781 total 0.637891
> [2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.000000 queue 0.000090 run 0.245030 total 0.245120
> [3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.000000 queue 0.000088 run 0.432421 total 0.432509
> [4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.000000 queue 0.244246 run 0.249208 total 0.493454
> [5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.000000 queue 0.431545 run 3.713331 total 4.144876
> [6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.000000 queue 0.480382 run 3.799818 total 4.280200
>
> Requests [4, 5, 6] are queued, and have queuing times between 0.2 and 
> 0.5 seconds. The queuing times are pretty much the runtimes of [2, 3, 
> 4], therefore the runtime determines the queuing time. For example, 
> the totalTime of request [4] was 0.493454 secs, half of the time it 
> was waiting in the queue. Request [4] can consider itself happy that 
> it was not scheduled after [5] or [6], where its totalTime would have 
> been likely in the range of 4 secs (10 times slower). Low waiting 
> times are essential for good performance.
>
> This example shows pretty well the importance of aync delivery 
> mechanisms like the writer thread or bgdelivery in OpenACS. A file 
> being delivered by the connection thread over a slow internet 
> connection might block later requests substantially (as in the cases 
> above). This is even more important for todays web-sites, where a 
> single view might entail 60+ embedded requests for js, css, images, 
> .... where it is not feasible to defined hundreds of connection threads.
>
> Before going further into detail i'll provide further introspection 
> mechanism to the experimental version.
>
>  - [ns_server stats] adding total waiting time
>  - [ns_conn queuetime] ... time spent in queue
>  - [ns_conn dequeue]  ... time stamp when the req starts to actually 
> run (similar to [ns_conn start])
>
> The first can be used for server monitoring, the next two for single 
> connection introspection. The queuetime can be useful for better 
> awareness and for optional output in the access log, and the dequeue 
> time-stamp for application level profiling as base for a difference 
> with current time.
>
> Further wishes, suggestions, comments?
>
> -gustaf neumann
>

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-20 19:07:40

Dear all,

> The idea of controlling the number of running threads via queuing
> latency is interesting, but i have to look into the details before i
> can comment on this.
before one can consider controlling the number of running threads via
queuing latency, one has to improve the awareness in NaviServer about
the various phases in the requests lifetime.

In the experimental version, we have now the following time stamps recorded:

- acceptTime (the time, a socket was accepted)
- requestQueueTime (the time the request was queued; was startTime)
- requestDequeueTime (the time the request was dequeued)

The difference between requestQueueTime and acceptTime is the setup cost
and depends on the amount of work, the driver does. For instance, nssock
of naviserver performs read-ahead, while nsssl does not and passes
connection right away. So, the previously used startTime (which is
acctually the time the request was queued) was for drivers with read
ahead not correct. In the experimental version, [ns_conn start] returns
now always the accept time.

The next paragraph uses the term endTime, which is the time, when a
connection thread is done with a request (either the content was
delivered, or the content was handed over to a writer thread).

The difference between requestDequeueTime and requestQueueTime is the
time spent in the queue. The difference between endTime and
requestDequeueTime is the pure runtime, the difference between endTime
and acceptTime is the totalTime. As a rough approximation the time
between requestDequeueTime and acceptTime is pretty much influenced by
the server setup, and the runtime by the application. I used the term
"approximation" since the runtime of certain other requests influences
the queuing time, as we see in the following:

Consider a server with two running connection threads receiving 6
requests, where requests 2-5 are received in a very short time. The
first three requests are directly assign to connection threads, have
fromQueue == 0. These have queuing times between 88 and 110 micro
seconds, which includes signal sending/receiving, thread change, and
initial setup in the connection thread. The runtimes for these requests
are pretty bad, in the range of 0.24 to 3.8 seconds elapsed time.

[1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.000000 queue 0.000110 run 0.637781 total 0.637891
[2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.000000 queue 0.000090 run 0.245030 total 0.245120
[3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.000000 queue 0.000088 run 0.432421 total 0.432509
[4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.000000 queue 0.244246 run 0.249208 total 0.493454
[5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.000000 queue 0.431545 run 3.713331 total 4.144876
[6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.000000 queue 0.480382 run 3.799818 total 4.280200

Requests [4, 5, 6] are queued, and have queuing times between 0.2 and
0.5 seconds. The queuing times are pretty much the runtimes of [2, 3,
4], therefore the runtime determines the queuing time. For example, the
totalTime of request [4] was 0.493454 secs, half of the time it was
waiting in the queue. Request [4] can consider itself happy that it was
not scheduled after [5] or [6], where its totalTime would have been
likely in the range of 4 secs (10 times slower). Low waiting times are
essential for good performance.

This example shows pretty well the importance of aync delivery
mechanisms like the writer thread or bgdelivery in OpenACS. A file being
delivered by the connection thread over a slow internet connection might
block later requests substantially (as in the cases above). This is even
more important for todays web-sites, where a single view might entail
60+ embedded requests for js, css, images, .... where it is not feasible
to defined hundreds of connection threads.

Before going further into detail i'll provide further introspection
mechanism to the experimental version.

- [ns_server stats] adding total waiting time
- [ns_conn queuetime] ... time spent in queue
- [ns_conn dequeue] ... time stamp when the req starts to actually
run (similar to [ns_conn start])

The first can be used for server monitoring, the next two for single
connection introspection. The queuetime can be useful for better
awareness and for optional output in the access log, and the dequeue
time-stamp for application level profiling as base for a difference with
current time.

Further wishes, suggestions, comments?

-gustaf neumann

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-19 11:30:28

On 18.11.12 20:34, Stephen Deasey wrote:
> On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann <ne...@wu...> wrote:
>> Here are some actual figures with a comparable number of requests:
>>
>> with minthreads==maxthreads==2
>>     requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415
>>
>> below are the previous values, competed by the number of queuing operations
>> and the rss size in MV
>>
>> with minthreads=2, create when queue >= 2
>>     requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466
>>
>> as anticipated, thread creations and cpu consumption went down, but the
>> number of queued requests (requests that could not be executed immediately)
>> increased significantly.
> I was thinking of the opposite: make min/max threads equal by
> increasing min threads. Requests would never stall in the queue,
> unlike the experiment you ran with max threads reduced to min threads.
on the site, we have maxthreads 10. so setting minthreads as 
well to 10 has the consequence of a larger memsize (and 
queued substantially reduced).
> But there's another benefit: unlike the dynamic scenario requests
> would also never stall in the queue when a new thread had to be
> started when min < max threads.
you are talking about naviserver before 4.99.4. Both, the 
version in the tip naviserver repository and the forked 
version provide already warmed up threads. The version on 
the main tip starts to listen to the wakup signals only once 
it is warmed up, the version in the fork adds a thread to 
the conn queue as well only after the startup is complete. 
So, in both cases there is no stall. In earlier version, 
this was as you describe.
> What is the down side to increasing min threads up to max threads?
higher memory consumption, maybe more open database 
connections, allocating resources which are not needed. The 
degree of wastefullness depends certainly on maxthreads. I 
would assume that for an admin carefully watching the server 
needs, setting minthreads==maxthreads to the "right value" 
can lead to slight improvements as long the load is rather 
constant over time.

>> Maybe the most significant benefit of a low maxthreads value is the reduced
>> memory consumption. On this machine we are using plain Tcl with its "zippy
>> malloc", which does not release memory (once allocated to its pool) back to
>> the OS. So, the measured memsize depends on the max number of threads with
>> tcl interps, especially with large blueprints (as in the case of OpenACS).
> Right: the max number of threads *ever*, not just currently. So by
> killing threads you don't reduce memory usage, but you do increase
> latency for some requests which have to wait for a thread+interp to be
> created.
not really with the warm-up feature.
> Is it convenient to measure latency distribution (not just average)? I
> guess not: we record conn.startTime when a connection is taken out of
> the queue and passed to a conn thread, but we don't record the time
> when a socket was accepted.
we could record the socket accept time and measure the 
difference until the start of the connection runtime; when 
we output this to the accesslog (like logreqtime) we could 
run whatever statistics we want.
> Actually, managing request latency is another area we don't handle so
> well. You can influence it by adjusting the OS listen socket accept
> queue length, you can adjust the length of the naviserver queue, and
> with the proposed change here you can change how aggressive new
> threads are created to process requests in the queue. But queue-depth
> is a roundabout way of specifying milliseconds of latency. And not
> just round-about but inherently imprecise as different URLs are going
> to require different amounts of time to complete, and which URLs are
> requested is a function of current traffic. If instead of queue size
> you could specify a target latency then we could maybe do smarter
> things with the queue, such as pull requests off the back of the queue
> which have been waiting longer than the target latency, making room
> for fresh requests on the front of the queue.
The idea of controlling the number of running threads via 
queuing latency is interesting, but i have to look into the 
details before i can comment on this.

-gustaf

Re: [naviserver-devel] naviserver with connection thread queue

From: Stephen D. <sd...@gm...> - 2012-11-18 19:35:19

On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann <ne...@wu...> wrote:
> On 14.11.12 09:51, Gustaf Neumann wrote:
>
> On 13.11.12 15:02, Stephen Deasey wrote:
>
> On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann <ne...@wu...> wrote:
>
> minthreads = 2
>
> creating threads, when idle == 0
>     10468 requests, connthreads 267
>     total cputime 00:10:32
>
> creating threads, when queue >= 5
>     requests 10104 connthreads 27
>     total cputime 00:06:14
>
> What if you set minthreads == maxthreads?
>
> The number of thread create operations will go further down.
>
> Here are some actual figures with a comparable number of requests:
>
> with minthreads==maxthreads==2
>    requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415
>
> below are the previous values, competed by the number of queuing operations
> and the rss size in MV
>
> with minthreads=2, create when queue >= 2
>    requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466
>
> as anticipated, thread creations and cpu consumption went down, but the
> number of queued requests (requests that could not be executed immediately)
> increased significantly.

I was thinking of the opposite: make min/max threads equal by
increasing min threads. Requests would never stall in the queue,
unlike the experiment you ran with max threads reduced to min threads.
But there's another benefit: unlike the dynamic scenario requests
would also never stall in the queue when a new thread had to be
started when min < max threads.

What is the down side to increasing min threads up to max threads?

> Maybe the most significant benefit of a low maxthreads value is the reduced
> memory consumption. On this machine we are using plain Tcl with its "zippy
> malloc", which does not release memory (once allocated to its pool) back to
> the OS. So, the measured memsize depends on the max number of threads with
> tcl interps, especially with large blueprints (as in the case of OpenACS).

Right: the max number of threads *ever*, not just currently. So by
killing threads you don't reduce memory usage, but you do increase
latency for some requests which have to wait for a thread+interp to be
created.

Is it convenient to measure latency distribution (not just average)? I
guess not: we record conn.startTime when a connection is taken out of
the queue and passed to a conn thread, but we don't record the time
when a socket was accepted.

Actually, managing request latency is another area we don't handle so
well. You can influence it by adjusting the OS listen socket accept
queue length, you can adjust the length of the naviserver queue, and
with the proposed change here you can change how aggressive new
threads are created to process requests in the queue. But queue-depth
is a roundabout way of specifying milliseconds of latency. And not
just round-about but inherently imprecise as different URLs are going
to require different amounts of time to complete, and which URLs are
requested is a function of current traffic. If instead of queue size
you could specify a target latency then we could maybe do smarter
things with the queue, such as pull requests off the back of the queue
which have been waiting longer than the target latency, making room
for fresh requests on the front of the queue.

[naviserver-devel] WebDav and Naviserver

From: Agustin L. <Agu...@uv...> - 2012-11-18 15:32:29

Hello all!


I am using Naviserver with our installation a pair of weeks.
All Ok. The memory consumption has been reduced from 8 to 5 GB
in each node. Almost no crashes reboots by memory problems.
:-)

I am having some problems with WebDav. Webdav should work equally
with both (Aolserver / Naviserver) with the same code?

Regards,
Agustin

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-18 13:22:12

On 14.11.12 09:51, Gustaf Neumann wrote:
> On 13.11.12 15:02, Stephen Deasey wrote:
>> On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann <ne...@wu...> wrote:
>>> minthreads = 2
>>>
>>> creating threads, when idle == 0
>>>      10468 requests, connthreads 267
>>>      total cputime 00:10:32
>>>
>>> creating threads, when queue >= 5
>>>      requests 10104 connthreads 27
>>>      total cputime 00:06:14
>> What if you set minthreads == maxthreads?
> The number of thread create operations will go further down.
Here are some actual figures with a comparable number of 
requests:

with minthreads==maxthreads==2
    requests 10182 queued 2695 connthreads 11 cpu 00:05:27 
rss 415

below are the previous values, competed by the number of 
queuing operations and the rss size in MV

with minthreads=2, create when queue >= 2
    requests 10104 queued 1584 connthreads 27 cpu 00:06:14 
rss 466

as anticipated, thread creations and cpu consumption went 
down, but the number of queued requests (requests that could 
not be executed immediately) increased significantly.

Maybe the most significant benefit of a low maxthreads value 
is the reduced memory consumption. On this machine we are 
using plain Tcl with its "zippy malloc", which does not 
release memory (once allocated to its pool) back to the OS. 
So, the measured memsize depends on the max number of 
threads with tcl interps, especially with large blueprints 
(as in the case of OpenACS). This situation can be improved 
with e.g. jemalloc (what we are using in production, which 
requires a modified tcl), but after about 2 or 3 days 
running a server the rss sizes are very similar (most likely 
due to fragmentation).

-gustaf neumann
> When running already at minthreads, the connection thread
> timeout is ignored (otherwise there would be a high number
> of thread create operations just after the timeout expires
> to ensure minthreads running connection threads). With
> connsperthread == 1000, there will be about 10 thread create
> operations for 10000 requests (not counting the 2 initial
> create operation during startup for minthreads == 2). So,
> the cpu consumption will be lower, but the server would not
> scale, when the requests frequency would require more
> connection threads. Furthermore, there will be most likely
> more requests put into the queue instead of being served
> immediately.
>
> When we assume, that with minthreads == maxthreads == 2
> there won't be more than say 20 requests queued, a similar
> effect could be achieved by allowing additional thread
> creations for more than 20 requests in the queue. Or even
> more conservative, allowing thread creations only when the
> request queue is completely full (setting the low water mark
> to 100%) would as well be better than minthreads ==
> maxthreads, since the server will at least start to create
> additional threads in this rather hopeless situation, where
> with minthreads == maxthreads, it won't.
>
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> naviserver-devel mailing list
> nav...@li...
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-14 08:52:04

On 13.11.12 15:02, Stephen Deasey wrote:
> On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann <ne...@wu...> wrote:
>> minthreads = 2
>>
>> creating threads, when idle == 0
>>     10468 requests, connthreads 267
>>     total cputime 00:10:32
>>
>> creating threads, when queue >= 5
>>     requests 10104 connthreads 27
>>     total cputime 00:06:14
> What if you set minthreads == maxthreads?
The number of thread create operations will go further down. 
When running already at minthreads, the connection thread 
timeout is ignored (otherwise there would be a high number 
of thread create operations just after the timeout expires 
to ensure minthreads running connection threads). With 
connsperthread == 1000, there will be about 10 thread create 
operations for 10000 requests (not counting the 2 initial 
create operation during startup for minthreads == 2). So, 
the cpu consumption will be lower, but the server would not 
scale, when the requests frequency would require more 
connection threads. Furthermore, there will be most likely 
more requests put into the queue instead of being served 
immediately.

When we assume, that with minthreads == maxthreads == 2 
there won't be more than say 20 requests queued, a similar 
effect could be achieved by allowing additional thread 
creations for more than 20 requests in the queue. Or even 
more conservative, allowing thread creations only when the 
request queue is completely full (setting the low water mark 
to 100%) would as well be better than minthreads == 
maxthreads, since the server will at least start to create 
additional threads in this rather hopeless situation, where 
with minthreads == maxthreads, it won't.

Re: [naviserver-devel] naviserver with connection thread queue

From: Stephen D. <sd...@gm...> - 2012-11-13 14:03:10

On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann <ne...@wu...> wrote:
>
> minthreads = 2
>
> creating threads, when idle == 0
>    10468 requests, connthreads 267
>    total cputime 00:10:32
>
> creating threads, when queue >= 5
>    requests 10104 connthreads 27
>    total cputime 00:06:14

What if you set minthreads == maxthreads?

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-13 11:18:47

Dear all,

again some update:

The mechanism sketched below works now as well in the regression test. 
There is now a backchannel in place that lets conn threads notify the 
driver to check the liveliness of the server. This backchannel makes as 
well the timeout based  liveliness checking obsolete. By using the 
lowwatermark parameter to control thread creation, the resource 
consumption went down significantly without sacrificing speed for this 
setup.

Here is some data from next-scripting.org, which is a rather idle site 
with real world traffic (including bots etc.). The server has defined 
minthreads = 2, is running 2 drivers (nssock + nsssl) and uses a writer 
thread.

before (creating threads, when idle == 0, running server for 2 days)
   10468 requests, connthreads 267
   total cputime 00:10:32

new (creating threads, when queue >= 5, running server for 2 days)
   requests 10104 connthreads 27
   total cputime 00:06:14

One can see, that the number of create operations for connection threads 
went down by a factor of 10 (from 267 to 27), and that the cpu 
consumption was reduced by about 40% (thread initialization costs 0.64 
secs in this configuration). One can get a behavior similar to idle==0 
by setting the low water mark to 0.

The shutdown mechanism is now adjusted to the new infrastructure 
(connection threads have their own condition variable, so one cannot use 
the old broadcast to all conn threads anymore).

-gustaf neumann


Am 07.11.12 02:54, schrieb Gustaf Neumann:
> Some update: after some more testing with the new code, i still think, 
> the version is promising, but needs a few tweaks. I have started to 
> address the thread creation.
>
> To sum up the thread creation behavior/configuration of naviserver-tip:
>
>   - minthreads (try to keep at least minthreads threads idle)
>   - spread (fight against thread mass extinction due to round robin)
>   - threadtimeout (useless due to round robin)
>   - connsperthread (the only parameter effectively controlling the 
> lifespan of an conn thread)
>   - maxconnections (controls maximum number of connections in the 
> waiting queue, including running threads)
>   - concurrentcreatethreshold (percentage of waiting queue full, when 
> to create threads in concurrently)
>
> Due to the policy of keeping at least minthreads idle, threads are 
> preallocated when the load is high, the number of threads never falls 
> under minthreads by construct. Threads stop mostly due to connsperthread.
>
> Naviserver with thread queue (fork)
>
>   - minthreads (try to keep at least minthreads threads idle)
>   - threadtimeout (works effectively, default 120 secs)
>   - connsperthread (as before, just not varied via spread)
>   - maxconnections (as before; use maybe "queuesize" instead)
>   - lowwatermark (new)
>   - highwatermark (was concurrentcreatethreshold)
>
> The parameter "spread" is already deleted, since the enqueueing takes 
> care for a certain distribution, at least, when several threads are 
> created. Threads are deleted often before connsperthread due to the 
> timeout. Experiments show furthermore, that the rather agressive 
> preallocation policy with minthreads idle threads causes now much more 
> thread destroy and thread create operations than before. With with 
> OpenACS, thread creation is compute-intense (about 1 sec).
>
> In the experimental version, connections are only queued when no 
> connection thread is available (the tip version places every 
> connection into the queue). Queueing happens with "bulky" requests, 
> when e.g. a view causes a bunch (on average 5, often 10+, sometimes 
> 50+) of requests for embedded resources (style files, javascript, 
> images). It seems that permitting a few queued requests is often a 
> good idea, since the connection threads will pick these up typically 
> very quickly.
>
> To make the aggressiveness of the thread creation policy better 
> configurable, the experimental version uses for this purpose solely 
> the number of queued requests based on two parameters:
>
>   - lowwatermark (if the actual queue size is below this value, don't 
> try to create threads; default 5%)
>   - highwatermark (if the actual queue size is above this value, allow 
> parallel thread creates; default 80%)
>
> To increase the aggressiveness, one could set lowwatermark to e.g. 0, 
> causing thread-creates, whenever a connection is queued. Increasing 
> the lowwatermark reduces the willingness to create new threads. The 
> highwatermark might be useful for benchmark situations, where the 
> queue is filled up quickly.
>
> The default values seems to work quite well, it is used currently on 
> http://next-scripting.org. However we still need some more experiments 
> on different sites to get a better understanding.
>
> hmm final comment: for the regression test, i had to add the policy to 
> create threads, when all connection threads are busy. The config file 
> of the regression test uses connsperthread 0 (which is the default, 
> but not very good as such), causing the exit every connection thread 
> to exit after every threads. So, when the request comes in, that we 
> have a thread busy, but nothing queued. So, there would not be the 
> need to create a new thread. However, when the conn thread exists, the 
> single request would not be processed.
>
> So, much more testing is needed.
> -gustaf neumann
>
> Am 01.11.12 20:17, schrieb Gustaf Neumann:
>> Dear all,
>>
>> There is now a version on bitbucket, which works quite nice
>> and stable, as far i can tell. I have split up the rather
>> coarse lock of all pools and introduced finer locks for
>> waiting queue (wqueue) and thread queue (tqueue) per pool.
>> The changes lead to significant finer lock granularity and
>> improve scalability.
>>
>> I have tested this new version with a synthetic load of 120
>> requests per seconds, some slower requests and some faster
>> ones, and it appears to be pretty stable. This load keeps
>> about 20 connection threads quite busy on my home machine.
>> The contention of the new locks is very little: on this test
>> we saw 12 busy locks on 217.000 locks on the waiting queue,
>> and 9 busy locks out of 83.000 locks on the thread queue.
>> These measures are much better than in current naviserver,
>> which has on the same test on the queue 248.000 locks with
>> 190 busy ones. The total waiting time for locks is reduced
>> by a factor of 10. One has to add, that it was not so bad
>> before either. The benefit will be larger when multiple
>> pools are used.
>>
>> Finally i think, the code is clearer than before, where the
>> lock duration was quite tricky to determine.
>>
>> opinions?
>> -gustaf neumann
>>
>> PS: For the changes, see:
>> https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets
>>
>> PS2: have not addressed the server exit signaling yet.
>>
>> On 29.10.12 13:41, Gustaf Neumann wrote:
>>> A version of this is in the following fork:
>>>
>>> https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets
>>>
>>> So far, the competition on the pool mutex is quite high, but
>>> i think, it can be improved. Currently primarily the pool
>>> mutex is used for conn thread life-cycle management, and it
>>> is needed from the main/drivers/spoolers as well from the
>>> connection threads to update the idle/running/.. counters
>>> needed for controlling thread creation etc. Differentiating
>>> these mutexes should help.
>>>
>>> i have not addressed the termination signaling, but that's
>>> rather simple.
>>>
>>> -gustaf neumann
>>>
>>> On 28.10.12 03:08, Gustaf Neumann wrote:
>>>> i've just implemented lightweight version of the above (just
>>>> a few lines of code) by extending the connThread Arg
>>>> structure; ....

Re: [naviserver-devel] naviserver with connection thread queue

From: Gustaf N. <ne...@wu...> - 2012-11-07 01:55:44

Some update: after some more testing with the new code, i still think, 
the version is promising, but needs a few tweaks. I have started to 
address the thread creation.

To sum up the thread creation behavior/configuration of naviserver-tip:

   - minthreads (try to keep at least minthreads threads idle)
   - spread (fight against thread mass extinction due to round robin)
   - threadtimeout (useless due to round robin)
   - connsperthread (the only parameter effectively controlling the 
lifespan of an conn thread)
   - maxconnections (controls maximum number of connections in the 
waiting queue, including running threads)
   - concurrentcreatethreshold (percentage of waiting queue full, when 
to create threads in concurrently)

Due to the policy of keeping at least minthreads idle, threads are 
preallocated when the load is high, the number of threads never falls 
under minthreads by construct. Threads stop mostly due to connsperthread.

Naviserver with thread queue (fork)

   - minthreads (try to keep at least minthreads threads idle)
   - threadtimeout (works effectively, default 120 secs)
   - connsperthread (as before, just not varied via spread)
   - maxconnections (as before; use maybe "queuesize" instead)
   - lowwatermark (new)
   - highwatermark (was concurrentcreatethreshold)

The parameter "spread" is already deleted, since the enqueueing takes 
care for a certain distribution, at least, when several threads are 
created. Threads are deleted often before connsperthread due to the 
timeout. Experiments show furthermore, that the rather agressive 
preallocation policy with minthreads idle threads causes now much more 
thread destroy and thread create operations than before. With with 
OpenACS, thread creation is compute-intense (about 1 sec).

In the experimental version, connections are only queued when no 
connection thread is available (the tip version places every connection 
into the queue). Queueing happens with "bulky" requests, when e.g. a 
view causes a bunch (on average 5, often 10+, sometimes 50+) of requests 
for embedded resources (style files, javascript, images). It seems that 
permitting a few queued requests is often a good idea, since the 
connection threads will pick these up typically very quickly.

To make the aggressiveness of the thread creation policy better 
configurable, the experimental version uses for this purpose solely the 
number of queued requests based on two parameters:

   - lowwatermark (if the actual queue size is below this value, don't 
try to create threads; default 5%)
   - highwatermark (if the actual queue size is above this value, allow 
parallel thread creates; default 80%)

To increase the aggressiveness, one could set lowwatermark to e.g. 0, 
causing thread-creates, whenever a connection is queued. Increasing the 
lowwatermark reduces the willingness to create new threads. The 
highwatermark might be useful for benchmark situations, where the queue 
is filled up quickly.

The default values seems to work quite well, it is used currently on 
http://next-scripting.org. However we still need some more experiments 
on different sites to get a better understanding.

hmm final comment: for the regression test, i had to add the policy to 
create threads, when all connection threads are busy. The config file of 
the regression test uses connsperthread 0 (which is the default, but not 
very good as such), causing the exit every connection thread to exit 
after every threads. So, when the request comes in, that we have a 
thread busy, but nothing queued. So, there would not be the need to 
create a new thread. However, when the conn thread exists, the single 
request would not be processed.

So, much more testing is needed.
-gustaf neumann

Am 01.11.12 20:17, schrieb Gustaf Neumann:
> Dear all,
>
> There is now a version on bitbucket, which works quite nice
> and stable, as far i can tell. I have split up the rather
> coarse lock of all pools and introduced finer locks for
> waiting queue (wqueue) and thread queue (tqueue) per pool.
> The changes lead to significant finer lock granularity and
> improve scalability.
>
> I have tested this new version with a synthetic load of 120
> requests per seconds, some slower requests and some faster
> ones, and it appears to be pretty stable. This load keeps
> about 20 connection threads quite busy on my home machine.
> The contention of the new locks is very little: on this test
> we saw 12 busy locks on 217.000 locks on the waiting queue,
> and 9 busy locks out of 83.000 locks on the thread queue.
> These measures are much better than in current naviserver,
> which has on the same test on the queue 248.000 locks with
> 190 busy ones. The total waiting time for locks is reduced
> by a factor of 10. One has to add, that it was not so bad
> before either. The benefit will be larger when multiple
> pools are used.
>
> Finally i think, the code is clearer than before, where the
> lock duration was quite tricky to determine.
>
> opinions?
> -gustaf neumann
>
> PS: For the changes, see:
> https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets
>
> PS2: have not addressed the server exit signaling yet.
>
> On 29.10.12 13:41, Gustaf Neumann wrote:
>> A version of this is in the following fork:
>>
>> https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets
>>
>> So far, the competition on the pool mutex is quite high, but
>> i think, it can be improved. Currently primarily the pool
>> mutex is used for conn thread life-cycle management, and it
>> is needed from the main/drivers/spoolers as well from the
>> connection threads to update the idle/running/.. counters
>> needed for controlling thread creation etc. Differentiating
>> these mutexes should help.
>>
>> i have not addressed the termination signaling, but that's
>> rather simple.
>>
>> -gustaf neumann
>>
>> On 28.10.12 03:08, Gustaf Neumann wrote:
>>> i've just implemented lightweight version of the above (just
>>> a few lines of code) by extending the connThread Arg
>>> structure; ....
>> ------------------------------------------------------------------------------
>> The Windows 8 Center - In partnership with Sourceforge
>> Your idea - your app - 30 days.
>> Get started!
>> http://windows8center.sourceforge.net/
>> what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
>> _______________________________________________
>> naviserver-devel mailing list
>> nav...@li...
>> https://lists.sourceforge.net/lists/listinfo/naviserver-devel
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> _______________________________________________
> naviserver-devel mailing list
> nav...@li...
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel

-- 
Univ.Prof. Dr. Gustaf Neumann
Institute of Information Systems and New Media
WU Vienna
Augasse 2-6, A-1090 Vienna, AUSTRIA

Re: [naviserver-devel] stacksize on linux

From: Jeff R. <dv...@di...> - 2012-11-07 01:25:46

Ok, this must be something weird about my build.  I'll ignore it for now.

Thanks
-J

Gustaf Neumann wrote:
> Jeff, are you sure, you are setting the parameter correctly in you
> config file? It seems to work for me (Linux FC 17).
>
> clone(child_stack=0x7f912f2b6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f912f2b79d0, tls=0x7f912f2b7700, child_tidptr=0x7f912f2b79d0) = 11862
> mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9116212000
> mprotect(0x7f9116212000, 4096, PROT_NONE) = 0
> clone(child_stack=0x7f9116231fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f91162329d0, tls=0x7f9116232700, child_tidptr=0x7f91162329d0) = 11863
> mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9115f94000
> mprotect(0x7f9115f94000, 4096, PROT_NONE) = 0
> clone(child_stack=0x7f9115fb3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f9115fb49d0, tls=0x7f9115fb4700, child_tidptr=0x7f9115fb49d0) = 11864
>
> -gustaf
>
> Am 05.11.12 22:56, schrieb Jeff Rogers:
>> Hi all,
>>
>> I've been trying to track down why my naviserver processes have such a
>> large memory footprint compared to similarly configured aolserver
>> processes.  One culprit I already found is the pre-allocation of
>> per-conn compression streams (the fixes to this have already been
>> mentioned, tho not yet committed).
>>
>> The other culprit is strange.  It appears that naviserver is completely
>> ignoring the stacksize parameter, and starting all threads with the
>> default size, which on linux is the maximum stack size at process start
>> time, 8Mb by default.    This isn't a case of the wrong parameter being
>> set, it appears to be a behavior of the nsthread library.  nsthreadtest
>> shows the same behavior, even after it calls
>>
>>     Ns_ThreadStackSize(81920);
>>
>> I've added a debug printf to pthread.c to double-check what is being
>> passed as the stacksize parameter, and it's the expected value in all
>> cases, it's just not used for some reason.  I've been using strace to
>> peek at what's going on, and nsthreadtest in naviserver looks like
>>
>> mmap2(NULL, 8392704, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb4bfa000
>> mprotect(0xb4bfa000, 4096, PROT_NONE)   = 0
>> clone(child_stack=0xb53fa494, ...
>>
>> while in aolserver it's
>>
>> mmap2(NULL, 90112, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb748b000
>> mprotect(0xb748b000, 4096, PROT_NONE)   = 0
>> clone(child_stack=0xb74a0494, ...
>>
>>
>> As you can see it's an ~8M vs 90k (81920 + guard pages) allocation.
>>
>> Does anyone else on linux see the same behavior, or should I be looking
>> at local differences?
>>
>> Second, the naviserver and aolserver nsthread libraries aren't
>> identical, but they aren't very different.  Any ideas what would be
>> causing this behavior?
>>
>> -J
>>
>> ------------------------------------------------------------------------------
>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
>> Stay in control, update software, and manage PCs from one command center
>> Diagnose problems and improve visibility into emerging IT issues
>> Automate, monitor and manage. Do more in less time with Central
>> http://p.sf.net/sfu/logmein12331_d2d
>> _______________________________________________
>> naviserver-devel mailing list
>> nav...@li...
>> https://lists.sourceforge.net/lists/listinfo/naviserver-devel
>
>

Re: [naviserver-devel] stacksize on linux

From: Gustaf N. <ne...@wu...> - 2012-11-06 20:05:03

Jeff, are you sure, you are setting the parameter correctly in you 
config file? It seems to work for me (Linux FC 17).

clone(child_stack=0x7f912f2b6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f912f2b79d0, tls=0x7f912f2b7700, child_tidptr=0x7f912f2b79d0) = 11862
mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9116212000
mprotect(0x7f9116212000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f9116231fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f91162329d0, tls=0x7f9116232700, child_tidptr=0x7f91162329d0) = 11863
mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9115f94000
mprotect(0x7f9115f94000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f9115fb3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f9115fb49d0, tls=0x7f9115fb4700, child_tidptr=0x7f9115fb49d0) = 11864

-gustaf

Am 05.11.12 22:56, schrieb Jeff Rogers:
> Hi all,
>
> I've been trying to track down why my naviserver processes have such a
> large memory footprint compared to similarly configured aolserver
> processes.  One culprit I already found is the pre-allocation of
> per-conn compression streams (the fixes to this have already been
> mentioned, tho not yet committed).
>
> The other culprit is strange.  It appears that naviserver is completely
> ignoring the stacksize parameter, and starting all threads with the
> default size, which on linux is the maximum stack size at process start
> time, 8Mb by default.    This isn't a case of the wrong parameter being
> set, it appears to be a behavior of the nsthread library.  nsthreadtest
> shows the same behavior, even after it calls
>
>    Ns_ThreadStackSize(81920);
>
> I've added a debug printf to pthread.c to double-check what is being
> passed as the stacksize parameter, and it's the expected value in all
> cases, it's just not used for some reason.  I've been using strace to
> peek at what's going on, and nsthreadtest in naviserver looks like
>
> mmap2(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb4bfa000
> mprotect(0xb4bfa000, 4096, PROT_NONE)   = 0
> clone(child_stack=0xb53fa494, ...
>
> while in aolserver it's
>
> mmap2(NULL, 90112, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb748b000
> mprotect(0xb748b000, 4096, PROT_NONE)   = 0
> clone(child_stack=0xb74a0494, ...
>
>
> As you can see it's an ~8M vs 90k (81920 + guard pages) allocation.
>
> Does anyone else on linux see the same behavior, or should I be looking
> at local differences?
>
> Second, the naviserver and aolserver nsthread libraries aren't
> identical, but they aren't very different.  Any ideas what would be
> causing this behavior?
>
> -J
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> naviserver-devel mailing list
> nav...@li...
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel


-- 
Univ.Prof. Dr. Gustaf Neumann
Institute of Information Systems and New Media
WU Vienna
Augasse 2-6, A-1090 Vienna, AUSTRIA

[naviserver-devel] stacksize on linux

From: Jeff R. <dv...@di...> - 2012-11-05 21:56:45

Hi all,

I've been trying to track down why my naviserver processes have such a 
large memory footprint compared to similarly configured aolserver 
processes.  One culprit I already found is the pre-allocation of 
per-conn compression streams (the fixes to this have already been 
mentioned, tho not yet committed).

The other culprit is strange.  It appears that naviserver is completely 
ignoring the stacksize parameter, and starting all threads with the 
default size, which on linux is the maximum stack size at process start 
time, 8Mb by default.    This isn't a case of the wrong parameter being 
set, it appears to be a behavior of the nsthread library.  nsthreadtest 
shows the same behavior, even after it calls

  Ns_ThreadStackSize(81920);

I've added a debug printf to pthread.c to double-check what is being 
passed as the stacksize parameter, and it's the expected value in all 
cases, it's just not used for some reason.  I've been using strace to 
peek at what's going on, and nsthreadtest in naviserver looks like

mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb4bfa000
mprotect(0xb4bfa000, 4096, PROT_NONE)   = 0
clone(child_stack=0xb53fa494, ...

while in aolserver it's

mmap2(NULL, 90112, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb748b000
mprotect(0xb748b000, 4096, PROT_NONE)   = 0
clone(child_stack=0xb74a0494, ...


As you can see it's an ~8M vs 90k (81920 + guard pages) allocation.

Does anyone else on linux see the same behavior, or should I be looking 
at local differences?

Second, the naviserver and aolserver nsthread libraries aren't 
identical, but they aren't very different.  Any ideas what would be 
causing this behavior?

-J

Re: [naviserver-devel] ns_return* documentation and ns_register_* "default" method

From: Gustaf N. <ne...@wu...> - 2012-11-02 12:11:19

On 02.11.12 12:43, Stephen Deasey wrote:
> On Thu, Nov 1, 2012 at 12:13 PM, Gustaf Neumann <ne...@wu...> wrote:
>> i think, that adding a command reference would be a nice
>> addition, and is most likely possible with reasonable amount
>> of work. The other alternative would be to split up the
>> documentation pages with multiple commands into single pages...
> The idea was to head in the other direction: consolidate similar
> commands into one page.
i fully agree that a contextual display of related commands 
is better.
> For example for the ns_return page there's a couple of paragraphs of
> explanation at the top that applies to all the commands, and by
> collecting them together it's easy to see how they're similar and how
> they differ. (maybe the 3 redirect commands should be split into their
> own page?)
>
> On the other hand, some times you just want a big list of all
> commands...  Hopefully dtplite can be coaxed into generating it.
dtplite is a tcl script that packages common behavior of the 
tcl doctool packages.  One can certainly try to extend 
dtplte or we can roll our own doc generator using these 
packages, but this requires some deeper diving into the 
doctool universe. In recent tcllibraries there are besides 
the doctool oacages the doctools v2 package, which does not 
make the best path more obvious, since dtplite does not use 
v2 (to my shallow understanding).

-gustaf neumann

Re: [naviserver-devel] ns_return* documentation and ns_register_* "default" method

From: Stephen D. <sd...@gm...> - 2012-11-02 11:44:11

On Thu, Nov 1, 2012 at 12:13 PM, Gustaf Neumann <ne...@wu...> wrote:
>
> i think, that adding a command reference would be a nice
> addition, and is most likely possible with reasonable amount
> of work. The other alternative would be to split up the
> documentation pages with multiple commands into single pages...

The idea was to head in the other direction: consolidate similar
commands into one page.

For example for the ns_return page there's a couple of paragraphs of
explanation at the top that applies to all the commands, and by
collecting them together it's easy to see how they're similar and how
they differ. (maybe the 3 redirect commands should be split into their
own page?)

On the other hand, some times you just want a big list of all
commands...  Hopefully dtplite can be coaxed into generating it.

Re: [naviserver-devel] ns_return* documentation and ns_register_* "default" method

From: Stephen D. <sd...@gm...> - 2012-11-02 11:32:53

On Wed, Oct 31, 2012 at 12:54 PM, Stefan Sobernig
<ste...@wu...> wrote:
>
> I'd also appreciate a heads-up on a previous thread:
>
> http://www.mail-archive.com/nav...@li.../msg01811.html
>
> Was the idea of a "default" method ("*" or the like) for the
> ns_register_* cmd family ever implemented?

No.

Looks like some surgery to nsd/urlspace.c might be required to alter
the current order of lookup from: method, path, path for the url GET
/foo/bar to: path, path, method, so that if the path matches and the
method does not, a default method handler can be returned.

Re: [naviserver-devel] Changing pageroot

From: Stephen D. <sd...@gm...> - 2012-11-02 10:44:18

On Thu, Nov 1, 2012 at 4:03 PM, David Osborne <da...@qc...> wrote:
>
> Trying to work out if this is a config error or a code difference... any
> pointers?

In general, add this to your config:

ns_section "ns/parameters"
ns_param   logdev          true

And the log file will tell you exactly which config parameters are
being accessed, with which values and types, what the defaults are,
etc.  It's worthwhile doing this just because config files tend to
accumulate cruft after a while.

Re: [naviserver-devel] Changing pageroot

From: Gustaf N. <ne...@wu...> - 2012-11-02 10:04:12

Hi David, i think, this is intentional.
all config options for serving static files are under

ns/server/${servername}/fastpath

see:
https://bitbucket.org/naviserver/naviserver/src/d34fc2681cf1fb4dec89b7172d09a187c37efe32/sample-config.tcl.in?at=default#cl-301

-gustaf

On 02.11.12 10:22, David Osborne wrote:
> Thanks Jeff and Gustaf.
>
> I found that setting pagedir had no effect under the 
> server section (as is in the newly changed 
> tests/http-test-config.tcl file) eg:
>
> ns_section "ns/server/${server_name}"
> ns_param        pagedir             $pageroot
>
> But if I added a subsequent fastpath section and added it 
> there instead then it worked. Is this what would be expected?
>
> ns_section "ns/server/${server_name}/fastpath"
> ns_param        pagedir             $pageroot
>
>
> On 1 November 2012 19:51, Gustaf Neumann <ne...@wu... 
> <mailto:ne...@wu...>> wrote:
>
>
>     hmm, i found two occurrences of  "nsparam pageroot", and
>     changed these.
>
>     % egrep -R "ns_param *pageroot" *
>     openacs-config.tcl:    ns_param   pageroot      
>      $pageroot
>     tests/http-test-config.tcl:ns_param   pageroot  $pageroot
>
>     i have added as well "ns_info pagedir" as an alias for
>     "ns_info pageroot".
>     One should mark the latter as deprecated.
>
>     -gustaf
>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
>
>
> _______________________________________________
> naviserver-devel mailing list
> nav...@li...
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Changing pageroot

From: David O. <da...@qc...> - 2012-11-02 09:49:37

Thanks Jeff and Gustaf.

I found that setting pagedir had no effect under the server section (as is
in the newly changed tests/http-test-config.tcl file) eg:

ns_section "ns/server/${server_name}"
ns_param        pagedir             $pageroot

But if I added a subsequent fastpath section and added it there instead
then it worked. Is this what would be expected?

ns_section     "ns/server/${server_name}/fastpath"
ns_param        pagedir             $pageroot

On 1 November 2012 19:51, Gustaf Neumann <ne...@wu...> wrote:

>
> hmm, i found two occurrences of  "nsparam pageroot", and
> changed these.
>
> % egrep -R "ns_param *pageroot" *
> openacs-config.tcl:    ns_param   pageroot        $pageroot
> tests/http-test-config.tcl:ns_param   pageroot        $pageroot
>
> i have added as well "ns_info pagedir" as an alias for
> "ns_info pageroot".
> One should mark the latter as deprecated.
>
> -gustaf
>

14 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 60 61 62 63 64 .. 178 > >> (Page 62 of 178)