Thread: [curl-loader-devel] [PROF RFC 1/2] profiling of libcurl by curl-loader, steady state
Status: Alpha
Brought to you by:
coroberti
From: Robert I. <cor...@gm...> - 2007-04-16 09:35:06
|
CURL VERSION: snapshot 7.16.2-20070326 with a recently submitted patch for optimization of curl_multi_socket () by passing bitmask, named here curl_multi_socket_noselect (renamed to curl_multi_socket_action) HOST MACHINE and OPERATING SYSTEM: linux, debian with vanilla kernel 2.6.20.6 COMPILER: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) CURL CONFIGURATION: configure --prefix $(CURL_BUILD) \ --enable-thread \ --enable-ipv6 \ --with-random=/dev/urandom \ --with-ssl=/usr/include/openssl \ --enable-shared=no \ CFLAGS="-pg -g -O3 -ffast-math -finline-functions -funroll-all-loops \ -finline-limit=1000 -mmmx -msse -foptimize-sibling-calls -mtune=pentium4 \ -mcpu=pentium4" COMPILATION OPTIONS: are the same for the application, libcurl and libevent: -pg -g -O3 -ffast-math -finline-functions -funroll-all-loops \ -finline-limit=1000 -mmmx -msse -foptimize-sibling-calls -mtune=pentium4 \ -mcpu=pentium4 CURL_EASY_SETOPT: The curl_easy_setopt used are: CURLOPT_IPRESOLVE CURLOPT_INTERFACE CURLOPT_NOSIGNAL CURLOPT_URL with http://localhost/ACE-INSTALL.html (104K file) CURLOPT_CONNECTTIMEOUT 5 CURLOPT_FRESHCONNECT 0 CURLOPT_DNS_CACHE_TIMEOUT with -1 CURLOPT_VERBOSE with 1 CURLOPT_DEBUGFUNCTION with client_tracing_function CURLOPT_DEBUGDATA, cctx; CURLOPT_WRITEFUNCTION with do_nothing_write_function CURLOPT_SSL_VERIFYPEER 0 /* not used, http-url */ CURLOPT_SSL_VERIFYHOST 0 /* not used, http-url */ CURLOPT_PRIVATE, cctx CURLOPT_WRITEDATA, cctx - for hyper CURLOPT_ERRORBUFFER, bctx->error_buffer CURLOPT_FOLLOWLOCATION, 1 /* not used, no 3xx */ CURLOPT_UNRESTRICTED_AUTH, 1 /* not used, no auth */ CURLOPT_MAXREDIRS, -1 /* not used, no 3xx */ CURLOPT_USERAGENT, bctx->user_agent /*MSIE-6 like string */ CURLOPT_COOKIEFILE, "" /* not used, no cookies*/ SERVER: lighttpd-1.4.13 (ssl) from debian, the stock configuration file with the more strings added: server.event-handler = "linux-sysepoll" server.max-fds = 32000 FILE FETCHED: 104K static file CLIENT APPLICATION: curl-loader CLIENT LOADING MODES: Smooth mode - using curl_multi_perform based on poll () demultiplexing; Hyper mode - based on hipev.c example and using epoll () based demultiplexing with further curl_multi_socket_noselect () CLIENT LOAD: For each loading mode (smooth and hyper) two loads have been run each for the time 200 seconds: - 200 libcurl clients loading localhost server with 0 msec between the fetched file and the new request; - 200 libcurl clients localhost server with 1000 msec between the fetched file and the new request The clients have been started simulteneously without any gradual increase, however, due to the 200 seconds of the run, the impact of connection establishment is rather low. PROFILING METHODS: gprof - provides application level information - collected to files suffixed *.prof; oprofile- whole system information, including glibc and kernel functions, filtered for the relevant curl-loader application to files suffixed *.op; RESULTING FILES: Smooth mode experiments: smooth-mode.profiling.tar.bz2: smooth-0.prof - gprof res of smooth mode, 0 msec delay between requests; smooth-0.op - oprofile res of smooth mode, 0 msec delay between requests; smooth-1000.prof - gprof res of smooth mode, 1000 msec delay b requests; smooth-1000.op - oprofile res of smooth mode, 1000 msec delay b requests; Hyper mode experiments: hyper-mode.profiling.tar.bz2: hyper-0.prof - gprof res of hyper mode, 0 msec delay between requests; hyper-0.op - oprofile res of hyper mode, 0 msec delay between requests; hyper-1000.prof - gprof res of hyper mode, 1000 msec delay b requests; hyper-1000.op - oprofile res of hyper mode, 1000 msec delay b requests; The recommendations and analyses of curl gurus would be very much appreciated. Any your feedback would be highly valued. Profiling of the loads with high number of connection establishements / new clients per second is out of the mail and will be a subject of some next e-mail (hopefully). In meanwhile, very premature results (without engagement) 50 new clients per second are totally burning out my CPU with Curl_connect () mainly is the CPU eater. Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ................................................................... Navigare necesse est, vivere non est necesse ................................................................... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. |
From: Daniel S. <da...@ha...> - 2007-04-16 21:36:57
|
On Mon, 16 Apr 2007, Robert Iakobashvili wrote: > hyper-mode.profiling.tar.bz2: > hyper-0.prof - gprof res of hyper mode, 0 msec delay between requests; > hyper-0.op - oprofile res of hyper mode, 0 msec delay between requests; > hyper-1000.prof - gprof res of hyper mode, 1000 msec delay b requests; > hyper-1000.op - oprofile res of hyper mode, 1000 msec delay b requests; Thanks for doing this profile work! I had a first look at these just now and three things struck me: 1 - gettimeofday() certainly is called a lot and indeed takes a significant time. We should address this (somehow). 2 - dprintf_formatf() seems to take quite a lot of time as well and I find it strange. What kind of *printf() operations are made with your tests? How come this function is used so much? 3 - I figure we need more precision on the 5th and 6th columns (how long each specific time takes by itself) to better identify functions that seem to take too long. -- Commercial curl and libcurl Technical Support: http://haxx.se/curl.html |
From: Robert I. <cor...@gm...> - 2007-04-17 08:30:16
Attachments:
lprof.tar.bz2
|
On 4/17/07, Daniel Stenberg <da...@ha...> wrote: > > I had a first look at these just now and three things struck me: > > 1 - gettimeofday() certainly is called a lot and indeed takes a significant > time. We should address this (somehow). It is called in cycles in many cases. A pattern to solve it something like this : gettimeofday (); int cycle_counter = 0; for (i....) { DO_THE_JOB if (! (++cycle_counter % TIME_RECALCULATION_CYCLES_NUM)) { gettimeofday (); } An example from curl-loader here, look for get_tick_count (), that calls gettimeofday (): http://curl-loader.svn.sourceforge.net/viewvc/curl-loader/trunk/curl-loader/loader_smooth.c?view=markup Better, that somebody with a deep understanding of curl will do the job. > > 2 - dprintf_formatf() seems to take quite a lot of time as well and I find > it strange. What kind of *printf() operations are made with your tests? > How come this function is used so much? This is the question, that I was going to ask you. :) Y can dig inside into the callstasks for .prof files and look, where it is called. What I see is, e.g.: ----------------------------------------------- 0.00 0.00 1/239402 curl_msnprintf [67] 0.04 0.00 37300/239402 curl_mvaprintf [27] 0.08 0.01 79201/239402 curl_maprintf [21] 0.12 0.02 122900/239402 curl_mvsnprintf [16] [12] 19.9 0.24 0.03 239402 dprintf_formatf [12] 0.02 0.00 5696319/5696319 addbyter [34] 0.01 0.00 7782971/7782971 alloc_addbyter [45] Just a hypotheses below: We are using (as posted) options: CURLOPT_VERBOSE with 1 CURLOPT_DEBUGFUNCTION with client_tracing_function All these CURL_INFO, etc messages are somehow printed to the buffers delivered. May be there is something there, that may be improved? More details: The profiling was done in a non-verbose mode. Still client_tracing_function receives all messages, but prints to a logfile only errors (CURL_ERROR). There was no any errors during profiling. Y can see here the patch, that we use to filter errors and to enable for a user to segregate among really important errors and just infos: http://curl-loader.svn.sourceforge.net/viewvc/curl-loader/trunk/curl-loader/patches/curl-trace-info-error.patch?view=markup For our client_tracing_function you can look here: http://curl-loader.svn.sourceforge.net/viewvc/curl-loader/trunk/curl-loader/loader.c?view=markup > 3 - I figure we need more precision on the 5th and 6th columns (how long > each specific time takes by itself) to better identify functions that > seem to take too long. Y can get more understanding from the lower (call-stack) sections of *.prof outputs For event more detailed information, please, find code-type gprof output for the same expetiments with more details attached. -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ................................................................... Navigare necesse est, vivere non est necesse ................................................................... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. |
From: Daniel S. <da...@ha...> - 2007-04-17 12:08:47
|
On Tue, 17 Apr 2007, Robert Iakobashvili wrote: >> 1 - gettimeofday() certainly is called a lot and indeed takes a >> significant time. We should address this (somehow). > > It is called in cycles in many cases. > A pattern to solve it something like this : > > gettimeofday (); > int cycle_counter = 0; > > for (i....) { > > DO_THE_JOB > > if (! (++cycle_counter % TIME_RECALCULATION_CYCLES_NUM)) { > gettimeofday (); > } Yes, but that relies on the fact that the condition equals true often enough to keep the date value accurate enough. We just recently added code to libcurl that allows it to timeout with a a millisecond precision and that doesn't give us much room to speculate on. We need pretty accurate time info at many places in the code, and it isn't easy to come up with a generic approach for skipping system calls for this. Possibly we should simply detect that no millisecond timeout option is used, and then switch to a more "sloppy" approach. That could perhaps involve getting the time immediately when curl_multi_socket*() is called and then simply not get it again (i.e keep returning the same) unless one of the functions is called that we know is time consuming. >> 2 - dprintf_formatf() seems to take quite a lot of time as well >> How come this function is used so much? > > This is the question, that I was going to ask you. :) > Y can dig inside into the callstasks for .prof files and look, where > it is called. > What I see is, e.g.: > ----------------------------------------------- > 0.00 0.00 1/239402 curl_msnprintf [67] > 0.04 0.00 37300/239402 curl_mvaprintf [27] > 0.08 0.01 79201/239402 curl_maprintf [21] > 0.12 0.02 122900/239402 curl_mvsnprintf [16] > [12] 19.9 0.24 0.03 239402 dprintf_formatf [12] > 0.02 0.00 5696319/5696319 addbyter [34] > 0.01 0.00 7782971/7782971 alloc_addbyter [45] This is the snprintf() and similar calls within libcurl. > Just a hypotheses below: > We are using (as posted) options: > CURLOPT_VERBOSE with 1 > CURLOPT_DEBUGFUNCTION with client_tracing_function > > All these CURL_INFO, etc messages are somehow printed to the buffers > delivered. May be there is something there, that may be improved? Well, VERBOSE and DEBUGFUNCTION are for debugging and then we should be prepared to get extra function calls like this. > The profiling was done in a non-verbose mode. Then we should use these functions for only "core" creation of strings and not for debug/verbose messages. There are several things we can do to improve the situation: 1 - optimize buffer sizes for the typical sizes (for the aprintf() function) 2 - make sure we don't use *printf() functions when there's no need. Possibly use memcpy() etc instead. 3 - make sure that the *_formatf() is written to perform well, especially for the typical input/flags that we use (which I guess means %s). -- Commercial curl and libcurl Technical Support: http://haxx.se/curl.html |
From: Jamie L. <ja...@sh...> - 2007-04-17 22:54:07
|
Daniel Stenberg wrote: > Possibly we should simply detect that no millisecond timeout option is > used, and then switch to a more "sloppy" approach. That could perhaps > involve getting the time immediately when curl_multi_socket*() is called > and then simply not get it again (i.e keep returning the same) unless one > of the functions is called that we know is time consuming. As far as I know, the only way to remove lots gettimeofday() or equivalent calls, from code like this which must detect timeouts rapidly, is to use some kind of signal (SIGALRM etc.) to set a flag, or a thread which sleeps for the appropriate time, then wakes and sets a flag, or other equivalent things (like POSIX timer events). It's a noticable performance improvement in event-driven code with an unpredictable mixture of short and long processing in each event. (That's why a counting method doesn't work, as you note). Unfortunately, they're all difficult to use in a library which is used in lots of different kinds of program, and mustn't use process-global resources. But those techniques might work fine with the HiPer enhancements, as that kind of time optimisation can be part of the application, just as the select() call is, and integrate with the application's use of signals and threads. -- Jamie |
From: Robert I. <cor...@gm...> - 2007-04-17 13:23:32
|
On 4/17/07, Daniel Stenberg <da...@ha...> wrote: > >> 1 - gettimeofday() certainly is called a lot and indeed takes a > >> significant time. We should address this (somehow). > > > > It is called in cycles in many cases. > > A pattern to solve it something like this : > > > > gettimeofday (); > > int cycle_counter = 0; > > > > for (i....) { > > > > DO_THE_JOB > > > > if (! (++cycle_counter % TIME_RECALCULATION_CYCLES_NUM)) { > > gettimeofday (); > > } > > Yes, but We need pretty accurate time info at many places Yes, that is why a really good knowledge of curl is required to detect the right places for the trick. > Possibly we should simply detect that no millisecond timeout option is used, > and then switch to a more "sloppy" approach. That could perhaps involve > getting the time immediately when curl_multi_socket*() is called and then > simply not get it again (i.e keep returning the same) unless one of the > functions is called that we know is time consuming. Indeed. This is similar to what we are doing ++ correcting timestamp in the cases, where looping takes long cycles. I would call it sloppy++ approach :). If you will look through the code and indicate the appropriate places besides curl_multi_socket*(), I could try to cook some patch proposal. > >> 2 - dprintf_formatf() seems to take quite a lot of time as well > >> How come this function is used so much? > > ----------------------------------------------- > > 0.00 0.00 1/239402 curl_msnprintf [67] > > 0.04 0.00 37300/239402 curl_mvaprintf [27] > > 0.08 0.01 79201/239402 curl_maprintf [21] > > 0.12 0.02 122900/239402 curl_mvsnprintf [16] > > [12] 19.9 0.24 0.03 239402 dprintf_formatf [12] > > 0.02 0.00 5696319/5696319 addbyter [34] > > 0.01 0.00 7782971/7782971 alloc_addbyter [45] > > This is the snprintf() and similar calls within libcurl. > > > Just a hypotheses below: > > We are using (as posted) options: > > CURLOPT_VERBOSE with 1 > > CURLOPT_DEBUGFUNCTION with client_tracing_function > > > Well, VERBOSE and DEBUGFUNCTION are for debugging and then we should be > prepared to get extra function calls like this. > > > The profiling was done in a non-verbose mode. Sorry, for the poor explanation. libcurl library verbose mode was always on during these profiling experiments. It was curl-loader's verbose mode was off which has another meaning, namely: - don't print to the logfile file any messages besides having CURLINFO_ERROR level (our patch to split CURLINFO_TEXT to errors and indeed info text). The motivation for the patch: - curl-loader verbose mode prints tons of libcurl messages, and it is important to grep fast to errors, when treating GB files; - curl-loader non-verbose mode is normally used for heavy loads, when it is important to save CPU by filtering out from the logfile output everything and leaving only really important errors. By the way, do you think, that this patch may be of interest for somebody else besides curl-loader: http://curl-loader.svn.sourceforge.net/viewvc/curl-loader/trunk/curl-loader/patches/curl-trace-info-error.patch?view=markup Hopefully, this misunderstanding does not impact the analysis below: > Then we should use these functions for only "core" creation of strings and not > for debug/verbose messages. There are several things we can do to improve the > situation: > 1 - optimize buffer sizes for the typical sizes (for the aprintf() function) > 2 - make sure we don't use *printf() functions when there's no need. Possibly > use memcpy() etc instead. > 3 - make sure that the *_formatf() is written to perform well, especially for > the typical input/flags that we use (which I guess means %s). Thank you. Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ................................................................... Navigare necesse est, vivere non est necesse ................................................................... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. |
From: Daniel S. <da...@ha...> - 2007-04-17 20:57:08
|
On Tue, 17 Apr 2007, Robert Iakobashvili wrote: (Replying only to a part of the previous message - for now) > Sorry, for the poor explanation. libcurl library verbose mode was always on > during these profiling experiments. Ah, ok. That explains the extensive use of *printf(). In my view, it would be more interesting to do the profiling without verbose mode. > It was curl-loader's verbose mode was off which has another meaning, namely: > - don't print to the logfile file any messages besides having CURLINFO_ERROR > level (our patch to split CURLINFO_TEXT to errors and indeed info text). I don't see the need for that patch/CURLINFO_ERROR. It simply makes the output that would otherwise be stored in the CURLOPT_ERRORBUFFER get sent to the debug callback instead and there you filter out only that. I would rather recommend you using CURLOPT_ERRORBUFFER instead, as it would then also reduce the amount of *printf()s etc within libcurl. -- Commercial curl and libcurl Technical Support: http://haxx.se/curl.html |
From: Robert I. <cor...@gm...> - 2007-04-19 13:11:54
|
Daniel, Thank you for the detailed explanations. We will look into it deeper. On 4/19/07, Daniel Stenberg <da...@ha...> wrote: > On Thu, 19 Apr 2007, Robert Iakobashvili wrote: > > > Could you recommend a way (sharing?) do decrease memory consumption per > > client in libcurl? > > Using the multi interface it already shares connections and DNS cache > automatically and then there's not much more to share. The easy struct has a > bunch of variables for internal house-keeping and then there are the buffers: > > There are no less than three buffers allocated at CURL_MAX_WRITE_SIZE (16K), > so 16*3=48K is "wasted" there already. You can of course: > > 1 - change the code to alloc the required buffer(s) once they're needed, as > the upload buffer is rather pointless without uploads and I believe the > master_buffer can be skipped if not using pipeline... > > 2 - experiment with lowering the CURL_MAX_WRITE_SIZE value, but that might of > course also risk getting a slower throughput. > > -- > Commercial curl and libcurl Technical Support: http://haxx.se/curl.html > -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ................................................................... Navigare necesse est, vivere non est necesse ................................................................... http://curl-loader.sourceforge.net An open-source HTTP/S, FTP/S traffic generating, and web testing tool. |