Re: curl-loader https: low performance due to DNS query?
Status: Alpha
Brought to you by:
coroberti
From: Robert I. <cor...@gm...> - 2012-11-24 15:06:43
|
Hi Fred, I think you can address this suggestion and your observations to the lists of cares and curl-development. Take care, Thanks Robert On Sat, Nov 24, 2012 at 5:00 PM, Fred Huang <di...@gm...> wrote: > hi all, > > is it possible to use adns (http://www.chiark.greenend.org.uk/~ian/adns/) to > do dns resolving instead of libc-ares? all my test results shows that > lib-cares can do no more than 300 queries per second. however, using 'dig > +short -f hostnames.list &>/dev/null' against same dns server I can get more > 7500 queries per second. I think libc-ares is also limited by the system API > of gethostbyname. > > About ADNS lib: > Many clients for DNS resolution are coded poorly.Most UNIX systems > provide an implementation of gethostbyname (the DNS client API—application > program interface), which cannot concurrently handle multiple outstanding > requests. Therefore, the crawler cannot issue many resolution requests > together and poll at a later time for completion of individual requests, > which is critical for acceptable performance. Furthermore, if the > system-provided client is used, there is no way to distribute load among a > number of DNS servers. For all these reasons, many crawlers choose to > include their own custom client for DNS name resolution. The Mercator > crawler from Compaq System Research Center reduced the time spent in DNS > from as high as 87% to a modest 25% by implementing a custom client. The > ADNS asynchronous DNS client library is ideal for use in crawlers. > In spite of these optimizations, a large-scale crawler will spend a > substantial fraction of its network time not waiting for Http data transfer, > but for address resolution. For every hostname that has not been resolved > before (which happens frequently with crawlers), the local DNS may have to > go across many network hops to fill its cache for the first time. To overlap > this unavoidable delay with useful work, prefetching can be used. When a > page that has just been fetched is parsed, a stream of HREFs is extracted. > Right at this time, that is, even before any of the corresponding URLs are > fetched, hostnames are extracted from the HREF targets, and DNS resolution > requests are made to the caching server. The prefetching client is usually > implemented using UDP instead of TCP, and it does not wait for resolution > to be completed. The request serves only to fill the DNS cache so that > resolution will be fast when the page is actually needed later on. > > ===end > > > 2012/3/5 Fred Huang <di...@gm...> >> >> test 1: >> >> dns server: dnsmasq@127.0.0.1, 2,000,000 dns entry cache, resolve *.com to >> one IP address >> number of domain names in all https URLs: 780,000 >> number of client: 1000 >> CPU usage: 78% >> cpu% irq% sirq% sys% iowt% mem_used buf&cached >> 78.8 0.0 0.6 3.8 0.0 3640.9Mb 282.1Mb >> SSL TPS: 300 >> SSL throughput: 200Mbps >> >> # gprof /usr/bin/curl-loader gmon.out -p | head -50 >> Flat profile: >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 38.55 127.10 127.10 130975 0.00 0.00 >> Curl_hash_clean_with_criterium >> 24.63 208.32 81.22 1247528407 0.00 0.00 >> hostcache_timestamp_remove >> 13.08 251.44 43.12 2791988 0.00 0.00 Curl_hash_pick >> 7.30 275.51 24.07 352490 0.00 0.00 Curl_hash_add >> 6.01 295.31 19.80 410760770 0.00 0.00 >> Curl_str_key_compare >> 2.95 305.04 9.73 131015 0.00 0.00 create_conn >> 0.88 307.94 2.90 4841390 0.00 0.00 dprintf_formatf >> 0.74 310.38 2.44 136036 0.00 0.00 ConnectionStore >> 0.48 311.96 1.58 814604911 0.00 0.00 ares__is_list_empty >> 0.36 313.16 1.20 locking_function >> 0.27 314.06 0.90 261608 0.00 0.00 ares_cancel >> 0.22 314.79 0.73 127417 0.00 0.00 >> curl_multi_remove_handle >> 0.19 315.40 0.62 8906301 0.00 0.00 >> client_tracing_function >> 0.17 315.97 0.57 135305 0.00 0.00 Curl_num_addresses >> 0.17 316.54 0.57 181937907 0.00 0.00 addbyter >> 0.17 317.09 0.55 2374363 0.00 0.00 multi_runsingle >> 0.16 317.62 0.53 541954 0.00 0.00 ares__init_list_node >> 0.16 318.14 0.52 494412 0.00 0.00 >> curl_multi_socket_action >> 0.14 318.60 0.46 33578897 0.00 0.00 Curl_socket_check >> 0.13 319.04 0.44 102951052 0.00 0.00 Curl_raw_toupper >> 0.13 319.48 0.44 787558 0.00 0.00 Curl_readwrite >> 0.13 319.91 0.44 id_function >> 0.12 320.29 0.38 391390 0.00 0.00 Curl_hash_str >> 0.09 320.58 0.29 9216 0.00 0.00 curl_multi_perform >> 0.08 320.86 0.28 2350908 0.00 0.00 Curl_raw_equal >> 0.08 321.14 0.28 2019415 0.00 0.00 Curl_splay >> 0.08 321.40 0.26 520625 0.00 0.00 ossl_connect_common >> 0.08 321.65 0.25 3532992 0.00 0.00 Curl_pgrsUpdate >> 0.07 321.87 0.22 131778537 0.00 0.00 curl_strequal >> 0.05 322.05 0.18 8954118 0.00 0.00 scan_response >> 0.05 322.23 0.18 533644 0.00 0.00 ares_expand_name >> 0.05 322.40 0.17 2477345 0.00 0.00 fd_key_compare >> 0.05 322.57 0.17 2429135 0.00 0.00 Curl_infof >> 0.05 322.74 0.17 130966 0.00 0.00 singleipconnect >> 0.05 322.90 0.16 9292 0.00 0.00 >> curl_multi_socket_all >> 0.05 323.06 0.16 270137 0.00 0.00 ares__get_hostent >> 0.05 323.21 0.15 8943657 0.00 0.00 Curl_debug >> 0.04 323.35 0.14 262493 0.00 0.00 >> Curl_ssl_getsessionid >> 0.04 323.48 0.13 566006 0.00 0.00 socket_callback >> 0.04 323.61 0.13 33983864 0.00 0.00 curlx_tvdiff >> 0.04 323.74 0.13 132853 0.00 0.00 >> Curl_http_readwrite_headers >> 0.04 323.86 0.12 344032 0.00 0.00 epoll_del >> 0.04 323.98 0.12 223477 0.00 0.00 Curl_poll >> 0.04 324.10 0.12 8526280 0.00 0.00 Curl_raw_nequal >> 0.03 324.21 0.11 348839 0.00 0.00 event_del >> >> >> >> >> test 2: >> >> dns server: dnsmasq@127.0.0.1, 2,000,000 dns entry cache, resolve *.com to >> one IP address >> number of domain names in all https URLs: 1 >> number of client: 1000 >> CPU usage: 75% >> SSL TPS: 1300 >> SSL throughput: 700Mbps >> >> # gprof /usr/bin/curl-loader gmon.out -p | head -50 >> Flat profile: >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 8.98 2.45 2.45 5127312 0.00 0.00 dprintf_formatf >> 8.90 4.88 2.43 1292796503 0.00 0.00 >> ares__is_list_empty >> 6.12 6.55 1.67 322461 0.00 0.00 create_conn >> 4.58 7.80 1.25 419872 0.00 0.00 ares_cancel >> 4.10 8.92 1.12 2100162 0.00 0.00 Curl_readwrite >> 3.92 9.99 1.07 4527388 0.00 0.00 Curl_hash_pick >> 3.43 10.93 0.94 locking_function >> 3.04 11.76 0.83 909759 0.00 0.00 >> curl_multi_socket_action >> 3.00 12.58 0.82 97252 0.00 0.00 >> Curl_hash_clean_with_criterium >> 2.71 13.32 0.74 179704016 0.00 0.00 Curl_raw_toupper >> 2.68 14.05 0.73 4646524 0.00 0.00 multi_runsingle >> 2.44 14.71 0.67 13862992 0.00 0.00 >> client_tracing_function >> 2.42 15.37 0.66 319735 0.00 0.00 >> curl_multi_remove_handle >> 2.42 16.03 0.66 22 0.03 0.03 ares__init_list_node >> 1.94 16.56 0.53 9930244 0.00 0.00 >> hostcache_timestamp_remove >> 1.94 17.09 0.53 169320088 0.00 0.00 addbyter >> 1.80 17.58 0.49 id_function >> 1.36 17.95 0.37 7057146 0.00 0.00 Curl_pgrsUpdate >> 1.36 18.32 0.37 16835 0.00 0.00 >> curl_multi_socket_all >> 1.25 18.66 0.34 16719 0.00 0.00 curl_multi_perform >> 1.21 18.99 0.33 329403 0.00 0.00 >> Curl_http_readwrite_headers >> 1.17 19.31 0.32 4036624 0.00 0.00 Curl_splay >> 1.14 19.62 0.31 20838902 0.00 0.00 Curl_raw_nequal >> 0.82 19.85 0.23 2764123 0.00 0.00 Curl_infof >> 0.81 20.07 0.22 5933857 0.00 0.00 ossl_recv >> 0.77 20.28 0.21 323112 0.00 0.00 >> Curl_splayremovebyaddr >> 0.75 20.48 0.21 5943991 0.00 0.00 Curl_read >> 0.70 20.67 0.19 425174 0.00 0.00 event_del >> 0.70 20.86 0.19 98652 0.00 0.00 >> Curl_if_is_interface_name >> 0.70 21.05 0.19 8677733 0.00 0.00 Curl_socket_check >> 0.59 21.21 0.16 4129554 0.00 0.00 fd_key_compare >> 0.57 21.37 0.16 14017019 0.00 0.00 Curl_debug >> 0.55 21.52 0.15 9376861 0.00 0.00 stat_data_in_add >> 0.55 21.67 0.15 6840468 0.00 0.00 Curl_timeleft >> 0.55 21.82 0.15 2054602 0.00 0.00 Curl_raw_equal >> 0.55 21.97 0.15 1291611 0.00 0.00 Curl_expire >> 0.51 22.11 0.14 16570063 0.00 0.00 curlx_tvnow >> 0.51 22.25 0.14 13863284 0.00 0.00 scan_response >> 0.51 22.39 0.14 5498864 0.00 0.00 Curl_setopt >> 0.48 22.52 0.13 97490291 0.00 0.00 curl_strequal >> 0.44 22.64 0.12 324253 0.00 0.00 Curl_http >> 0.44 22.76 0.12 394386 0.00 0.00 ossl_connect_common >> 0.40 22.87 0.11 3581335 0.00 0.00 Curl_getinfo >> 0.38 22.97 0.11 26983092 0.00 0.00 alloc_addbyter >> 0.37 23.07 0.10 8729044 0.00 0.00 Curl_client_write >> > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > curl-loader-devel mailing list > cur...@li... > https://lists.sourceforge.net/lists/listinfo/curl-loader-devel > -- Regards, Robert Iakobashvili, Ph.D. Home: http://www.ghotit.com ......................................................... Ghotit Dyslexia -> Das Ist Real Writer ......................................................... |