Re: [And-httpd-devel] and-server HTTP 1.1 features

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

"Adam Zell" <zel...@gm...> writes:

> Greetings,
>
> On 9/21/06, James Antill <ja...@an...> wrote:
> "Adam Zell" <zel...@gm...> writes:
>>
>> > Hello,
>> >
>> > I had a quick look at And-HTTPD, and am curious if the following
>> features
>> > are supported:
>> >
>> > * HTTP/1.1 pipelining (reduce TCP cold start)
>>
>> Yes, Vstr makes this much less work.
>
> Hmmm...I am not sure if this means it *is* implemented, or if it *could* be
> with a minimum of effort.

 Both. It is implemented. And was easy to do efficiently, due to Vstr.

>  To ensure we are referring to the same feature,
> pipelining allows a client to send multiple requests over the same
> connection without waiting for the initial response.  The server must send
> back the responses in the same order as the requests.

 Yes, see httpd.c:httpd_serv_recv() for the start. Currently and-httpd
will not parse the next request if there is data to go out on a
previous one (as there seems little point) ... but it will continue to
read the requests from the network (upto a configurable amount of data).
 And if you have a pipeline of say, OPTIONS requests (or HEAD, or are
using mmap() IO, etc.) then the code path looks like:

 httpd_serv_recv() -> (network recv)
  http_parse_req() ->
   http_req_op_get() / _opts() / _trace() ->
    [ http_fin_fd_req() ]
     http_fin_req() -> 
      httpd_serv_send() -> (network send)
       httpd__serv_fin_send() ->
        http_parse_req() -> ...

...Ie. and-httpd will actually parse multiple requests from a single
network recv event.
 Note that and-httpd uses TCP_CORK, so while scatter/gather is at the
heart of Vstr ... it isn't required to have as much data as possible
be in one writev() call.

>  The feature is useful
> in that it allows a client to send multiple packets before blocking on a
> response, giving the TCP stack time to measure latency/windows size/etc.
>
>> * FastCGI, SCGI, LSAPI, etc. (external application support)
>>
>> Not at the moment, this is probably the next big thing that's
>> needed. I mostly have a design, but haven't got around to implementing
>> it. Also I think that most people want less FastCGI etc. and more
>> controlled HTTP proxying type behaviour.
>> It doesn't help that I don't have anything that needs dynamic content
>> generation right now (at least anything that can't be done using the
>> and-ssi, scons, etc. tools).
>
>
> I think that there is still a large market for PHP/Ruby/Python connectors
> without a separate web server (note the momentum which lighttpd and
> LiteSpeed have gained by supporting RoR).  The proxy approach makes sense
> when there are multiple back-ends but that would imply some sort of load
> balancing logic on the HTTP server.  Have you given any thought to cluster
> management similar to a layer 5/7 switch?

 I've certainly given some thought to how to distribute dynamic
requests over multiple backends, but the first dynamic implementation
probably won't care a lot about how to sync multiple requests from a
single Cookied/etc. user to a single server.

>> Have you done any comparisons or benchmarks to
>> Cherokee/Lighttpd/LiteSpeed?
>>
>> I've done some personal benchmarks, they were mostly against itself
>> (Ie. strace etc.) although I've done a few against thttpd and some
>> against gatling. While I read the lighttpd mailing list, I'm reticent
>> to do comparisons against even that due to unfamiliarity (and even if
>> it won, it has a horrible design).
>
> The code is also somewhat bloated and difficult to follow.  I would not use
> it on any production servers, but I would use thttpd.  Cherokee seems
> advanced but I haven't had a chance to play with it much.  From what I can
> tell from the mailing list stability is an issue.

 Right, I've personally not found lighttpd that great from a QoI point
of view. But then one of my first experiences with it was responding
to someone else's statement that it was "secure" upon which I grep'd
the source for 15 minutes and found a buffer overflow.

 But, then again, I've found a lot of conformance problems of edge
cases in Apache-httpd ... so I just think most people probably have
different values of "good" (and they do have more features and users,
so I'm probably also not with the majority of users either).

> Also there's a big problem of lack of usable benchmarking software,
>> and a lot of people have fall back on using "ab" even though it's a
>> horribly inefficient program.
>
>
> Here's a couple which may be decent alternatives to ab:
>
> http://jakarta.apache.org/jmeter/
> http://grinder.sourceforge.net/

 I hadn't seen these, I'll have a look.

> http://www.hpl.hp.com/research/linux/httperf/

 This I have seen though, and while it's better than ab ... I wouldn't
call it efficient. The really hacky client the dietlibc guy did for
gattling (although horrible code) was very fast, so I've also used
that.

> I believe most, if not all of the above packages support coordinating load
> generation from multiple machines.

 Testing via. multiple machines isn't worth my doing atm. as I don't
have a decent multi-GB network.

> One thing I've thought about doing is a http-client type API, which I
>> could use to write a decent benchmark client, http-fuzz program and
>> dynamic content stuff. But there's a lot of things I'd like to do, and
>> I don't have time for all of them :)
>>
>> A lot of the speed comes from:
>>
>> 1) Pipelining (thttpd is probably one of the few that doesn't have
>> that though).
>
>
> Pipelining or persistent connections?

 There is little difference, with static content, if you are using
TCP_CORK (and good reason not to do things in parallel).
 In theory with dynamic content if you have "req1, req2,
req3" and both "req1 and req2" take 5 seconds, then you want to run
them in parallel on the server. However a few things suggest that this
isn't a good idea to me:

1. Each req. shouldn't be taking that long in a "normal" website
design, IMO. You'll have stylesheets/images/favicon.ico requests in
there too.

2. The client could just as easily open multiple network connections.

3. You have inherent parallelism due to all the other network
connections, from other users (it being very rare for a server you'd
care about efficiency that much to only have a small number of
connections to the outside world).

4. There are a lot of clients that don't understand HTTP well, and
having rare failures for almost no gain is bad.

>  Oh, I also forgot to ask about SSL
> support.  I am not sure if/how OpenSSL fits into the secure design, given
> its crufty API.

 It's kind of planned, but it won't be in the And-httpd server itself
(mainly due to the fact I don't trust the OpenSSL or GNUTLS code). The
obvious choice is putting something like stunnel on the connection, in
a sandbox (so it can only read and write on the sockets).

> 2) TCP_DEFER_ACCPET (if you have a lot of dead connections, which some
>> benchmarks tend to do, you do no work at all).
>> 3) TCP_CORK for certain workloads (although I think the latest
>> lighttpd beta has that too).
>> 4) epoll/sendfile (although most decent webservers have had both of
>> those for a while now too).
>
>
> What systems do you plan to support?  Solaris has a couple of different APIs
> for multiplexing (/dev/poll and Solaris 10 events) while *BSD has kqueue.  I
> would guess Windows is not a priority.

 I plan on supporting them all (the POSIX ones), and it should be
close. However, for obvious reasons, everything but Linux is kind of
second tier ... so optimizations tend to get done there first
(FreeBSD's crappy sendfile API still isn't patched in, and that should
be simple) and the Linux API can affect the design where others don't
(splice/tee in Linux is likely to push me to getting NFS async zero
copy IO into and-httpd, but all the kqueue extras don't make an
appearance).

 Saying that, adding the /dev/poll and kqueue event frameworks should
be simple patches to evnt.c

> 5) Memory scalability due to Vstr, although this is mostly unconfirmed
>> by data ... it should be true, and is the hardest for anything else to
>> copy :).
>
> + scatter/gather IO.  Have you thought of using custom memory allocators to
> reduce fragmentation similar to memcached?

 A lot of the Vstr is design is based around IO vectors, and it has a
custom memory allocator (and not just for efficiency reasons).
 Although I wrote Vstr, I have no problem saying it is brilliant :).
 There is a significant amount of documentation that I wrote for it

-- 
James Antill -- ja...@an...
Need an efficient and powerful string library for C?
http://www.and.org/vstr/