Re: [cpp-netlib-devel] Adding files to the sandbox

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Jose,

you are right, there is a spirit-based URI parser in mini-httpd.
Unfortunately, it is incomplete insofar as that it understands
HTTP URLs only and doesn't recognize literal IPv6 host names. In
addition, the code is a bit messy because it was written several
years ago, at a time where Spirit didn't have the sophisticated
actor infrastructure it has these days.

In my experience, the greatest challenge when parsing an URI is
not the parser, it is the resulting data structure. The URI class
mini-httpd uses in fine for mini-httpd, but it certainly is far
from generic.

RFC 2396 comes with a state-machine for parsing URIs, by the way,
and it's pretty wild:

 |    ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 |     12            3  4          5       6  7        8 9

The relevant sub-match states are:

 |    scheme    = $2
 |    authority = $4
 |    path      = $5
 |    query     = $7
 |    fragment  = $9

In terms of performance, I doubt that there is much of a
difference between Spirit and Boost.Regex in this context. The
main difference is that Boost.Regex must be linked whereas Spirit
is a header-only library. That may or may not matter to our
users; it's hard to tell.

One disadvantage of Spirit is that compile-time goes through the
roof even for trivial grammars. Another problem is that Spirit
relies one rather sophisticated magic to be thread-safe. Compiled
regular expressions, however, are immutable and can be used by
any number of threads concurrently without synchronization.

A hand-written parser might be slightly faster than either Spirit
or Boost.Regex. It's definitely harder to get right, though. :-)

Best regards,
Peter