From: Peter S. <si...@cr...> - 2007-06-03 21:18:03
|
Hi Jose, you are right, there is a spirit-based URI parser in mini-httpd. Unfortunately, it is incomplete insofar as that it understands HTTP URLs only and doesn't recognize literal IPv6 host names. In addition, the code is a bit messy because it was written several years ago, at a time where Spirit didn't have the sophisticated actor infrastructure it has these days. In my experience, the greatest challenge when parsing an URI is not the parser, it is the resulting data structure. The URI class mini-httpd uses in fine for mini-httpd, but it certainly is far from generic. RFC 2396 comes with a state-machine for parsing URIs, by the way, and it's pretty wild: | ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? | 12 3 4 5 6 7 8 9 The relevant sub-match states are: | scheme = $2 | authority = $4 | path = $5 | query = $7 | fragment = $9 In terms of performance, I doubt that there is much of a difference between Spirit and Boost.Regex in this context. The main difference is that Boost.Regex must be linked whereas Spirit is a header-only library. That may or may not matter to our users; it's hard to tell. One disadvantage of Spirit is that compile-time goes through the roof even for trivial grammars. Another problem is that Spirit relies one rather sophisticated magic to be thread-safe. Compiled regular expressions, however, are immutable and can be used by any number of threads concurrently without synchronization. A hand-written parser might be slightly faster than either Spirit or Boost.Regex. It's definitely harder to get right, though. :-) Best regards, Peter |