Re: [cpp-netlib-devel] Help! Grammar for parsing HTTP URLs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Divye,

On Sun, Aug 16, 2009 at 5:12 AM, Divye Kapoor<div...@gm...> wrote:
>
>     I went through your code and some of the documentation. However, while
> tracing the code flow, i was able to determine that the string
> "http://www.boost.org" was being passed using the range represented by
> (start_, end_). I couldn't find where the string "http:"  was being struck
> off from that range.

Actually, there are two places which does the parsing:

  - boost/network/uri/detail/url_parser.hpp -- function parse_url<>(...)
  - boost/network/uri/http_url.hpp -- function parse_special<>(...)

The 'http' is parsed by the function parse_url, which takes the scheme
('http') and the scheme-specific part ('//www.boost.org') and then
delegates the special parsing of the scheme-specific part to
parse_special. What happens then is the range [start_,end_) is just
'//www.boost.org' when it's passed in parse_special.

The problem becomes that because of the grammar I already have in
there, www.boost.org seems to be parsed as the user instead of the
host. Basically I need something regex-like:

  //~([user]:[password]@)[host]~(:[port])

(where '~' denotes optional).

Right now I'm trying a lot of things with a "longest-match" kind of
parser, maybe having lexemes of lexemes.

[snip]
>
> As there is nothing before the lit("//"). Probably, the first lexeme is
> picking up the "http" and the userinfo is getting all the rest of the URL
> ://www.boost.org (as there is no @ around). Unfortunately, I don't have an
> updated boost installation to test this out just yet (no Spirit 2 just yet).
> Why the grammar is ignoring the lit("//") is a mystery to me.
> Hope this helps somewhat (or I might be completely off track on this).

You might want to check out the latest boost trunk and let me know if
you get any farther with testing things out. :)

-- 
Dean Michael Berris
blog.cplusplus-soup.com | twitter.com/mikhailberis
linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com