From: Dean M. B. <mik...@gm...> - 2009-08-14 19:46:54
|
Hi Guys, I've run into a little conundrum. Here's the problem: I want to be able to parse optional username and password in an HTTP url as in 'http://user:password@host/' with Boost.Spirit2x (the one in Boost trunk). So far every attempt I've tried has brought me compile-time errors or if I get past the compile errors I see the host being put in the user field. So far here is my grammar: bool ok = phrase_parse( start_, end_, ( lit("//") >> -lexeme[*(char_ - ':')] >> -lexeme[':' >> *(char_ - '@')] >> -lexeme['@'] >> +(char_ - '/') >> -lexeme['/' >> *(char_ - '?')] >> -lexeme['?' >> *(char_ - '#')] >> -lexeme['#' >> *char_] ), space, result ); I have committed the failing tests and the grammar to the repository (revision 149, in branches/urllib-dean). Any Spirit2x users out there willing to lend a hand? Thanks in advance. -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Kim G. <kim...@gm...> - 2009-08-14 20:21:58
|
Hi Dean, I know I struggled with this when I did it with Spirit + Phoenix, I don't know how Spirit 2 is different. The only way I could find was to keep a buffer variable containing the possible user info, and then commit to it once an @ was found. See my confusion in action here: http://cpp-netlib.svn.sourceforge.net/viewvc/cpp-netlib/branches/uri/boost/network/uri.hpp?revision=143&view=markup I don't know if that helps at all, but maybe you can find inspiration somehow... It looks like the Spirit 2 grammar has an entirely different form, so I don't really see how it ties into the Spirit 1 model. Cheers, - Kim On Fri, Aug 14, 2009 at 21:46, Dean Michael Berris<mik...@gm...> wrote: > Hi Guys, > > I've run into a little conundrum. Here's the problem: > > I want to be able to parse optional username and password in an HTTP > url as in 'http://user:password@host/' with Boost.Spirit2x (the one in > Boost trunk). So far every attempt I've tried has brought me > compile-time errors or if I get past the compile errors I see the host > being put in the user field. So far here is my grammar: > > bool ok = phrase_parse( > start_, end_, > ( > lit("//") > >> -lexeme[*(char_ - ':')] > >> -lexeme[':' >> *(char_ - '@')] > >> -lexeme['@'] > >> +(char_ - '/') > >> -lexeme['/' >> *(char_ - '?')] > >> -lexeme['?' >> *(char_ - '#')] > >> -lexeme['#' >> *char_] > ), > space, > result > ); > > I have committed the failing tests and the grammar to the repository > (revision 149, in branches/urllib-dean). > > Any Spirit2x users out there willing to lend a hand? Thanks in advance. > > -- > Dean Michael Berris > blog.cplusplus-soup.com | twitter.com/mikhailberis > linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Cpp-netlib-devel mailing list > Cpp...@li... > https://lists.sourceforge.net/lists/listinfo/cpp-netlib-devel > |
From: Dean M. B. <mik...@gm...> - 2009-08-15 15:24:11
|
Hi Kim! On Sat, Aug 15, 2009 at 4:21 AM, Kim Gräsman<kim...@gm...> wrote: > > I know I struggled with this when I did it with Spirit + Phoenix, I > don't know how Spirit 2 is different. The only way I could find was to > keep a buffer variable containing the possible user info, and then > commit to it once an @ was found. > Yeah, that's one way. The other way I was struggling with was with doing a "longest match" ala-regex where if you found an @ character, what you've seen before it is something you deal with differently. I tried doing something with the 'lexeme' parser with multiple nested lexemes -- this seemed to have worked, except that I can't seem to do the parsing grammar correctly. I've avoided trying to create my own parser type and just try and do everything in-lined to keep it simple, but it proves to be a pretty hard thing to do. > See my confusion in action here: > http://cpp-netlib.svn.sourceforge.net/viewvc/cpp-netlib/branches/uri/boost/network/uri.hpp?revision=143&view=markup > > I don't know if that helps at all, but maybe you can find inspiration > somehow... It looks like the Spirit 2 grammar has an entirely > different form, so I don't really see how it ties into the Spirit 1 > model. > Thanks for the link, yes I see the approach that seems to work -- however I'm not very keen on using Spirit 1 anymore at the moment having seen that the performance and expressiveness of Spirit 2x seems to be better. For instance, it's more efficient not having to use Phoenix and just have direct storage for assigning resulting values. Maybe you want to have a hand at Spirit 2x, and translating the logic you have there but without having to use Phoenix explicitly? Maybe you can express it as a normal "longest match" parser? :D Thanks again Kim. :) -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Divye K. <div...@gm...> - 2009-08-15 21:12:30
|
Hi Dean, I went through your code and some of the documentation. However, while tracing the code flow, i was able to determine that the string " http://www.boost.org" was being passed using the range represented by (start_, end_). I couldn't find where the string "http:" was being struck off from that range. > > bool ok = phrase_parse( > start_, end_, > ( > lit("//") > >> -lexeme[*(char_ - ':')] > >> -lexeme[':' >> *(char_ - '@')] > >> -lexeme['@'] > >> +(char_ - '/') > >> -lexeme['/' >> *(char_ - '?')] > >> -lexeme['?' >> *(char_ - '#')] > >> -lexeme['#' >> *char_] > ), > space, > result > ); > As there is nothing before the lit("//"). Probably, the first lexeme is picking up the "http" and the userinfo is getting all the rest of the URL ://www.boost.org (as there is no @ around). Unfortunately, I don't have an updated boost installation to test this out just yet (no Spirit 2 just yet). Why the grammar is ignoring the lit("//") is a mystery to me. Hope this helps somewhat (or I might be completely off track on this). Sincerely, Divye |
From: Dean M. B. <mik...@gm...> - 2009-08-16 04:27:28
|
Hi Divye, On Sun, Aug 16, 2009 at 5:12 AM, Divye Kapoor<div...@gm...> wrote: > > I went through your code and some of the documentation. However, while > tracing the code flow, i was able to determine that the string > "http://www.boost.org" was being passed using the range represented by > (start_, end_). I couldn't find where the string "http:" was being struck > off from that range. Actually, there are two places which does the parsing: - boost/network/uri/detail/url_parser.hpp -- function parse_url<>(...) - boost/network/uri/http_url.hpp -- function parse_special<>(...) The 'http' is parsed by the function parse_url, which takes the scheme ('http') and the scheme-specific part ('//www.boost.org') and then delegates the special parsing of the scheme-specific part to parse_special. What happens then is the range [start_,end_) is just '//www.boost.org' when it's passed in parse_special. The problem becomes that because of the grammar I already have in there, www.boost.org seems to be parsed as the user instead of the host. Basically I need something regex-like: //~([user]:[password]@)[host]~(:[port]) (where '~' denotes optional). Right now I'm trying a lot of things with a "longest-match" kind of parser, maybe having lexemes of lexemes. [snip] > > As there is nothing before the lit("//"). Probably, the first lexeme is > picking up the "http" and the userinfo is getting all the rest of the URL > ://www.boost.org (as there is no @ around). Unfortunately, I don't have an > updated boost installation to test this out just yet (no Spirit 2 just yet). > Why the grammar is ignoring the lit("//") is a mystery to me. > Hope this helps somewhat (or I might be completely off track on this). You might want to check out the latest boost trunk and let me know if you get any farther with testing things out. :) -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Dean M. B. <mik...@gm...> - 2009-08-16 05:27:08
|
Update: I think I got it! :) Please check out the source in branches/urllib-dean -- I think it just has something to do with understanding how to use the new primitive parsers in Spirit 2x. Thanks to those who responded and gave me an idea of how to go about things. :) -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: John P. F. <jf...@ov...> - 2009-08-16 14:40:10
|
Sorry to chime in late as always, I just dumped the in-progress work that I was doing into http_integration_jf. This was done a number of months ago, prior to my life-forcing break from this project. The only thing that is functional -is the uri parser. This spirit implementation is inspired from Braden McDaniel's uri-grammar. The gist of the design is that there was a main class which did structural grammar checking of the url, and then a family of re-usable grammar classes which corresponded to the http components. To see this in action: build the tests under libs/uri/test.For now I hope this lends some insight. I should get around to cleaning this branch up in the near future. John Dean Michael Berris wrote: > Update: I think I got it! :) Please check out the source in > branches/urllib-dean -- I think it just has something to do with > understanding how to use the new primitive parsers in Spirit 2x. > > Thanks to those who responded and gave me an idea of how to go about things. :) > > |
From: Dean M. B. <mik...@gm...> - 2009-08-16 16:00:48
|
Hey John! On Sun, Aug 16, 2009 at 6:26 PM, John P. Feltz<jf...@ov...> wrote: > Sorry to chime in late as always, No worries, better late than never. ;-) > > I just dumped the in-progress work that I was doing into > http_integration_jf. This was done a number of months ago, prior to my > life-forcing break from this project. The only thing that is functional > -is the uri parser. This spirit implementation is inspired from Braden > McDaniel's uri-grammar. The gist of the design is that there was a main > class which did structural grammar checking of the url, and then a > family of re-usable grammar classes which corresponded to the http > components. To see this in action: build the tests under > libs/uri/test.For now I hope this lends some insight. I should get > around to cleaning this branch up in the near future. > Cool! Are you using Boost.Spirit 2x? I haven't been looking at these changes closely. What I've already started doing is have a base URL class from which all specific URL families (HTTP, FTP, etc.) will derive from. I've based my implementation on RFC 1738. There's a two-step parsing approach I use which first does a generic parse that parses the scheme from the scheme specific part; then invokes a 'parse_special' function that parses the scheme specific part. The basic_url<tags::default_> implementation is a bare basic_url<> that just supports the protocol(...) and rest(...) function. The specialization of the basic_url<...> for the HTTP urls is basic_url<tags::http> -- and the parsing specific to HTTP URLs is encapsulated in parse_special<traits::string<tags::http>::type, tags::http>(...). This allows anyone to create a specialization of basic_url<...> for the special parsing of FTP, "mailto", etc. Maybe we can merge the work together in a branch just for the URL parsing, then make the http_message implementation use the new URL library instead of the adhoc implementation that it's using at the moment? Personally I really want to be using Spirit 2x because I also intend to use Karma for the HTTP Message generation/encoding for MIME messages. Of course that's a lot of work down the road, but the current (not-so-well-tested) implementation seems to be able to identify between HTTP and HTTPS ports. From there we should be able to write the stuff that allows the HTTP client to create its own connections based on the protocol(http_message.url()) -- if it's "https" then use the ssl::socket and if it's just "http" use the normal tcp::socket. That needs to be ironed out and refactored into a separate logic for connection handling. -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: John P. F. <jf...@ov...> - 2009-08-16 17:22:55
|
Dean Michael Berris wrote: > Hey John! > > On Sun, Aug 16, 2009 at 6:26 PM, John P. Feltz<jf...@ov...> wrote: > >> Sorry to chime in late as always, >> > > No worries, better late than never. ;-) > > >> I just dumped the in-progress work that I was doing into >> http_integration_jf. This was done a number of months ago, prior to my >> life-forcing break from this project. The only thing that is functional >> -is the uri parser. This spirit implementation is inspired from Braden >> McDaniel's uri-grammar. The gist of the design is that there was a main >> class which did structural grammar checking of the url, and then a >> family of re-usable grammar classes which corresponded to the http >> components. To see this in action: build the tests under >> libs/uri/test.For now I hope this lends some insight. I should get >> around to cleaning this branch up in the near future. >> >> > > Cool! Are you using Boost.Spirit 2x? I haven't been looking at these > changes closely. > To be truthful I haven't even bothered to determine that. Spirit 2 was and still is for me- ambiguous. I simply chose to base that work off the boost_139 spirit docs. > Of course that's a lot of work down the road, but the current > (not-so-well-tested) implementation seems to be able to identify > between HTTP and HTTPS ports. From there we should be able to write > the stuff that allows the HTTP client to create its own connections > based on the protocol(http_message.url()) -- if it's "https" then use > the ssl::socket and if it's just "http" use the normal tcp::socket. > That needs to be ironed out and refactored into a separate logic for > connection handling. > > That seems rational. Actually, after a stint of researching some Java and Python based networking libraries, I myself have come the conclusion that presenting the user with a configurable connection object for a particular protocol is preferred, in addition to a client facade for common use-cases. As a side note, I have also come to the conclusion that a mailing list is not my preferred forum for this sort of discussion, which is better suited by collaborative specifications and conferencing. I'm curious as to what the opinions of the other developers are on this. John |
From: Dean M. B. <mik...@gm...> - 2009-08-17 12:26:55
|
On Sun, Aug 16, 2009 at 10:22 PM, John P. Feltz<jf...@ov...> wrote: > > Dean Michael Berris wrote: >> >> Cool! Are you using Boost.Spirit 2x? I haven't been looking at these >> changes closely. >> > > To be truthful I haven't even bothered to determine that. Spirit 2 was > and still is for me- ambiguous. I simply chose to base that work off the > boost_139 spirit docs. > Ah, okay. Well it should be alright -- it should be Spirit 2 if it's Boost 1.39. Although I may be wrong. >> Of course that's a lot of work down the road, but the current >> (not-so-well-tested) implementation seems to be able to identify >> between HTTP and HTTPS ports. From there we should be able to write >> the stuff that allows the HTTP client to create its own connections >> based on the protocol(http_message.url()) -- if it's "https" then use >> the ssl::socket and if it's just "http" use the normal tcp::socket. >> That needs to be ironed out and refactored into a separate logic for >> connection handling. >> >> > That seems rational. Actually, after a stint of researching some Java > and Python based networking libraries, I myself have come the conclusion > that presenting the user with a configurable connection object for a > particular protocol is preferred, in addition to a client facade for > common use-cases. Right. But my only reservation against this is that that's too much work for the user. I want to be able to do something like: http::request normal("http://www.boost.org"); http::request https("https://www.boost.org"); http::client c; http::response normal_response = c.get(normal); http::response http_response = c.get(https); And it should "just work". > As a side note, I have also come to the conclusion > that a mailing list is not my preferred forum for this sort of > discussion, which is better suited by collaborative specifications and > conferencing. I'm curious as to what the opinions of the other > developers are on this. > While we're on the subject, I don't like making documents for one, which explains why I can't get myself to put a roadmap document. ;) Nor do I like writing specification documents -- I feel that's a waste of my time. I'd rather show client code that works and hide the plumbing so that I (and everyone else working on the project) can just "make it work" without having to burden the client (or the person reading the documentation) with too many details. Although for our sake, I think we need a coherent place to put the information in -- so that we don't just put the details in mailing list archives. However, I am not the best person to write that document; although I feel like I should be the one doing it. :| At any rate, I agree that mailing lists aren't the best means for ironing out specifications or design documents -- however I feel discussions can be best held here about the approach. It's (for me) the medium of least resistance as far as collaboration goes. I don't mind a Wiki page that says what we mean to say in one place, but before we put anything up to a Wiki I think there should be some sort of discussion that we can keep going on a mailing list -- then later we can lift the results of the discussion into a Wiki. This has worked for me in my time as a developer, because it kills two birds with one stone -- the rationale is ironed out in the mailing list while the outcome is put in the Wiki. I hope this makes sense. :-) -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Allister L. S. <all...@gm...> - 2009-08-17 08:26:21
|
Hi everyone, On Sun, Aug 16, 2009 at 4:22 PM, John P. Feltz <jf...@ov...>wrote: > As a side note, I have also come to the conclusion > that a mailing list is not my preferred forum for this sort of > discussion, which is better suited by collaborative specifications and > conferencing. I'm curious as to what the opinions of the other > developers are on this. > Do we all have Google Wave accounts? You might find it very useful for collaborating on specs :-) Cheers, Allister |
From: Dean M. B. <mik...@gm...> - 2009-08-17 12:28:01
|
On Mon, Aug 17, 2009 at 4:26 PM, Allister Levi Sanchez<all...@gm...> wrote: > > Do we all have Google Wave accounts? You might find it very useful for > collaborating on specs :-) > Oh, do you have one? How do you get one? I'd like to try it out first hand too -- maybe in lieu of that we use Google Documents first? -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |