From: Laurent H. <lau...@gm...> - 2004-03-02 02:10:46
|
Hi all, I discovered the cool libxml++ yesterday on my quest for the best C++ XML Parser (bindings, coz libxml2 seems to be the best C parser anyway ;). Libxml is not new to me though, I used it extensively in Perl thanx to the very complete XML::LibXML CPAN Module. Now one of my main motivations is to parse HTML Files into a DOM tree where I can extract nodes with XPATH. In perl that was easy , it has the html parser included. Therefore after a thorough search in the API I was a bit disappointed that there was no HTML Parser support in libxml++... but thanks to the clean API's of libxml(++) and after a little reading , I had no difficulties at all building my own subclass (based on domparser.cc) except some little quirks (like extra encoding parameter in some html parser functions) :) In fact libxml2 has a really tolerant html parser (I used it in perl for mirroring/parsing whole dynamic websites :D ), it even returns a good XML Document when it had parser Errors, but to get a Doc returned in such a case one has to turn off the 'wellformedness' check, which I did in my temporary htmlparser Implementation. ( Unfort. there's always a segfault at the end of a run of my edited 'dom_xpath/main.cc' html parsing example app , when ignoring '!context_->wellFormed' ?! experimenting done in 'HtmlParser::parse_context' method ) I hope HTML Parsing can be included in the main distr. ( maybe better with wellFormed check on )... To compile the whole library with my htmlparser class, I added the class in all the files (Makefile.am files, libxml++.h...) containing 'domparser'. Included are the c++ and include files of htmlparser class (or should I've taken diffs from the domparser.cc/h originals ?) plus my html parsing example, which shows all the //a[@href] links with their attribute contents. Hopefully the segfault can be easily solved with the knowledge of the lead developpers ( I don't have yet ;). I guess its just something I'm missing, else I'll try to find the mem.leak using a debugger (or is there a better way ??) Thanx, Laurent |