|
From: Raymond C. R. <ra...@ba...> - 2009-01-29 23:18:41
|
Mark Hellegers wrote: > Hi all, > > I just checked in a complete rewrite of the HTML parser. > It is still very unstable, but it can already parse some pages correctly, > for example the google homepage. > I don't think it is very useful for anyone to test it, if you don't know > where to find the problem in the code when a page doesn't parse correctly. > I know it still gets confused on a lot of pages, causing error messages, > or worse infinite loops. > > That said, if you are working on another part of Themis (hint hint ;) and > you need a particular page parsed correctly, I'll be happy to have a look > at what the problem is. > > Mark > > :) I noticed the devcvs messages earlier. :) I'm just finishing up a project for work that's had me tied up for the last two months, so I'll be getting back to work myself [again] soon. When I was last working with the code in November, I noticed a few bugs in processing set-cookie headers on certain sites. (Oddly enough, only on Microsoft owned sites.) So while it might not be directly related to HTML parsing, I'll be sure to keep an eye on what happens. Raymond |