From: Neal R. <ne...@ri...> - 2004-08-23 19:06:24
|
I'm finding that several things are wierd in the htdig/HTML.cc class 1) If a page has an ill-formed comment tag like this: <!-- hennerik CVSweb $Revision: 1.64 0-> Everything after the start of the comment is eaten.. the entire page.. since the comment end is bad. A browser handles this fine. 2) In the HTML::parse function 223 unsigned char *text = (unsigned char *)new char[contents->length()+1]; This variable seems intended to store the document contents. However both times it's used as a RHS of an anssignment statment: 224 unsigned char *ptext = text; [snip] 380 position = text; 381 start = position; 382 383 while (*position) 384 { Note that the while statement (lines 384 to 545) is likely never entered since gcc seems to initialize text to zeros on Linux. The behavior could be platform dependent since who knows what's in that memory. Any feedback? Thanks Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |