From: Gilles D. <gr...@sc...> - 2002-02-01 15:33:25
|
According to Franck Horlaville: > My 3.1.6 try on www.aicha.com (13 pages) suceeded, whereas a > 3.2.0b4-123001 failed on that same page. Should I try the latest > snapshot ? Yes, there haven't been a huge amount of changes in the past month, but enough that it could affect HTTP request handling. And also off-list Franck said: > OK, it's been stuck since 1am now and it's 1pm. > > Here are the last lines of the log file: > > Connection closed. Try to get it again. > Header line: HTTP/1.1 200 OK > Header line: Date: Fri, 01 Feb 2002 01:05:23 GMT > Header line: Server: Apache/1.3.6 (Unix) > Header line: Page-Completion-Status: Normal > Discarded header line: Page-Completion-Status: Normal > Header line: Page-Completion-Status: Normal > Discarded header line: Page-Completion-Status: Normal > Header line: Set-Cookie: CFID=767378; expires=Sun, 27-Sep-2037 > 00:00:00 GMT; path=/; > Header line: Set-Cookie: CFTOKEN=17634972; expires=Sun, 27-Sep-2037 > 00:00:00 GMT; path=/; > Header line: Transfer-Encoding: chunked > Header line: Content-Type: text/html > No modification time returned: assuming now > Retrieving document > /wanadoo2/fr/thematiques/dominanteservices/sport/tennis/articles_tennis.cfm?id_art=514 > on host: www.wanadoo.ma:80 > Http version : HTTP/1.1 > Server : HTTP/1.1 > Status Code : 200 > Reason : OK > Access Time : Fri, 01 Feb 2002 01:06:17 GMT > Modification Time : Fri, 01 Feb 2002 01:06:20 GMT > Content-type : text/html > Transfer-encoding : chunked OK, this is informative! We've had problems with chunked input before, so there may still be problems to resolve. The HtHTTP.cc code was developed by Gabriele, so it would be a good thing for him to take a look at this. (All the more reason to keep replies on the list, right?) Interestingly, this is a different site than what you reported before (www.aicha.com). Did htdig follow a link from aicha.com to wanadoo.ma, or is this a different test altogether? If it's different, does it hang at the same spot when indexing aicha.com? You had mentioned previously that it seemed to hang on the URL http://www.aicha.com/lacollec/c/montres_010430_2.htm. Does it also read this using chunked input, or is this yet another problem? Chunked input is usually used by HTTP 1.1 for server-parsed content, not for static pages. > === > here's the gdb stack trace: > > [Switching to thread 1 (process 9071 thread 0x1707)] > ^C > Program received signal SIGINT, Interrupt. > 0x700252fc in select () > (gdb) bt > #0 0x700252fc in select () > #1 0x00018b70 in Connection::Read_Partial (this=0x15785f0, > buffer=0x15785f4 "1000\r\nredi soir une conf?nce de presse par > satellite en direct de Melbourne organis?par son sponsor m?tel, a > soulign?ue sa victoire au tournoi de Doha n'est pas vol?mais elle > constitue le fr"..., maxlength=8192) at Connection.cc:652 > #2 0x00018698 in Connection::Read_Char (this=0x15785f0) at Connection.cc:419 > #3 0x00018764 in Connection::Read_Line (this=0x15785f0, > s=@0xbfffced8, terminator=0xad860 "\r\n") at Connection.cc:466 > #4 0x00014078 in HtHTTP::ReadChunkedBody (this=0x1578060) at HtHTTP.cc:1049 > #5 0x00012b6c in HtHTTP::HTTPRequest (this=0x1578060) at HtHTTP.cc:411 > #6 0x000121d0 in HtHTTP::Request (this=0x1578060) at HtHTTP.cc:182 > #7 0x0000390c in Document::Retrieve (this=0x499ab0, > server=0x1a69a30, date={Ht_t = 0, local_time = true, static days = > 0xd1790, _vptr$ = 0xf9584}) at Document.cc:487 > #8 0x00008f04 in Retriever::parse_url (this=0xbffff378, urlRef=@0x0) > at Retriever.cc:567 > #9 0x00008934 in Retriever::Start (this=0xbffff378) at Retriever.cc:427 > #10 0x0000ff4c in main (ac=5, av=0xbffffb30) at htdig.cc:338 > #11 0x000026f8 in _start () > #12 0x00002528 in start () > > anything else I can do ? This is 3.2.0b4-123001 running (or trying to > ;-) ) on a PowerMac G4 with 512 Mb RAM, MacOS X 10.1.2 (a.k.a Darwin > 5.2) This backtrace seems consistent with an earlier bug report about Read_Partial. If the problem persists with the htdig-3.2.0b4-012702 snapshot, e-mail back to the list about it, and we'll try to get Gabriele on the case. > (answers to your questions lower down) > > >According to Franck Horlaville: ... > >> by the way when you send a kill statement, it would be great to have > >> on the stdout something like "kill received, dumping progress log to > >> /path/file.log" > > > >That's a good idea. At least, if you have one or more -v options it ought > >to do this. It probably should be added to the log dumping section of > >Retriever::Start(). Should it say this even if no -v options are given, > >though, or should it run silently in that case? My feeling is the latter, > >for consistency. Any other opinions? > > Not sure - I'd say kill is an important enough operation to always > give some feedback ... Also add a "dump done" message. Maybe send it > to stderr if no "v" options are set ? Sounds reasonable to me. It's too late for 3.1.6 for this change, but we can put it in 3.2.0b4. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |