From: johnnyb@wolfram.com
To: htdig3-bugs@htdig.org
Subject: Endless loop in "htdig"
Full_Name: Jonathan Bartlett
Version: 3.2b2
OS: RH Linux 6.1 (Kern 2.2.14)
Submission from: brickwall.wolfram.com (140.177.203.26)
While indexing one of our sites
(http://mathworld.wolfram.com) htdig hangs on
random
pages. This is probably due to Apache processes
randomly dying in the middle of
requests.
strace output shows a continual
select
read
select
read
where select will return 1 and read will return 0.
This will happen
indefinitely.
gdb usually ends up somewhere in Read_Partial
./htdig -vvvvv looks like
173:10010:74:http://mathworld.wolfram.com/ScrewTheorem.html:
Making HTTP
reques\t on
http://mathworld.wolfram.com/ScrewTheorem.html
Try to get through to host mathworld.wolfram.com (port
80) via HTTP
10 - Connection already open. No need to re-open.
Header line: HTTP/1.1 200 OK
Header line: Date: Wed, 12 Apr 2000 18:24:53 GMT
Header line: Server: Apache/1.3.11 (Unix)
Header line: Transfer-Encoding: chunked
Header line: Content-Type: text/html
No modification time returned: assuming now
Retrieving document /ScrewTheorem.html on host:
mathworld.wolfram.com:80
Http version : HTTP/1.1
Server : HTTP/1.1
Status Code : 200
Reason : OK
Access Time : Wed, 12 Apr 2000 18:24:53 GMT
Modification Time : Wed, 12 Apr 2000 18:24:50 GMT
Content-type : text/html
Transfer-encoding : chunked
Persistent connection: would be accepted
Reading the body of the response
Initial chunk-size: 3648
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
and the Chunk-size messages will continue
indefinitely. Feel free to contact
me. I have tried to look through the source myself,
but I couldn't find any
problems.
Thanks for your time and a great program.
Logged In: YES
user_id=21420
This case (select returns 1, read returns 0) will still
create an infinite loop--this needs to be addressed somehow,
even if this case is unlikely to occur based on other fixes
in the connection code.
A zero result from 0 indicates the EOF, so shouldn't the
loop end on a 0 return from read?
Logged In: YES
user_id=21420
AFAIK, this is fixed in current code.
-Geoff