#8 Endless loop in Read_Partial

need info
closed-fixed
htnet (8)
6
2002-07-28
2001-03-01
Geoff Hutchison
No

From: johnnyb@wolfram.com
To: htdig3-bugs@htdig.org
Subject: Endless loop in "htdig"

Full_Name: Jonathan Bartlett
Version: 3.2b2
OS: RH Linux 6.1 (Kern 2.2.14)
Submission from: brickwall.wolfram.com (140.177.203.26)

While indexing one of our sites
(http://mathworld.wolfram.com) htdig hangs on
random
pages. This is probably due to Apache processes
randomly dying in the middle of
requests.
strace output shows a continual
select
read
select
read

where select will return 1 and read will return 0.
This will happen
indefinitely.

gdb usually ends up somewhere in Read_Partial

./htdig -vvvvv looks like

173:10010:74:http://mathworld.wolfram.com/ScrewTheorem.html:
Making HTTP
reques\t on
http://mathworld.wolfram.com/ScrewTheorem.html
Try to get through to host mathworld.wolfram.com (port
80) via HTTP
10 - Connection already open. No need to re-open.
Header line: HTTP/1.1 200 OK
Header line: Date: Wed, 12 Apr 2000 18:24:53 GMT
Header line: Server: Apache/1.3.11 (Unix)
Header line: Transfer-Encoding: chunked
Header line: Content-Type: text/html
No modification time returned: assuming now
Retrieving document /ScrewTheorem.html on host:
mathworld.wolfram.com:80
Http version : HTTP/1.1
Server : HTTP/1.1
Status Code : 200
Reason : OK
Access Time : Wed, 12 Apr 2000 18:24:53 GMT
Modification Time : Wed, 12 Apr 2000 18:24:50 GMT
Content-type : text/html
Transfer-encoding : chunked
Persistent connection: would be accepted
Reading the body of the response
Initial chunk-size: 3648
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670
Chunk-size: 670

and the Chunk-size messages will continue
indefinitely. Feel free to contact
me. I have tried to look through the source myself,
but I couldn't find any
problems.

Thanks for your time and a great program.

Discussion

    • assigned_to: nobody --> angusgb
     
  • Logged In: YES
    user_id=21420

    This case (select returns 1, read returns 0) will still
    create an infinite loop--this needs to be addressed somehow,
    even if this case is unlikely to occur based on other fixes
    in the connection code.

    A zero result from 0 indicates the EOF, so shouldn't the
    loop end on a 0 return from read?

     
    • priority: 5 --> 6
     
    • status: open --> closed-fixed
     
  • Logged In: YES
    user_id=21420

    AFAIK, this is fixed in current code.

    -Geoff