Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 Terminating paused crawl leaves zombie threads - ID: 1002319
Last Update: Comment added ( karl-ia )

Via the UI: if you pause a crawl, then terminate it,
all the toethreads are still waiting in their
shouldPause holding pen, sticking around.

(Noted in eclipse debugger.)


Gordon Mohr ( gojomo ) - 2004-08-02 23:08

7

Closed

Fixed

Michael Stack

None

1.0.1

Public


Comments ( 4 )

Date: 2007-03-14 00:15
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-217 -- please add further
comments at that location.


Date: 2004-09-08 20:04
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Closing. Here is commit message (Committed to heritrix_1_0.
HEAD has been changed in how it works regards pause).

Fix for "[ 1002319 ] Terminating paused crawl leaves zombie
threads".
* src/java/org/archive/crawler/framework/ToeThread.java
(stopAfterCurrent): Need to set the shouldPause flag to
false in case
thread currently paused. Otherwise, the subsequent will
wakeup the
waiting thread, it'll see shouldPause still true and
it'll just go
(setShouldPause): Refactoring. Early exit if we already
have state
we're being asked to go into.


Index: ToeThread.java
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawler/framework/ToeThread.java,v
retrieving revision 1.35
retrieving revision 1.35.2.1
diff -C2 -d -r1.35 -r1.35.2.1
*** ToeThread.java 6 Aug 2004 02:15:43 -0000 1.35
--- ToeThread.java 8 Sep 2004 18:45:03 -0000 1.35.2.1
***************
*** 330,335 ****
logger.info("ToeThread " + this.serialNumber +
" has been told to stopAfterCurrent()");
! shouldCrawl = false;
! notify();
}

--- 330,336 ----
logger.info("ToeThread " + this.serialNumber +
" has been told to stopAfterCurrent()");
! this.shouldCrawl = false;
! this.shouldPause = false;
! notifyAll();
}

***************
*** 437,452 ****
*/
public void setShouldPause(boolean b) {
// Updating this field outside of a synchronized
block should be ok
// as its volatile -- the value will be read right
through to
// memory (If the JVM acts on the volatile keyword
at all).
! shouldPause = b;
! // Don't synchronize if we don't have to.
! if (!shouldPause) {
! synchronized (this) {
! // Recheck in case changed after we got
the lock.
! if(!shouldPause) {
! notifyAll();
! }
! }
}
}
--- 438,452 ----
*/
public void setShouldPause(boolean b) {
+ if (b == this.shouldPause) {
+ return;
+ }
// Updating this field outside of a synchronized
block should be ok
// as its volatile -- the value will be read right
through to
// memory (If the JVM acts on the volatile keyword
at all).
! // We're doing it like this to narrow
synchronization; we're having
! // probs. 'cos takes long time for sync block to
get attention.
! this.shouldPause = b;
! synchronized (this) {
! notifyAll();
}
}






Date: 2004-09-04 01:01
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

If a feral crawl, 250 threads or so, then hitting terminate,
the threads stick around. Look hung on frontier.next and on
TextUtils.get (Hashtable gets are synchronized). Happens
each time. Its like the old problem on the hyperthreaded
NTPL box (ToePool getting its toethread stop message through
-- least, it takes a long time to complete). Reproducible.
Will look some more.


Date: 2004-09-01 21:12
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

The situation with pause-then-terminate has been fixed in
HEAD via major revisions to toethread-frontier relationship.
It may not be necessary to fix in 1.0.x branch.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
status_id Open 2004-09-08 20:04 stack-sf
resolution_id None 2004-09-08 20:04 stack-sf
close_date - 2004-09-08 20:04 stack-sf
assigned_to nobody 2004-09-04 01:01 stack-sf
artifact_group_id None 2004-09-01 21:08 gojomo
priority 5 2004-09-01 21:08 gojomo