Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

6 [contrib] UI stacktrace dump (Depends on JDK150) - ID: 1180630
Last Update: Comment added ( karl-ia )

Hi,

some improvements for the admin webapp:

1. Submitting a new job with a huge seed list and
recovering large old jobs currently "hang" the web UI
until the operation terminates. I have modified
Heritrix to make new jobs being prepared for crawling
in a separate thread. This way, the WUI immediately
responds to the user's request.
Along with this, I have introduced a new job status
"PREPARING", which signalizes that state, especially in
order to disallow multiple job submissions over
multiple Heritrix instances.

Some JSP files had to be modified to not throw
NullPointerExceptions but to show as much information
as possible about the job being prepared. Now, you can
access report information (such as queue state) already
during job recovery.

2. In addition, a new "Stacktraces" report is provided
which dumps stack traces of all running Java threads.
In most cases, this is more handy than sending a "kill
-SIGQUIT" or using JMX, isn't it?


Best regards,

Christian


Christian Kohlschütter ( ck-heritrix ) - 2005-04-11 11:02

6

Closed

None

Michael Stack

Usability/UI

1.6.0

Public


Comments ( 7 )

Date: 2007-03-14 01:40
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-916 -- please add further
comments at that location.


Date: 2005-06-01 18:00
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Closing.

Christian got stacktraces in via '[ 1208510 ] [rfe-contrib]
Add Stacktrace dump to ToeThread.report()'.


Date: 2005-05-05 19:49
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Changed the summary because we've applied the bulk of this
patch -- just the bit that depends on jdk150 hasn't been
applied (Added this RFE as minor reason for why we should
move to jdk150 exclusively:
http://crawler.archive.org/cgi-bin/wiki.pl?WhyJdk150)


Date: 2005-04-12 15:52
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

The hard part about switching on presence of jdk5.0 is that
we precompile the jsp. Perhaps we could compile all but
jdk5.0 pages? Would take a little work. Will have to wait
till after 1.4.0.

Its a sweet trick though Christian. A nice feature. I
liked your bolding of threadgroup names.

Upped priority.


Date: 2005-04-12 08:01
Sender: ck-heritrix

Logged In: YES
user_id=1220421

Arrgh, you're right :)

However, it might be a feasible to provide the stacktrace feature
only if a JDK 5.0 JVM is detected.



Date: 2005-04-11 21:54
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Reopening.

Backed out the stacktrace feature. It uses 1.5.0ism. We
still want to support jdk 1.4.




Date: 2005-04-11 20:06
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Nice patch Christian. I'll add it because it adds a nice
debugging aid -- the stacktrace in the UI.

Made a minor change 'cos threadgroup came back null in
stacktrace.jsp testing but otherwise all looks good.

Below was the commit message.

1180630 ] [contrib] UI: Immediate job submission +
stacktrace dump
Contributed by Christian Kohlschutter. Tested by St.Ack.
Also partially implements: '[ 1061288 ] recovering crawl
shows as neither
pending or in-progress'
Here are Christian's notes on the patch:

Here are Christian's notes on the patch:

1. Submitting a new job with a huge seed list and
recovering large old jobs currently "hang" the web UI
until the operation terminates. I have modified
Heritrix to make new jobs being prepared for crawling
in a separate thread. This way, the WUI immediately
responds to the user's request.
Along with this, I have introduced a new job status
"PREPARING", which signalizes that state, especially in
order to disallow multiple job submissions over
multiple Heritrix instances.

Some JSP files had to be modified to not throw
NullPointerExceptions but to show as much information
as possible about the job being prepared. Now, you can
access report information (such as queue state) already
during job recovery.

2. In addition, a new "Stacktraces" report is provided
which dumps stack traces of all running Java threads.
In most cases, this is more handy than sending a "kill
-SIGQUIT" or using JMX, isn't it?

* src/java/org/archive/crawler/admin/CrawlJob.java
Add new state 'preparing' job.
* src/java/org/archive/crawler/admin/CrawlJobHandler.java
Startup jobs in a dedicated thread to free up UI.
(startNextJob): Made final.
(startNextJobInternal): Added. Run in prepare-job thread.
* src/java/org/archive/crawler/framework/CrawlController.java
Added new state 'PREPARING'.
* src/webapps/admin/index.jsp
* src/webapps/admin/include/head.jsp
Allow for preparing state.
* src/webapps/admin/reports.jsp
Add new stacktrace report.
Allow for seeds and crawl reports not being available
(when preparing).
* src/webapps/admin/reports/stacktraces.jsp
Added.



Attached File ( 1 )

Filename Description Download
prepare-job-1.patch Patch: Prepare job in a separate thread + stacktrace report Download

Changes ( 11 )

Field Old Value Date By
artifact_group_id None 2005-09-23 21:08 gojomo
close_date 2005-04-11 20:06 2005-06-01 18:00 stack-sf
status_id Open 2005-06-01 18:00 stack-sf
category_id None 2005-05-05 19:49 stack-sf
assigned_to nobody 2005-05-05 19:49 stack-sf
summary [contrib] UI: Immediate job submission + stacktrace dump 2005-05-05 19:49 stack-sf
priority 5 2005-04-12 15:52 stack-sf
status_id Closed 2005-04-11 21:54 stack-sf
close_date - 2005-04-11 20:06 stack-sf
status_id Open 2005-04-11 20:06 stack-sf
File Added 129379: prepare-job-1.patch 2005-04-11 11:02 ck-heritrix