Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

8 NPE when viewing crawl report - ID: 1204931
Last Update: Comment added ( karl-ia )

Using the latest HEAD, I've been getting NPEs when
trying to view the crawl report.

---
java.lang.NullPointerException

java.lang.NullPointerException
at
org.archive.crawler.jspc.admin.reports.crawljob_jsp._jspService(Unknown
Source)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:358)
at
org.mortbay.jetty.servlet.WebApplicationHandler$Chain.doFilter(WebApplicati
onHandler.java:342)
at
org.archive.crawler.admin.ui.RootFilter.doFilter(RootFilter.java:67)
at
org.mortbay.jetty.servlet.WebApplicationHandler$Chain.doFilter(WebApplicati
onHandler.java:334)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHand
ler.java:286)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1807)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContex
t.java:525)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1757)
at
org.mortbay.http.HttpServer.service(HttpServer.java:879)
at
org.mortbay.http.HttpConnection.service(HttpConnection.java:789)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:960)
at
org.mortbay.http.HttpConnection.handle(HttpConnection.java:806)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:218)
at
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:300)
at
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:511)
---

The error usually occurs somewhere in the domains list
(20-30 domains might show up). This was not happening
before 1.4.0 so something in the last few weeks leading
up to 1.4.0 or after it must cause this.


Kristinn Sigurdsson ( kristinn_sig ) - 2005-05-19 12:43

8

Closed

None

Karl Thiessen

None

1.6.0

Public


Comments ( 3 )

Date: 2007-03-14 00:52
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-413 -- please add further
comments at that location.


Date: 2005-06-20 05:48
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Fixed by clearing up synchronization problems in
StatisticsTracker. Commit comment:

Fix for [ 1204931 ] NPE when viewing crawl report
Partial fix for [ 1221570 ] reports (web ui and to disk)
don't scale
* StatisticsTracker.java
synchronization cleanup
updated for writer-based Reporter interface
avoid composing giant Strings for disk-based report writing

Assigning to QA for verification.


Date: 2005-05-19 15:10
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Tom is having same issue. I upped the priority.

Tom Emerson wrote:

> Monitoring an ongoing crawl I was looking at the crawl job
> report. Part way through the "Hosts" section I get an NPE:
>
> [...]
> www.alanwar.com 1208 1184 KB
24s144ms
> www.bentqatar.com 1205 31 MB 15ms
> www.alwaraq.com 1194 6 MB
2m26s581ms
> An error occured
>
> java.lang.NullPointerException
>
> java.lang.NullPointerException
> at
org.archive.crawler.jspc.admin.reports.crawljob_jsp._jspService(Unknown
Source)
> at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
> at
javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
> at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:358)
> at
org.mortbay.jetty.servlet.WebApplicationHandler$Chain.doFilter(WebApplicationHandler.java:342)
> at
org.archive.crawler.admin.ui.RootFilter.doFilter(RootFilter.java:67)
> at
org.mortbay.jetty.servlet.WebApplicationHandler$Chain.doFilter(WebApplicationHandler.java:334)
> at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:286)
> at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> at
org.mortbay.http.HttpContext.handle(HttpContext.java:1807)
> at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:525)
> at
org.mortbay.http.HttpContext.handle(HttpContext.java:1757)
> at
org.mortbay.http.HttpServer.service(HttpServer.java:879)
> at
org.mortbay.http.HttpConnection.service(HttpConnection.java:789)
> at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:960)
> at
org.mortbay.http.HttpConnection.handle(HttpConnection.java:806)
> at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:218)
> at
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:300)
> at
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:511)
>
> You may be able to recover by going back
>
>
> I've gotten this on other admin pages too. It seems to be
spurious,
> insofar as I can go back and return to the page, only to
get a bit
> further in the list before it happens again.
>
> Details:
>
> Mac OS X 10.4.1
> (0) tree% java -version
> java version "1.4.2_07"
> Java(TM) 2 Runtime Environment, Standard Edition (build
1.4.2_07-215)
> Java HotSpot(TM) Client VM (build 1.4.2-50, mixed mode)
>
> Heritrix 1.4 binary distro.
>
> Order contains some 2300 seeds, SURT prefix scope,
inferring prefixes
> from seeds (i.e., no external SURT prefix file), 100
toe-threads, 15
> ARC writers.
>
> Filters:
>
> 1) URI regexp exclude filter, with my big "HTML-only" regexp
>
> 2) Midfetch filter to only grab content that's text/html.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
status_id Open 2005-12-02 17:14 stack-sf
close_date - 2005-12-02 17:14 stack-sf
artifact_group_id None 2005-09-23 18:29 gojomo
assigned_to gojomo 2005-06-20 05:48 gojomo
assigned_to nobody 2005-06-16 00:37 gojomo
priority 6 2005-05-19 15:10 stack-sf