Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 NPE reading override - ID: 1031525
Last Update: Comment added ( karl-ia )

Igor wrestled with an NPE when reading in config for
the E04 crawl; He eventually traced the culprit
fragment. Nothing wrong with the way its written but
was throwing an NPE because of assumed container when
there is none at line 611 in CrawlSettingsSAXHandler.

Turns out the problem is because there once was a
filter in global scope that was removed that the
fragment still had reference to; thats why the
container on line 611 is null.

Rather than NPE, we'll log it as a severe error; crawl
can keep going but operator should probably be aware of
each of the overrides that still reference a
since-removed global filter.


Michael Stack ( stack-sf ) - 2004-09-20 20:49

5

Closed

Fixed

Michael Stack

configuration

None

Public


Comments ( 3 )

Date: 2007-03-14 00:16
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-247 -- please add further
comments at that location.


Date: 2004-09-20 21:51
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Here is the stack trace Igor saw:

Stacktrace:
java.lang.NullPointerException
at
org.archive.crawler.settings.CrawlSettingsSAXHandler$SimpleElementHandler.endElement(CrawlSettingsSAXHandler.java:611)
at
org.archive.crawler.settings.CrawlSettingsSAXHandler.endElement(CrawlSettingsSAXHandler.java:250)
at
org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at
org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at
org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
Source)
at
org.archive.crawler.settings.XMLSettingsHandler.readSettingsObject(XMLSettingsHandler.java:275)
at
org.archive.crawler.settings.XMLSettingsHandler.readSettingsObject(XMLSettingsHandler.java:302)
at
org.archive.crawler.settings.SettingsHandler.getSettingsObject(SettingsHandler.java(Compiled
Code))
at
org.archive.crawler.settings.SettingsHandler.getSettingsObject(SettingsHandler.java(Inlined
Compiled Code))
at
org.archive.crawler.settings.SettingsHandler.getSettingsForHost(SettingsHandler.java(Compiled
Code))
at
org.archive.crawler.settings.SettingsHandler.getSettings(SettingsHandler.java(Inlined
Compiled Code))
at
org.archive.crawler.settings.ComplexType.getSettingsFromObject(ComplexType.java(Compiled
Code))
at
org.archive.crawler.settings.ComplexType.getSettingsFromObject(ComplexType.java(Inlined
Compiled Code))
at
org.archive.crawler.settings.ComplexType.getAttribute(ComplexType.java(Compiled
Code))
at
org.archive.crawler.settings.ComplexType.getUncheckedAttribute(ComplexType.java:493)
at
org.archive.crawler.frontier.Frontier.innerSchedule(Frontier.java:425)
at
org.archive.crawler.frontier.Frontier.loadSeeds(Frontier.java:337)
at
org.archive.crawler.frontier.Frontier.initialize(Frontier.java:297)
at
org.archive.crawler.framework.CrawlController.setupCrawlModules(CrawlController.java:484)
at
org.archive.crawler.framework.CrawlController.initialize(CrawlController.java:298)
at
org.archive.crawler.admin.CrawlJobHandler.startNextJob(CrawlJobHandler.java:891)
at
org.archive.crawler.admin.CrawlJobHandler.startCrawler(CrawlJobHandler.java:846)
at
org.archive.crawler.jspc.admin.console.action_jsp._jspService(Unknown
Source)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:358)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:294)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1807)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:525)
at
org.mortbay.http.HttpContext.handle(HttpContext.java:1757)
at org.mortbay.http.HttpServer.service(HttpServer.java:879)
at
org.mortbay.http.HttpConnection.service(HttpConnection.java:790)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:961)
at
org.mortbay.http.HttpConnection.handle(HttpConnection.java:807)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:197)
at
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:276)
at
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:511)



Date: 2004-09-20 20:57
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed.


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-09-20 20:57 stack-sf
resolution_id None 2004-09-20 20:57 stack-sf
close_date - 2004-09-20 20:57 stack-sf