Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

8 resurrectOneQueueState has keys for items not in allqueues - ID: 1325961
Last Update: Comment added ( karl-ia )

Recovery a fairly large crawl on crawling013, getting
following exception:

10/13/2005 01:16:57 +0000 FINE
org.archive.crawler.frontier.BdbFrontier put Restoring:
146.145.120.3
10/13/2005 01:16:57 +0000 FINE
org.archive.crawler.frontier.BdbFrontier put Restoring:
168.8.216.16
java.lang.NullPointerException
at java.util.TreeMap.compare(TreeMap.java:1093)
at java.util.TreeMap.put(TreeMap.java:465)
at java.util.TreeSet.add(TreeSet.java:210)
at
java.util.Collections$SynchronizedCollection.add(Collections.java:1581)
at
org.archive.crawler.frontier.BdbFrontier$4.put(BdbFrontier.java:326)
at
org.archive.crawler.frontier.BdbFrontier.put(BdbFrontier.java:375)
at
org.archive.crawler.frontier.BdbFrontier.resurrectOneQueueState(BdbFrontier
.java:359)
at
org.archive.crawler.frontier.BdbFrontier.resurrectQueueState(BdbFrontier.ja
va:329)
at
org.archive.crawler.frontier.BdbFrontier.initQueue(BdbFrontier.java:272)
at
org.archive.crawler.frontier.WorkQueueFrontier.initialize(WorkQueueFrontier
.java:269)
at
org.archive.crawler.frontier.BdbFrontier.initialize(BdbFrontier.java:467)
at
org.archive.crawler.framework.CrawlController.setupCrawlModules(CrawlContro
ller.java:652)
at
org.archive.crawler.framework.CrawlController.initialize(CrawlController.ja
va:378)
at
org.archive.crawler.admin.CrawlJob.startCrawling(CrawlJob.java:777)
at
org.archive.crawler.admin.CrawlJobHandler.startNextJobInternal(CrawlJobHand
ler.java:1120)
at
org.archive.crawler.admin.CrawlJobHandler$2.run(CrawlJobHandler.java:1103)
at java.lang.Thread.run(Thread.java:595)

Later adding asserts, I see that for the key
'146.145.120.3', there is not item in allqueues:

Exception in thread "StartNextJob"
java.lang.AssertionError: null is null: 146.145.120.3
at
org.archive.crawler.frontier.BdbFrontier.put(BdbFrontier.java:374)
at
org.archive.crawler.frontier.BdbFrontier.resurrectOneQueueState(BdbFrontier
.java:354)
at
org.archive.crawler.frontier.BdbFrontier.resurrectQueueState(BdbFrontier.ja
va:329)
at
org.archive.crawler.frontier.BdbFrontier.initQueue(BdbFrontier.java:272)
at
org.archive.crawler.frontier.WorkQueueFrontier.initialize(WorkQueueFrontier
.java:269)
at
org.archive.crawler.frontier.BdbFrontier.initialize(BdbFrontier.java:467)
at
org.archive.crawler.framework.CrawlController.setupCrawlModules(CrawlContro
ller.java:652)
at
org.archive.crawler.framework.CrawlController.initialize(CrawlController.ja
va:378)
at
org.archive.crawler.admin.CrawlJob.startCrawling(CrawlJob.java:777)
at
org.archive.crawler.admin.CrawlJobHandler.startNextJobInternal(CrawlJobHand
ler.java:1120)
at
org.archive.crawler.admin.CrawlJobHandler$2.run(CrawlJobHandler.java:1103)
at java.lang.Thread.run(Thread.java:595)



Michael Stack ( stack-sf ) - 2005-10-13 16:14

8

Closed

Invalid

Michael Stack

None

None

Public


Comments ( 2 )

Date: 2007-03-14 01:01
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-503 -- please add further
comments at that location.


Date: 2005-10-13 21:09
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

The problem was the list of files in the checkpoint
bdbje-logs subdirectory didn't match the list in the
bdbje-logs-manifest.txt file. I had included a file too
many -- the bdbje log file which included the
clear-bdbje-database-on-terminate action so the allqueues on
recovery appeared empty.

Closing. Operator error. Added note on expert mode
checkpointing to the manual that includes reference to this
particular error.


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2005-10-13 21:09 stack-sf
resolution_id None 2005-10-13 21:09 stack-sf
close_date - 2005-10-13 21:09 stack-sf