Recovering a crawl puts the path to the recovery log
into the 'via' of many URIs, resulting in a lot of
these errors in heritrix_out:
org.apache.commons.httpclient.URIException: Relative
URI but no base:
/1/webcrawl-test/results/heritrix-1.5.0-200506061400-20050606T203234/tAU/be
gin/logs/recover.gz
at
org.archive.crawler.datamodel.UURIFactory.fixup(UURIFactory.java:438)
at
org.archive.crawler.datamodel.UURIFactory.create(UURIFactory.java:296)
at
org.archive.crawler.datamodel.UURIFactory.create(UURIFactory.java:285)
at
org.archive.crawler.datamodel.UURIFactory.getInstance(UURIFactory.java:240)
at
org.archive.crawler.frontier.RecoveryJournal.importRecoverLog(RecoveryJourn
al.java:221)
at
org.archive.crawler.frontier.AbstractFrontier.importRecoverLog(AbstractFron
tier.java:799)
at
org.archive.crawler.framework.CrawlController.setupCrawlModules(CrawlContro
ller.java:582)
at
org.archive.crawler.framework.CrawlController.initialize(CrawlController.ja
va:336)
at
org.archive.crawler.admin.CrawlJobHandler.startNextJobInternal(CrawlJobHand
ler.java:1066)
at
org.archive.crawler.admin.CrawlJobHandler$2.run(CrawlJobHandler.java:1032)
at java.lang.Thread.run(Thread.java:595)
This is harmless, but pollutes the heritrix_out with
expected, uninteresting output. Some adjustment to the
current practice should prevent this.
Gordon Mohr
None
1.6.0
Public
|
Date: 2007-03-14 00:53
|
|
Date: 2005-07-22 02:53 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| artifact_group_id | None | 2005-09-23 18:02 | gojomo |
| status_id | Open | 2005-07-22 02:53 | gojomo |
| resolution_id | None | 2005-07-22 02:53 | gojomo |
| close_date | - | 2005-07-22 02:53 | gojomo |
| assigned_to | nobody | 2005-06-22 19:24 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use