Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

8 IllegalArgumentException adding to source host report - ID: 1462407
Last Update: Comment added ( karl-ia )

Igor reports issue in 1.8 candidate:

I started using 1.8 candidate for the production crawls
and run into errors like this one:

-----------------
Time: Mar. 30, 2006 05:22:47 GMT
Level: SEVERE
Message:

Fatal exception in ToeThread #28:
http://www.scorecard.org/graphics/leftNav_commCenter-over.gif

Exception:

java.lang.IllegalArgumentException: Data object class
(class java.util.Hashtable) not an instance of
binding's base class (class java.util.HashMap)
Stacktrace: java.lang.IllegalArgumentException: Data
object class (class java.util.Hashtable) not an
instance of binding's base class (class java.util.HashMap)
at
com.sleepycat.bind.serial.SerialBinding.objectToEntry(SerialBinding.java:15
2)
at
com.sleepycat.collections.DataView.useValue(DataView.java:502)
at
com.sleepycat.collections.DataCursor.initForPut(DataCursor.java:625)
at
com.sleepycat.collections.DataCursor.put(DataCursor.java:559)
at
com.sleepycat.collections.StoredContainer.put(StoredContainer.java:311)
at
com.sleepycat.collections.StoredMap.put(StoredMap.java:258)
at
org.archive.util.CachedBdbMap.expungeStaleEntry(CachedBdbMap.java:552)
at
org.archive.util.CachedBdbMap.expungeStaleEntries(CachedBdbMap.java:525)
at
org.archive.util.CachedBdbMap.get(CachedBdbMap.java:363)
at
org.archive.crawler.admin.StatisticsTracker.saveSourceStats(StatisticsTrack
er.java:746)
at
org.archive.crawler.admin.StatisticsTracker.crawledURISuccessful(Statistics
Tracker.java:738)
at
org.archive.crawler.framework.CrawlController.fireCrawledURISuccessfulEvent
(CrawlController.java:554)
at
org.archive.crawler.frontier.WorkQueueFrontier.finished(WorkQueueFrontier.j
ava:838)
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:159)
-------------------

I am running 4 crawls on the crawling022 where one has
finished without problems. The other three all have
same problems and are not making any progress.

Active crawls are:
http://crawling022.archive.org:8080/
http://crawling022.archive.org:8081/
http://crawling022.archive.org:8082/

Could you please take a look at this problems? I would
like to recover these crawls today if possible.

Machine seems to be OK.

i.


Michael Stack ( stack-sf ) - 2006-03-31 21:05

8

Closed

Fixed

Michael Stack

Disk I/O

1.8.0

Public


Comments ( 8 )

Date: 2007-03-14 01:05
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-546 -- please add further
comments at that location.


Date: 2006-04-11 22:50
Sender: ia_igorProject Admin

Logged In: YES
user_id=715474

I never checked if created report at the end of the test
crawl had any content. I just saw that report was created
and did not notice this exception in heritrix_out:

Exception in thread "ToeThread #71: "
java.lang.ClassCastException: java.util.HashMap
at
org.archive.crawler.admin.StatisticsTracker.writeSourceReportTo(StatisticsTracker.java:882)
at
org.archive.crawler.admin.StatisticsTracker.writeReportTo(StatisticsTracker.java:1038)
at
org.archive.crawler.admin.StatisticsTracker.writeReportFile(StatisticsTracker.java:998)
at
org.archive.crawler.admin.StatisticsTracker.dumpReports(StatisticsTracker.java:1070)
at
org.archive.crawler.framework.AbstractTracker.crawlEnded(AbstractTracker.java:314)
at
org.archive.crawler.admin.StatisticsTracker.crawlEnded(StatisticsTracker.java:841)
at
org.archive.crawler.framework.CrawlController.sendCrawlStateChangeEvent(CrawlController.java:956)
at
org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1028)
at
org.archive.crawler.admin.CrawlJob$MBeanCrawlController.completeStop(CrawlJob.java:793)
at
org.archive.crawler.framework.CrawlController.toeEnded(CrawlController.java:1810)
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:190)

---------------------
I ran a simple test crawl again and commited the fix.
The issue should remain closed/fixed.


Date: 2006-04-05 23:42
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Good by me, let's leave assigned to Stack as record of who
fixed.


Date: 2006-04-05 22:40
Sender: ia_igorProject Admin

Logged In: YES
user_id=715474

I used Stack's patch and ran the crawl for 5 days without
seeing the error. Closing the issue.



Date: 2006-04-05 22:05
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Already did. See StatisticsTracker.java.patch attachment
below (Assigned back to G).


Date: 2006-04-05 21:37
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

With SF CVS back, could you post the fix as a patch for review?


Date: 2006-04-03 23:39
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Igor reports trying the patch and exception no longer shows.

CVS is still out and looks like it will be till tuesday or
even wednesday. See below:

( 2006-04-03 14:04:56 - Project CVS Service ) As an
update to the 2006-03-30 CVS outage, our current estimate is
that CVS services will be back online (developer access)
late Tuesday or early Wednesday (Pacific Timezone).

I've attached a better patch. Assigning to Gordon to
review. Assign back after and I'll commit (when sf.net comes
back).


Date: 2006-03-31 21:19
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Its as the exception says, we tell BigMap we're going to be
serializing HashMaps but in actuality we're serializing
Hashtables. Weird we haven't seen this issue till now.

CVS seems to be down so am having difficulty generating a
jar to test. Meantime fix is attached. Will try again in
an hour. Perhaps CVS will be back then.


Attached Files ( 2 )

Filename Description Download
StatisticsTracker.java StatisticsTracker that aligns type declared to BigMap with what we actually use in the BigMap value Download
StatisticsTracker.java.patch Better version of patch. Download

Changes ( 10 )

Field Old Value Date By
assigned_to nobody 2006-04-05 23:42 gojomo
status_id Open 2006-04-05 22:40 ia_igor
resolution_id None 2006-04-05 22:40 ia_igor
assigned_to gojomo 2006-04-05 22:40 ia_igor
close_date - 2006-04-05 22:40 ia_igor
assigned_to stack-sf 2006-04-05 22:05 stack-sf
assigned_to gojomo 2006-04-05 21:37 gojomo
File Added 173310: StatisticsTracker.java.patch 2006-04-03 23:39 stack-sf
assigned_to stack-sf 2006-04-03 23:39 stack-sf
File Added 172978: StatisticsTracker.java 2006-03-31 21:19 stack-sf