Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

8 hosts-report.txt is empty - ID: 1176788
Last Update: Comment added ( karl-ia )

After a successfully completed shadow crawl, the hosts-report.txt
contains only the header line:

[host][#urls] [#bytes]

...and nothing else. Reports are available for review at
crawling005:/2/crawldata/UKGOV-WEEKLY-SHADOW-085


Dan Avery ( danavery ) - 2005-04-05 03:52

8

Closed

Fixed

Michael Stack

Logging

None

Public


Comments ( 3 )

Date: 2007-03-14 00:22
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-384 -- please add further
comments at that location.


Date: 2005-04-13 20:58
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed.

Below is commit.

* src/java/org/archive/crawler/admin/StatisticsTrackerTest.java
Added test of reverse sort.
* src/java/org/archive/crawler/admin/StatisticsTracker.java
(getSortedByValue): Renamed as
(getReverseSortCopy): Added. Returns TreeMap instead of
TreeSet.
Added handling of case where entrySet is not supported
in the passed map.
(getLongestMapEntryKey): Changed implementation so it
operates on
an iterator of keys rather than an interator of Map.Entry.
(writeHostReport): Changes to go against changed way in
which
getLongest... and getSortedByValue have changed.



Date: 2005-04-13 02:01
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Taking this issue. StatisticsTracker relys on entrySet.
This don't work so well when going against BigMap
implementations that are disk backed.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
status_id Open 2005-04-13 20:58 stack-sf
resolution_id None 2005-04-13 20:58 stack-sf
close_date - 2005-04-13 20:58 stack-sf
assigned_to nobody 2005-04-13 02:01 stack-sf
priority 5 2005-04-05 03:52 danavery