Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 crawl.log timestamps out-of-order - ID: 1077924
Last Update: Comment added ( karl-ia )

Because the crawl.log timestamps are the 'fetch
completed time' for a completed CrawlURI, and different
CrawlURIs take different amounts of time to finish
post-fetch steps, the timestamps in the crawl.log are
not in strict order.

This is by design, but confusing. An alternate
end-of-processing timestamp could be used for the
crawl.log that ensures expected ever-increasing
timestamps.

If Igor and Dan don't find anything useful about the
end-of-fetch time as opposed to end-of-processing, we
should switch to end-of-processing time.

Since the crawl.log timestamp is already different from
the ARC record timestamp (crawl.log uses end-of-fetch;
arc-record uses start-of-fetch), there shouldn't be any
other loss of consistency/cross-referenceability.


Gordon Mohr ( gojomo ) - 2004-12-02 21:36

7

Closed

Fixed

Gordon Mohr

Usability/UI

None

Public


Comments ( 5 )

Date: 2007-03-14 00:18
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-300 -- please add further
comments at that location.


Date: 2005-03-03 07:46
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Docs updated, and WUI logs view-by-timestamp updated to work
with new-format timestamps. Commit comment:

Completion of work for [ 1077924 ] crawl.log timestamps
out-of-order
* logs.jsp
update to reflect new timestamp format
* LogReader.java
methods to find first line beginning with prefix string
(instead of using regexp that was fouled by new timestamp
separators)
* src/articles/user_manual.xml
update sections about logs with new format info, updated
fetch-begun+duration column


Date: 2005-03-03 04:09
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Code changed. Commit comment:

Fix for [ 1077924 ] crawl.log timestamps out-of-order
Implementation for [ 1055766 ] Dates in logs are unreadable.
* ArchiveUtils.java
Add convenience SimpleDateFormat instances and static
methods for ISO8601 14-digit and 17-digit timestamps
* UriProcessingFormatter.java
Use log-time (ISO8601) for first field. Use fetch-begin
(RFC2550 aka ARC format) '+' fetch-duration in field 9,
which was previously just fetch-duration
* StatisticsTracker.java
Use log-time (ISO8601) for first field.
* AbstractTracker.java
Adjust headers slightly to better align with new longer
date.

Doc changes forthcoming.


Date: 2005-03-02 19:40
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Per triage: make log initial time time of logging (and thus
monotonically increasing)... add another timestamp column at
end for ARC.
Doc impact: list of crawl.log fields.


Date: 2005-02-03 23:05
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Upping priority so it gets addressed.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
status_id Open 2005-03-03 07:46 gojomo
resolution_id None 2005-03-03 07:46 gojomo
close_date - 2005-03-03 07:46 gojomo
assigned_to nobody 2005-03-02 19:40 gojomo
priority 6 2005-02-10 00:55 stack-sf
priority 5 2005-02-03 23:05 stack-sf