Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

7 Dates in logs are unreadable. - ID: 1055766
Last Update: Comment added ( karl-ia )

Looking at two log entries, I cannot tell easily the
difference in time between the two. Lets change the
log entries to be more readable. Example:
2004-10-27-22:17:26:817.

Apache logs the time entry as:

13/Sep/2004:11:50:17 -0700

Tomcat logs like this:

2004-10-27 10:29:39


Michael Stack ( stack-sf ) - 2004-10-27 23:44

7

Closed

None

Gordon Mohr

i/o

None

Public


Comments ( 6 )

Date: 2007-03-14 01:35
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-844 -- please add further
comments at that location.


Date: 2005-03-03 21:02
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Completion of work for [ 1077924 ] crawl.log timestamps
out-of-order
* logs.jsp
update to reflect new timestamp format
* LogReader.java
methods to find first line beginning with prefix string
(instead of using regexp that was fouled by new timestamp
separators)
* src/articles/user_manual.xml
update sections about logs with new format info, updated
fetch-begun+duration column


Date: 2005-03-03 04:09
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Code changed. Commit comment:

Fix for [ 1077924 ] crawl.log timestamps out-of-order
Implementation for [ 1055766 ] Dates in logs are unreadable.
* ArchiveUtils.java
Add convenience SimpleDateFormat instances and static
methods for ISO8601 14-digit and 17-digit timestamps
* UriProcessingFormatter.java
Use log-time (ISO8601) for first field. Use fetch-begin
(RFC2550 aka ARC format) '+' fetch-duration in field 9,
which was previously just fetch-duration
* StatisticsTracker.java
Use log-time (ISO8601) for first field.
* AbstractTracker.java
Adjust headers slightly to better align with new longer
date.

Doc changes forthcoming.


Date: 2005-03-02 20:22
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Per triage meeting: change initial timestamps (first field)
to W3C/ISO8601 timestamps, with implied UTC timezone.




Date: 2005-03-01 01:50
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

The benefits of the current format include:
(1) it's a standard (rfc2550)
(2) it matches ARCs
(3) it sorts reasonably, even when mixing items of different
precision

I would only want to change the log format if we could
change the ARC format too -- I think consistency between the
two is valuable, especially if we could match the 'instants'
written in the ARCs with the logs.

About the only format I could see displacing the current is
ISO8601, with additional constraints: eg always assume (and
drop) the 'Z' UTC indicator, which improves its sorting
behavior. Such a timestamp would look like:

2004-10-27T22:17:26.817



Date: 2005-02-03 22:45
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Upped priority.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
status_id Open 2005-03-03 21:02 gojomo
close_date - 2005-03-03 21:02 gojomo
assigned_to nobody 2005-03-02 20:22 gojomo
priority 6 2005-02-10 00:34 stack-sf
priority 5 2005-02-03 22:45 stack-sf