Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 Can't open logs of old jobs post-restart in UI - ID: 1060589
Last Update: Comment added ( karl-ia )

OK by me. Could deserve a 'known issue' note as well,
with the workaround to look on disk
directly. I think your initial fix would have still
worked with old jobs.

- Gordon

stack wrote:

> Gordon Mohr (Internet Archive) wrote:
>
>> Problem with this fix:
>>
>> Michael Stack wrote:
>>
>>> --- 49,64 ----
>>> }
>>> ! if(theJob != null) {
>>> settingsHandler =
theJob.getSettingsHandler();
>>> ! String logsPath =
(String)settingsHandler.getOrder().
>>> ! getAttribute(CrawlOrder.ATTR_LOGS_PATH);
>>> ! File f = new File(logsPath);
>>> ! if (f.isAbsolute()) {
>>> ! fileName = (new File(f,
fileName)).getAbsolutePath(); ! } else {
>>> ! f =
settingsHandler.getOrder().getController().
>>> !
getSettingsDir(CrawlOrder.ATTR_LOGS_PATH);
>>> ! fileName = (new File(f,
fileName)).getAbsolutePath();
>>> ! }
>>
>>
>>
>>
>> For old jobs, there's no controller attached, so
this getController().getSettingsDirectory()
>> will NPE. To trigger, choose the 'logs' link from
the list of old completed jobs.
>>
>> This makes fixing this for all situations a bit of a
mess unless some
>> of the path-knowledge in controller gets duplicated
elsewhere...
>>
> I think its a rare-enough case that, unless you think
otherwise, I'll leave it as is and make an issue for
this case.
> St.Ack


Michael Stack ( stack-sf ) - 2004-11-04 22:54

5

Closed

Fixed

Nobody/Anonymous

Usability/UI

None

Public


Comments ( 2 )

Date: 2007-03-14 00:18
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-283 -- please add further
comments at that location.


Date: 2004-11-05 21:37
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Kris submitted a patch. I tested it. Here is the commit.
Closing.

Here is the commit message:


Fix for [ 1060589 ] Can't open logs of old jobs post-restart
in UI
Patch submitted by Kris. Here is his email.

A couple of things.

... (See below).

* src/java/org/archive/crawler/admin/CrawlJob.java
(getLogPath): Added.
* src/webapps/admin/logs.jsp
Use new getLogPath method.
* src/articles/releasenotes.xml
Removed logs not working on old crawls as known problem.


>> -----Original Message-----
>> From: archive-crawler-cvs-admin@lists.sourceforge.net
>> [mailto:archive-crawler-cvs-admin@lists.sourceforge.net] On
>> Behalf Of Gordon Mohr (Internet Archive)
>> Sent: 4. nóvember 2004 22:22
>> To: Michael Stack
>> Cc: archive-crawler-cvs@lists.sourceforge.net
>> Subject: Re: [Archive-crawler-cvs]
>> ArchiveOpenCrawler/src/webapps/admin logs.jsp,1.25,1.26
>>
>>
>> Problem with this fix:
>>
>> Michael Stack wrote:
>
>>> > --- 49,64 ----
>>> > }
>>> >
>>> > ! if(theJob != null) {
>>> > settingsHandler = theJob.getSettingsHandler();
>>> > ! String logsPath =
(String)settingsHandler.getOrder().
>>> > ! getAttribute(CrawlOrder.ATTR_LOGS_PATH);
>>> > ! File f = new File(logsPath);
>>> > ! if (f.isAbsolute()) {
>>> > ! fileName = (new File(f,
fileName)).getAbsolutePath();
>>> > ! } else {
>>> > ! f =
settingsHandler.getOrder().getController().
>>> > !
getSettingsDir(CrawlOrder.ATTR_LOGS_PATH);
>>> > ! fileName = (new File(f,
fileName)).getAbsolutePath();
>>> > ! }
>
>>
>> For old jobs, there's no controller attached, so this
>> getController().getSettingsDirectory()
>> will NPE. To trigger, choose the 'logs' link from the
list of
>> old completed jobs.
>>
>> This makes fixing this for all situations a bit of a mess
unless some
>> of the path-knowledge in controller gets duplicated
elsewhere...


A couple of things.

1. Michael's change causes some problems. By rewriting
fileName (the
variable) to also include the path, other things get broken.
Namely the
'having the current log selected (black text, no
underscore)' and the
display of the current logs name (if we want to display the
path, it should
be done somewhere at the bottom of the page, not in the 'log
display
header'.)

2. To fix all of this how about adding a 'getLogsPath()' to
CrawlJob. The
code in getSettingsDirectory() could be replicated (it's
just reading from
the CrawlOrder) minus the 'create dir if it doesn't exist
already'.

That would have the additional benefit of removing a bit
more 'code' from
the JSP pages (usually a good thing).
So the above section in the logs.jsp file would just be:

So whenever you need the absolute path of a log you use
theJob.getLogPath(fileName);

See patches below. (As far as I can tell it works under all the
circumstances Michael cited)

- Kris

Index: logs.jsp
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/webapps/admin/logs.jsp,v
retrieving revision 1.26
diff -u -r1.26 logs.jsp
--- logs.jsp 3 Nov 2004 23:29:23 -0000 1.26
+++ logs.jsp 5 Nov 2004 09:39:41 -0000
@@ -49,17 +49,6 @@
}

if(theJob != null) {
- settingsHandler = theJob.getSettingsHandler();
- String logsPath = (String)settingsHandler.getOrder().
- getAttribute(CrawlOrder.ATTR_LOGS_PATH);
- File f = new File(logsPath);
- if (f.isAbsolute()) {
- fileName = (new File(f,
fileName)).getAbsolutePath();
- } else {
- f = settingsHandler.getOrder().getController().
- getSettingsDir(CrawlOrder.ATTR_LOGS_PATH);
- fileName = (new File(f,
fileName)).getAbsolutePath();
- }

// Got a valid crawl order, find it's logs
if(mode != null && mode.equalsIgnoreCase("number"))
@@ -72,7 +61,7 @@
}
catch(Exception e){/*Ignore*/}

- log = LogReader.getFromSeries(fileName, linenumber,
linesToShow);
+ log =
LogReader.getFromSeries(theJob.getLogPath(fileName),
linenumber, linesToShow);
}
else if(mode != null && mode.equalsIgnoreCase("time"))
{
@@ -87,9 +76,9 @@
else
{
int timestampLinenumber = LogReader.
- findFirstLineContainingFromSeries(fileName,
+
findFirstLineContainingFromSeries(theJob.getLogPath(fileName),
timestamp + ".*");
- log = LogReader.getFromSeries(fileName,
+ log =
LogReader.getFromSeries(theJob.getLogPath(fileName),
timestampLinenumber, linesToShow);
}
}
@@ -120,13 +109,13 @@

if(indent)
{
- log =
LogReader.getByRegExprFromSeries(fileName,
- regexpr, " ",
ln,linesToSkip-1,linesToShow);
+ log =
LogReader.getByRegExprFromSeries(theJob.getLogPath(fileName),
+ regexpr, " ", ln,linesToSkip-1,
linesToShow);
}
else
{
- log =
LogReader.getByRegExprFromSeries(fileName,
regexpr,
- 0, ln,linesToSkip-1,linesToShow);
+ log =
LogReader.getByRegExprFromSeries(theJob.getLogPath(fileName),
+ regexpr, 0, ln,linesToSkip-1,
linesToShow);
}
}
}
@@ -141,7 +130,7 @@
}
catch(Exception e){/* Ignore - default value
will do */}

- log = LogReader.tail(fileName, linesToShow);
+ log =
LogReader.tail(theJob.getLogPath(fileName), linesToShow);
}
}
else

Index: CrawlJob.java
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawler/adm
in/CrawlJob.java,v
retrieving revision 1.21
diff -u -r1.21 CrawlJob.java
--- CrawlJob.java 29 Oct 2004 21:25:44 -0000 1.21
+++ CrawlJob.java 5 Nov 2004 09:38:39 -0000
@@ -30,10 +30,14 @@
import java.util.Iterator;
import java.util.logging.Level;

+import javax.management.AttributeNotFoundException;
import javax.management.InvalidAttributeValueException;
+import javax.management.MBeanException;
+import javax.management.ReflectionException;

import org.archive.crawler.Heritrix;
import org.archive.crawler.checkpoint.Checkpoint;
+import org.archive.crawler.datamodel.CrawlOrder;
import org.archive.crawler.framework.StatisticsTracking;
import org.archive.crawler.settings.XMLSettingsHandler;

@@ -548,6 +552,34 @@
return jobDir;
}
}
+
+ /**
+ * Returns the absolute path of the specified log.
+ * <p>
+ * Note: If crawl has not begun, this file may not exist.
+ *
+ * @return the absolute path for the specified log.
+ *
+ * @throws AttributeNotFoundException
+ * @throws ReflectionException
+ * @throws MBeanException
+ */
+ public String getLogPath(String log) throws
AttributeNotFoundException,

+ MBeanException, ReflectionException {
+ String logsPath = (String)settingsHandler.getOrder().
+ getAttribute(CrawlOrder.ATTR_LOGS_PATH);
+
+ CrawlOrder order = settingsHandler.getOrder();
+ String diskPath =
+ (String) order.getAttribute(null,
CrawlOrder.ATTR_DISK_PATH);
+ File disk =
settingsHandler.getPathRelativeToWorkingDirectory(diskPath);
+
+ File f = new File(logsPath + File.separator + log);
+ if (!f.isAbsolute()) {
+ f = new File(disk.getPath(), logsPath +
File.separator + log);
+ }
+ return f.getAbsolutePath();
+ }

/**
* Get the error message associated with this job. Will
return null if
there


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-11-05 21:37 stack-sf
resolution_id None 2004-11-05 21:37 stack-sf
close_date - 2004-11-05 21:37 stack-sf