archive-access-discuss Mailing List for Web Archive Access Utilities (Page 32)

Brought to you by: binzino, bradtofel, gojomo, ia_igor, and 5 others

archive-access-discuss — General discussion about archive-access projects

You can subscribe to this list here.

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (4)	Sep (5)	Oct (17)	Nov (30)	Dec (3)
2006	Jan (4)	Feb (14)	Mar (8)	Apr (11)	May (2)	Jun (13)	Jul (9)	Aug (2)	Sep (2)	Oct (9)	Nov (20)	Dec (9)
2007	Jan (6)	Feb (4)	Mar (6)	Apr (7)	May (6)	Jun (6)	Jul (4)	Aug (3)	Sep (9)	Oct (26)	Nov (23)	Dec (2)
2008	Jan (17)	Feb (19)	Mar (16)	Apr (27)	May (3)	Jun (21)	Jul (21)	Aug (8)	Sep (13)	Oct (7)	Nov (8)	Dec (8)
2009	Jan (18)	Feb (14)	Mar (27)	Apr (14)	May (10)	Jun (14)	Jul (18)	Aug (30)	Sep (18)	Oct (12)	Nov (5)	Dec (26)
2010	Jan (27)	Feb (3)	Mar (8)	Apr (4)	May (6)	Jun (13)	Jul (25)	Aug (11)	Sep (2)	Oct (4)	Nov (7)	Dec (6)
2011	Jan (25)	Feb (17)	Mar (25)	Apr (23)	May (15)	Jun (12)	Jul (8)	Aug (13)	Sep (4)	Oct (17)	Nov (7)	Dec (6)
2012	Jan (4)	Feb (7)	Mar (1)	Apr (10)	May (11)	Jun (5)	Jul (7)	Aug (1)	Sep (1)	Oct (5)	Nov (6)	Dec (13)
2013	Jan (9)	Feb (7)	Mar (3)	Apr (1)	May (3)	Jun (19)	Jul (3)	Aug (3)	Sep	Oct (1)	Nov (1)	Dec (1)
2014	Jan (11)	Feb (1)	Mar	Apr (2)	May (6)	Jun	Jul	Aug (1)	Sep	Oct (1)	Nov (1)	Dec (1)
2015	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (4)	Aug	Sep	Oct	Nov	Dec (1)
2016	Jan (4)	Feb (3)	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct (1)	Nov	Dec
2018	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov (1)	Dec
2019	Jan (2)	Feb (1)	Mar	Apr	May	Jun (2)	Jul	Aug	Sep (1)	Oct (1)	Nov	Dec

Flat | Threaded

<< < 1 .. 30 31 32 33 34 .. 43 > >> (Page 32 of 43)

Re: [Archive-access-discuss] how to partition the index?

From: John H. L. <jl...@ar...> - 2008-02-05 20:29:34

Hi Miguel.

To use distributed search, you need to plan ahead a bit and generate  
multiple indices. I don't know of a way to partition an existing large  
index into smaller chunks.

For example, if you're indexing 100,000 ARCs and want to deploy on 10  
machines, you should split your list of ARCs into 10 chunks of 10,000,  
invoke ImportArcs for each chunk, and invoke NutchwaxIndexer for each  
chunk. This will produce 10 segment/index pairs, each of which could  
be deployed on one of your 10 machines.

For large jobs, I usually split the ARCs into groups of 1000. This  
produces segment/index pairs that are small enough to be manageable  
and flexible when it comes to deployment layout.

Hope this helps.

-J

On Feb 5, 2008, at 5:12 AM, Miguel Costa wrote:

> Hi  to all,
>
> After reading the nutchwax + nutch documentation I can index ARC  
> files and search them using the nutchwax + wayback machine.
> However, I would like to perform a distributed search but I don't  
> find any documentation on how to partition the index in n parts/ 
> segments for n machines.
> On the other hand there is information explaining how to distribute  
> search using the search-servers.txt file, but I need to partition  
> the index first.
> Can anyone explain me or give me a clue on how to partition an index  
> for n machines?
>
> Regards,
>
> Miguel Costa
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss

Re: [Archive-access-discuss] missing port number in wayback 1.0.1 search results

From: Chris V. <cv...@gm...> - 2008-02-05 17:08:55

Just a little update. I was able to get this to work in Wayback 0.8.0 (which
we are still using in a production app). The ResultURIConverter class was
extended and the makeReplayUri(SearchResult) method was overridden to use
the RESULT_URL_KEY (which contains port information) to form the replay URI
(instead of the RESULT_URL). The JSReplayRenderer class was also extended so
that the <base href. ..> tag and the javascript sResourceUrl variable also
use the RESULT_URL_KEY value.

After minimal testing, this seems to work without breaking any existing
functionality.  Does anyone know of a scenario where this will not work?

Eventually, once we move to Wayback 1.0.1, similar changes will need to be
made there.

-Chris

On Feb 4, 2008 3:19 PM, Chris Vicary <cv...@gm...> wrote:

> Hi all,
>
> I am having a problem retrieving harvested resources whose urls include
> port numbers using Wayback 1.0.1. We have a seed that includes a port
> number that was harvested using heritrix. The resulting arc files were
> indexed using wayback, and the urls stored in the index include the port
> number. Using the wayback web address search interface, I am able to find
> the urls by including the port number in the search string (if the port
> number is not included, no results are found - which is expected). The link
> for the search result does not include the port number, however, and
> clicking it does not retrieve the harvested resource. If the port number is
> inserted into the search result link, retrieval works fine. Even so,
> rewritten links on the retrieved page do not include a port number where
> applicable. So my question is, how do I ensure that port numbers are
> preserved in wayback search results and in rewritten links?
>
> Thanks,
>
> Chris
>

[Archive-access-discuss] how to partition the index?

From: Miguel C. <mig...@fc...> - 2008-02-05 13:13:05

Hi  to all,
 
After reading the nutchwax + nutch documentation I can index ARC files and
search them using the nutchwax + wayback machine.
However, I would like to perform a distributed search but I don't find any
documentation on how to partition the index in n parts/segments for n
machines. 
On the other hand there is information explaining how to distribute search
using the search-servers.txt file, but I need to partition the index first.
Can anyone explain me or give me a clue on how to partition an index for n
machines?
 
Regards,
 
Miguel Costa

[Archive-access-discuss] missing port number in wayback 1.0.1 search results

From: Chris V. <cv...@gm...> - 2008-02-04 20:19:10

Hi all,

I am having a problem retrieving harvested resources whose urls include port
numbers using Wayback 1.0.1. We have a seed that includes a port number that
was harvested using heritrix. The resulting arc files were indexed using
wayback, and the urls stored in the index include the port number. Using the
wayback web address search interface, I am able to find the urls by
including the port number in the search string (if the port number is not
included, no results are found - which is expected). The link for the search
result does not include the port number, however, and clicking it does not
retrieve the harvested resource. If the port number is inserted into the
search result link, retrieval works fine. Even so, rewritten links on the
retrieved page do not include a port number where applicable. So my question
is, how do I ensure that port numbers are preserved in wayback search
results and in rewritten links?

Thanks,

Chris

Re: [Archive-access-discuss] NutchWax Search error

From: John H. L. <jl...@ar...> - 2008-02-04 18:31:55

Hi Jack.

It sounds like you need to increase the number of open files that a  
single process can have on your system. The more indexes you're  
searching, the more filehandles the searcher process needs.

On Unix-like systems, "ulimit -n" will tell you the current setting,  
and "ulimit -n N" will set a new value for your current shell. If  
you're using Linux, you can usually set these values permanently in / 
etc/security/limits.conf.

For our systems, we use an arbitrary but high ulimit -n 32768.

Hope this helps.

-J

On Feb 4, 2008, at 7:06 AM, Pope, Jackson wrote:

> Hiya all,
>
> I’m trying to run NutchWax for searching. It’s all set up ok, doing  
> incremental indexing, and works fine. I’m not merging the indexes  
> however (I’ve a separate directory for each under /indexes, with an  
> index.done file in each). This works fine for small numbers of  
> indexes but with large numbers of indexes, I get an error when  
> browsing to the Nutchwax search page (before entering any search  
> criteria): Too many files open.
>
> Do I have to merge all the indexes together to get around this, or  
> is there another solution?
>
> Cheers,
>
> Jack
>
> Jackson Pope
> Technical Lead
> Web Archiving Team
> The British Library
> +44 (0)1937 54 6942
>
> **************************************************************************
>
> Experience the British Library online at www.bl.uk
>
> The British Library’s new interactive Annual Report and Accounts  
> 2006/07 : www.bl.uk/mylibrary
>
> Help the British Library conserve the world's knowledge. Adopt a  
> Book. www.bl.uk/adoptabook
>
> The Library's St Pancras site is WiFi - enabled
>
> *************************************************************************
>
> The information contained in this e-mail is confidential and may be  
> legally privileged. It is intended for the addressee(s) only. If you  
> are not the intended recipient, please delete this e-mail and notify  
> the pos...@bl... : The contents of this e-mail must not be  
> disclosed or copied without the sender's consent.
>
> The statements and opinions expressed in this message are those of  
> the author and do not necessarily reflect those of the British  
> Library. The British Library does not take any responsibility for  
> the views of the author.
>
> *************************************************************************
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss

[Archive-access-discuss] NutchWax Search error

From: Pope, J. <Jac...@bl...> - 2008-02-04 15:06:57

Hiya all,

=20

I'm trying to run NutchWax for searching. It's all set up ok, doing
incremental indexing, and works fine. I'm not merging the indexes
however (I've a separate directory for each under /indexes, with an
index.done file in each). This works fine for small numbers of indexes
but with large numbers of indexes, I get an error when browsing to the
Nutchwax search page (before entering any search criteria): Too many
files open.

=20

Do I have to merge all the indexes together to get around this, or is
there another solution?

=20

Cheers,

=20

Jack

=20

Jackson Pope

Technical Lead

Web Archiving Team

The British Library

+44 (0)1937 54 6942

*************************************************************************=
*
=20
Experience the British Library online at www.bl.uk
=20
The British Library's new interactive Annual Report and Accounts 2006/07 =
: www.bl.uk/mylibrary
=20
Help the British Library conserve the world's knowledge. Adopt a Book. =
www.bl.uk/adoptabook
=20
The Library's St Pancras site is WiFi - enabled
=20
*************************************************************************=

=20
The information contained in this e-mail is confidential and may be =
legally privileged. It is intended for the addressee(s) only. If you are =
not the intended recipient, please delete this e-mail and notify the =
pos...@bl... : The contents of this e-mail must not be disclosed or =
copied without the sender's consent.=20
=20
The statements and opinions expressed in this message are those of the =
author and do not necessarily reflect those of the British Library. The =
British Library does not take any responsibility for the views of the =
author.=20
=20
*************************************************************************=

Re: [Archive-access-discuss] org.archive.io.NoGzipMagicException

From: Miguel C. <mig...@fc...> - 2008-02-04 10:36:14

Thank you Brad.

This FIX solved the problem.

Best regards,

Miguel Costa 



-----Original Message-----
From: Brad Tofel [mailto:br...@ar...] 
Sent: sexta-feira, 1 de Fevereiro de 2008 20:16
To: Miguel Costa
Cc: arc...@li...
Subject: Re: [Archive-access-discuss] org.archive.io.NoGzipMagicException

Hey Miguel, I think I just found the problem: I hadn't checked in a small
but crucial change to the wayback-code pom.xml which increases the
dependency on archive-commons from 2.0.0 to 2.0.1.. I'm betting this makes
all the difference. Please try updating to the latest HEAD and let me know
if that works for you.

Brad

Miguel Costa wrote:
> Hello,
>  
> I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback 
> with an URL I get a:
>  
> org.archive.io.NoGzipMagicException
>  org.archive.io.GzipHeader.readHeader(GzipHeader.java:122)
>  org.archive.io.GzipHeader.<init>(GzipHeader.java:107)
>  
> org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:3
> 35)
>  
> org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.ja
> va:370
> )
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderF
> actory
> .java:383)
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderF
> actory
> .java:326)
>  
> org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResour
> ce(Loc
> alARCResourceStore.java:108)
>  
> org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:3
> 12)
>  
> org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:
> 280)
>  
> org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106
> )
>  
> org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:9
> 0)
>  
> The wayback find de file and then check if it is OK. This check thows 
> a NoGzipMagicException because it doesn't find a "magic" number.
> The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) 
> for both projects - nutchwax and wayback.
>  
> I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from 
> trunk) and indexed the same ARC files. The query's results are presented
ok.
> Other files present the same symptoms.
> Does anyone have a clue of this problem? Does anyone use this version 
> of wayback without problems?
>  
>  
> Thanks
> --
>
> Miguel Costa
>
>  
>
>   
> ----------------------------------------------------------------------
> --
>
> ----------------------------------------------------------------------
> --- This SF.net email is sponsored by: Microsoft Defy all challenges. 
> Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>

[Archive-access-discuss] full text indexing, with wayback ?

From: Kaisa K. <kau...@cs...> - 2008-02-04 08:58:02

Hi,
are there any plans to add full text indexing and search to wayback?

I know nutchwax makes full text indexes but 
- latest nutchwax release (0.10.0) dates from January 2007
- it's awkward to use when you want to keep duplicate versions
  of harvested pages
- seems to be difficult to integrate with wayback user interface

Best,
kk

Re: [Archive-access-discuss] org.archive.io.NoGzipMagicException

From: Brad T. <br...@ar...> - 2008-02-01 20:13:43

Hey Miguel, I think I just found the problem: I hadn't checked in a 
small but crucial change to the wayback-code pom.xml which increases the 
dependency on archive-commons from 2.0.0 to 2.0.1.. I'm betting this 
makes all the difference. Please try updating to the latest HEAD and let 
me know if that works for you.

Brad

Miguel Costa wrote:
> Hello,
>  
> I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with
> an URL I get a:
>  
> org.archive.io.NoGzipMagicException
>  org.archive.io.GzipHeader.readHeader(GzipHeader.java:122)
>  org.archive.io.GzipHeader.<init>(GzipHeader.java:107)
>  org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335)
>  
> org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370
> )
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
> .java:383)
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
> .java:326)
>  
> org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc
> alARCResourceStore.java:108)
>  org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312)
>  org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280)
>  org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106)
>  org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90)
>  
> The wayback find de file and then check if it is OK. This check thows a
> NoGzipMagicException because it doesn't find a "magic" number.
> The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for
> both projects - nutchwax and wayback.
>  
> I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk)
> and indexed the same ARC files. The query's results are presented ok.
> Other files present the same symptoms.
> Does anyone have a clue of this problem? Does anyone use this version of
> wayback without problems?
>  
>  
> Thanks
> --
>
> Miguel Costa
>
>  
>
>   
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>

Re: [Archive-access-discuss] org.archive.io.NoGzipMagicException

From: Brad T. <br...@ar...> - 2008-01-30 04:12:31

Hi Miguel,

The SVN code was not quite coherent: the wayback.xml configuration file, 
specifically, was not referencing the new implementation classes for the 
ResourceStore. I'm hoping that this is the issue, but see below for some 
more notes if you're still having problems.

The major changes you'll need to make in the wayback.xml are in the 
ResourceStore and Replay configurations.

I'm not convinced this will solve the problem though, since you were 
able to index the documents OK.. With what version of the wayback code 
did you first index them?

One last question is how the ARCs were compressed. Were they written 
compressed by Heritrix, or compressed later?

If the new wayback.xml (using different implementations) does not fix 
the problem, one thing that may help me figure out what's going wrong 
would be a fragment of one of your ARC files. Can you post part of one 
of your ARC files somewhere, for example, just the first few 100KB? 
(head -c 200000 foo.arc.gz > sample.arc.gz -- understanding that the 
last record in the ARC fragment will probably be truncated.)

Brad

Miguel Costa wrote:
> Hello,
>  
> I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with
> an URL I get a:
>  
> org.archive.io.NoGzipMagicException
>  org.archive.io.GzipHeader.readHeader(GzipHeader.java:122)
>  org.archive.io.GzipHeader.<init>(GzipHeader.java:107)
>  org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335)
>  
> org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370
> )
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
> .java:383)
>  
> org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
> .java:326)
>  
> org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc
> alARCResourceStore.java:108)
>  org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312)
>  org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280)
>  org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106)
>  org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90)
>  
> The wayback find de file and then check if it is OK. This check thows a
> NoGzipMagicException because it doesn't find a "magic" number.
> The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for
> both projects - nutchwax and wayback.
>  
> I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk)
> and indexed the same ARC files. The query's results are presented ok.
> Other files present the same symptoms.
> Does anyone have a clue of this problem? Does anyone use this version of
> wayback without problems?
>  
>  
> Thanks
> --
>
> Miguel Costa
>
>  
>
>   
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>

Re: [Archive-access-discuss] Wayback works with tomcat cluster?

From: Brad T. <br...@ar...> - 2008-01-30 03:53:50

Hi Natalia,

Sorry about the delayed response. I'm not sure I understand your 
question/problem. Are you trying to run multiple tomcat 
instances(processes) using the same BDB index? In this case, I don't 
think it will work - the BDBJE code wants only a single process to write 
to an environment, and currently the wayback always tries to open the 
BDB read-write. If this is something that really needs to happen, we 
mightbe able to change the way the software opens the databases, but I'm 
not sure if BDBJE supports this.

You may be able to solve this problem using multiple AccessPoints, using 
wayback 1.0 or later.

Can you elaborate on how you're trying to set up the wayback in this 
deployment?

Brad

Natalia Torres wrote:
> Hello
>
> I installed wayback 0.8 following the instructions on web page and 
> customizing it for using the wayback machine in timeline access
> mode.
>
> When I upgrade to use 2 tomcat v.5.5  on cluster, wayback doesn't work. 
> I get this message from log file:
>
> INFO: new org.archive.wayback.resourceindex.LocalResourceIndex created.
> org.archive.wayback.exception.ConfigurationException: A je..lckfile 
> exists in /mywaybackdata/index The environment can not be locked for 
> single writer access.
>
> When I search on UI I get this message:
>
> Configuration Error
>
> A je..lckfile exists in /paditest/wayback/index The environment can not 
> be locked for single writer access.
>
>
> Each tomcat must have his own indexes? They can't share it? Exist any 
> setting for this in configuration files=
>
> Thanks
>
> N.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>

[Archive-access-discuss] org.archive.io.NoGzipMagicException

From: Miguel C. <mig...@fc...> - 2008-01-29 16:06:13

Hello,
 
I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with
an URL I get a:
 
org.archive.io.NoGzipMagicException
 org.archive.io.GzipHeader.readHeader(GzipHeader.java:122)
 org.archive.io.GzipHeader.<init>(GzipHeader.java:107)
 org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335)
 
org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370
)
 
org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
.java:383)
 
org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory
.java:326)
 
org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc
alARCResourceStore.java:108)
 org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312)
 org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280)
 org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106)
 org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90)
 
The wayback find de file and then check if it is OK. This check thows a
NoGzipMagicException because it doesn't find a "magic" number.
The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for
both projects - nutchwax and wayback.
 
I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk)
and indexed the same ARC files. The query's results are presented ok.
Other files present the same symptoms.
Does anyone have a clue of this problem? Does anyone use this version of
wayback without problems?
 
 
Thanks
--

Miguel Costa

[Archive-access-discuss] Wayback works with tomcat cluster?

From: Natalia T. <nt...@ce...> - 2008-01-18 09:11:12

Hello

I installed wayback 0.8 following the instructions on web page and 
customizing it for using the wayback machine in timeline access
mode.

When I upgrade to use 2 tomcat v.5.5  on cluster, wayback doesn't work. 
I get this message from log file:

INFO: new org.archive.wayback.resourceindex.LocalResourceIndex created.
org.archive.wayback.exception.ConfigurationException: A je..lckfile 
exists in /mywaybackdata/index The environment can not be locked for 
single writer access.

When I search on UI I get this message:

Configuration Error

A je..lckfile exists in /paditest/wayback/index The environment can not 
be locked for single writer access.


Each tomcat must have his own indexes? They can't share it? Exist any 
setting for this in configuration files=

Thanks

N.