archive-access-discuss Mailing List for Web Archive Access Utilities (Page 39)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jo=E3o Cl=E1udio Luzio wrote:
> Oops.. forgot to say that the arcs where on=20
> /var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/=20
> directory but with .arc.gz instead of .arc.
>
>  =20
This should be fine.

> Jo=E3o Cl=E1udio Luzio wrote:
>  =20
>> Hi,
>>     I've been trying to get the pair up and running for a while now bu=
t=20
>> had some problems..
>>     Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20
>> get it running but some of the related files (images)
>> aren't displayed. Those get:
>> <retrievermessage>
>>     <head>
>>         <errorcode>4</errorcode>
>>         <errormessage>Unable to parse Archive Identifier</errormessage=
>
>>     </head>
>> </retrievermessage>
>>     Using wera debug I found that the "[archiveidentifier] =3D>=20
>> 2770/IAH-20060619172903-00000-webarchive1" for a specific search i mad=
e.
>> (Starting tomcat from the nutchwax indexed data)
>>    =20

So, it generally works but some of the images don't show sometimes?

>>     Using wayback I dont have the same problems(I dont use nutchwax wi=
th=20
>> wayback..).
>>
>>     I've tried to get nutchwax 0.6.1 and wera running but the opensear=
ch=20
>> servlet for the rss from nutchwax gives an exception..
>>    =20

Do you still have the exception?

>>     So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but n=
ow=20
>> the arcretriever gives an exception when trying to get the document.
>>     Using wera debug I found that the "[archiveidentifier] =3D>=20
>> 2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20
>> same search i made.
>> (Starting tomcat from anywhere)
>>
>> The exception:
>> 7  Bad function argument Cause: java.io.FileNotFoundException:=20
>> /var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filed=
esc:/IAH-20060619172903-00000-webarchive1.arc=20
>> does not exist. Stack trace:=20
>> org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20
>> org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20
>> org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20
>> org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20
>> no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20
>> no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131)
>>    =20

Looks like we shouldn't be putting the 'filedesc:' on front of ARC=20
name?  Does ARCRetreiver work if you make a request with=20
IAH-20060619172903-00000-webarchive1.arc instead of=20
filedesc:/IAH-20060619172903-00000-webarchive1.arc?

St.Ack

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (4)	Sep (5)	Oct (17)	Nov (30)	Dec (3)
2006	Jan (4)	Feb (14)	Mar (8)	Apr (11)	May (2)	Jun (13)	Jul (9)	Aug (2)	Sep (2)	Oct (9)	Nov (20)	Dec (9)
2007	Jan (6)	Feb (4)	Mar (6)	Apr (7)	May (6)	Jun (6)	Jul (4)	Aug (3)	Sep (9)	Oct (26)	Nov (23)	Dec (2)
2008	Jan (17)	Feb (19)	Mar (16)	Apr (27)	May (3)	Jun (21)	Jul (21)	Aug (8)	Sep (13)	Oct (7)	Nov (8)	Dec (8)
2009	Jan (18)	Feb (14)	Mar (27)	Apr (14)	May (10)	Jun (14)	Jul (18)	Aug (30)	Sep (18)	Oct (12)	Nov (5)	Dec (26)
2010	Jan (27)	Feb (3)	Mar (8)	Apr (4)	May (6)	Jun (13)	Jul (25)	Aug (11)	Sep (2)	Oct (4)	Nov (7)	Dec (6)
2011	Jan (25)	Feb (17)	Mar (25)	Apr (23)	May (15)	Jun (12)	Jul (8)	Aug (13)	Sep (4)	Oct (17)	Nov (7)	Dec (6)
2012	Jan (4)	Feb (7)	Mar (1)	Apr (10)	May (11)	Jun (5)	Jul (7)	Aug (1)	Sep (1)	Oct (5)	Nov (6)	Dec (13)
2013	Jan (9)	Feb (7)	Mar (3)	Apr (1)	May (3)	Jun (19)	Jul (3)	Aug (3)	Sep	Oct (1)	Nov (1)	Dec (1)
2014	Jan (11)	Feb (1)	Mar	Apr (2)	May (6)	Jun	Jul	Aug (1)	Sep	Oct (1)	Nov (1)	Dec (1)
2015	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (4)	Aug	Sep	Oct	Nov	Dec (1)
2016	Jan (4)	Feb (3)	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct (1)	Nov	Dec
2018	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov (1)	Dec
2019	Jan (2)	Feb (1)	Mar	Apr	May	Jun (2)	Jul	Aug	Sep (1)	Oct (1)	Nov	Dec

archive-access-discuss Mailing List for Web Archive Access Utilities (Page 39)

archive-access-discuss — General discussion about archive-access projects