i downloaded new version of nutch from cvs and i think that script
indexarc.sh stil doesn't work well.
(in previous version i had to use absolute paths and no links in directorie=
s)
with relative paths same result...
in dir archive are slinks to arcs.
=2E/bin/indexarcs.sh -s /home/nwa/nutchwax/archive -d /home/nwa/nutchwax/da=
ta -c=20
test
St z=E1=F8 14 23:12:36 CEST 2005 Checking environment variables.
St z=E1=F8 14 23:12:36 CEST 2005 Cleaning up all /home/nwa/nutchwax/data co=
ntent.
St z=E1=F8 14 23:12:36 CEST 2005 Creating new queue, and segments.
St z=E1=F8 14 23:12:36 CEST 2005 Started segmenting.
St z=E1=F8 14 23:12:36 CEST 2005 Started build of link database.
050914 231237 parsing file:/home/nwa/nutchwax/conf/nutch-default.xml
050914 231238 parsing file:/home/nwa/nutchwax/conf/nutch-site.xml
050914 231238 No FS indicated, using default:local
050914 231238 Created webdb at LocalFS,/home/nwa/nutchwax/data/db
050914 231239 parsing file:/home/nwa/nutchwax/conf/nutch-default.xml
050914 231240 parsing file:/home/nwa/nutchwax/conf/nutch-site.xml
050914 231240 No FS indicated, using default:local
050914 231240 Updating /home/nwa/nutchwax/data/db
050914 231240 Updating for /home/nwa/nutchwax/data/segments/*
Exception in thread "main"=20
java.io.FileNotFoundException: /home/nwa/nutchwax/data/segments/*/fetcher/d=
ata
at org.apache.nutch.fs.LocalFileSystem.open(LocalFileSystem.java:93)
at=20
org.apache.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:194)
at=20
org.apache.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:187)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:190)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:179)
at org.apache.nutch.io.ArrayFile$Reader.<init>(ArrayFile.java:50)
at=20
org.apache.nutch.tools.UpdateDatabaseTool.updateForSegment(UpdateDatabaseTo=
ol.java:92)
at=20
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:366)
050914 231242 parsing file:/home/nwa/nutchwax/conf/nutch-default.xml
l.
|