|
From: Martin B. <xb...@fi...> - 2007-10-27 20:53:16
|
Hi, I have managed to index documents using nutchwax in distributed mode for several times. But now there is a problem i can not cope with. This time not all computers are under the same domain. Machine which hosts namenode and jobtracker is under domain webarchiv.cz, while all datanodes are under fi.muni.cz (btw all computers are in the same building). When the 5th job starts (dedup 1: urls by time) 'info' massages are combined with 'warn' ones in the logs like these: jobtracker: INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0005_m_000003_3: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:109) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:177) at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:203) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1388) datanodes: WARN org.apache.hadoop.dfs.DataNode: DataXCeiver java.io.IOException: Block blk_-7402203219236206647 has already been started (though not co mpleted), and thus cannot be created. at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:437) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:721) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:550) at java.lang.Thread.run(Thread.java:619) WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-7402203219236206647 to nymfe01/147.251.53.11:50010 java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:974) at java.lang.Thread.run(Thread.java:619) WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-5576786832054029538 to nymfe05/147.251.53.15:50010 java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:974) at java.lang.Thread.run(Thread.java:619) Then it crashes and my terminal says: 07/10/27 22:20:40 INFO indexer.DeleteDuplicates: Dedup: adding indexes in: output/indexes 07/10/27 22:20:43 INFO mapred.JobClient: Running job: job_0005 07/10/27 22:20:44 INFO mapred.JobClient: map 0% reduce 0% 07/10/27 22:21:05 INFO mapred.JobClient: map 100% reduce 100% Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:433) at org.archive.access.nutch.Nutchwax.doDedup(Nutchwax.java:257) at org.archive.access.nutch.Nutchwax.doAll(Nutchwax.java:156) at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:389) at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) Can anybody help? Thanks, Martin Bella |