|
From: alexis a. <alx...@ya...> - 2007-06-28 05:17:55
|
Hi, We are encountering a new set of errors aside from the socket time out. Subsequent runs produces the following errors that results to Job Failed. We hope you can guide us in this issue. 2007-06-27 08:48:04,001 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_000115_0: java.io.IOException: Could not obtain block: blk_-8188170094415436519 file=/user/outputs/segments/20070626172746-test/parse_data/part-00023/data offset=33845248 at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:563) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:675) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:170) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:89) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:404) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:330) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:371) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58) at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:49) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075) 2007-06-27 08:48:31,758 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0003_m_000154_0' has been lost. 2007-06-27 08:48:31,794 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0003_m_000154_1' to tip tip_0003_m_000154, for tracker 'tracker_orange.com:50050' 2007-06-27 08:48:32,544 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_000146_0: java.lang.RuntimeException: Summer buffer overflow b.len=4096, off=0, summed=512, read=4096, bytesPerSum=1, inSum=512 at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:100) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:170) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:89) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:404) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:330) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:371) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58) at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:49) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075) Caused by: java.lang.ArrayIndexOutOfBoundsException at java.util.zip.CRC32.update(Unknown Source) at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:98) ... 15 more alexis artes <alx...@ya...> wrote: Hi, We are having problems in doing an incremental indexing. We have initially indexed 3000 arcfiles and trying to indexed 3000 more arcfiles when we encountered the following error. 2007-06-19 02:49:25,135 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0001_r_000035_0: java.net.SocketTimeout Exception: timed out waiting for rpc response at org.apache.hadoop.ipc.Client.call(Client.java:312) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:161) at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1126) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:97) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:160) at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:118) at org.archive.access.nutch.ImportArcs$WaxFetcherOutputFormat$1.close(ImportArcs.java:687) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:281) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075) We are using 28 nodes. Our configuration in hadoop-site.xml as follows: <property> <name>fs.default.name</name> <value>apple001:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>apple001:9001</value> </property> <property> <name>dfs.name.dir</name> <value>/opt/hadoop-0.5.0/filesystem/name</value> </property> <property> <name>dfs.data.dir</name> <value>/opt/hadoop-0.5.0/filesystem/data</value> </property> <property> <name>mapred.local.dir</name> <value>/opt/hadoop-0.5.0/filesystem/mapreduce/local</value> </property> <property> <name>mapred.system.dir</name> <value>/opt/hadoop-0.5.0/temp/hadoop/mapred/system</value> <description>The shared directory where MapReduce stores control files. </description> </property> <property> <name>mapred.temp.dir</name> <value>/opt/hadoop-0.5.0/temp/hadoop/mapred/temp</value> <description>A shared directory for temporary files. </description> </property> <property> <name>mapred.map.tasks</name> <value>89</value> <description> define mapred.map tasks to be number of slave hosts </description> </property> <property> <name>mapred.reduce.tasks</name> <value>53</value> <description> define mapred.reduce tasks to be number of slave hosts </description> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>2</value> <description>The maximum number of tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>dfs.replication</name> <value>1</value> </property> Moreover, what is the maximum number of arc files that can be indexed in the same batch? We tried 6000 but we encountered errors. Our System Configuration: Scientific Linux CERN 2.4.21-32.0.1.EL.cernsmp JDK1.5 Hadoop0.5 Nutchwax0.8 Best Regards, Alex --------------------------------- Pinpoint customers who are looking for what you sell. --------------------------------- Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. |