|
From: alexis a. <alx...@ya...> - 2007-06-20 11:20:06
|
Hi,
We are having problems in doing an incremental indexing. We have initially indexed 3000 arcfiles and trying to indexed 3000 more arcfiles when we encountered the following error.
2007-06-19 02:49:25,135 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0001_r_000035_0: java.net.SocketTimeout
Exception: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:312)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:161)
at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1126)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:97)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:118)
at org.archive.access.nutch.ImportArcs$WaxFetcherOutputFormat$1.close(ImportArcs.java:687)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:281)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075)
We are using 28 nodes. Our configuration in hadoop-site.xml as follows:
<property>
<name>fs.default.name</name>
<value>apple001:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>apple001:9001</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/data</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/mapreduce/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/opt/hadoop-0.5.0/temp/hadoop/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/opt/hadoop-0.5.0/temp/hadoop/mapred/temp</value>
<description>A shared directory for temporary files.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>89</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>53</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>2</value>
<description>The maximum number of tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Moreover, what is the maximum number of arc files that can be indexed in the same batch? We tried 6000 but we encountered errors.
Best Regards,
Alex
---------------------------------
Get your own web address.
Have a HUGE year through Yahoo! Small Business. |