|
From: alexis a. <alx...@ya...> - 2007-04-27 10:43:43
|
Hi,
We are encountering a "Too many files are open" error while doing an incremental indexing. We followed the procedure outlined in the FAQ and below are the commands we used.
-----------------------------------------
Import
>bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar import inputs outputs test2
Update
>bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar update outputs outputs/segments/20070425125008-test2
Invert
>bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar invert outputs outputs/segments/20070425125008-test2
Dedup
->we did not run the dedup command.
Index
>bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class org.archive.access.nutch.NutchwaxIndexer outputs/indexes2 outputs/crawldb outputs/linkdb outputs/segments/20070425125008-test2
Merge
>bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class org.apache.nutch.indexer.IndexMerger outputs/index2 outputs/indexes outputs/indexes2
Our System Configuration:
Scientific Linux CERN
2.4.21-32.0.1.EL.cernsmp
JDK1.5
Hadoop0.5
Nutchwax0.8
We also tried running Nutchwax0.10 on Hadoop0.12.3 and Hadoop0.9.2, but still get the same kind of error as below.
---------------------------------------------
07/04/26 15:49:50 INFO conf.Configuration: parsing file:/opt/hadoop-0.9.2/conf/hadoop-default.xml
07/04/26 15:49:50 INFO conf.Configuration: parsing file:/tmp/hadoop-unjar23572/nutch-default.xml
07/04/26 15:49:50 INFO ipc.Client: org.apache.hadoop.io.ObjectWritableConnection culler maxidletime= 1000ms
07/04/26 15:49:50 INFO ipc.Client: org.apache.hadoop.io.ObjectWritable Connection Culler: starting
07/04/26 15:49:50 INFO indexer.IndexMerger: merging indexes to: outputs/index2
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00000
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00001
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00002
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00003
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00004
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00005
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00006
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00007
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00008
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00009
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00010
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00011
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00012
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00013
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00014
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00015
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00016
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00017
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00018
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes/part-00019
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00000
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00001
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00002
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00003
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00004
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00005
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00006
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00007
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00008
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00009
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00010
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00011
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00012
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00013
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00014
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00015
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00016
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00017
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00018
07/04/26 15:49:50 INFO indexer.IndexMerger: Adding /user/root/outputs/indexes2/part-00019
07/04/26 15:50:02 INFO fs.DFSClient: Could not obtain block from any node: java.io.IOException: No live nodes contain current block
07/04/26 15:50:05 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 1 time(s).
07/04/26 15:50:06 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 2 time(s).
07/04/26 15:50:07 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 3 time(s).
07/04/26 15:50:08 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 4 time(s).
07/04/26 15:50:09 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 5 time(s).
07/04/26 15:50:10 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 6 time(s).
07/04/26 15:50:11 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 7 time(s).
07/04/26 15:50:12 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 8 time(s).
07/04/26 15:50:13 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x4:9000. Already tried 9 time(s).
07/04/26 15:50:14 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 10 time(s).
07/04/26 15:50:15 WARN fs.DFSClient: DFS Read: java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:388)
at java.net.Socket.connect(Socket.java:514)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:525)
at org.apache.hadoop.ipc.Client.call(Client.java:452)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686)
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91)
at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.read(DataInputStream.java:134)
at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33)
at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98)
at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284)
at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394)
at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
07/04/26 15:50:15 INFO fs.DFSClient: Could not obtain block from any node: java.io.IOException: No live nodes contain current block
07/04/26 15:50:18 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 1 time(s).
07/04/26 15:50:19 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 2 time(s).
07/04/26 15:50:20 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 3 time(s).
07/04/26 15:50:21 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 4 time(s).
07/04/26 15:50:22 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 5 time(s).
07/04/26 15:50:23 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 6 time(s).
07/04/26 15:50:24 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 7 time(s).
07/04/26 15:50:25 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 8 time(s).
07/04/26 15:50:26 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 9 time(s).
07/04/26 15:50:27 INFO ipc.Client: Retrying connect to server: mon034/x.x.x.x:9000. Already tried 10 time(s).
07/04/26 15:50:28 WARN fs.DFSClient: DFS Read: java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:388)
at java.net.Socket.connect(Socket.java:514)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:525)
at org.apache.hadoop.ipc.Client.call(Client.java:452)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686)
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91)
at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.read(DataInputStream.java:134)
at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33)
at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98)
at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284)
at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394)
at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
07/04/26 15:50:28 FATAL indexer.IndexMerger: IndexMerger: java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:388)
at java.net.Socket.connect(Socket.java:514)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:525)
at org.apache.hadoop.ipc.Client.call(Client.java:452)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686)
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91)
at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.read(DataInputStream.java:134)
at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33)
at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98)
at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284)
at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394)
at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
I hope you can help us solve the issue.
Best Regards,
Alexis
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
Check outnew cars at Yahoo! Autos. |
|
From: Michael S. <st...@du...> - 2007-04-27 17:21:56
|
See this old FAQ from heritrix: http://crawler.archive.org/faq.html#toomanyopenfiles St.Ack alexis artes wrote: > Hi, > > We are encountering a "Too many files are open" error while doing an > incremental indexing. We followed the procedure outlined in the FAQ > and below are the commands we used. > ----------------------------------------- > Import > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar import inputs > outputs test2 > > Update > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar update outputs > outputs/segments/20070425125008-test2 > > Invert > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar invert outputs > outputs/segments/20070425125008-test2 > > Dedup > ->we did not run the dedup command. > > Index > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class > org.archive.access.nutch.NutchwaxIndexer outputs/indexes2 > outputs/crawldb outputs/linkdb outputs/segments/20070425125008-test2 > > Merge > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class > org.apache.nutch.indexer.IndexMerger outputs/index2 outputs/indexes > outputs/indexes2 > > Our System Configuration: > Scientific Linux CERN > 2.4.21-32.0.1.EL.cernsmp > JDK1.5 > Hadoop0.5 > Nutchwax0.8 > > We also tried running Nutchwax0.10 on Hadoop0.12.3 and Hadoop0.9.2, > but still get the same kind of error as below. > --------------------------------------------- > 07/04/26 15:49:50 INFO conf.Configuration: parsing > file:/opt/hadoop-0.9.2/conf/hadoop-default.xml > 07/04/26 15:49:50 INFO conf.Configuration: parsing > file:/tmp/hadoop-unjar23572/nutch-default.xml > 07/04/26 15:49:50 INFO ipc.Client: > org.apache.hadoop.io.ObjectWritableConnection culler maxidletime= 1000ms > 07/04/26 15:49:50 INFO ipc.Client: org.apache.hadoop.io.ObjectWritable > Connection Culler: starting > 07/04/26 15:49:50 INFO indexer.IndexMerger: merging indexes to: > outputs/index2 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00000 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00001 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00002 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00003 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00004 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00005 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00006 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00007 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00008 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00009 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00010 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00011 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00012 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00013 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00014 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00015 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00016 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00017 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00018 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00019 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00000 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00001 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00002 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00003 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00004 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00005 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00006 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00007 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00008 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00009 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00010 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00011 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00012 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00013 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00014 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00015 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00016 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00017 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00018 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00019 > 07/04/26 15:50:02 INFO fs.DFSClient: Could not obtain block from any > node: java.io.IOException: No live nodes contain current block > 07/04/26 15:50:05 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 1 time(s). > 07/04/26 15:50:06 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 2 time(s). > 07/04/26 15:50:07 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 3 time(s). > 07/04/26 15:50:08 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 4 time(s). > 07/04/26 15:50:09 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 5 time(s). > 07/04/26 15:50:10 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 6 time(s). > 07/04/26 15:50:11 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 7 time(s). > 07/04/26 15:50:12 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 8 time(s). > 07/04/26 15:50:13 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x4:9000. Already tried 9 time(s). > 07/04/26 15:50:14 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 10 time(s). > 07/04/26 15:50:15 WARN fs.DFSClient: DFS Read: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > 07/04/26 15:50:15 INFO fs.DFSClient: Could not obtain block from any > node: java.io.IOException: No live nodes contain current block > 07/04/26 15:50:18 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 1 time(s). > 07/04/26 15:50:19 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 2 time(s). > 07/04/26 15:50:20 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 3 time(s). > 07/04/26 15:50:21 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 4 time(s). > 07/04/26 15:50:22 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 5 time(s). > 07/04/26 15:50:23 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 6 time(s). > 07/04/26 15:50:24 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 7 time(s). > 07/04/26 15:50:25 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 8 time(s). > 07/04/26 15:50:26 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 9 time(s). > 07/04/26 15:50:27 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 10 time(s). > 07/04/26 15:50:28 WARN fs.DFSClient: DFS Read: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > 07/04/26 15:50:28 FATAL indexer.IndexMerger: IndexMerger: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > I hope you can help us solve the issue. > > Best Regards, > Alexis > > ------------------------------------------------------------------------ > Ahhh...imagining that irresistible "new car" smell? > Check out new cars at Yahoo! Autos. > <http://us.rd.yahoo.com/evt=48245/*http://autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-> |
|
From: alexis a. <alx...@ya...> - 2007-06-20 11:20:06
|
Hi,
We are having problems in doing an incremental indexing. We have initially indexed 3000 arcfiles and trying to indexed 3000 more arcfiles when we encountered the following error.
2007-06-19 02:49:25,135 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0001_r_000035_0: java.net.SocketTimeout
Exception: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:312)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:161)
at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1126)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:97)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:160)
at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:118)
at org.archive.access.nutch.ImportArcs$WaxFetcherOutputFormat$1.close(ImportArcs.java:687)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:281)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075)
We are using 28 nodes. Our configuration in hadoop-site.xml as follows:
<property>
<name>fs.default.name</name>
<value>apple001:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>apple001:9001</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/data</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/opt/hadoop-0.5.0/filesystem/mapreduce/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/opt/hadoop-0.5.0/temp/hadoop/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/opt/hadoop-0.5.0/temp/hadoop/mapred/temp</value>
<description>A shared directory for temporary files.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>89</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>53</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>2</value>
<description>The maximum number of tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Moreover, what is the maximum number of arc files that can be indexed in the same batch? We tried 6000 but we encountered errors.
Best Regards,
Alex
---------------------------------
Get your own web address.
Have a HUGE year through Yahoo! Small Business. |
|
From: John H. L. <jl...@ar...> - 2007-06-20 14:54:10
|
Hi Alexis. NutchWAX 0.10.0 has lots of bug fixes and improvements over 0.8.0, so you may want to start by upgrading your installation. Does your job complete any tasks before you see this error? Do you see any other errors in the logs? Specifically, do you see a BindException when you start-all.sh? The more ARCs you index in a single job, the larger heap space you'll need both during indexing and during deployment. This depends, of course, on how much text is contained in the documents within the ARCs. I've been able to index and deploy batches of 12,000 ARCs with heap spaces around 3200m on 4GB machines. Hope this helps. -J On Jun 20, 2007, at 4:19 AM, alexis artes wrote: > Hi, > > We are having problems in doing an incremental indexing. We have > initially indexed 3000 arcfiles and trying to indexed 3000 more > arcfiles when we encountered the following error. > > 2007-06-19 02:49:25,135 INFO > org.apache.hadoop.mapred.TaskInProgress: Error from > task_0001_r_000035_0: java.net.SocketTimeout > Exception: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:312) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:161) > at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close > (DFSClient.java:1126) > at java.io.FilterOutputStream.close(FilterOutputStream.java: > 143) > at org.apache.hadoop.fs.FSDataOutputStream$Summer.close > (FSDataOutputStream.java:97) > at java.io.FilterOutputStream.close(FilterOutputStream.java: > 143) > at java.io.FilterOutputStream.close(FilterOutputStream.java: > 143) > at java.io.FilterOutputStream.close(FilterOutputStream.java: > 143) > at org.apache.hadoop.io.SequenceFile$Writer.close > (SequenceFile.java:160) > at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:118) > at org.archive.access.nutch.ImportArcs > $WaxFetcherOutputFormat$1.close(ImportArcs.java:687) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: > 281) > at org.apache.hadoop.mapred.TaskTracker$Child.main > (TaskTracker.java:1075) > > > We are using 28 nodes. Our configuration in hadoop-site.xml as > follows: > > <property> > <name>fs.default.name</name> > <value>apple001:9000</value> > </property> > > <property> > <name>mapred.job.tracker</name> > <value>apple001:9001</value> > </property> > > <property> > <name>dfs.name.dir</name> > <value>/opt/hadoop-0.5.0/filesystem/name</value> > </property> > > <property> > <name>dfs.data.dir</name> > <value>/opt/hadoop-0.5.0/filesystem/data</value> > </property> > > <property> > <name>mapred.local.dir</name> > <value>/opt/hadoop-0.5.0/filesystem/mapreduce/local</value> > </property> > > <property> > <name>mapred.system.dir</name> > <value>/opt/hadoop-0.5.0/temp/hadoop/mapred/system</value> > <description>The shared directory where MapReduce stores > control files. > </description> > </property> > > <property> > <name>mapred.temp.dir</name> > <value>/opt/hadoop-0.5.0/temp/hadoop/mapred/temp</value> > <description>A shared directory for temporary files. > </description> > </property> > > <property> > <name>mapred.map.tasks</name> > <value>89</value> > <description> > define mapred.map tasks to be number of slave hosts > </description> > </property> > > <property> > <name>mapred.reduce.tasks</name> > <value>53</value> > <description> > define mapred.reduce tasks to be number of slave hosts > </description> > </property> > > <property> > <name>mapred.tasktracker.tasks.maximum</name> > <value>2</value> > <description>The maximum number of tasks that will be run > simultaneously by a task tracker. > </description> > </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > > Moreover, what is the maximum number of arc files that can be > indexed in the same batch? We tried 6000 but we encountered errors. > > > Best Regards, > Alex > > Get your own web address. > Have a HUGE year through Yahoo! Small Business. > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |