|
From: Michael S. <st...@du...> - 2007-04-27 17:21:56
|
See this old FAQ from heritrix: http://crawler.archive.org/faq.html#toomanyopenfiles St.Ack alexis artes wrote: > Hi, > > We are encountering a "Too many files are open" error while doing an > incremental indexing. We followed the procedure outlined in the FAQ > and below are the commands we used. > ----------------------------------------- > Import > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar import inputs > outputs test2 > > Update > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar update outputs > outputs/segments/20070425125008-test2 > > Invert > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar invert outputs > outputs/segments/20070425125008-test2 > > Dedup > ->we did not run the dedup command. > > Index > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class > org.archive.access.nutch.NutchwaxIndexer outputs/indexes2 > outputs/crawldb outputs/linkdb outputs/segments/20070425125008-test2 > > Merge > >bin/hadoop jar /opt/nutchwax-0.8.0/nutchwax-0.8.0.jar class > org.apache.nutch.indexer.IndexMerger outputs/index2 outputs/indexes > outputs/indexes2 > > Our System Configuration: > Scientific Linux CERN > 2.4.21-32.0.1.EL.cernsmp > JDK1.5 > Hadoop0.5 > Nutchwax0.8 > > We also tried running Nutchwax0.10 on Hadoop0.12.3 and Hadoop0.9.2, > but still get the same kind of error as below. > --------------------------------------------- > 07/04/26 15:49:50 INFO conf.Configuration: parsing > file:/opt/hadoop-0.9.2/conf/hadoop-default.xml > 07/04/26 15:49:50 INFO conf.Configuration: parsing > file:/tmp/hadoop-unjar23572/nutch-default.xml > 07/04/26 15:49:50 INFO ipc.Client: > org.apache.hadoop.io.ObjectWritableConnection culler maxidletime= 1000ms > 07/04/26 15:49:50 INFO ipc.Client: org.apache.hadoop.io.ObjectWritable > Connection Culler: starting > 07/04/26 15:49:50 INFO indexer.IndexMerger: merging indexes to: > outputs/index2 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00000 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00001 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00002 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00003 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00004 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00005 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00006 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00007 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00008 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00009 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00010 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00011 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00012 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00013 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00014 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00015 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00016 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00017 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00018 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes/part-00019 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00000 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00001 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00002 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00003 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00004 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00005 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00006 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00007 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00008 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00009 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00010 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00011 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00012 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00013 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00014 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00015 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00016 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00017 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00018 > 07/04/26 15:49:50 INFO indexer.IndexMerger: Adding > /user/root/outputs/indexes2/part-00019 > 07/04/26 15:50:02 INFO fs.DFSClient: Could not obtain block from any > node: java.io.IOException: No live nodes contain current block > 07/04/26 15:50:05 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 1 time(s). > 07/04/26 15:50:06 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 2 time(s). > 07/04/26 15:50:07 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 3 time(s). > 07/04/26 15:50:08 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 4 time(s). > 07/04/26 15:50:09 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 5 time(s). > 07/04/26 15:50:10 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 6 time(s). > 07/04/26 15:50:11 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 7 time(s). > 07/04/26 15:50:12 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 8 time(s). > 07/04/26 15:50:13 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x4:9000. Already tried 9 time(s). > 07/04/26 15:50:14 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 10 time(s). > 07/04/26 15:50:15 WARN fs.DFSClient: DFS Read: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > 07/04/26 15:50:15 INFO fs.DFSClient: Could not obtain block from any > node: java.io.IOException: No live nodes contain current block > 07/04/26 15:50:18 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 1 time(s). > 07/04/26 15:50:19 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 2 time(s). > 07/04/26 15:50:20 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 3 time(s). > 07/04/26 15:50:21 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 4 time(s). > 07/04/26 15:50:22 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 5 time(s). > 07/04/26 15:50:23 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 6 time(s). > 07/04/26 15:50:24 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 7 time(s). > 07/04/26 15:50:25 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 8 time(s). > 07/04/26 15:50:26 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 9 time(s). > 07/04/26 15:50:27 INFO ipc.Client: Retrying connect to server: > mon034/x.x.x.x:9000. Already tried 10 time(s). > 07/04/26 15:50:28 WARN fs.DFSClient: DFS Read: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > 07/04/26 15:50:28 FATAL indexer.IndexMerger: IndexMerger: > java.net.SocketException: Too many open files > at java.net.Socket.createImpl(Socket.java:388) > at java.net.Socket.connect(Socket.java:514) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:145) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:525) > at org.apache.hadoop.ipc.Client.call(Client.java:452) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:512) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:732) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:577) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:686) > at > org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:91) > at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.read(DataInputStream.java:134) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:183) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:41) > at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:507) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:406) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:90) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) > at > org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:98) > at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.archive.access.nutch.Nutchwax.doClass(Nutchwax.java:284) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:394) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > I hope you can help us solve the issue. > > Best Regards, > Alexis > > ------------------------------------------------------------------------ > Ahhh...imagining that irresistible "new car" smell? > Check out new cars at Yahoo! Autos. > <http://us.rd.yahoo.com/evt=48245/*http://autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-> |