From: <sta...@us...> - 2007-02-20 21:15:58
|
Revision: 1505 http://archive-access.svn.sourceforge.net/archive-access/?rev=1505&view=rev Author: stack-sf Date: 2007-02-20 13:13:09 -0800 (Tue, 20 Feb 2007) Log Message: ----------- Use subversion svn:externals feature to maintain the nutchwax nutch dependency. See http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-7-sect-3. Suggested a while back by Doug Cutting. No more need of independent nutch checkout. * . Added third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk * src/java/overview.html Amend how to build from src instruction. * src/plugin/build-plugin.xml Point at nutch over in its new third-party subdirectory * README.txt Remove hadoop checksum error patch reference and the lease patch (its been fixed in recent hadoops). * build.xml Add targets to build our nutch dependency. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/README.txt trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/src/java/overview.html trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml Property Changed: ---------------- trunk/archive-access/projects/nutchwax/ Property changes on: trunk/archive-access/projects/nutchwax ___________________________________________________________________ Name: svn:externals + third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk Modified: trunk/archive-access/projects/nutchwax/README.txt =================================================================== --- trunk/archive-access/projects/nutchwax/README.txt 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/README.txt 2007-02-20 21:13:09 UTC (rev 1505) @@ -3,10 +3,7 @@ See associated docs directory for requirements, installation and build instruction or visit http://archive-access.sourceforge.net/projects/nutch/. -The rest of the README is taken up with versions of hadoop and nutch that -nutchwax depends on including patches made to hadoop and nutch to releases. - HADOOP VERSION AND PATCHES Hadoop release version is 0.9.2. 0.9.1 fails when you try to use local @@ -14,81 +11,9 @@ hadoop 0.9.1. has it set to true in bundled hadoop-default.xml. See HADOOP-827. -Here is single patch we make against it (TODO: TEST still works and -still needed): -http://issues.apache.org/jira/browse/HADOOP-145 - -Index: src/java/org/apache/hadoop/fs/LocalFileSystem.java -=================================================================== ---- src/java/org/apache/hadoop/fs/LocalFileSystem.java (revision 393675) -+++ src/java/org/apache/hadoop/fs/LocalFileSystem.java (working copy) -@@ -362,6 +362,11 @@ - public void reportChecksumFailure(File f, FSInputStream in, - long start, long length, int crc) { - try { -+ if (getConf().getBoolean("io.skip.checksum.errors", false)) { -+ // If this flag is set, do not move aside the file. -+ LOG.warn("DEBUG: Not moving file " + p.toString()); -+ return; -+ } - // canonicalize f - f = makeAbsolute(f).getCanonicalFile(); - - -If you are seeing jobs fail because of complaints about DFS lease expiration, -try the below patch with an ipc.client.timeout setting of 20 or 30 seconds: - -Index: src/java/org/apache/hadoop/dfs/DFSClient.java -=================================================================== ---- src/java/org/apache/hadoop/dfs/DFSClient.java (revision 409788) -+++ src/java/org/apache/hadoop/dfs/DFSClient.java (working copy) -@@ -403,18 +434,23 @@ - public void run() { - long lastRenewed = 0; - while (running) { -- if (System.currentTimeMillis() - lastRenewed > (LEASE_PERIOD / 2)) { -+ // Divide by 3 instead of by 2 so we start renewing earlier -+ // and set down "ipc.client.timeout" from its 60 to 20 or 30. -+ // See this note for why: -+ // http://mail-archives.apache.org/mod_mbox/lucene-hadoop-dev/200607.mbox/%3C3...@ya...%3E -+ if (System.currentTimeMillis() - lastRenewed > (LEASE_PERIOD / 3)) { - try { - namenode.renewLease(clientName); - lastRenewed = System.currentTimeMillis(); - } catch (IOException ie) { - String err = StringUtils.stringifyException(ie); -- LOG.warning("Problem renewing lease for " + clientName + -+ LOG.warn("Problem renewing lease for " + clientName + - ": " + err); - } - } - try { -- Thread.sleep(1000); -+ // Renew every 3 seconds, not every 1 second. -+ Thread.sleep(1000 * 3); - } catch (InterruptedException ie) { - } - } - - NUTCH VERSION AND PATCHES -Version of nutch on builds.archive.org NutchWAX is built against. - -stack@bregeon:~/workspace/nutch$ svn info -Path: . -URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk -Repository Root: http://svn.apache.org/repos/asf -Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 -Revision: 492357 -Node Kind: directory -Schedule: normal -Last Changed Author: ab -Last Changed Rev: 491291 -Last Changed Date: 2006-12-30 11:13:06 -0800 (Sat, 30 Dec 2006) -Properties Last Updated: 2007-01-03 11:34:45 -0800 (Wed, 03 Jan 2007) - Below are patches made against the nutch thats built into nutchwax. You may be able to do without them. Apply if you you are OOME'ing because too many links found building crawldb or merging segments. Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-20 21:13:09 UTC (rev 1505) @@ -1,13 +1,13 @@ <?xml version="1.0"?> -<project name="nutchwax" default="war"> +<project name="nutchwax" default="all"> <property name="name" value="${ant.project.name}"/> <property name="root" value="${basedir}"/> <!--'nutch.root' is pointer at core nutch. Expect to find it in - 'basedir' named 'nutch'. + '${basedir}/third-party' named 'nutch'. --> - <property name="nutch.root" location="${root}/nutch"/> + <property name="nutch.root" location="${root}/third-party/nutch"/> <property file="${user.home}/.$(name}.build.properties" /> @@ -57,6 +57,19 @@ <path refid="classpath"/> </path> + <target name="third.party.jar"> + <echo message="Building nutch third-party dependency (jar)" /> + <ant dir="third-party/nutch" target="jar" inheritAll="false"/> + </target> + <target name="third.party.war"> + <echo message="Building nutch third-party dependency (war)" /> + <ant dir="third-party/nutch" target="war" inheritAll="false"/> + </target> + <target name="third.party.clean"> + <echo message="Cleaning nutch third-party dependency" /> + <ant dir="third-party/nutch" target="clean" inheritAll="false"/> + </target> + <!-- ====================================================== --> <!-- Stuff needed by all targets --> <!-- ====================================================== --> @@ -167,6 +180,12 @@ </zip> </target> + <!-- ================================================================== --> + <!-- Build all including third-party dependencies (i.e. nutch) --> + <!-- ================================================================== --> + <!-- --> + <!-- ================================================================== --> + <target name="all" depends="third.party.jar,third.party.war,jar,compile,war" /> <!-- ================================================================== --> <!-- Compile test code --> @@ -290,4 +309,12 @@ <delete dir="${build.dir}"/> </target> + <!-- ================================================================== --> + <!-- Clean all. Delete the build files including third-party builds --> + <!-- and their directories --> + <!-- ================================================================== --> + <target name="clean-all" + depends="clean,third.party.clean" + description="Clean up all built including third-party dependencies" /> + </project> Modified: trunk/archive-access/projects/nutchwax/src/java/overview.html =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/overview.html 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/src/java/overview.html 2007-02-20 21:13:09 UTC (rev 1505) @@ -383,26 +383,17 @@ </a></li> (See the NutchWAX README for details). </ol> <p>Checkout NutchWAX [See <a href="http://sourceforge.net/svn/?group_id=118427">Source Repository</a> for how]. +As the checkout runs, subversion will fetch the version of nutch the NutchWAX trunk is pegged against into +the <code>${NUTCHWAX_HOME}/third-party</code> directory using +the <a href="http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-7-sect-3">svn:externals</a> mechanism. </p> -<p>Make a symbolic link under NutchWAX to your Nutch checkout: -<pre> - % ln -s ${NUTCH_HOME} ${NUTCHWAX_HOME}/nutch -orojects/nutch/project.xml.r1445: <connection>scm:cvs:pserver:ano...@ar...:/cvsroot/archive-access:archive-access/projects/nutch</connection> -</pre> </p> -<p>Build Nutch: -orojects/nutch/project.xml.mine: http://sourceforge.net/mailarchive/forum.php?forum=archive-access-cvs +<p>To build NutchWAX and its nutch dependency, run the default 'all' target: <pre> - % cd ${NUTCH_HOME} - % ant jar war -</pre> -</p> -<p>To build NutchWAX, do the same: -<pre> % cd ${NUTCHWAX_HOME} - % ant jar war - % cd ${NUTCHWAX_HOME} + % ant all </pre> +This will generate the NutchWAX jar and war. </p> <p>To build the NutchWAX site or distribution, run maven: <pre> Modified: trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-20 21:13:09 UTC (rev 1505) @@ -16,7 +16,12 @@ <property file="${user.home}/$(name}.build.properties" /> <property file="${root}/build.properties" /> + <!--Point at nutchwax home instead of at nutch. + --> <property name="nutch.root" location="${root}/../../../"/> + <!--Point at nutch under third-party subdir. + --> + <property name="real.nutch.root" location="${nutch.root}/third-party/nutch"/> <property name="src.dir" location="${root}/src/java"/> <property name="src.test" location="${root}/src/test"/> @@ -50,11 +55,11 @@ <include name="*.jar" /> </fileset> <!--IA: Add the nutch jars.--> - <fileset dir="${nutch.root}/nutch/lib"> + <fileset dir="${real.nutch.root}/lib"> <include name="*.jar" /> </fileset> <!--IA: Add nutch classes.--> - <pathelement location="${nutch.root}/nutch/build/classes"/> + <pathelement location="${real.nutch.root}/build/classes"/> </path> <!-- the unit test classpath --> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |