From: <sta...@us...> - 2007-02-14 02:33:12
|
Revision: 1489 http://archive-access.svn.sourceforge.net/archive-access/?rev=1489&view=rev Author: stack-sf Date: 2007-02-13 18:33:05 -0800 (Tue, 13 Feb 2007) Log Message: ----------- Part of '[ 1637951 ] [nutchwax] Redo reporting scripts as mapreduce jobs' * conf/wax-default.xml Override default hadoop log formatter. Turn off the purging of logs and keep them around longer than 12 hours. * src/java/org/archive/access/nutch/ImportArcs.java Pass on empty split inputs. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/conf/wax-default.xml trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java Modified: trunk/archive-access/projects/nutchwax/conf/wax-default.xml =================================================================== --- trunk/archive-access/projects/nutchwax/conf/wax-default.xml 2007-02-12 22:00:59 UTC (rev 1488) +++ trunk/archive-access/projects/nutchwax/conf/wax-default.xml 2007-02-14 02:33:05 UTC (rev 1489) @@ -154,7 +154,41 @@ </description> </property> +<!-- The below mapred.userlog configs. override defaults which purge +anything beyond a 100k and anything over 12 hours old. Of note, if mapred +is restarted, logs for tasks of same name are overwritten. +--> <property> + <name>mapred.userlog.limit.kb</name> + <value>400</value> + <description>The maximum size of user-logs of each task. + + We're using default split of 4 so 400 instead of 100 makes + for files of 100k each. + </description> +</property> + +<property> + <name>mapred.userlog.purgesplits</name> + <value>false</value> + <description>Should the splits be purged disregarding the user-log size limit. + + For now, don't purge logs. Default purges. + </description> +</property> + +<property> + <name>mapred.userlog.retain.hours</name> + <value>168</value> + <description>The maximum time, in hours, for which the user-logs are to be + retained. + + + Keep them for a week rather than for 12 hours only, the default. + </description> +</property> + +<property> <name>fetcher.store.content</name> <value>false</value> <description>If true, fetcher will store content. Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-12 22:00:59 UTC (rev 1488) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-14 02:33:05 UTC (rev 1489) @@ -323,6 +323,10 @@ } public void run() { + if (this.arcLocation == null || this.arcLocation.length() <= 0) { + return; + } + ArchiveReader arc = null; // Need a thread that will keep updating TaskTracker during long // downloads else tasktracker will kill us. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-15 18:31:44
|
Revision: 1496 http://archive-access.svn.sourceforge.net/archive-access/?rev=1496&view=rev Author: stack-sf Date: 2007-02-15 10:14:32 -0800 (Thu, 15 Feb 2007) Log Message: ----------- Implement '[ 1660808 ] As we parse ARCs, output cdx line.' * project.properties * project.xml Add in wayback jar. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/project.properties trunk/archive-access/projects/nutchwax/project.xml Modified: trunk/archive-access/projects/nutchwax/project.properties =================================================================== --- trunk/archive-access/projects/nutchwax/project.properties 2007-02-15 17:58:57 UTC (rev 1495) +++ trunk/archive-access/projects/nutchwax/project.properties 2007-02-15 18:14:32 UTC (rev 1496) @@ -21,6 +21,7 @@ maven.jar.corenutch = ${basedir}/nutch/build/nutch-0.9-dev.jar maven.jar.hadoop = ${basedir}/nutch/lib/hadoop-0.9.2.jar maven.jar.archive-commons = ${basedir}/lib/archive-commons-1.11.0-200612262257.jar +maven.jar.wayback = ${basedir}/lib/wayback-0.9.0-200702150450.jar maven.jar.servlet-api = ${basedir}/nutch/lib/servlet-api.jar maven.jar.commons-codec = ${basedir}/lib/commons-codec-1.3.jar maven.jar.commons-httpclient-local = ${basedir}/lib/commons-httpclient-3.0-rc3.jar Modified: trunk/archive-access/projects/nutchwax/project.xml =================================================================== --- trunk/archive-access/projects/nutchwax/project.xml 2007-02-15 17:58:57 UTC (rev 1495) +++ trunk/archive-access/projects/nutchwax/project.xml 2007-02-15 18:14:32 UTC (rev 1496) @@ -273,6 +273,18 @@ </properties> </dependency> <dependency> + <id>wayback</id> + <version>0.9.0</version> + <url>http://builds.archive.org:8080/cruisecontrol/buildresults/HEAD-archive-access</url> + <properties> + <war.bundle>true</war.bundle> + <description>Wayback machine. Used for its ARCRecord to + CDX line function. + </description> + <license>LGPL</license> + </properties> + </dependency> + <dependency> <id>s3</id> <version>1.0.0</version> <url>http://builds.archive.org:8080/cruisecontrol/buildresults/HEAD-heritrix</url> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-16 06:04:33
|
Revision: 1497 http://archive-access.svn.sourceforge.net/archive-access/?rev=1497&view=rev Author: stack-sf Date: 2007-02-15 22:04:32 -0800 (Thu, 15 Feb 2007) Log Message: ----------- Use new s3 URL handler. * .classpath * project.properties * project.xml Replace s3 jar with jets3t. * src/plugin/index-wax/plugin.xml * lib/archive-commons-1.11.0-200612262257.jar * src/plugin/index-wax/lib/archive-commons-1.11.0-200612262257.jar * src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar Update archive-commons to get the new version of s3 handler. * lib/jets3t-0.5.0.jar Added. * lib/s3-20061030.jar Removed. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/.classpath trunk/archive-access/projects/nutchwax/project.properties trunk/archive-access/projects/nutchwax/project.xml trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/lib/archive-commons-1.11.0-200702160009.jar trunk/archive-access/projects/nutchwax/lib/jets3t-0.5.0.jar trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar Removed Paths: ------------- trunk/archive-access/projects/nutchwax/lib/archive-commons-1.11.0-200612262257.jar trunk/archive-access/projects/nutchwax/lib/s3-20061030.jar trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200612262257.jar Modified: trunk/archive-access/projects/nutchwax/.classpath =================================================================== --- trunk/archive-access/projects/nutchwax/.classpath 2007-02-15 18:14:32 UTC (rev 1496) +++ trunk/archive-access/projects/nutchwax/.classpath 2007-02-16 06:04:32 UTC (rev 1497) @@ -5,7 +5,7 @@ <classpathentry kind="lib" path="lib/commons-codec-1.3.jar"/> <classpathentry kind="lib" path="lib/commons-httpclient-3.0-rc3.jar"/> <classpathentry kind="lib" path="lib/dsi.unimi.it-1.2.0.jar"/> - <classpathentry kind="lib" path="lib/s3-20061030.jar"/> + <classpathentry kind="lib" path="lib/jets3t-0.5.0.jar"/> <classpathentry kind="lib" path="conf"/> <classpathentry kind="lib" path="build"/> <classpathentry combineaccessrules="false" kind="src" path="/heritrix"/> Deleted: trunk/archive-access/projects/nutchwax/lib/archive-commons-1.11.0-200612262257.jar =================================================================== (Binary files differ) Added: trunk/archive-access/projects/nutchwax/lib/archive-commons-1.11.0-200702160009.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/lib/archive-commons-1.11.0-200702160009.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/archive-access/projects/nutchwax/lib/jets3t-0.5.0.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/lib/jets3t-0.5.0.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Deleted: trunk/archive-access/projects/nutchwax/lib/s3-20061030.jar =================================================================== (Binary files differ) Modified: trunk/archive-access/projects/nutchwax/project.properties =================================================================== --- trunk/archive-access/projects/nutchwax/project.properties 2007-02-15 18:14:32 UTC (rev 1496) +++ trunk/archive-access/projects/nutchwax/project.properties 2007-02-16 06:04:32 UTC (rev 1497) @@ -1,5 +1,4 @@ -maven.xdoc.version=${pom.currentVersion} -maven.docs.outputencoding=UTF-8 +maven.xdoc.version=${pom.currentVersion} maven.docs.outputencoding=UTF-8 # maven.xdoc.theme=classic maven.xdoc.date=left # maven.ui.section.background=#fff @@ -25,7 +24,7 @@ maven.jar.servlet-api = ${basedir}/nutch/lib/servlet-api.jar maven.jar.commons-codec = ${basedir}/lib/commons-codec-1.3.jar maven.jar.commons-httpclient-local = ${basedir}/lib/commons-httpclient-3.0-rc3.jar -maven.jar.s3 = ${basedir}/lib/s3-20061030.jar +maven.jar.jets3t = ${basedir}/lib/jets3t-0.5.0.jar maven.jar.local-commons-logging = ${basedir}/nutch/lib/commons-logging-1.0.4.jar maven.jar.lucene = ${basedir}/nutch/lib/lucene-core-2.0.0.jar Modified: trunk/archive-access/projects/nutchwax/project.xml =================================================================== --- trunk/archive-access/projects/nutchwax/project.xml 2007-02-15 18:14:32 UTC (rev 1496) +++ trunk/archive-access/projects/nutchwax/project.xml 2007-02-16 06:04:32 UTC (rev 1497) @@ -285,20 +285,16 @@ </properties> </dependency> <dependency> - <id>s3</id> - <version>1.0.0</version> - <url>http://builds.archive.org:8080/cruisecontrol/buildresults/HEAD-heritrix</url> + <id>jets3t</id> + <version>0.5.0</version> + <url>http://jets3t.s3.amazonaws.com/</url> <properties> <war.bundle>true</war.bundle> <description> - This jar contains code for accessing S3. Its a subset of code - obtained at URL given above. Here's how I made this jar - (after changing some of the statics to have public rather than - default access): - 1131 javac com/amazon/thirdparty/Base64.java com/amazon/s3/Utils.java com/amazon/s3/AWSAuthConnection.java - 1132 jar -cf s3-20061030.jar `find com -name '*.class'` + Use same S3 lib as hadoop. </description> - <license /> + <license>Apache 2.0 + http://www.apache.org/licenses/LICENSE-2.0</license> </properties> </dependency> </dependencies> Deleted: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200612262257.jar =================================================================== (Binary files differ) Added: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml 2007-02-15 18:14:32 UTC (rev 1496) +++ trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml 2007-02-16 06:04:32 UTC (rev 1497) @@ -12,7 +12,7 @@ <!--Alternative is to change the nutch script so that it includes libs from other than its local directory. Without that, need to have lib local to plugin.--> - <library name="archive-commons-1.11.0-200612262257.jar" /> + <library name="archive-commons-1.11.0-200702160009.jar" /> </runtime> <extension id="org.archive.access.nutch.indexer" This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-16 06:08:20
|
Revision: 1498 http://archive-access.svn.sourceforge.net/archive-access/?rev=1498&view=rev Author: stack-sf Date: 2007-02-15 22:08:15 -0800 (Thu, 15 Feb 2007) Log Message: ----------- * .classpath * project.properties * project.xml * lib/libidn-0.5.9.jar Add in the libidn jar. Needed by wayback cdx'ing. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/.classpath trunk/archive-access/projects/nutchwax/project.properties trunk/archive-access/projects/nutchwax/project.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/lib/libidn-0.5.9.jar Modified: trunk/archive-access/projects/nutchwax/.classpath =================================================================== --- trunk/archive-access/projects/nutchwax/.classpath 2007-02-16 06:04:32 UTC (rev 1497) +++ trunk/archive-access/projects/nutchwax/.classpath 2007-02-16 06:08:15 UTC (rev 1498) @@ -6,15 +6,16 @@ <classpathentry kind="lib" path="lib/commons-httpclient-3.0-rc3.jar"/> <classpathentry kind="lib" path="lib/dsi.unimi.it-1.2.0.jar"/> <classpathentry kind="lib" path="lib/jets3t-0.5.0.jar"/> + <classpathentry kind="lib" path="lib/libidn-0.5.9.jar"/> <classpathentry kind="lib" path="conf"/> - <classpathentry kind="lib" path="build"/> <classpathentry combineaccessrules="false" kind="src" path="/heritrix"/> <classpathentry combineaccessrules="false" kind="src" path="/nutch"/> - <classpathentry combineaccessrules="false" kind="src" path="/Hadoop"/> <classpathentry kind="lib" path="/nutch/lib/servlet-api.jar"/> <classpathentry kind="lib" path="/nutch/lib/commons-logging-1.0.4.jar"/> <classpathentry kind="lib" path="/nutch/lib/junit-3.8.1.jar"/> <classpathentry kind="lib" path="/nutch/conf"/> - <classpathentry kind="lib" path="nutch/build"/> + <classpathentry kind="lib" path="lib/wayback-0.9.0-200702150450.jar" /> + <classpathentry kind="lib" path="/nutch/build"/> + <classpathentry kind="lib" path="build"/> <classpathentry kind="output" path="target"/> </classpath> Added: trunk/archive-access/projects/nutchwax/lib/libidn-0.5.9.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/lib/libidn-0.5.9.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/archive-access/projects/nutchwax/project.properties =================================================================== --- trunk/archive-access/projects/nutchwax/project.properties 2007-02-16 06:04:32 UTC (rev 1497) +++ trunk/archive-access/projects/nutchwax/project.properties 2007-02-16 06:08:15 UTC (rev 1498) @@ -23,6 +23,7 @@ maven.jar.wayback = ${basedir}/lib/wayback-0.9.0-200702150450.jar maven.jar.servlet-api = ${basedir}/nutch/lib/servlet-api.jar maven.jar.commons-codec = ${basedir}/lib/commons-codec-1.3.jar +maven.jar.commons-codec = ${basedir}/lib/libidn-0.5.9.jar maven.jar.commons-httpclient-local = ${basedir}/lib/commons-httpclient-3.0-rc3.jar maven.jar.jets3t = ${basedir}/lib/jets3t-0.5.0.jar maven.jar.local-commons-logging = ${basedir}/nutch/lib/commons-logging-1.0.4.jar Modified: trunk/archive-access/projects/nutchwax/project.xml =================================================================== --- trunk/archive-access/projects/nutchwax/project.xml 2007-02-16 06:04:32 UTC (rev 1497) +++ trunk/archive-access/projects/nutchwax/project.xml 2007-02-16 06:08:15 UTC (rev 1498) @@ -297,6 +297,23 @@ http://www.apache.org/licenses/LICENSE-2.0</license> </properties> </dependency> + <dependency> + <id>libidn</id> + <version>0.5.9</version> + <url>http://www.gnu.org/software/libidn/</url> + <properties> + <war.bundle>true</war.bundle> + <ear.bundle>true</ear.bundle> + <ear.bundle.dir>APP-INF/lib</ear.bundle.dir> + <description>GNU Libidn is an implementation of the Stringprep, + Punycode and IDNA specifications defined by the IETF + Internationalized Domain Names (IDN) working group, used for + internationalized domain names. + </description> + <license>GNU Lesser General Public License + http://www.gnu.org/licenses/lgpl.txt</license> + </properties> + </dependency> </dependencies> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-20 21:15:58
|
Revision: 1505 http://archive-access.svn.sourceforge.net/archive-access/?rev=1505&view=rev Author: stack-sf Date: 2007-02-20 13:13:09 -0800 (Tue, 20 Feb 2007) Log Message: ----------- Use subversion svn:externals feature to maintain the nutchwax nutch dependency. See http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-7-sect-3. Suggested a while back by Doug Cutting. No more need of independent nutch checkout. * . Added third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk * src/java/overview.html Amend how to build from src instruction. * src/plugin/build-plugin.xml Point at nutch over in its new third-party subdirectory * README.txt Remove hadoop checksum error patch reference and the lease patch (its been fixed in recent hadoops). * build.xml Add targets to build our nutch dependency. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/README.txt trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/src/java/overview.html trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml Property Changed: ---------------- trunk/archive-access/projects/nutchwax/ Property changes on: trunk/archive-access/projects/nutchwax ___________________________________________________________________ Name: svn:externals + third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk Modified: trunk/archive-access/projects/nutchwax/README.txt =================================================================== --- trunk/archive-access/projects/nutchwax/README.txt 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/README.txt 2007-02-20 21:13:09 UTC (rev 1505) @@ -3,10 +3,7 @@ See associated docs directory for requirements, installation and build instruction or visit http://archive-access.sourceforge.net/projects/nutch/. -The rest of the README is taken up with versions of hadoop and nutch that -nutchwax depends on including patches made to hadoop and nutch to releases. - HADOOP VERSION AND PATCHES Hadoop release version is 0.9.2. 0.9.1 fails when you try to use local @@ -14,81 +11,9 @@ hadoop 0.9.1. has it set to true in bundled hadoop-default.xml. See HADOOP-827. -Here is single patch we make against it (TODO: TEST still works and -still needed): -http://issues.apache.org/jira/browse/HADOOP-145 - -Index: src/java/org/apache/hadoop/fs/LocalFileSystem.java -=================================================================== ---- src/java/org/apache/hadoop/fs/LocalFileSystem.java (revision 393675) -+++ src/java/org/apache/hadoop/fs/LocalFileSystem.java (working copy) -@@ -362,6 +362,11 @@ - public void reportChecksumFailure(File f, FSInputStream in, - long start, long length, int crc) { - try { -+ if (getConf().getBoolean("io.skip.checksum.errors", false)) { -+ // If this flag is set, do not move aside the file. -+ LOG.warn("DEBUG: Not moving file " + p.toString()); -+ return; -+ } - // canonicalize f - f = makeAbsolute(f).getCanonicalFile(); - - -If you are seeing jobs fail because of complaints about DFS lease expiration, -try the below patch with an ipc.client.timeout setting of 20 or 30 seconds: - -Index: src/java/org/apache/hadoop/dfs/DFSClient.java -=================================================================== ---- src/java/org/apache/hadoop/dfs/DFSClient.java (revision 409788) -+++ src/java/org/apache/hadoop/dfs/DFSClient.java (working copy) -@@ -403,18 +434,23 @@ - public void run() { - long lastRenewed = 0; - while (running) { -- if (System.currentTimeMillis() - lastRenewed > (LEASE_PERIOD / 2)) { -+ // Divide by 3 instead of by 2 so we start renewing earlier -+ // and set down "ipc.client.timeout" from its 60 to 20 or 30. -+ // See this note for why: -+ // http://mail-archives.apache.org/mod_mbox/lucene-hadoop-dev/200607.mbox/%3C3...@ya...%3E -+ if (System.currentTimeMillis() - lastRenewed > (LEASE_PERIOD / 3)) { - try { - namenode.renewLease(clientName); - lastRenewed = System.currentTimeMillis(); - } catch (IOException ie) { - String err = StringUtils.stringifyException(ie); -- LOG.warning("Problem renewing lease for " + clientName + -+ LOG.warn("Problem renewing lease for " + clientName + - ": " + err); - } - } - try { -- Thread.sleep(1000); -+ // Renew every 3 seconds, not every 1 second. -+ Thread.sleep(1000 * 3); - } catch (InterruptedException ie) { - } - } - - NUTCH VERSION AND PATCHES -Version of nutch on builds.archive.org NutchWAX is built against. - -stack@bregeon:~/workspace/nutch$ svn info -Path: . -URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk -Repository Root: http://svn.apache.org/repos/asf -Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 -Revision: 492357 -Node Kind: directory -Schedule: normal -Last Changed Author: ab -Last Changed Rev: 491291 -Last Changed Date: 2006-12-30 11:13:06 -0800 (Sat, 30 Dec 2006) -Properties Last Updated: 2007-01-03 11:34:45 -0800 (Wed, 03 Jan 2007) - Below are patches made against the nutch thats built into nutchwax. You may be able to do without them. Apply if you you are OOME'ing because too many links found building crawldb or merging segments. Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-20 21:13:09 UTC (rev 1505) @@ -1,13 +1,13 @@ <?xml version="1.0"?> -<project name="nutchwax" default="war"> +<project name="nutchwax" default="all"> <property name="name" value="${ant.project.name}"/> <property name="root" value="${basedir}"/> <!--'nutch.root' is pointer at core nutch. Expect to find it in - 'basedir' named 'nutch'. + '${basedir}/third-party' named 'nutch'. --> - <property name="nutch.root" location="${root}/nutch"/> + <property name="nutch.root" location="${root}/third-party/nutch"/> <property file="${user.home}/.$(name}.build.properties" /> @@ -57,6 +57,19 @@ <path refid="classpath"/> </path> + <target name="third.party.jar"> + <echo message="Building nutch third-party dependency (jar)" /> + <ant dir="third-party/nutch" target="jar" inheritAll="false"/> + </target> + <target name="third.party.war"> + <echo message="Building nutch third-party dependency (war)" /> + <ant dir="third-party/nutch" target="war" inheritAll="false"/> + </target> + <target name="third.party.clean"> + <echo message="Cleaning nutch third-party dependency" /> + <ant dir="third-party/nutch" target="clean" inheritAll="false"/> + </target> + <!-- ====================================================== --> <!-- Stuff needed by all targets --> <!-- ====================================================== --> @@ -167,6 +180,12 @@ </zip> </target> + <!-- ================================================================== --> + <!-- Build all including third-party dependencies (i.e. nutch) --> + <!-- ================================================================== --> + <!-- --> + <!-- ================================================================== --> + <target name="all" depends="third.party.jar,third.party.war,jar,compile,war" /> <!-- ================================================================== --> <!-- Compile test code --> @@ -290,4 +309,12 @@ <delete dir="${build.dir}"/> </target> + <!-- ================================================================== --> + <!-- Clean all. Delete the build files including third-party builds --> + <!-- and their directories --> + <!-- ================================================================== --> + <target name="clean-all" + depends="clean,third.party.clean" + description="Clean up all built including third-party dependencies" /> + </project> Modified: trunk/archive-access/projects/nutchwax/src/java/overview.html =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/overview.html 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/src/java/overview.html 2007-02-20 21:13:09 UTC (rev 1505) @@ -383,26 +383,17 @@ </a></li> (See the NutchWAX README for details). </ol> <p>Checkout NutchWAX [See <a href="http://sourceforge.net/svn/?group_id=118427">Source Repository</a> for how]. +As the checkout runs, subversion will fetch the version of nutch the NutchWAX trunk is pegged against into +the <code>${NUTCHWAX_HOME}/third-party</code> directory using +the <a href="http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-7-sect-3">svn:externals</a> mechanism. </p> -<p>Make a symbolic link under NutchWAX to your Nutch checkout: -<pre> - % ln -s ${NUTCH_HOME} ${NUTCHWAX_HOME}/nutch -orojects/nutch/project.xml.r1445: <connection>scm:cvs:pserver:ano...@ar...:/cvsroot/archive-access:archive-access/projects/nutch</connection> -</pre> </p> -<p>Build Nutch: -orojects/nutch/project.xml.mine: http://sourceforge.net/mailarchive/forum.php?forum=archive-access-cvs +<p>To build NutchWAX and its nutch dependency, run the default 'all' target: <pre> - % cd ${NUTCH_HOME} - % ant jar war -</pre> -</p> -<p>To build NutchWAX, do the same: -<pre> % cd ${NUTCHWAX_HOME} - % ant jar war - % cd ${NUTCHWAX_HOME} + % ant all </pre> +This will generate the NutchWAX jar and war. </p> <p>To build the NutchWAX site or distribution, run maven: <pre> Modified: trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-17 01:06:39 UTC (rev 1504) +++ trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-20 21:13:09 UTC (rev 1505) @@ -16,7 +16,12 @@ <property file="${user.home}/$(name}.build.properties" /> <property file="${root}/build.properties" /> + <!--Point at nutchwax home instead of at nutch. + --> <property name="nutch.root" location="${root}/../../../"/> + <!--Point at nutch under third-party subdir. + --> + <property name="real.nutch.root" location="${nutch.root}/third-party/nutch"/> <property name="src.dir" location="${root}/src/java"/> <property name="src.test" location="${root}/src/test"/> @@ -50,11 +55,11 @@ <include name="*.jar" /> </fileset> <!--IA: Add the nutch jars.--> - <fileset dir="${nutch.root}/nutch/lib"> + <fileset dir="${real.nutch.root}/lib"> <include name="*.jar" /> </fileset> <!--IA: Add nutch classes.--> - <pathelement location="${nutch.root}/nutch/build/classes"/> + <pathelement location="${real.nutch.root}/build/classes"/> </path> <!-- the unit test classpath --> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-21 16:19:28
|
Revision: 1507 http://archive-access.svn.sourceforge.net/archive-access/?rev=1507&view=rev Author: stack-sf Date: 2007-02-21 08:19:09 -0800 (Wed, 21 Feb 2007) Log Message: ----------- Move to nutch revision 508238 (from 492357). Includes move to hadoop 0.10.1. * . Update svn:externals. - third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk + third-party/nutch -r 508238 http://svn.apache.org/repos/asf/lucene/nutch/trunk * src/java/org/archive/access/nutch/Nutchwax.java (invert): Add 'force removal of locks' to signature. * src/java/org/archive/access/nutch/NutchwaxIndexer.java Call parents indexer mapper. * src/java/org/archive/access/nutch/ImportArcs.java Put collection name from command line into job conf. * src/java/org/archive/access/nutch/NutchwaxLinkDb.java Add in lock handling for linkdb from parent. Revision Links: -------------- http://archive-access.svn.sourceforge.net/archive-access/?rev=508238&view=rev Modified Paths: -------------- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/Nutchwax.java trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxIndexer.java trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxLinkDb.java Property Changed: ---------------- trunk/archive-access/projects/nutchwax/ Property changes on: trunk/archive-access/projects/nutchwax ___________________________________________________________________ Name: svn:externals - third-party/nutch -r 492357 http://svn.apache.org/repos/asf/lucene/nutch/trunk + third-party/nutch -r 508238 http://svn.apache.org/repos/asf/lucene/nutch/trunk Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-20 22:30:08 UTC (rev 1506) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-21 16:19:09 UTC (rev 1507) @@ -245,6 +245,8 @@ this.filters = new URLFilters(job); this.parseUtil = new ParseUtil(job); + + this.collectionName = job.get(ImportArcs.WAX_SUFFIX + ImportArcs.ARCCOLLECTION_KEY); } public void onARCOpen() { @@ -878,4 +880,4 @@ return -1; } } -} \ No newline at end of file +} Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/Nutchwax.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/Nutchwax.java 2007-02-20 22:30:08 UTC (rev 1506) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/Nutchwax.java 2007-02-21 16:19:09 UTC (rev 1507) @@ -216,14 +216,14 @@ throws IOException { createLinkdb(od); new NutchwaxLinkDb(getJobConf()). - invert(od.getLinkDb(), segments, true, true); + invert(od.getLinkDb(), segments, true, true, false); } protected void doInvert(final OutputDirectories od) throws IOException { LOG.info("inverting links in " + od.getSegments()); new NutchwaxLinkDb(getJobConf()). - invert(od.getLinkDb(), getSegments(od), true, true); + invert(od.getLinkDb(), getSegments(od), true, true, false); } protected boolean createLinkdb(final OutputDirectories od) Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxIndexer.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxIndexer.java 2007-02-20 22:30:08 UTC (rev 1506) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxIndexer.java 2007-02-21 16:19:09 UTC (rev 1507) @@ -83,6 +83,7 @@ job.addInputPath(new Path(linkDb, LinkDb.CURRENT_NAME)); job.setInputFormat(SequenceFileInputFormat.class); + job.setMapperClass(Indexer.class); job.setReducerClass(NutchwaxIndexer.class); job.setOutputPath(indexDir); Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxLinkDb.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxLinkDb.java 2007-02-20 22:30:08 UTC (rev 1506) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/NutchwaxLinkDb.java 2007-02-21 16:19:09 UTC (rev 1507) @@ -26,6 +26,7 @@ import org.apache.nutch.net.URLNormalizers; import org.apache.nutch.parse.Outlink; import org.apache.nutch.parse.ParseData; +import org.apache.nutch.util.LockUtil; import org.apache.nutch.util.NutchJob; /** @@ -150,8 +151,12 @@ } public void invert(Path linkDb, final Path[] segments, - final boolean normalize, final boolean filter) + final boolean normalize, final boolean filter, boolean force) throws IOException { + Path lock = new Path(linkDb, LOCK_NAME); + FileSystem fs = FileSystem.get(getConf()); + LockUtil.createLockFile(fs, lock, force); + Path currentLinkDb = new Path(linkDb, CURRENT_NAME); if (LOG.isInfoEnabled()) { LOG.info("NutchwaxLinkDb: starting"); LOG.info("NutchwaxLinkDb: linkdb: " + linkDb); @@ -159,16 +164,19 @@ LOG.info("LinkDb: URL filter: " + filter); } JobConf job = createJob(getConf(), linkDb, normalize, filter); - for (int i = 0; i < segments.length; i++) { if (LOG.isInfoEnabled()) { LOG.info("LinkDb: adding segment: " + segments[i]); } job.addInputPath(new Path(segments[i], ParseData.DIR_NAME)); } - JobClient.runJob(job); - FileSystem fs = FileSystem.get(getConf()); - if (fs.exists(linkDb)) { + try { + JobClient.runJob(job); + } catch (IOException e) { + LockUtil.removeLockFile(fs, lock); + throw e; + } + if (fs.exists(currentLinkDb)) { if (LOG.isInfoEnabled()) { LOG.info("LinkDb: merging with existing linkdb: " + linkDb); } @@ -178,9 +186,15 @@ job.setJobName("NutchwaxLinkDb merge " + linkDb + " " + Arrays.asList(segments)); job.setMapperClass(NutchwaxLinkDbFilter.class); - job.addInputPath(new Path(linkDb, CURRENT_NAME)); + job.addInputPath(currentLinkDb); job.addInputPath(newLinkDb); - JobClient.runJob(job); + try { + JobClient.runJob(job); + } catch (IOException e) { + LockUtil.removeLockFile(fs, lock); + fs.delete(newLinkDb); + throw e; + } fs.delete(newLinkDb); } LinkDb.install(job, linkDb); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-22 22:32:16
|
Revision: 1512 http://archive-access.svn.sourceforge.net/archive-access/?rev=1512&view=rev Author: stack-sf Date: 2007-02-22 14:32:13 -0800 (Thu, 22 Feb 2007) Log Message: ----------- Add first cut at m2 pom for nutchwax. Just delegates to ant build.xml * src/plugin/build-plugin.xml * build.xml * maven.xml Build into target instead of into build. * pom.xml m2 pom for nutchwax. site, docbook, sourceforge update still todo. * project.properties * project.xml We moved to new nutch (and new hadoop) a few days ago. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/maven.xml trunk/archive-access/projects/nutchwax/project.properties trunk/archive-access/projects/nutchwax/project.xml trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/pom.xml Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-22 20:09:13 UTC (rev 1511) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-22 22:32:13 UTC (rev 1512) @@ -18,7 +18,7 @@ <property name="conf.dir" location="${root}/conf"/> - <property name="build.dir" location="${root}/build"/> + <property name="build.dir" location="${root}/target"/> <property name="build.classes" location="${build.dir}/classes"/> <property name="build.test" location="${build.dir}/test"/> @@ -57,17 +57,17 @@ <path refid="classpath"/> </path> - <target name="third.party.jar"> + <target name="third.party.jar" description="Build third-party jars"> <echo message="Building nutch third-party dependency (jar)" /> - <ant dir="third-party/nutch" target="jar" inheritAll="false"/> + <ant dir="third-party/nutch" target="jar" inheritAll="false" /> </target> - <target name="third.party.war"> + <target name="third.party.war" description="Build third-party wars"> <echo message="Building nutch third-party dependency (war)" /> - <ant dir="third-party/nutch" target="war" inheritAll="false"/> + <ant dir="third-party/nutch" target="war" inheritAll="false" /> </target> - <target name="third.party.clean"> + <target name="third.party.clean" description="Clean third-party software"> <echo message="Cleaning nutch third-party dependency" /> - <ant dir="third-party/nutch" target="clean" inheritAll="false"/> + <ant dir="third-party/nutch" target="clean" inheritAll="false" /> </target> <!-- ====================================================== --> @@ -273,7 +273,7 @@ </lib> <!--Copy into place the nutchwax classes.--> <zipfileset prefix="WEB-INF/classes" - dir="${root}/build/classes/" /> + dir="${build.dir}/classes/" /> <!--Be selective about plugins to copy. Shrinks size of webapp. --> @@ -294,7 +294,7 @@ <include name="urlfilter-*/**" /> </zipfileset> <zipfileset prefix="WEB-INF/classes/plugins" - dir="${root}/build/wax-plugins"/> + dir="${build.dir}/wax-plugins"/> <webinf dir="${nutch.root}/lib"> <include name="taglibs-*.tld"/> </webinf> Modified: trunk/archive-access/projects/nutchwax/maven.xml =================================================================== --- trunk/archive-access/projects/nutchwax/maven.xml 2007-02-22 20:09:13 UTC (rev 1511) +++ trunk/archive-access/projects/nutchwax/maven.xml 2007-02-22 22:32:13 UTC (rev 1512) @@ -63,10 +63,11 @@ <attainGoal name="ant:clean" /> --> </preGoal> + <goal name="jar:jar"><!--Block building of jar--></goal> <postGoal name="dist:build-setup"> - <ant:available file="${basedir}/build/nutchwax.jar" + <ant:available file="${basedir}/target/nutchwax.jar" property="job.jar.exists"/> <ant:fail message="Must run ant 'jar' and 'war' targets before maven dist" @@ -86,9 +87,8 @@ filtering="true" overwrite="true" > <fileset dir="${basedir}/bin" /> </copy> - <!--Copy over war and jar made by the ant build.--> <copy todir="${maven.dist.bin.assembly.dir}"> - <fileset dir="${basedir}/build/"> + <fileset dir="${basedir}/target/"> <include name="nutchwax.war"/> <include name="nutchwax.jar"/> </fileset> Added: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-22 22:32:13 UTC (rev 1512) @@ -0,0 +1,171 @@ +<?xml version="1.0"?> +<!-- + POM reference: http://maven.apache.org/pom.html + + List of the better articles on maven: + + http://www.javaworld.com/javaworld/jw-05-2006/jw-0529-maven.html + http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-maven_p.html + + URLs on converting from 1.0 to 2.0 maven (not much good generally): + + http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade + http://maven.apache.org/guides/mini/guide-m1-m2.html + --> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> + + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.access</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11-0-SNAPSHOT</version> + <packaging>pom</packaging> + + <name>NutchWAX</name> + <description>NutchWAX is + <i> + "<a href="http://nutch.org">Nutch</a> Web Archive eXtensions" + </i>. Nutch + NutchWAX can be used search Web Archive Collections + (WACs). Extensions include adaptation of the Nutch fetcher step to go + against web archives rather than open net. Index-time and + query-time plugins + add to the index and allow querying of a records' WAC + location info., collection name, etc. This project is sponsored by the + <a href="http://netpreserve.org">International Internet Preservation + Consortium</a>. + </description> + <url>http://archive-access.sourceforge.net/projects/nutchwax/</url> + <inceptionYear>2005</inceptionYear> + + <licenses> + <license> + <name>GNU LESSER GENERAL PUBLIC LICENSE</name> + <url>http://www.gnu.org/licenses/lgpl.txt</url> + <distribution>repo</distribution> + </license> + </licenses> + + <organization> + <name>Internet Archive</name> + <url>http://www.archive.org/</url> + </organization> + + <issueManagement> + <system>SourceForge</system> + <url>http://sourceforge.net/tracker/?group_id=118427</url> + </issueManagement> + <ciManagement> + <system>cruisecontrol</system> + <url>http://builds.archive.org:8080/cruisecontrol/</url> + </ciManagement> + <mailingLists> + <mailingList> + <name>Archive Access ARC Tools Discussion List</name> + <subscribe> + http://lists.sourceforge.net/lists/listinfo/archive-access-discuss + </subscribe> + <unsubscribe> + http://lists.sourceforge.net/lists/listinfo/archive-access-discuss + </unsubscribe> + <post>archive-access-discuss</post> + <archive> + http://sourceforge.net/mailarchive/forum.php?forum_id=45842 + </archive> + </mailingList> + <mailingList> + <name>Archive Access ARC Tools Commits</name> + <subscribe> + https://lists.sourceforge.net/lists/listinfo/archive-access-cvs + </subscribe> + <unsubscribe> + https://lists.sourceforge.net/lists/listinfo/archive-access-cvs + </unsubscribe> + <post>archive-access-cvs</post> + <archive> + http://sourceforge.net/mailarchive/forum.php?forum=archive-access-cvs + </archive> + </mailingList> + </mailingLists> + <scm> + <connection>scm:svn:https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</connection> + <tag>HEAD</tag> + <url>https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</url> + </scm> + + <prerequisites> + <maven>2.0.4</maven> + </prerequisites> + + + <dependencies> + <dependency> + <groupId>junit</groupId> + <artifactId>junit</artifactId> + <version>3.8.1</version> + <scope>test</scope> + </dependency> + + <dependency> + <groupId>org.archive</groupId> + <artifactId>archive-commons</artifactId> + <version>1.11.0-SNAPSHOT</version> + </dependency> + + + </dependencies> + <build> + <plugins> + + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution > + <id>antrun.compile</id> + <phase>compile</phase> + <configuration> + <tasks> + <echo>Compiling third.party dependencies and nutchwax</echo> + <!--From http://www.mail-archive.com/us...@ma.../msg60131.html --> + <property name="build.compiler" value="extJavac"/> + <ant target="third.party.jar"/> + <ant target="compile"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution > + <id>antrun.package</id> + <phase>package</phase> + <configuration> + <tasks> + <echo>Assembling JAR and WAR targets</echo> + <ant target="jar"/> + <ant target="war"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution > + <id>antrun.clean</id> + <phase>clean</phase> + <configuration> + <tasks> + <echo>Cleaning nutchwax</echo> + <ant target="clean-all"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> + +</project> Modified: trunk/archive-access/projects/nutchwax/project.properties =================================================================== --- trunk/archive-access/projects/nutchwax/project.properties 2007-02-22 20:09:13 UTC (rev 1511) +++ trunk/archive-access/projects/nutchwax/project.properties 2007-02-22 22:32:13 UTC (rev 1512) @@ -18,7 +18,7 @@ # Local jars to add to classpath. maven.jar.override = on maven.jar.corenutch = ${basedir}/third-party/nutch/build/nutch-0.9-dev.jar -maven.jar.hadoop = ${basedir}/third-party/nutch/lib/hadoop-0.9.2.jar +maven.jar.hadoop = ${basedir}/third-party/nutch/lib/hadoop-0.10.1-core.jar maven.jar.archive-commons = ${basedir}/lib/archive-commons-1.11.0-200702160009.jar maven.jar.wayback = ${basedir}/lib/wayback-0.9.0-200702150450.jar maven.jar.servlet-api = ${basedir}/third-party/nutch/lib/servlet-api.jar Modified: trunk/archive-access/projects/nutchwax/project.xml =================================================================== --- trunk/archive-access/projects/nutchwax/project.xml 2007-02-22 20:09:13 UTC (rev 1511) +++ trunk/archive-access/projects/nutchwax/project.xml 2007-02-22 22:32:13 UTC (rev 1512) @@ -175,7 +175,7 @@ </dependency> <dependency> <id>hadoop</id> - <version>0.9.2</version> + <version>0.10.1</version> <url>http://lucene.apache.org/hadoop</url> <properties> <war.bundle>true</war.bundle> Modified: trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-22 20:09:13 UTC (rev 1511) +++ trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-02-22 22:32:13 UTC (rev 1512) @@ -30,11 +30,11 @@ <property name="conf.dir" location="${nutch.root}/conf"/> - <property name="build.dir" location="${nutch.root}/build/${name}"/> + <property name="build.dir" location="${nutch.root}/target/${name}"/> <property name="build.classes" location="${build.dir}/classes"/> <property name="build.test" location="${build.dir}/test"/> - <property name="deploy.dir" location="${nutch.root}/build/wax-plugins/${name}"/> + <property name="deploy.dir" location="${nutch.root}/target//wax-plugins/${name}"/> <property name="javac.deprecation" value="off"/> <property name="javac.debug" value="on"/> @@ -50,7 +50,7 @@ <path id="classpath"> <pathelement location="${build.classes}"/> <fileset refid="lib.jars"/> - <pathelement location="${nutch.root}/build/classes"/> + <pathelement location="${nutch.root}/target/classes"/> <fileset dir="${nutch.root}/lib"> <include name="*.jar" /> </fileset> @@ -65,10 +65,10 @@ <!-- the unit test classpath --> <path id="test.classpath"> <pathelement location="${build.test}" /> - <pathelement location="${nutch.root}/build/test/classes"/> + <pathelement location="${nutch.root}/target/test/classes"/> <pathelement location="${nutch.root}/src/test"/> <pathelement location="${conf.dir}"/> - <pathelement location="${nutch.root}/build"/> + <pathelement location="${nutch.root}/target"/> <path refid="classpath"/> </path> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-23 00:38:14
|
Revision: 1513 http://archive-access.svn.sourceforge.net/archive-access/?rev=1513&view=rev Author: stack-sf Date: 2007-02-22 16:38:13 -0800 (Thu, 22 Feb 2007) Log Message: ----------- A nutchwax/src/main A nutchwax/src/main/assembly A nutchwax/src/main/assembly/src-distribution.xml A nutchwax/src/main/assembly/bin-distribution.xml A nutchwax/src/main/filters A nutchwax/src/main/filters/filter.properties M nutchwax/pom.xml M nutchwax/build.xml Add assembler descriptors for src and bin. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/pom.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/src/main/ trunk/archive-access/projects/nutchwax/src/main/assembly/ trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml trunk/archive-access/projects/nutchwax/src/main/filters/ trunk/archive-access/projects/nutchwax/src/main/filters/filter.properties Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-22 22:32:13 UTC (rev 1512) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-23 00:38:13 UTC (rev 1513) @@ -59,15 +59,21 @@ <target name="third.party.jar" description="Build third-party jars"> <echo message="Building nutch third-party dependency (jar)" /> - <ant dir="third-party/nutch" target="jar" inheritAll="false" /> + <ant dir="third-party/nutch" target="jar" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> </target> <target name="third.party.war" description="Build third-party wars"> <echo message="Building nutch third-party dependency (war)" /> - <ant dir="third-party/nutch" target="war" inheritAll="false" /> + <ant dir="third-party/nutch" target="war" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> </target> <target name="third.party.clean" description="Clean third-party software"> <echo message="Cleaning nutch third-party dependency" /> - <ant dir="third-party/nutch" target="clean" inheritAll="false" /> + <ant dir="third-party/nutch" target="clean" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> </target> <!-- ====================================================== --> @@ -102,7 +108,9 @@ <!-- ====================================================== --> <target name="compile-plugins" description="Compile all nutchwax plugins"> - <ant dir="src/plugin" target="deploy" inheritAll="false"/> + <ant dir="src/plugin" target="deploy" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> </target> <!-- ================================================================== --> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-22 22:32:13 UTC (rev 1512) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-23 00:38:13 UTC (rev 1513) @@ -118,6 +118,21 @@ <plugins> <plugin> + <artifactId>maven-assembly-plugin</artifactId> + + <configuration> + <filters> + <filter>src/main/filters/filter.properties</filter> + </filters> + <descriptors> + <descriptor>src/main/assembly/bin-distribution.xml</descriptor> + <descriptor>src/main/assembly/src-distribution.xml</descriptor> + </descriptors> + </configuration> + + </plugin> + + <plugin> <artifactId>maven-antrun-plugin</artifactId> <executions> <execution > @@ -126,10 +141,7 @@ <configuration> <tasks> <echo>Compiling third.party dependencies and nutchwax</echo> - <!--From http://www.mail-archive.com/us...@ma.../msg60131.html --> - <property name="build.compiler" value="extJavac"/> <ant target="third.party.jar"/> - <ant target="compile"/> </tasks> </configuration> <goals> @@ -142,6 +154,7 @@ <configuration> <tasks> <echo>Assembling JAR and WAR targets</echo> + <ant target="compile"/> <ant target="jar"/> <ant target="war"/> </tasks> Added: trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml 2007-02-23 00:38:13 UTC (rev 1513) @@ -0,0 +1,29 @@ +<assembly> + <id>bin</id> + <formats> + <format>tar.gz</format> + <format>zip</format> + </formats> + <fileSets> + <fileSet> + <includes> + <include>*.txt</include> + </includes> + </fileSet> + <fileSet> + <directory>bin</directory> + <fileMode>0744</fileMode> + </fileSet> + <fileSet> + <directory>target</directory> + <outputDirectory /> + <includes> + <include>nutchwax*.jar</include> + <include>nutchwax*.war</include> + </includes> + </fileSet> + <fileSet> + <directory>target/docs</directory> + </fileSet> + </fileSets> +</assembly> Added: trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml 2007-02-23 00:38:13 UTC (rev 1513) @@ -0,0 +1,38 @@ +<assembly> + <id>src</id> + <formats> + <format>tar.gz</format> + <format>zip</format> + </formats> + <fileSets> + <fileSet> + <includes> + <include>*.txt</include> + <include>pom.xml</include> + <include>build.xml</include> + </includes> + </fileSet> + <fileSet> + <directory>bin</directory> + <includes> + <include>**/**</include> + </includes> + <fileMode>0744</fileMode> + </fileSet> + <fileSet> + <directory>src</directory> + </fileSet> + <fileSet> + <directory>conf</directory> + </fileSet> + <fileSet> + <directory>xdocs</directory> + </fileSet> + <fileSet> + <directory>third-party</directory> + <excludes> + <exclude>**/build/**</exclude> + </excludes> + </fileSet> + </fileSets> +</assembly> Added: trunk/archive-access/projects/nutchwax/src/main/filters/filter.properties =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/filters/filter.properties (rev 0) +++ trunk/archive-access/projects/nutchwax/src/main/filters/filter.properties 2007-02-23 00:38:13 UTC (rev 1513) @@ -0,0 +1,2 @@ +variable1=value1 +variable2=value2 This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 01:57:12
|
Revision: 1516 http://archive-access.svn.sourceforge.net/archive-access/?rev=1516&view=rev Author: stack-sf Date: 2007-02-26 17:57:11 -0800 (Mon, 26 Feb 2007) Log Message: ----------- * src/site/site.xml Added m2 site.xml. Needs work. * pom.xml Upped m2 requirement. Point at IA repository. * build.xml Added a build nutch plugins target. Made jar and war depend on it. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/pom.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/src/site/ trunk/archive-access/projects/nutchwax/src/site/site.xml Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-27 00:03:41 UTC (rev 1515) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-27 01:57:11 UTC (rev 1516) @@ -57,13 +57,21 @@ <path refid="classpath"/> </path> - <target name="third.party.jar" description="Build third-party jars"> + <target name="third.party.plugins" description="Build third-party plugins"> + <echo message="Building nutch third-party dependency (plugins)" /> + <ant dir="third-party/nutch" target="compile-plugins" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> + </target> + <target name="third.party.jar" description="Build third-party jars" + depends="third.party.plugins"> <echo message="Building nutch third-party dependency (jar)" /> <ant dir="third-party/nutch" target="jar" inheritAll="false" > <property name="build.compiler" value="extJavac" /> </ant> </target> - <target name="third.party.war" description="Build third-party wars"> + <target name="third.party.war" description="Build third-party wars" + depends="third.party.plugins"> <echo message="Building nutch third-party dependency (war)" /> <ant dir="third-party/nutch" target="war" inheritAll="false" > <property name="build.compiler" value="extJavac" /> @@ -90,6 +98,7 @@ <!-- ====================================================== --> <target name="compile" depends="init" description="Compile nutchwax classes"> + <property name="build.compiler" value="extJavac" /> <javac encoding="${build.encoding}" srcdir="${src.dir}" Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 00:03:41 UTC (rev 1515) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 01:57:11 UTC (rev 1516) @@ -87,6 +87,7 @@ </archive> </mailingList> </mailingLists> + <scm> <connection>scm:svn:https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</connection> <tag>HEAD</tag> @@ -94,11 +95,12 @@ </scm> <prerequisites> - <maven>2.0.4</maven> + <maven>2.0.5</maven> </prerequisites> <dependencies> + <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> @@ -112,14 +114,18 @@ <version>1.11.0-SNAPSHOT</version> </dependency> - </dependencies> <build> <plugins> + <plugin> + <artifactId>maven-site-plugin</artifactId> + <configuration> + <xdocDirectory>${basedir}/xdocs</xdocDirectory> + </configuration> + </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> - <configuration> <filters> <filter>src/main/filters/filter.properties</filter> @@ -129,7 +135,6 @@ <descriptor>src/main/assembly/src-distribution.xml</descriptor> </descriptors> </configuration> - </plugin> <plugin> @@ -181,4 +186,23 @@ </plugins> </build> + <repositories> + <repository> + <releases> + <enabled>true</enabled> + <updatePolicy>always</updatePolicy> + <checksumPolicy>warn</checksumPolicy> + </releases> + <snapshots> + <enabled>true</enabled> + <updatePolicy>never</updatePolicy> + <checksumPolicy>fail</checksumPolicy> + </snapshots> + <id>internetarchive</id> + <name>Internet Archive Maven Repository</name> + <url>http://builds.archive.org:8080/maven2</url> + <layout>default</layout> + </repository> + </repositories> + </project> Added: trunk/archive-access/projects/nutchwax/src/site/site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/site/site.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 01:57:11 UTC (rev 1516) @@ -0,0 +1,38 @@ +<project name="nutchwax"> + +<bannerLeft> + <name>NutchWAXXXXX</name> + <href>http://maven.apache.org/</href> + <src>http://maven.apache.org/images/apache-maven archive_logo.png</src> +</bannerLeft> + +<bannerRight> + <name>NutchWAXProficio</name> + <src>http://maven.apache.org/images/apache-maven project.png</src> +</bannerRight> + +<skin> + <groupId>org.archive</groupId> + <artifactId>maven-skin</artifactId> + <version>1.0-SNAPSHOT</version> +</skin> + +<publishDate format="dd MMM yyyy" /> +<body> + <links> + <item name="Apache" href="http://www.apache.org/"/> + <item name="Maven" href="http://maven.apache.org/"/> + <item name="Continuum" href="http://maven.apache.org/continuum"/> + </links> + <head> + <meta name="faq" content="proficio"/> + </head> + <menu name="Quick Links"> + <item name="Features" href="/maven-features.html"/> + </menu> + <menu name="About Proficio"> + <item name="What is Proficio?" href="/what-is-maven.html"/> + </menu> + ${reports} +</body> +</project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 16:05:31
|
Revision: 1518 http://archive-access.svn.sourceforge.net/archive-access/?rev=1518&view=rev Author: stack-sf Date: 2007-02-27 07:57:53 -0800 (Tue, 27 Feb 2007) Log Message: ----------- * src/site/site.xml * pom.xml Developing out the nutchwax site using new m2 toolset. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/pom.xml trunk/archive-access/projects/nutchwax/src/site/site.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/src/site/resources/ trunk/archive-access/projects/nutchwax/src/site/resources/css/ trunk/archive-access/projects/nutchwax/src/site/resources/images/ trunk/archive-access/projects/nutchwax/src/site/resources/images/ia_logo.gif trunk/archive-access/projects/nutchwax/src/site/resources/images/iipc.gif trunk/archive-access/projects/nutchwax/src/site/resources/images/nutchwax.jpg trunk/archive-access/projects/nutchwax/src/site/resources/images/nwa.jpg Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 02:01:35 UTC (rev 1517) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 15:57:53 UTC (rev 1518) @@ -119,9 +119,11 @@ <plugins> <plugin> <artifactId>maven-site-plugin</artifactId> - <configuration> - <xdocDirectory>${basedir}/xdocs</xdocDirectory> - </configuration> + <configuration > + <xdocDirectory> + ${basedir}/xdocs + </xdocDirectory> + </configuration > </plugin> <plugin> @@ -185,6 +187,28 @@ </plugin> </plugins> </build> + <reporting> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-project-info-reports-plugin</artifactId> + <reportSets> + <reportSet> + <reports> + <report>dependencies</report> + <report>project-team</report> + <report>mailing-list</report> + <report>cim</report> + <report>issue-tracking</report> + <report>license</report> + <report>scm</report> + <report>javadoc</report> + </reports> + </reportSet> + </reportSets> + </plugin> + </plugins> + </reporting> <repositories> <repository> Added: trunk/archive-access/projects/nutchwax/src/site/resources/images/ia_logo.gif =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/site/resources/images/ia_logo.gif ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/archive-access/projects/nutchwax/src/site/resources/images/iipc.gif =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/site/resources/images/iipc.gif ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/archive-access/projects/nutchwax/src/site/resources/images/nutchwax.jpg =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/site/resources/images/nutchwax.jpg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/archive-access/projects/nutchwax/src/site/resources/images/nwa.jpg =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/site/resources/images/nwa.jpg ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/archive-access/projects/nutchwax/src/site/site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 02:01:35 UTC (rev 1517) +++ trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 15:57:53 UTC (rev 1518) @@ -1,14 +1,16 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> <project name="nutchwax"> <bannerLeft> - <name>NutchWAXXXXX</name> - <href>http://maven.apache.org/</href> - <src>http://maven.apache.org/images/apache-maven archive_logo.png</src> + <name>Internet Archive</name> + <src>images/ia_logo.gif</src> + <href>http://www.archive.org/</href> </bannerLeft> <bannerRight> - <name>NutchWAXProficio</name> - <src>http://maven.apache.org/images/apache-maven project.png</src> + <name>NutchWAX</name> + <src>images/nutchwax.jpg</src> + <href >http://archive-access.sf.net/projects/nutchwax/</href> </bannerRight> <skin> @@ -18,21 +20,26 @@ </skin> <publishDate format="dd MMM yyyy" /> + <body> + <links> - <item name="Apache" href="http://www.apache.org/"/> - <item name="Maven" href="http://maven.apache.org/"/> - <item name="Continuum" href="http://maven.apache.org/continuum"/> + <item name="Sourceforge" href="http://archive-access.sf.net"/> + <item name="Heritrix" href="http://crawler.archive.org"/> </links> - <head> - <meta name="faq" content="proficio"/> - </head> - <menu name="Quick Links"> - <item name="Features" href="/maven-features.html"/> - </menu> - <menu name="About Proficio"> - <item name="What is Proficio?" href="/what-is-maven.html"/> - </menu> + + <menu name="NutchWAX"> + <item name="Downloads" href="downloads.html"/> + <item name="Getting Started" href="apidocs/overview-summary.html#toc"/> + <item name="User Query-time Help" href="help-queries.html"/> + <item name="FAQ" href="faq.html"/> + <item name="Building from Source" href="apidocs/overview-summary.html#src"/> + <item name="Regression Test Suite" href="regress.html"/> + <item name="Praxis" href="practices.html"/> + <item name="Wayback-NutchWAX" href="wayback.html"/> + </menu> + ${reports} + </body> </project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 18:01:32
|
Revision: 1521 http://archive-access.svn.sourceforge.net/archive-access/?rev=1521&view=rev Author: stack-sf Date: 2007-02-27 10:01:29 -0800 (Tue, 27 Feb 2007) Log Message: ----------- Add dependency on new archive-mapred jar. * src/java/org/archive/access/nutch/ImportArcs.java Package for mapreduce classes changes when we add dependency on archive-mapred jar. * src/java/org/archive/access/nutch/mapred/ARCReporter.java * src/java/org/archive/access/nutch/mapred/ARCRecordMapper.java * src/java/org/archive/access/nutch/mapred/ARCMapRunner.java Removed. Import from archive-mapred jar instead. * .classpath Update with new archive-mapred. Change path to nutch jar. * project.xml * project.properties Update with new archive-mapred * lib/archive-mapred-0.1.0-20070227.175246-2.jar Added. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/.classpath trunk/archive-access/projects/nutchwax/project.properties trunk/archive-access/projects/nutchwax/project.xml trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java Added Paths: ----------- trunk/archive-access/projects/nutchwax/lib/archive-mapred-0.1.0-20070227.175246-2.jar Removed Paths: ------------- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCMapRunner.java trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCRecordMapper.java trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCReporter.java Modified: trunk/archive-access/projects/nutchwax/.classpath =================================================================== --- trunk/archive-access/projects/nutchwax/.classpath 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/.classpath 2007-02-27 18:01:29 UTC (rev 1521) @@ -14,8 +14,9 @@ <classpathentry kind="lib" path="/nutch/lib/commons-logging-1.0.4.jar"/> <classpathentry kind="lib" path="/nutch/lib/junit-3.8.1.jar"/> <classpathentry kind="lib" path="/nutch/conf"/> - <classpathentry kind="lib" path="lib/wayback-0.9.0-200702150450.jar" /> - <classpathentry kind="lib" path="/nutch/build"/> - <classpathentry kind="lib" path="build"/> + <classpathentry kind="lib" path="lib/wayback-0.9.0-200702150450.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build"/> + <classpathentry combineaccessrules="false" kind="src" path="/hadoop"/> + <classpathentry kind="lib" path="lib/archive-mapred-0.1.0-20070227.175246-2.jar"/> <classpathentry kind="output" path="target"/> </classpath> Added: trunk/archive-access/projects/nutchwax/lib/archive-mapred-0.1.0-20070227.175246-2.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/lib/archive-mapred-0.1.0-20070227.175246-2.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/archive-access/projects/nutchwax/project.properties =================================================================== --- trunk/archive-access/projects/nutchwax/project.properties 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/project.properties 2007-02-27 18:01:29 UTC (rev 1521) @@ -20,6 +20,7 @@ maven.jar.corenutch = ${basedir}/third-party/nutch/build/nutch-0.9-dev.jar maven.jar.hadoop = ${basedir}/third-party/nutch/lib/hadoop-0.10.1-core.jar maven.jar.archive-commons = ${basedir}/lib/archive-commons-1.11.0-200702160009.jar +maven.jar.archive-mapred = ${basedir}/lib/archive-mapred-0.1.0-20070227.175246-2.jar maven.jar.wayback = ${basedir}/lib/wayback-0.9.0-200702150450.jar maven.jar.servlet-api = ${basedir}/third-party/nutch/lib/servlet-api.jar maven.jar.commons-codec = ${basedir}/lib/commons-codec-1.3.jar Modified: trunk/archive-access/projects/nutchwax/project.xml =================================================================== --- trunk/archive-access/projects/nutchwax/project.xml 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/project.xml 2007-02-27 18:01:29 UTC (rev 1521) @@ -273,6 +273,17 @@ </properties> </dependency> <dependency> + <id>archive-mapred</id> + <version>0.1.0-SNAPSHOT</version> + <url>http://archive-access.sf.net/projects/mapred/</url> + <properties> + <war.bundle>true</war.bundle> + <description>Archive mapreduce classes. + </description> + <license>LGPL</license> + </properties> + </dependency> + <dependency> <id>wayback</id> <version>0.9.0</version> <url>http://builds.archive.org:8080/cruisecontrol/buildresults/HEAD-archive-access</url> Modified: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/ImportArcs.java 2007-02-27 18:01:29 UTC (rev 1521) @@ -82,11 +82,11 @@ import org.apache.nutch.util.mime.MimeType; import org.apache.nutch.util.mime.MimeTypeException; import org.apache.nutch.util.mime.MimeTypes; -import org.archive.access.nutch.mapred.ARCMapRunner; -import org.archive.access.nutch.mapred.ARCRecordMapper; -import org.archive.access.nutch.mapred.ARCReporter; import org.archive.io.arc.ARCRecord; import org.archive.io.arc.ARCRecordMetaData; +import org.archive.mapred.ARCMapRunner; +import org.archive.mapred.ARCRecordMapper; +import org.archive.mapred.ARCReporter; import org.archive.util.Base32; import org.archive.util.MimetypeUtils; import org.archive.util.TextUtils; Deleted: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCMapRunner.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCMapRunner.java 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCMapRunner.java 2007-02-27 18:01:29 UTC (rev 1521) @@ -1,263 +0,0 @@ -/* - * $Id: ImportArcs.java 1494 2007-02-15 17:47:58Z stack-sf $ - * - * Copyright (C) 2007 Internet Archive. - * - * This file is part of the archive-access tools project - * (http://sourceforge.net/projects/archive-access). - * - * The archive-access tools are free software; you can redistribute them and/or - * modify them under the terms of the GNU Lesser Public License as published by - * the Free Software Foundation; either version 2.1 of the License, or any - * later version. - * - * The archive-access tools are distributed in the hope that they will be - * useful, but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser - * Public License for more details. - * - * You should have received a copy of the GNU Lesser Public License along with - * the archive-access tools; if not, write to the Free Software Foundation, - * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -package org.archive.access.nutch.mapred; - -import java.io.IOException; -import java.util.Iterator; - -import org.apache.commons.logging.Log; -import org.apache.commons.logging.LogFactory; -import org.apache.hadoop.io.ObjectWritable; -import org.apache.hadoop.io.Text; -import org.apache.hadoop.io.Writable; -import org.apache.hadoop.io.WritableComparable; -import org.apache.hadoop.mapred.JobConf; -import org.apache.hadoop.mapred.MapRunnable; -import org.apache.hadoop.mapred.OutputCollector; -import org.apache.hadoop.mapred.RecordReader; -import org.apache.hadoop.mapred.Reporter; -import org.apache.hadoop.util.ReflectionUtils; -import org.archive.io.ArchiveReader; -import org.archive.io.ArchiveReaderFactory; -import org.archive.io.arc.ARCConstants; -import org.archive.io.arc.ARCRecord; - -/** - * MapRunner that passes an ARCRecord to configured mapper. - * Configured mapper must be implementation of {@link ARCMapRunner}. - * @author stack - */ -public class ARCMapRunner implements MapRunnable { - public final Log LOG = LogFactory.getLog(this.getClass().getName()); - private ARCRecordMapper mapper; - - /** - * How long to spend indexing. - */ - private long maxtime; - - - public void configure(JobConf job) { - this.mapper = (ARCRecordMapper)ReflectionUtils. - newInstance(job.getMapperClass(), job); - // Value is in minutes. - this.maxtime = job.getLong("wax.index.timeout", 60) * 60 * 1000; - } - - public void run(RecordReader input, OutputCollector output, - Reporter reporter) - throws IOException { - try { - WritableComparable key = input.createKey(); // Unused. - Writable value = input.createValue(); - while (input.next(key, value)) { - doArc(value.toString(), output, new ARCReporter(reporter)); - } - } finally { - this.mapper.close(); - } - } - - protected void doArc(final String arcurl, final OutputCollector output, - final ARCReporter reporter) - throws IOException { - if ((arcurl == null) || arcurl.endsWith("work")) { - reporter.setStatus("skipping " + arcurl, true); - return; - } - - // Set off indexing in a thread so I can cover it with a timer. - final Thread t = new IndexingThread(arcurl, output, reporter); - t.setDaemon(true); - t.start(); - final long start = System.currentTimeMillis(); - try { - for (long period = this.maxtime; t.isAlive() && (period > 0); - period = this.maxtime - (System.currentTimeMillis() - start)) { - try { - t.join(period); - } catch (final InterruptedException e) { - e.printStackTrace(); - } - } - } finally { - cleanup(t, reporter); - } - } - - protected void cleanup(final Thread t, final ARCReporter reporter) - throws IOException { - if (!t.isAlive()) { - return; - } - reporter.setStatus("Killing indexing thread " + t.getName(), true); - t.interrupt(); - try { - // Give it some time to die. - t.join(1000); - } catch (final InterruptedException e) { - e.printStackTrace(); - } - if (t.isAlive()) { - LOG.info(t.getName() + " will not die"); - } - } - - private class IndexingThread extends Thread { - private final String arcLocation; - private final OutputCollector output; - private final ARCReporter reporter; - - public IndexingThread(final String arcloc, final OutputCollector o, - final ARCReporter r) { - // Name this thread same as ARC location. - super(arcloc); - this.arcLocation = arcloc; - this.output = o; - this.reporter = r; - } - - /** - * @return Null if fails download. - */ - protected ArchiveReader getArchiveReader() { - ArchiveReader arc = null; - // Need a thread that will keep updating TaskTracker during long - // downloads else tasktracker will kill us. - Thread reportingDuringDownload = null; - try { - this.reporter.setStatus("opening " + this.arcLocation, true); - reportingDuringDownload = new Thread("reportDuringDownload") { - public void run() { - while (!this.isInterrupted()) { - try { - synchronized (this) { - sleep(1000 * 60); // Sleep a minute. - } - reporter.setStatus("downloading " + - arcLocation); - } catch (final IOException e) { - e.printStackTrace(); - // No point hanging around if we're failing - // status. - break; - } catch (final InterruptedException e) { - // Interrupt flag is cleared. Just fall out. - break; - } - } - } - }; - reportingDuringDownload.setDaemon(true); - reportingDuringDownload.start(); - arc = ArchiveReaderFactory.get(this.arcLocation); - } catch (final Throwable e) { - try { - final String msg = "Error opening " + this.arcLocation - + ": " + e.toString(); - this.reporter.setStatus(msg, true); - LOG.info(msg); - } catch (final IOException ioe) { - LOG.warn(this.arcLocation, ioe); - } - } finally { - if ((reportingDuringDownload != null) - && reportingDuringDownload.isAlive()) { - reportingDuringDownload.interrupt(); - } - } - return arc; - } - - public void run() { - if (this.arcLocation == null || this.arcLocation.length() <= 0) { - return; - } - ArchiveReader arc = getArchiveReader(); - if (arc == null) { - return; - } - - try { - ARCMapRunner.this.mapper.onARCOpen(); - - // Iterate over each ARCRecord. - for (final Iterator i = arc.iterator(); - i.hasNext() && !currentThread().isInterrupted();) { - final ARCRecord rec = (ARCRecord)i.next(); - - - try { - ARCMapRunner.this.mapper.map( - new Text(rec.getMetaData().getUrl()), - new ObjectWritable(rec), this.output, - this.reporter); - - final long b = rec.getMetaData().getContentBegin(); - final long l = rec.getMetaData().getLength(); - final long recordLength = (l > b)? (l - b): l; - if (recordLength > - ARCConstants.DEFAULT_MAX_ARC_FILE_SIZE) { - // Now, if the content length is larger than a - // standard ARC, then it is most likely the last - // record in the ARC because ARC is closed after we - // exceed 100MB (DEFAULT_MAX_ARC...). Calling - // hasNext above will make us read through the - // whole record, even if its a 1.7G video. On a - // loaded machine, this might cause us timeout with - // tasktracker -- so, just skip out here. - this.reporter.setStatus("skipping " + - this.arcLocation + " -- very long record " + - rec.getMetaData()); - break; - } - } catch (final Throwable e) { - // Failed parse of record. Keep going. - LOG.warn("Error processing " + rec.getMetaData(), e); - } - } - if (currentThread().isInterrupted()) { - LOG.info(currentThread().getName() + " interrupted"); - } - this.reporter.setStatus("closing " + this.arcLocation, true); - } catch (final Throwable e) { - // Problem parsing arc file. - final String msg = "Error parsing " + this.arcLocation; - try { - this.reporter.setStatus(msg, true); - } catch (final IOException ioe) { - ioe.printStackTrace(); - } - LOG.warn(msg, e); - } finally { - try { - arc.close(); - ARCMapRunner.this.mapper.onARCClose(); - } catch (final IOException e) { - e.printStackTrace(); - } - } - } - } - -} \ No newline at end of file Deleted: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCRecordMapper.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCRecordMapper.java 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCRecordMapper.java 2007-02-27 18:01:29 UTC (rev 1521) @@ -1,49 +0,0 @@ -/* - * $Id: ImportArcs.java 1494 2007-02-15 17:47:58Z stack-sf $ - * - * Copyright (C) 2007 Internet Archive. - * - * This file is part of the archive-access tools project - * (http://sourceforge.net/projects/archive-access). - * - * The archive-access tools are free software; you can redistribute them and/or - * modify them under the terms of the GNU Lesser Public License as published by - * the Free Software Foundation; either version 2.1 of the License, or any - * later version. - * - * The archive-access tools are distributed in the hope that they will be - * useful, but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser - * Public License for more details. - * - * You should have received a copy of the GNU Lesser Public License along with - * the archive-access tools; if not, write to the Free Software Foundation, - * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -package org.archive.access.nutch.mapred; - -import java.io.IOException; - -import org.apache.hadoop.mapred.Mapper; -import org.apache.hadoop.mapred.OutputCollector; -import org.apache.hadoop.mapred.Reporter; -import org.archive.io.arc.ARCRecord; - -/** - * Like {@link Mapper} but adds signaling of ARC open and close. - * @author stack - */ -public interface ARCRecordMapper extends Mapper { - /** - * Called after ARC open but before we call - * {@link #map(String, ARCRecord, OutputCollector, Reporter)} - * @throws IOException - */ - public void onARCOpen() throws IOException; - - /** - * Called on ARC close. - * @throws IOException - */ - public void onARCClose() throws IOException; -} Deleted: trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCReporter.java =================================================================== --- trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCReporter.java 2007-02-27 17:46:30 UTC (rev 1520) +++ trunk/archive-access/projects/nutchwax/src/java/org/archive/access/nutch/mapred/ARCReporter.java 2007-02-27 18:01:29 UTC (rev 1521) @@ -1,80 +0,0 @@ -/* - * $Id: ImportArcs.java 1494 2007-02-15 17:47:58Z stack-sf $ - * - * Copyright (C) 2007 Internet Archive. - * - * This file is part of the archive-access tools project - * (http://sourceforge.net/projects/archive-access). - * - * The archive-access tools are free software; you can redistribute them and/or - * modify them under the terms of the GNU Lesser Public License as published by - * the Free Software Foundation; either version 2.1 of the License, or any - * later version. - * - * The archive-access tools are distributed in the hope that they will be - * useful, but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser - * Public License for more details. - * - * You should have received a copy of the GNU Lesser Public License along with - * the archive-access tools; if not, write to the Free Software Foundation, - * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -package org.archive.access.nutch.mapred; - -import java.io.IOException; - -import org.apache.commons.logging.Log; -import org.apache.commons.logging.LogFactory; -import org.apache.hadoop.mapred.Reporter; - -/** - * Reporter that logs all status passed; a combined Reporter and logger. Only - * reports home every so often. - * @author stack - */ -public class ARCReporter implements Reporter { - public final Log LOG = LogFactory.getLog(this.getClass().getName()); - private final Reporter wrappedReporter; - private long nextUpdate = 0; - private long time = System.currentTimeMillis(); - - private static final long FIVE_MINUTES = 1000 * 60 * 5; - - public ARCReporter(final Reporter r) { - this.wrappedReporter = r; - } - - public void setStatus(final String msg) throws IOException { - setStatus(msg, false); - } - - public void setStatus(final String msg, final boolean writeThrough) - throws IOException { - LOG.info(msg); - // Only update tasktracker every second -- not for every record. - long now = System.currentTimeMillis(); - if (writeThrough || now > this.nextUpdate) { - this.wrappedReporter.setStatus(msg); - this.nextUpdate = now + 1000; - this.time = now; - } - } - - /** - * Update reporter if its a long time since last log only. - * @param msg Message to report IF we haven't reported in a long time. - * @throws IOException - */ - public void setStatusIfElapse(final String msg) - throws IOException { - long now = System.currentTimeMillis(); - if ((now - this.time) > FIVE_MINUTES) { - setStatus(msg); - } - } - - public void progress() throws IOException { - this.wrappedReporter.progress(); - } -} \ No newline at end of file This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 20:01:30
|
Revision: 1523 http://archive-access.svn.sourceforge.net/archive-access/?rev=1523&view=rev Author: stack-sf Date: 2007-02-27 12:01:25 -0800 (Tue, 27 Feb 2007) Log Message: ----------- More m2 migration work. * src/site/site.xml Use default skin rather than custom archive one (Lets wait till we want to spend time on styling before we bring in our own skin). Move stuff around in menus so closer to current layout. * pom.xml Add dependencies needed by just-added javadoc report. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/pom.xml trunk/archive-access/projects/nutchwax/src/site/site.xml Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 19:03:59 UTC (rev 1522) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 20:01:25 UTC (rev 1523) @@ -114,6 +114,44 @@ <version>1.11.0-SNAPSHOT</version> </dependency> + <dependency> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + <version>1.0.4</version> + </dependency> + + <dependency> + <groupId>org.apache</groupId> + <artifactId>hadoop</artifactId> + <version>0.10.1-core</version> + </dependency> + + <dependency> + <groupId>org.apache</groupId> + <artifactId>nutch</artifactId> + <version>0.9-dev-508238</version> + </dependency> + + <dependency> + <groupId>servletapi</groupId> + <artifactId>servletapi</artifactId> + <version>2.4</version> + </dependency> + + <dependency> + <groupId>org.archive</groupId> + <artifactId>archive-commons</artifactId> + <!--SNAPSHOT means use latest. + When archive-commons is deployed to the local repository, use: + $ JAVA_HOME=/usr/lib/j2sdk1.5-sun/ bash /0/builds/bin/maven-2.0.5/bin/mvn deploy:deploy-file \ + -Dfile=/tmp/archive-commons-1.11.0-SNAPSHOT.jar -Durl=file:/0/maven2-repository/ \ + -DgroupId=org.archive -DartifactId=archive-commons -Dpackaging=jar -Dversion=1.11.0-SNAPSHOT + --> + <version>1.11.0-SNAPSHOT</version> + </dependency> + + + </dependencies> <build> <plugins> @@ -189,6 +227,15 @@ </build> <reporting> <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-javadoc-plugin</artifactId> + <configuration> + <javadocDirectory> + ${basedir}/src/java + </javadocDirectory> + </configuration> + </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-project-info-reports-plugin</artifactId> Modified: trunk/archive-access/projects/nutchwax/src/site/site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 19:03:59 UTC (rev 1522) +++ trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 20:01:25 UTC (rev 1523) @@ -1,10 +1,9 @@ <?xml version="1.0" encoding="ISO-8859-1"?> <project name="nutchwax"> +<!--Have nothing showing on LHS--> <bannerLeft> - <name>Internet Archive</name> - <src>images/ia_logo.gif</src> - <href>http://www.archive.org/</href> + <name /> </bannerLeft> <bannerRight> @@ -13,12 +12,6 @@ <href >http://archive-access.sf.net/projects/nutchwax/</href> </bannerRight> -<skin> - <groupId>org.archive</groupId> - <artifactId>maven-skin</artifactId> - <version>1.0-SNAPSHOT</version> -</skin> - <publishDate format="dd MMM yyyy" /> <body> @@ -26,20 +19,26 @@ <links> <item name="Sourceforge" href="http://archive-access.sf.net"/> <item name="Heritrix" href="http://crawler.archive.org"/> + <item name="Home" href="index.html"/> </links> <menu name="NutchWAX"> - <item name="Downloads" href="downloads.html"/> + <item name="Home" href="index.html"/> + <item name="Downloads" href="downloads.html" /> <item name="Getting Started" href="apidocs/overview-summary.html#toc"/> - <item name="User Query-time Help" href="help-queries.html"/> - <item name="FAQ" href="faq.html"/> <item name="Building from Source" href="apidocs/overview-summary.html#src"/> + <item name="User Query-time Help" href="help-queries.html"/> <item name="Regression Test Suite" href="regress.html"/> - <item name="Praxis" href="practices.html"/> <item name="Wayback-NutchWAX" href="wayback.html"/> + <item name="Praxis" href="practices.html"/> + <item name="FAQ" href="faq.html"/> </menu> - ${reports} + <!--Its not possible to change the labels used in reports, not yet anyways. + Current ones don't jibe well. Reports are headed 'Project Documentation' + but its only a subset of all documentation. + --> + ${reports} </body> </project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 23:42:53
|
Revision: 1524 http://archive-access.svn.sourceforge.net/archive-access/?rev=1524&view=rev Author: stack-sf Date: 2007-02-27 15:42:50 -0800 (Tue, 27 Feb 2007) Log Message: ----------- More m2 building. * src/site/site.xml Comment. * pom.xml Pretty-print. Found docbook plugin. Mapped old config. to new. Made it run as part of site phase. Requires new repository though had to installl plugin into our repo because it can't be found at its home location by mvn. Added in more dependencies to make javadoc build complete. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/pom.xml trunk/archive-access/projects/nutchwax/src/site/site.xml Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 20:01:25 UTC (rev 1523) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 23:42:50 UTC (rev 1524) @@ -12,16 +12,12 @@ http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade http://maven.apache.org/guides/mini/guide-m1-m2.html --> -<project xmlns="http://maven.apache.org/POM/4.0.0" - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> - +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.archive.access</groupId> <artifactId>nutchwax</artifactId> - <version>0.11-0-SNAPSHOT</version> + <version>0.11-0-SNAPSHOT</version> <packaging>pom</packaging> - <name>NutchWAX</name> <description>NutchWAX is <i> @@ -37,20 +33,17 @@ </description> <url>http://archive-access.sourceforge.net/projects/nutchwax/</url> <inceptionYear>2005</inceptionYear> - - <licenses> - <license> - <name>GNU LESSER GENERAL PUBLIC LICENSE</name> - <url>http://www.gnu.org/licenses/lgpl.txt</url> - <distribution>repo</distribution> - </license> - </licenses> - + <licenses> + <license> + <name>GNU LESSER GENERAL PUBLIC LICENSE</name> + <url>http://www.gnu.org/licenses/lgpl.txt</url> + <distribution>repo</distribution> + </license> + </licenses> <organization> <name>Internet Archive</name> <url>http://www.archive.org/</url> </organization> - <issueManagement> <system>SourceForge</system> <url>http://sourceforge.net/tracker/?group_id=118427</url> @@ -59,111 +52,162 @@ <system>cruisecontrol</system> <url>http://builds.archive.org:8080/cruisecontrol/</url> </ciManagement> - <mailingLists> - <mailingList> - <name>Archive Access ARC Tools Discussion List</name> - <subscribe> + <mailingLists> + <mailingList> + <name>Archive Access ARC Tools Discussion List</name> + <subscribe> http://lists.sourceforge.net/lists/listinfo/archive-access-discuss </subscribe> - <unsubscribe> + <unsubscribe> http://lists.sourceforge.net/lists/listinfo/archive-access-discuss </unsubscribe> - <post>archive-access-discuss</post> - <archive> + <post>archive-access-discuss</post> + <archive> http://sourceforge.net/mailarchive/forum.php?forum_id=45842 </archive> - </mailingList> - <mailingList> - <name>Archive Access ARC Tools Commits</name> - <subscribe> + </mailingList> + <mailingList> + <name>Archive Access ARC Tools Commits</name> + <subscribe> https://lists.sourceforge.net/lists/listinfo/archive-access-cvs </subscribe> - <unsubscribe> + <unsubscribe> https://lists.sourceforge.net/lists/listinfo/archive-access-cvs </unsubscribe> - <post>archive-access-cvs</post> - <archive> + <post>archive-access-cvs</post> + <archive> http://sourceforge.net/mailarchive/forum.php?forum=archive-access-cvs </archive> - </mailingList> - </mailingLists> - - <scm> - <connection>scm:svn:https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</connection> - <tag>HEAD</tag> - <url>https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</url> - </scm> - - <prerequisites> - <maven>2.0.5</maven> - </prerequisites> - - + </mailingList> + </mailingLists> + <scm> + <connection>scm:svn:https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</connection> + <tag>HEAD</tag> + <url>https://archive-access.svn.sourceforge.net/svnroot/archive-access/projects/nutchwax</url> + </scm> + <prerequisites> + <maven>2.0.5</maven> + </prerequisites> <dependencies> - <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> +<!--Needed because we have test code under src/java. <scope>test</scope> + --> </dependency> - - <dependency> - <groupId>org.archive</groupId> - <artifactId>archive-commons</artifactId> - <version>1.11.0-SNAPSHOT</version> - </dependency> - - <dependency> - <groupId>commons-logging</groupId> - <artifactId>commons-logging</artifactId> - <version>1.0.4</version> - </dependency> - - <dependency> - <groupId>org.apache</groupId> - <artifactId>hadoop</artifactId> - <version>0.10.1-core</version> - </dependency> - - <dependency> - <groupId>org.apache</groupId> - <artifactId>nutch</artifactId> - <version>0.9-dev-508238</version> - </dependency> - - <dependency> - <groupId>servletapi</groupId> - <artifactId>servletapi</artifactId> - <version>2.4</version> - </dependency> - - <dependency> - <groupId>org.archive</groupId> - <artifactId>archive-commons</artifactId> - <!--SNAPSHOT means use latest. + <dependency> + <groupId>org.archive</groupId> + <artifactId>archive-commons</artifactId> + <version>1.11.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>org.archive</groupId> + <artifactId>wayback</artifactId> + <version>0.9.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + <version>1.0.4</version> + </dependency> + <dependency> + <groupId>commons-httpclient</groupId> + <artifactId>commons-httpclient</artifactId> + <version>3.0.1</version> + </dependency> + <dependency> + <groupId>commons-cli</groupId> + <artifactId>commons-cli</artifactId> + <version>1.0-beta-2</version> + </dependency> + <dependency> + <groupId>org.apache</groupId> + <artifactId>hadoop</artifactId> + <version>0.10.1-core</version> + </dependency> + <dependency> + <groupId>org.apache</groupId> + <artifactId>nutch</artifactId> + <version>0.9-dev-508238</version> + </dependency> + <dependency> + <groupId>javax.servlet</groupId> + <artifactId>servlet-api</artifactId> + <version>2.4</version> + </dependency> + <dependency> + <groupId>org.archive</groupId> + <artifactId>archive-commons</artifactId> +<!--SNAPSHOT means use latest. When archive-commons is deployed to the local repository, use: $ JAVA_HOME=/usr/lib/j2sdk1.5-sun/ bash /0/builds/bin/maven-2.0.5/bin/mvn deploy:deploy-file \ -Dfile=/tmp/archive-commons-1.11.0-SNAPSHOT.jar -Durl=file:/0/maven2-repository/ \ -DgroupId=org.archive -DartifactId=archive-commons -Dpackaging=jar -Dversion=1.11.0-SNAPSHOT --> - <version>1.11.0-SNAPSHOT</version> - </dependency> - - - + <version>1.11.0-SNAPSHOT</version> + </dependency> + <dependency> + <groupId>org.archive</groupId> + <artifactId>archive-mapred</artifactId> + <version>0.1.0-SNAPSHOT</version> + </dependency> </dependencies> <build> <plugins> <plugin> +<!--See here for doc: http://agilejava.com/blog/?p=51 + + Plugin is in our maven repository because can't be + found by mvn given how its currently deployed + in its home repo. + --> + <groupId>com.agilejava.docbkx</groupId> + <artifactId>docbkx-maven-plugin</artifactId> + <version>2.0.3-SNAPSHOT</version> + <executions> + <execution> + <goals> + <goal>generate-html</goal> + </goals> + <phase>site</phase> + </execution> + </executions> + <dependencies> + <dependency> + <groupId> + org.docbook + </groupId> + <artifactId>docbook-xml</artifactId> + <version>4.4</version> + <scope>runtime</scope> + </dependency> + </dependencies> + <configuration> + <includes>**/*.xml</includes> + <sourceDirectory> ${basedir}/src/articles </sourceDirectory> + <targetDirectory> ${project.reporting.outputDirectory}/articles </targetDirectory> + <generateIdAttributes>1</generateIdAttributes> + <sectionAutolabel>1</sectionAutolabel> + <partAutolabel>1</partAutolabel> + <chapterAutolabel>1</chapterAutolabel> + <generateMetaAbstract>1</generateMetaAbstract> + <htmlStylesheet>docbook.css</htmlStylesheet> + <cssDecoration>1</cssDecoration> + <postProcess> + <copy file="src/articles/docbook.css" overwrite="true" todir="${project.reporting.outputDirectory}/articles"/> + </postProcess> + </configuration> + </plugin> + <plugin> <artifactId>maven-site-plugin</artifactId> - <configuration > - <xdocDirectory> + <configuration> + <xdocDirectory> ${basedir}/xdocs </xdocDirectory> - </configuration > + </configuration> </plugin> - <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> @@ -176,11 +220,10 @@ </descriptors> </configuration> </plugin> - <plugin> <artifactId>maven-antrun-plugin</artifactId> <executions> - <execution > + <execution> <id>antrun.compile</id> <phase>compile</phase> <configuration> @@ -193,7 +236,7 @@ <goal>run</goal> </goals> </execution> - <execution > + <execution> <id>antrun.package</id> <phase>package</phase> <configuration> @@ -208,7 +251,7 @@ <goal>run</goal> </goals> </execution> - <execution > + <execution> <id>antrun.clean</id> <phase>clean</phase> <configuration> @@ -221,42 +264,54 @@ <goal>run</goal> </goals> </execution> + <execution> + <id>antrun.docbook</id> + <phase> + docbook + </phase> + <configuration> + <tasks> + <echo>docbook</echo> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> </executions> </plugin> </plugins> </build> - <reporting> - <plugins> + <reporting> + <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-javadoc-plugin</artifactId> <configuration> - <javadocDirectory> + <javadocDirectory> ${basedir}/src/java </javadocDirectory> </configuration> </plugin> - <plugin> - <groupId>org.apache.maven.plugins</groupId> - <artifactId>maven-project-info-reports-plugin</artifactId> - <reportSets> - <reportSet> - <reports> - <report>dependencies</report> - <report>project-team</report> - <report>mailing-list</report> - <report>cim</report> - <report>issue-tracking</report> - <report>license</report> - <report>scm</report> - <report>javadoc</report> - </reports> - </reportSet> - </reportSets> - </plugin> - </plugins> - </reporting> - + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-project-info-reports-plugin</artifactId> + <reportSets> + <reportSet> + <reports> + <report>dependencies</report> + <report>project-team</report> + <report>mailing-list</report> + <report>cim</report> + <report>issue-tracking</report> + <report>license</report> + <report>scm</report> + </reports> + </reportSet> + </reportSets> + </plugin> + </plugins> + </reporting> <repositories> <repository> <releases> @@ -271,9 +326,26 @@ </snapshots> <id>internetarchive</id> <name>Internet Archive Maven Repository</name> - <url>http://builds.archive.org:8080/maven2</url> + <url>http://builds.archive.org:8080/maven2/</url> <layout>default</layout> </repository> +<!--Needed for docbkx plugin dependencies. + --> + <repository> + <releases> + <enabled>true</enabled> + <updatePolicy>always</updatePolicy> + <checksumPolicy>warn</checksumPolicy> + </releases> + <snapshots> + <enabled>true</enabled> + <updatePolicy>never</updatePolicy> + <checksumPolicy>fail</checksumPolicy> + </snapshots> + <id>agilejava</id> + <name>Repository with docbook plugin.</name> + <url>http://agilejava.com/maven/</url> + <layout>default</layout> + </repository> </repositories> - </project> Modified: trunk/archive-access/projects/nutchwax/src/site/site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 20:01:25 UTC (rev 1523) +++ trunk/archive-access/projects/nutchwax/src/site/site.xml 2007-02-27 23:42:50 UTC (rev 1524) @@ -40,5 +40,8 @@ --> ${reports} + <!--I want to get the sourceforge image in here but not sure how. + --> + </body> </project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-02-27 23:59:27
|
Revision: 1525 http://archive-access.svn.sourceforge.net/archive-access/?rev=1525&view=rev Author: stack-sf Date: 2007-02-27 15:59:28 -0800 (Tue, 27 Feb 2007) Log Message: ----------- * pom.xml Do explicit build of plugins in here. * build.xml Don't depend on compile and plugins. Was double-running compile. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/pom.xml Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-02-27 23:42:50 UTC (rev 1524) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-02-27 23:59:28 UTC (rev 1525) @@ -69,15 +69,13 @@ <property name="build.compiler" value="extJavac" /> </ant> </target> - <target name="third.party.jar" description="Build third-party jars" - depends="third.party.compile,third.party.plugins"> + <target name="third.party.jar" description="Build third-party jars" > <echo message="Building nutch third-party dependency (jar)" /> <ant dir="third-party/nutch" target="jar" inheritAll="false" > <property name="build.compiler" value="extJavac" /> </ant> </target> - <target name="third.party.war" description="Build third-party wars" - depends="third.party.plugins"> + <target name="third.party.war" description="Build third-party wars" > <echo message="Building nutch third-party dependency (war)" /> <ant dir="third-party/nutch" target="war" inheritAll="false" > <property name="build.compiler" value="extJavac" /> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 23:42:50 UTC (rev 1524) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-02-27 23:59:28 UTC (rev 1525) @@ -228,8 +228,10 @@ <phase>compile</phase> <configuration> <tasks> + <!-- Make these conditional so do not run everytime--> <echo>Compiling third.party dependencies and nutchwax</echo> <ant target="third.party.jar"/> + <ant target="third.party.plugins"/> </tasks> </configuration> <goals> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-16 00:18:48
|
Revision: 1584 http://archive-access.svn.sourceforge.net/archive-access/?rev=1584&view=rev Author: stack-sf Date: 2007-03-15 17:18:49 -0700 (Thu, 15 Mar 2007) Log Message: ----------- M nutchwax/pom.xml Add a repository to deploy to. M nutchwax/build.xml Add a version string. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/pom.xml Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-03-15 19:10:17 UTC (rev 1583) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-03-16 00:18:49 UTC (rev 1584) @@ -7,6 +7,8 @@ <!--'nutch.root' is pointer at core nutch. Expect to find it in '${basedir}/third-party' named 'nutch'. --> + <!--Keep this aligned with whats in maven2 pom--> + <property name="nutchwax.version" value="-0.11.0-SNAPSHOT"/> <property name="nutch.root" location="${root}/third-party/nutch"/> <property file="${user.home}/.$(name}.build.properties" /> @@ -133,7 +135,7 @@ <!-- ================================================================== --> <target name="jar" depends="compile, compile-plugins" description="Builds nutchwax jobs jar of all tasks to do import, etc." > - <zip destfile="${build.dir}/${name}.jar"> + <zip destfile="${build.dir}/${name}${nutchwax.version}.jar"> <zipfileset prefix="META-INF" file="${conf.dir}/MANIFEST.MF"/> <zipfileset file="${conf.dir}/log4j.properties"/> <zipfileset file="${conf.dir}/wax-parse-plugins.xml"/> @@ -255,7 +257,7 @@ <!--Copy our nutchwax nutch-site.xml template into the build dir as nutch-site.xml. Then in the below, add it into the WEB-INF/classes dir. --> - <war destfile="${build.dir}/${name}.war" webxml="${this.web}/web.xml"> + <war destfile="${build.dir}/${name}${nutchwax.version}.war" webxml="${this.web}/web.xml"> <fileset dir="${nutch.web}/jsp"> <exclude name="**/search.jsp"/> <exclude name="**/web.xml"/> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-03-15 19:10:17 UTC (rev 1583) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-03-16 00:18:49 UTC (rev 1584) @@ -383,6 +383,12 @@ </pluginRepositories> <distributionManagement> + <repository> + <id>repository</id> + <name>Repository</name> + <!--Pass as command-line system property to maven--> + <url>${repository.url}</url> + </repository> <site> <id>website</id> <name>Website</name> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-16 06:12:51
|
Revision: 1585 http://archive-access.svn.sourceforge.net/archive-access/?rev=1585&view=rev Author: stack-sf Date: 2007-03-15 23:12:50 -0700 (Thu, 15 Mar 2007) Log Message: ----------- * src/main/assembly/src-distribution.xml * src/main/assembly/bin-distribution.xml Removed. * src/main/assembly/distribution.xml Added in place of above two assemblies. * nutchwax-war/src/main/assembly/placeholder.xml * nutchwax-job/src/main/assembly/placeholder.xml A do near-nothing assembly whose product gets overwritten (see comment at head of nutchwax-job pom.xml). * nutchwax-war/pom.xml Added. * pom.xml Added two modules, one for job jar and one for war. Added a dependency management. * build.xml Renamed the outputs to nutchwax-job.jar from nutchwax.jar and nutchwax-webapp.war from nutchwax.war. The old names were better but m2 won't let me have names like this in repository (All artifactids must be unique). * nutchwax-job/pom.xml Note on my hack to get nutchwax job jar and war installed into repository. * nutchwax/nutchwax-war * nutchwax/nutchwax-job Added. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/pom.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-job/ trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-job/src/ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/placeholder.xml trunk/archive-access/projects/nutchwax/nutchwax-war/ trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-war/src/ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml Removed Paths: ------------- trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-03-16 00:18:49 UTC (rev 1584) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -135,7 +135,7 @@ <!-- ================================================================== --> <target name="jar" depends="compile, compile-plugins" description="Builds nutchwax jobs jar of all tasks to do import, etc." > - <zip destfile="${build.dir}/${name}${nutchwax.version}.jar"> + <zip destfile="${build.dir}/${name}-job${nutchwax.version}.jar"> <zipfileset prefix="META-INF" file="${conf.dir}/MANIFEST.MF"/> <zipfileset file="${conf.dir}/log4j.properties"/> <zipfileset file="${conf.dir}/wax-parse-plugins.xml"/> @@ -257,7 +257,7 @@ <!--Copy our nutchwax nutch-site.xml template into the build dir as nutch-site.xml. Then in the below, add it into the WEB-INF/classes dir. --> - <war destfile="${build.dir}/${name}${nutchwax.version}.war" webxml="${this.web}/web.xml"> + <war destfile="${build.dir}/${name}-webapp${nutchwax.version}.war" webxml="${this.web}/web.xml"> <fileset dir="${nutch.web}/jsp"> <exclude name="**/search.jsp"/> <exclude name="**/web.xml"/> Added: trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -0,0 +1,150 @@ +<?xml version="1.0"?> +<!-- + Warning!!!! HACK Alert!!!! + + I want m2 to use the product of the old ant builds and + have m2 install the ant built nutchwax jar and war into the + maven2 repository. I do not want to rewrite and duplicate + the ant build.xml in m2 especially as its awkward with its + first needing to build nutch first. + + Maven has only a fixed set of target packages: war, jar, etc., + and a pom can only build one such target package. Its the product + of these targets that gets installed into the repository. There is + no way to have m2 substitute something other than the result of the + running of the war, jar, etc. packaging that I can see. But there + is a means of 'attach'ing something so it will get installed alongside + the pom target. + + So, to get our ant product installed into the repository, the + following was done. + + We set the packaging target for this pom to be 'pom' ('pom' is a + special target used as parent of submodules; if no submodules + 'nothing' is build but a pom file. Also, the pom target package + must be different from any assembly used or trouble). We then + used the maven-assembly-plugin and 'attach'ed an assembly to the + 'package' phase. Unfortunately, You cannot have two assemblies + in the one pom, one for assembly time and another 'attached' to + the package goal, at least not with each mention referring to + different assemblies (i.e. one building the distribution, the + other the jar or war). So, I had to make submodules of nutchwax + to hold the 'attach' assemblies, one for this job jar and another + for the war file. In each submodule, we attach the build of an + empty jar (or war over in the nutchwax-war submodule) to the + package goal below. This builds a jar named + nutchwax-job-VERSION.jar. We have also attached to the package + goal AFTER the assembly, the copying from the parent project of the + nutchwax-job-VERSION.jar down to overwrite what was written by the + just-previous 'attach' assembly. + + The overwrite makes it so the jar built by the parent gets installed + into the repository and not the one built by the assembly step. + + Ugly but it works. + + I tried making this submodule build with ant scripts but maven calls + the parent with an amended basedir, one that points into parent dir. + This makes it awkward; have to have a build.xml in both locations with + same targets. + + POM reference: http://maven.apache.org/pom.html + + List of the better articles on maven: + + http://www.javaworld.com/javaworld/jw-05-2006/jw-0529-maven.html + http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-maven_p.html + + URLs on converting from 1.0 to 2.0 maven (not much good generally): + + http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade + http://maven.apache.org/guides/mini/guide-m1-m2.html + --> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> +<parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> +</parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-job</artifactId> + <packaging>pom</packaging> + <name>NutchWAX Job Jar</name> + <build> + <plugins> + <plugin> + <!-- NOTE: We don't need a groupId specification because the group is + org.apache.maven.plugins ...which is assumed by default. + --> + <artifactId>maven-assembly-plugin</artifactId> + <configuration> + <descriptors> + <descriptor> + src/main/assembly/placeholder.xml + </descriptor> + </descriptors> + <appendAssemblyId> + false + </appendAssemblyId> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>attached</goal> + </goals> + </execution> + </executions> + </plugin> + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution> + <id>antrun.compile</id> + <phase>compile</phase> + <configuration> + <tasks> + <!-- Make these conditional so do not run everytime--> + <echo>Compiling third.party dependencies and nutchwax</echo> + <ant dir=".." target="third.party.jar"/> + <ant dir=".." target="third.party.plugins"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution> + <id>antrun.package</id> + <phase>package</phase> + <configuration> + <tasks> + <echo>Assembling Job JAR</echo> + <ant dir=".." target="compile"/> + <ant dir=".." target="jar"/> + <copy file="../target/nutchwax-job-${project.version}.jar" overwrite="true" + verbose="true" tofile="target/${project.artifactId}-${project.version}.jar" /> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution> + <id>antrun.clean</id> + <phase>clean</phase> + <configuration> + <tasks> + <ant dir=".." target="clean-all"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> Added: trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/placeholder.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/placeholder.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/placeholder.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -0,0 +1,6 @@ +<assembly> + <id>placeholder</id> + <formats> + <format>jar</format> + </formats> +</assembly> Added: trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -0,0 +1,84 @@ +<?xml version="1.0"?> +<!-- + POM reference: http://maven.apache.org/pom.html + + List of the better articles on maven: + + http://www.javaworld.com/javaworld/jw-05-2006/jw-0529-maven.html + http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-maven_p.html + + URLs on converting from 1.0 to 2.0 maven (not much good generally): + + http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade + http://maven.apache.org/guides/mini/guide-m1-m2.html + --> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 + http://maven.apache.org/maven-v4_0_0.xsd"> + <parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-war</artifactId> + <packaging>pom</packaging> + <name>NutchWAX Webapp</name> + <build> + <plugins> + <plugin> + <!-- NOTE: We don't need a groupId specification because the group is + org.apache.maven.plugins ...which is assumed by default. + --> + <artifactId>maven-assembly-plugin</artifactId> + <configuration> + <descriptors> + <descriptor> + src/main/assembly/placeholder.xml + </descriptor> + </descriptors> + <appendAssemblyId> + false + </appendAssemblyId> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>attached</goal> + </goals> + </execution> + </executions> + </plugin> + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution> + <id>antrun.package</id> + <phase>package</phase> + <configuration> + <tasks> + <echo>Assembling Job JAR</echo> + <ant dir=".." target="war"/> + <copy file="../target/nutchwax-webapp-${project.version}.war" overwrite="true" + verbose="true" tofile="target/${project.artifactId}-${project.version}.war" /> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-job</artifactId> + <scope>compile</scope> + </dependency> + </dependencies> +</project> Added: trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -0,0 +1,6 @@ +<assembly> + <id>placeholder</id> + <formats> + <format>war</format> + </formats> +</assembly> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-03-16 00:18:49 UTC (rev 1584) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -254,59 +254,10 @@ <filter>src/main/filters/filter.properties</filter> </filters> <descriptors> - <descriptor>src/main/assembly/bin-distribution.xml</descriptor> - <descriptor>src/main/assembly/src-distribution.xml</descriptor> + <descriptor>src/main/assembly/distribution.xml</descriptor> </descriptors> </configuration> </plugin> - <plugin> - <artifactId>maven-antrun-plugin</artifactId> - <executions> - <execution> - <id>antrun.compile</id> - <phase>compile</phase> - <configuration> - <tasks> - <!-- Make these conditional so do not run everytime--> - <echo>Compiling third.party dependencies and nutchwax</echo> - <ant target="third.party.jar"/> - <ant target="third.party.plugins"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - <execution> - <id>antrun.package</id> - <phase>package</phase> - <configuration> - <tasks> - <echo>Assembling JAR and WAR targets</echo> - <ant target="compile"/> - <ant target="jar"/> - <ant target="war"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - <execution> - <id>antrun.clean</id> - <phase>clean</phase> - <configuration> - <tasks> - <echo>Cleaning nutchwax</echo> - <ant target="clean-all"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - </executions> - </plugin> </plugins> </build> <reporting> @@ -396,4 +347,27 @@ <url>${website.url}/projects/${artifactId}</url> </site> </distributionManagement> + <!--Dependeny management is not same as dependencies (ugh)--> + <dependencyManagement> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-job</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-war</artifactId> + <version>${project.version}</version> + </dependency> + </dependencies> + </dependencyManagement> + <modules> + <module> + nutchwax-job + </module> + <module> + nutchwax-war + </module> + </modules> </project> Deleted: trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml 2007-03-16 00:18:49 UTC (rev 1584) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/bin-distribution.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -1,29 +0,0 @@ -<assembly> - <id>bin</id> - <formats> - <format>tar.gz</format> - <format>zip</format> - </formats> - <fileSets> - <fileSet> - <includes> - <include>*.txt</include> - </includes> - </fileSet> - <fileSet> - <directory>bin</directory> - <fileMode>0744</fileMode> - </fileSet> - <fileSet> - <directory>target</directory> - <outputDirectory /> - <includes> - <include>nutchwax*.jar</include> - <include>nutchwax*.war</include> - </includes> - </fileSet> - <fileSet> - <directory>target/docs</directory> - </fileSet> - </fileSets> -</assembly> Added: trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -0,0 +1,28 @@ +<assembly> + <id>distribution</id> + <formats> + <format>tar.gz</format> + </formats> + <fileSets> + <fileSet> + <includes> + <include>*.txt</include> + </includes> + </fileSet> + <fileSet> + <directory>bin</directory> + <fileMode>0744</fileMode> + </fileSet> + <fileSet> + <directory>target</directory> + <outputDirectory /> + <includes> + <include>nutchwax*.jar</include> + <include>nutchwax*.war</include> + </includes> + </fileSet> + <fileSet> + <directory>target/docs</directory> + </fileSet> + </fileSets> +</assembly> Deleted: trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml 2007-03-16 00:18:49 UTC (rev 1584) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/src-distribution.xml 2007-03-16 06:12:50 UTC (rev 1585) @@ -1,38 +0,0 @@ -<assembly> - <id>src</id> - <formats> - <format>tar.gz</format> - <format>zip</format> - </formats> - <fileSets> - <fileSet> - <includes> - <include>*.txt</include> - <include>pom.xml</include> - <include>build.xml</include> - </includes> - </fileSet> - <fileSet> - <directory>bin</directory> - <includes> - <include>**/**</include> - </includes> - <fileMode>0744</fileMode> - </fileSet> - <fileSet> - <directory>src</directory> - </fileSet> - <fileSet> - <directory>conf</directory> - </fileSet> - <fileSet> - <directory>xdocs</directory> - </fileSet> - <fileSet> - <directory>third-party</directory> - <excludes> - <exclude>**/build/**</exclude> - </excludes> - </fileSet> - </fileSets> -</assembly> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-18 12:25:40
|
Revision: 1589 http://archive-access.svn.sourceforge.net/archive-access/?rev=1589&view=rev Author: stack-sf Date: 2007-03-16 17:34:39 -0700 (Fri, 16 Mar 2007) Log Message: ----------- * .project Trying mavenized .project (Using?\226?\128?\137http://m2eclipse.codehaus.org/) * pom.xml m2eclipse rewrote my pom. Try it. * .classpath Update so has all nutch src too. * lib/commons-codec-1.3.jar * lib/dsi.unimi.it-1.2.0.jar * lib/archive-mapred-0.1.0-20070227.175246-2.jar * lib/libidn-0.5.9.jar * lib/commons-httpclient-3.0-rc3.jar * lib/jets3t-0.5.0.jar * lib/wayback-0.9.0-200702150450.jar * lib/archive-commons-1.11.0-200702160009.jar Try getting from m2 repo instead. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/.classpath trunk/archive-access/projects/nutchwax/.project Removed Paths: ------------- trunk/archive-access/projects/nutchwax/lib/ Modified: trunk/archive-access/projects/nutchwax/.classpath =================================================================== --- trunk/archive-access/projects/nutchwax/.classpath 2007-03-17 00:20:22 UTC (rev 1588) +++ trunk/archive-access/projects/nutchwax/.classpath 2007-03-17 00:34:39 UTC (rev 1589) @@ -1,22 +1,272 @@ <?xml version="1.0" encoding="UTF-8"?> <classpath> <classpathentry kind="src" path="src/java"/> - <classpathentry kind="var" path="JRE_LIB" sourcepath="JRE_SRC"/> - <classpathentry kind="lib" path="lib/commons-codec-1.3.jar"/> - <classpathentry kind="lib" path="lib/commons-httpclient-3.0-rc3.jar"/> - <classpathentry kind="lib" path="lib/dsi.unimi.it-1.2.0.jar"/> - <classpathentry kind="lib" path="lib/jets3t-0.5.0.jar"/> - <classpathentry kind="lib" path="lib/libidn-0.5.9.jar"/> - <classpathentry kind="lib" path="conf"/> - <classpathentry combineaccessrules="false" kind="src" path="/heritrix"/> - <classpathentry combineaccessrules="false" kind="src" path="/nutch"/> - <classpathentry kind="lib" path="/nutch/lib/servlet-api.jar"/> - <classpathentry kind="lib" path="/nutch/lib/commons-logging-1.0.4.jar"/> - <classpathentry kind="lib" path="/nutch/lib/junit-3.8.1.jar"/> - <classpathentry kind="lib" path="/nutch/conf"/> - <classpathentry kind="lib" path="lib/wayback-0.9.0-200702150450.jar"/> - <classpathentry kind="lib" path="third-party/nutch/build"/> - <classpathentry combineaccessrules="false" kind="src" path="/hadoop"/> - <classpathentry kind="lib" path="lib/archive-mapred-0.1.0-20070227.175246-2.jar"/> + <classpathentry kind="src" path="src/plugin/index-wax/src/java"/> + <classpathentry kind="src" path="src/plugin/parse-default/src/java"/> + <classpathentry kind="src" path="src/plugin/parse-waxext/src/java"/> + <classpathentry kind="src" path="src/plugin/query-anchor/src/java"/> + <classpathentry kind="src" path="src/plugin/query-content/src/java"/> + <classpathentry kind="src" path="src/plugin/query-host/src/java"/> + <classpathentry kind="src" path="src/plugin/query-title/src/java"/> + <classpathentry kind="src" path="src/plugin/query-wax/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/analysis-de/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/analysis-fr/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/clustering-carrot2/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/clustering-carrot2/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/creativecommons/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/creativecommons/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/index-basic/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/index-more/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/languageidentifier/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/languageidentifier/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-http/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-http/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-parsems/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-regex-filter/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-regex-filter/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/microformats-reltag/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/ontology/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/ontology/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-ext/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-ext/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-html/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-html/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-js/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msexcel/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msexcel/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-mspowerpoint/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-mspowerpoint/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msword/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msword/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-oo/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-oo/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-pdf/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-pdf/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-rss/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-rss/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-swf/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-swf/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-text/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-zip/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-zip/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-file/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-ftp/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-http/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-httpclient/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/query-basic/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/query-more/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/query-site/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/query-url/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/query-url/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/scoring-opic/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/subcollection/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/subcollection/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/summary-basic/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/summary-lucene/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-automaton/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-automaton/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-prefix/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-regex/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-regex/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-suffix/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-suffix/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-basic/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-basic/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-pass/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-pass/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-regex/src/java"/> + <classpathentry kind="src" path="third-party/nutch/src/plugin/urlnormalizer-regex/src/test"/> + <classpathentry kind="src" path="third-party/nutch/src/test"/> + <classpathentry kind="lib" path="third-party/nutch/build/clustering-carrot2/clustering-carrot2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/creativecommons/creativecommons.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/index-basic/index-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/index-more/index-more.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/language-identifier/language-identifier.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-http/lib-http.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-jakarta-poi/poi-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-jakarta-poi/poi-scratchpad-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-log4j/log4j-1.2.11.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-lucene-analyzers/lucene-analyzers-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-nekohtml/nekohtml-0.9.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-parsems/lib-parsems.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-regex-filter/lib-regex-filter.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-xml/jaxen-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-xml/jaxen-jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-xml/jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-xml/saxpath.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/lib-xml/xercesImpl.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/microformats-reltag/microformats-reltag.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/nutch-0.9-dev.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/nutch-extensionpoints/nutch-extensionpoints.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/ontology/ontology.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-ext/parse-ext.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-html/parse-html.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-js/parse-js.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-msexcel/parse-msexcel.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-mspowerpoint/parse-mspowerpoint.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-msword/parse-msword.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-oo/parse-oo.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-pdf/parse-pdf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-rss/parse-rss.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-swf/parse-swf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-text/parse-text.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/parse-zip/parse-zip.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/carrot2-filter-lingo.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/carrot2-local-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/carrot2-snowball-stemmers.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/carrot2-util-common.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/carrot2-util-tokenizer.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/clustering-carrot2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/commons-collections-3.1-patched.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/commons-pool-1.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/Jama-1.0.1-patched.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/clustering-carrot2/violinstrings-1.0.2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/creativecommons/creativecommons.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/index-basic/index-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/index-more/index-more.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/language-identifier/language-identifier.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-http/lib-http.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-jakarta-poi/poi-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-jakarta-poi/poi-scratchpad-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-log4j/log4j-1.2.11.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-lucene-analyzers/lucene-analyzers-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-nekohtml/nekohtml-0.9.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-parsems/lib-parsems.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-regex-filter/lib-regex-filter.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-xml/jaxen-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-xml/jaxen-jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-xml/jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-xml/saxpath.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/lib-xml/xercesImpl.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/microformats-reltag/microformats-reltag.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/nutch-extensionpoints/nutch-extensionpoints.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/ontology/commons-logging-1.0.3.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/ontology/icu4j_2_6_1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/ontology/jena-2.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/ontology/ontology.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-ext/parse-ext.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-html/parse-html.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-html/tagsoup-1.0rc3.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-js/parse-js.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-msexcel/parse-msexcel.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-mspowerpoint/parse-mspowerpoint.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-msword/parse-msword.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-oo/parse-oo.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-pdf/parse-pdf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-pdf/PDFBox-0.7.2-log4j.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-rss/commons-feedparser-0.6-fork.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-rss/parse-rss.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-rss/xmlrpc-1.2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-swf/javaswf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-swf/parse-swf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-text/parse-text.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/parse-zip/parse-zip.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/protocol-file/protocol-file.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/protocol-ftp/commons-net-1.2.0-dev.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/protocol-ftp/protocol-ftp.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/protocol-http/protocol-http.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/protocol-httpclient/protocol-httpclient.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/query-basic/query-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/query-more/query-more.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/query-site/query-site.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/query-url/query-url.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/scoring-opic/scoring-opic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/subcollection/subcollection.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/summary-basic/summary-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/summary-lucene/lucene-highlighter-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/summary-lucene/summary-lucene.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlfilter-automaton/automaton.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlfilter-automaton/urlfilter-automaton.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlfilter-prefix/urlfilter-prefix.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlfilter-regex/urlfilter-regex.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlfilter-suffix/urlfilter-suffix.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlnormalizer-basic/urlnormalizer-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlnormalizer-pass/urlnormalizer-pass.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/plugins/urlnormalizer-regex/urlnormalizer-regex.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/protocol-file/protocol-file.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/protocol-ftp/protocol-ftp.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/protocol-http/protocol-http.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/protocol-httpclient/protocol-httpclient.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/query-basic/query-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/query-more/query-more.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/query-site/query-site.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/query-url/query-url.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/scoring-opic/scoring-opic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/subcollection/subcollection.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/summary-basic/summary-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/summary-lucene/summary-lucene.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlfilter-automaton/urlfilter-automaton.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlfilter-prefix/urlfilter-prefix.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlfilter-regex/urlfilter-regex.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlfilter-suffix/urlfilter-suffix.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlnormalizer-basic/urlnormalizer-basic.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlnormalizer-pass/urlnormalizer-pass.jar"/> + <classpathentry kind="lib" path="third-party/nutch/build/urlnormalizer-regex/urlnormalizer-regex.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/commons-beanutils.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/commons-collections-3.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/commons-digester.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/jstl.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/standard.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/lib/struts.jar"/> + <classpathentry kind="lib" path="third-party/nutch/contrib/web2/plugins/web-caching-oscache/lib/oscache-2.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-cli-2.0-SNAPSHOT.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-codec-1.3.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-httpclient-3.0.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-lang-2.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-logging-1.0.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/commons-logging-api-1.0.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/hadoop-0.10.1-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jakarta-oro-2.0.7.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jets3t.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-5.1.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-ext/ant.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-ext/commons-el.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-ext/jasper-compiler.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-ext/jasper-runtime.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/jetty-ext/jsp-api.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/junit-3.8.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/log4j-1.2.13.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/lucene-core-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/lucene-misc-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/pmd-ext/jakarta-oro-2.0.8.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/pmd-ext/jaxen-1.1-beta-7.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/pmd-ext/pmd-3.6.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/servlet-api.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/taglibs-i18n.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/xerces-2_6_2-apis.jar"/> + <classpathentry kind="lib" path="third-party/nutch/lib/xerces-2_6_2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/commons-pool-1.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/Jama-1.0.1-patched.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/clustering-carrot2/lib/violinstrings-1.0.2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-jakarta-poi/lib/poi-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-jakarta-poi/lib/poi-scratchpad-3.0-alpha1-20050704.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-log4j/lib/log4j-1.2.11.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-nekohtml/lib/nekohtml-0.9.4.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-xml/lib/jaxen-core.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-xml/lib/jaxen-jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-xml/lib/jdom.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-xml/lib/saxpath.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/lib-xml/lib/xercesImpl.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/ontology/lib/commons-logging-1.0.3.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/ontology/lib/icu4j_2_6_1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/ontology/lib/jena-2.1.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/parse-html/lib/tagsoup-1.0rc3.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/parse-pdf/lib/PDFBox-0.7.2-log4j.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/parse-rss/lib/commons-feedparser-0.6-fork.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/parse-rss/lib/xmlrpc-1.2.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/parse-swf/lib/javaswf.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/protocol-ftp/lib/commons-net-1.2.0-dev.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/summary-lucene/lib/lucene-highlighter-2.0.0.jar"/> + <classpathentry kind="lib" path="third-party/nutch/src/plugin/urlfilter-automaton/lib/automaton.jar"/> + <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/> + <classpathentry kind="con" path="org.maven.ide.eclipse.MAVEN2_CLASSPATH_CONTAINER"/> <classpathentry kind="output" path="target"/> </classpath> Modified: trunk/archive-access/projects/nutchwax/.project =================================================================== --- trunk/archive-access/projects/nutchwax/.project 2007-03-17 00:20:22 UTC (rev 1588) +++ trunk/archive-access/projects/nutchwax/.project 2007-03-17 00:34:39 UTC (rev 1589) @@ -10,15 +10,14 @@ <arguments> </arguments> </buildCommand> + <buildCommand> + <name>org.maven.ide.eclipse.maven2Builder</name> + <arguments> + </arguments> + </buildCommand> </buildSpec> <natures> <nature>org.eclipse.jdt.core.javanature</nature> + <nature>org.maven.ide.eclipse.maven2Nature</nature> </natures> - <linkedResources> - <link> - <name>java</name> - <type>2</type> - <locationURI>src/java</locationURI> - </link> - </linkedResources> </projectDescription> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-19 19:11:48
|
Revision: 1591 http://archive-access.svn.sourceforge.net/archive-access/?rev=1591&view=rev Author: stack-sf Date: 2007-03-19 12:11:31 -0700 (Mon, 19 Mar 2007) Log Message: ----------- A nutchwax-core A nutchwax-core/pom.xml Added new module to build nutchwax jar. Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-core/ trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml Added: trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-19 19:11:31 UTC (rev 1591) @@ -0,0 +1,57 @@ +<?xml version="1.0"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> +<parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> +</parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-core</artifactId> + <packaging>jar</packaging> + <name>NutchWAX Core Jar</name> + <build> + <sourceDirectory>../src/java</sourceDirectory> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-compiler-plugin</artifactId> + <configuration> + <source>1.5</source> + <target>1.5</target> + </configuration> + </plugin> + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution> + <id>antrun.generate.sources</id> + <phase>generate-sources</phase> + <configuration> + <tasks> + <!-- Make these conditional so do not run everytime--> + <echo>Compiling third.party dependencies as part of generate-sources</echo> + <ant dir=".." target="third.party.jar"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution> + <id>antrun.clean</id> + <phase>clean</phase> + <configuration> + <tasks> + <ant dir=".." target="clean-all"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-20 00:29:12
|
Revision: 1592 http://archive-access.svn.sourceforge.net/archive-access/?rev=1592&view=rev Author: stack-sf Date: 2007-03-19 17:29:13 -0700 (Mon, 19 Mar 2007) Log Message: ----------- D src/web Moved under nutchwax-war. D nutchwax-war/src/main/assembly/placeholder.xml A nutchwax-war/src/main/assembly/assemble-war.xml A nutchwax-war/src/main/webapp A nutchwax-war/src/main/webapp/WEB-INF Added from src/web M nutchwax-war/pom.xml Change target packaging to be war. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/assemble-war.xml trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/webapp/ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/webapp/WEB-INF/ Removed Paths: ------------- trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml trunk/archive-access/projects/nutchwax/src/web/ Modified: trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml 2007-03-19 19:11:31 UTC (rev 1591) +++ trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml 2007-03-20 00:29:13 UTC (rev 1592) @@ -29,72 +29,20 @@ <modelVersion>4.0.0</modelVersion> <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-war</artifactId> - <packaging>pom</packaging> + <packaging>war</packaging> <name>NutchWAX Webapp</name> - <build> - <plugins> - <plugin> - <!-- NOTE: We don't need a groupId specification because the group is - org.apache.maven.plugins ...which is assumed by default. - --> - <artifactId>maven-assembly-plugin</artifactId> - <configuration> - <descriptors> - <descriptor> - src/main/assembly/placeholder.xml - </descriptor> - </descriptors> - <appendAssemblyId> - false - </appendAssemblyId> - </configuration> - <executions> - <execution> - <phase>package</phase> - <goals> - <goal>attached</goal> - </goals> - </execution> - </executions> - </plugin> - <plugin> - <artifactId>maven-antrun-plugin</artifactId> - <executions> - <execution> - <id>antrun.package</id> - <phase>package</phase> - <configuration> - <tasks> - <echo>Assembling Job JAR</echo> - <ant dir=".." target="war"/> - <copy file="../target/nutchwax-webapp-${project.version}.war" overwrite="true" - verbose="true" tofile="target/${project.artifactId}-${project.version}.war" /> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - </executions> - </plugin> - </plugins> - </build> <dependencies> <dependency> <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-core</artifactId> </dependency> </dependencies> - <!--If I uncomment the below, we fail trying to download our - dependency from remote repository. It should be getting it from - the local repository. It must be some artifact of our hack. - Leaving it off for now. - - <dependencies> - <dependency> - <groupId>org.archive.nutchwax</groupId> - <artifactId>nutchwax-job</artifactId> - </dependency> - </dependencies> - --> + <distributionManagement> + <site> + <id>website</id> + <name>Website</name> + <!--Pass as command-line system property to maven--> + <url>${website.url}/projects/${project.parent.artifactId}/${project.artifactId}</url> + </site> + </distributionManagement> </project> Copied: trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/assemble-war.xml (from rev 1589, trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml) =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/assemble-war.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/assemble-war.xml 2007-03-20 00:29:13 UTC (rev 1592) @@ -0,0 +1,116 @@ +<assembly> + <id>war</id> + <formats> + <format>war</format> + </formats> + <includeBaseDirectory>false</includeBaseDirectory> + <fileSets> + <fileSet> + <directory>../third-party/nutch/src/web/jsp</directory> + <outputDirectory>/</outputDirectory> + <excludes> + <exclude>**/search.jsp</exclude> + <exclude>**/web.xml</exclude> + <exclude>**/refine*.xml</exclude> + <exclude>**/cluster.jsp</exclude> + <exclude>**/refine-query*</exclude> + </excludes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/src/web/jsp</directory> + <outputDirectory>/</outputDirectory> + <excludes> + <exclude>**/web.xml</exclude> + <excludes> + </fileSet> + + <fileSet> + <directory>../target/wax-plugins</directory> + <outputDirectory>/wax-plugins</outputDirectory> + </fileSet> + <fileSet> + <directory>../src/plugin/parse-waxext/bin</directory> + <outputDirectory>/bin</outputDirectory> + </fileSet> + <fileSet> + <directory>..</directory> + <outputDirectory>/</outputDirectory> + <includes> + <include> + README* + </include> + </includes> + </fileSet> + <fileSet> + <directory>../conf</directory> + <outputDirectory>/</outputDirectory> + <includes> + <include>log4j.properties</include> + <include>wax-parse-plugins.xml</include> + <include>wax-default.xml</include> + <include>regex-normalize.xml</include> + <include>regex-urlfilter.txt</include> + </includes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/build/plugins</directory> + <outputDirectory>/plugins</outputDirectory> + <includes> + <include>analysis-*/**</include> + <include>index-*/**</include> + <include>language-*/**</include> + <include>lib-*/**</include> + <include>nutch-*/**</include> + <include>scoring-*/**</include> + <include>query-*/**</include> + <include>summary-*/**</include> + <include>urlfilter-*/**</include> + <include>urlnormalizer-*/**</include> + <include>parse-*/**</include> + </includes> + <excludes> + <exclude>parse-js/**</exclude> + </excludes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/conf</directory> + <outputDirectory>/</outputDirectory> + <includes> + <include>mime-types.xml</include> + <include>nutch-default.xml</include> + <include>nutch-site.xml</include> + <include>common-terms.utf8</include> + </includes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/lib</directory> + <outputDirectory>/lib</outputDirectory> + <includes> + <include>commons-lang*</include> + <include>lucene*</include> + <include>jakarta-oro*</include> + <include>xerces*</include> + <include>concurrent*</include> + </includes> + </fileSet> + </fileSets> + <dependencySets> + <dependencySet> + <outputDirectory>/lib</outputDirectory> + <!--<scope>runtime</scope> + --> + <excludes> + <exclude>commons-cli:commons-cli</exclude> + <exclude>commons-collections:commons-collections</exclude> + <exclude>commons-pool:commons-pool</exclude> + <exclude>commons-logging:commons-logging</exclude> + <exclude>org.apache:hadoop</exclude> + <exclude>org.apache:nutch</exclude> + <exclude>org.apache:nutch</exclude> + <exclude>com.sleepycat:je</exclude> + <exclude>junit:junit</exclude> + <exclude>javax.servlet:servlet-api</exclude> + </excludes> + </dependencySet> + </dependencySets> +</assembly> Deleted: trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml 2007-03-19 19:11:31 UTC (rev 1591) +++ trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/assembly/placeholder.xml 2007-03-20 00:29:13 UTC (rev 1592) @@ -1,6 +0,0 @@ -<assembly> - <id>placeholder</id> - <formats> - <format>war</format> - </formats> -</assembly> Copied: trunk/archive-access/projects/nutchwax/nutchwax-war/src/main/webapp (from rev 1589, trunk/archive-access/projects/nutchwax/src/web) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-20 00:51:03
|
Revision: 1594 http://archive-access.svn.sourceforge.net/archive-access/?rev=1594&view=rev Author: stack-sf Date: 2007-03-19 17:38:08 -0700 (Mon, 19 Mar 2007) Log Message: ----------- A nutchwax-thirdparty A nutchwax-thirdparty/pom.xml Added. Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/ trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml Added: trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-20 00:38:08 UTC (rev 1594) @@ -0,0 +1,58 @@ +<?xml version="1.0"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> +<parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> +</parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-thirdparty</artifactId> + <!--This project just compiles third-party code. + The produced jar has nothing in it but is useful + as a product subsequent modules can check for + to ensure third-party build has preceeded their + build. + --> + <packaging>jar</packaging> + <name>NutchWAX Third-party Dependencies</name> + <build> + <plugins> + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution> + <id>antrun.generate.sources</id> + <phase>generate-sources</phase> + <configuration> + <tasks> + <!-- Make these conditional so do not run everytime + Done as part of the generate-sources step so that + we can invoke it from eclipse. + --> + <echo>Compiling third.party dependencies as part of generate-sources</echo> + <ant dir=".." target="third.party.jar"/> + <ant dir=".." target="third.party.plugins"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution> + <id>antrun.clean</id> + <phase>clean</phase> + <configuration> + <tasks> + <ant dir=".." target="clean-all"/> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-20 00:51:03
|
Revision: 1593 http://archive-access.svn.sourceforge.net/archive-access/?rev=1593&view=rev Author: stack-sf Date: 2007-03-19 17:36:23 -0700 (Mon, 19 Mar 2007) Log Message: ----------- More on moving nutchwax to maven2. * projects/nutchwax/src/plugin/index-wax/plugin.xml * projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.12.0.jar Updated plugin. * projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar Removed. * projects/nutchwax/src/plugin/build-plugin.xml Call from maven2. Pass the maven2 dependencies to javac as extra classpath argument. * projects/nutchwax/src/plugin/build.xml Replaced by call from maven2 * projects/nutchwax/nutchwax-core/pom.xml Moved generation of sources to the nutchwax-thirdparty module. Make this module dependent on nutchwax-thirdparty module. * projects/nutchwax/pom.xml Set scope on dependencies. Make update of released plugins daily instead of always. Add in new modules thirdparty and plugins. * projects/nutchwax/build.xml Remove lib dir references (Its been removed). * projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml Assemble a job jar (We used to do this in an ant file). * projects/nutchwax/nutchwax-job/pom.xml Renamed assembler as assemble-job... It used to be just a placeholder but now we no longer need to do the copy from parent hack that we used to rely on. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/build.xml trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml trunk/archive-access/projects/nutchwax/pom.xml trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.12.0.jar Removed Paths: ------------- trunk/archive-access/projects/nutchwax/src/plugin/build.xml trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar Modified: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -38,13 +38,10 @@ <property name="build.encoding" value="ISO-8859-1"/> - <fileset id="lib.jars" dir="${root}" includes="lib/*.jar"/> - <!-- the normal classpath --> <path id="classpath"> <pathelement location="${build.classes}"/> <pathelement location="${nutch.root}/build/classes"/> - <fileset refid="lib.jars"/> <fileset dir="${nutch.root}/lib"> <include name="*.jar" /> </fileset> @@ -146,7 +143,9 @@ <zipfileset file="${nutch.root}/conf/nutch-default.xml"/> <zipfileset file="${nutch.root}/conf/common-terms.utf8"/> <zipfileset prefix="bin" file="${basedir}/src/plugin/parse-waxext/bin/parse-pdf.sh" filemode="555"/> - <zipfileset refid="lib.jars"/> + <!--<zipfileset refid="lib.jars"/> + --> + <!--Include all class files both nutch and nutchwax at top level so all needed to launch a job using the 'hadoop jar nutchwax.jobs' is on the classpath (Only classes that are at top-level in a jar can Modified: trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -19,39 +19,22 @@ <configuration> <source>1.5</source> <target>1.5</target> + <!-- + <compilerArgument> -verbose -classpath ../third-party/nutch/build/classes</compilerArgument> + --> </configuration> </plugin> - <plugin> - <artifactId>maven-antrun-plugin</artifactId> - <executions> - <execution> - <id>antrun.generate.sources</id> - <phase>generate-sources</phase> - <configuration> - <tasks> - <!-- Make these conditional so do not run everytime--> - <echo>Compiling third.party dependencies as part of generate-sources</echo> - <ant dir=".." target="third.party.jar"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - <execution> - <id>antrun.clean</id> - <phase>clean</phase> - <configuration> - <tasks> - <ant dir=".." target="clean-all"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - </executions> - </plugin> </plugins> </build> + <!--Look for placeholder nutchwax-thirdparty jar + Means third-party sources have been compiled. + The jar itself is empty. + --> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-thirdparty</artifactId> + <scope>compile</scope> + </dependency> + </dependencies> </project> Modified: trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -21,6 +21,9 @@ <modelVersion>4.0.0</modelVersion> <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-job</artifactId> + <!--Below we attach the job jar to the pom production. + The 'attach'ed assembly generates the job jar. + --> <packaging>pom</packaging> <name>NutchWAX Job Jar</name> <build> @@ -33,19 +36,17 @@ <configuration> <descriptors> <descriptor> - src/main/assembly/placeholder.xml + src/main/assembly/assemble-job.xml </descriptor> </descriptors> <appendAssemblyId> false </appendAssemblyId> - <archive> <manifest> <mainClass>org.archive.access.nutch.Nutchwax</mainClass> </manifest> </archive> - </configuration> <executions> <execution> @@ -57,25 +58,6 @@ </execution> </executions> </plugin> - <plugin> - <artifactId>maven-antrun-plugin</artifactId> - <executions> - <execution> - <id>antrun.generate.sources</id> - <phase>generate-sources</phase> - <configuration> - <tasks> - <!-- Make these conditional so do not run everytime--> - <echo>Compiling third.party plugins as part of generate-sources</echo> - <ant dir=".." target="third.party.plugins"/> - </tasks> - </configuration> - <goals> - <goal>run</goal> - </goals> - </execution> - </executions> - </plugin> </plugins> </build> <dependencies> Modified: trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -6,14 +6,92 @@ <includeBaseDirectory>false</includeBaseDirectory> <fileSets> <fileSet> - <directory>target/classes</directory> + <directory>../target/wax-plugins</directory> + <outputDirectory>/wax-plugins</outputDirectory> + </fileSet> + <fileSet> + <directory>../src/plugin/parse-waxext/bin</directory> + <outputDirectory>/bin</outputDirectory> + </fileSet> + <fileSet> + <directory>..</directory> <outputDirectory>/</outputDirectory> + <includes> + <include> + README* + </include> + </includes> </fileSet> + <fileSet> + <directory>../conf</directory> + <outputDirectory>/</outputDirectory> + <includes> + <include>log4j.properties</include> + <include>wax-parse-plugins.xml</include> + <include>wax-default.xml</include> + <include>regex-normalize.xml</include> + <include>regex-urlfilter.txt</include> + </includes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/build/plugins</directory> + <outputDirectory>/plugins</outputDirectory> + <includes> + <include>analysis-*/**</include> + <include>index-*/**</include> + <include>language-*/**</include> + <include>lib-*/**</include> + <include>nutch-*/**</include> + <include>scoring-*/**</include> + <include>query-*/**</include> + <include>summary-*/**</include> + <include>urlfilter-*/**</include> + <include>urlnormalizer-*/**</include> + <include>parse-*/**</include> + </includes> + <excludes> + <exclude>parse-js/**</exclude> + </excludes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/conf</directory> + <outputDirectory>/</outputDirectory> + <includes> + <include>mime-types.xml</include> + <include>nutch-default.xml</include> + <include>nutch-site.xml</include> + <include>common-terms.utf8</include> + </includes> + </fileSet> + <fileSet> + <directory>../third-party/nutch/lib</directory> + <outputDirectory>/lib</outputDirectory> + <includes> + <include>commons-lang*</include> + <include>lucene*</include> + <include>jakarta-oro*</include> + <include>xerces*</include> + <include>concurrent*</include> + </includes> + </fileSet> </fileSets> <dependencySets> <dependencySet> <outputDirectory>/lib</outputDirectory> - <scope>runtime</scope> + <!--<scope>runtime</scope> + --> + <excludes> + <exclude>commons-cli:commons-cli</exclude> + <exclude>commons-collections:commons-collections</exclude> + <exclude>commons-pool:commons-pool</exclude> + <exclude>commons-logging:commons-logging</exclude> + <exclude>org.apache:hadoop</exclude> + <exclude>org.apache:nutch</exclude> + <exclude>org.apache:nutch</exclude> + <exclude>com.sleepycat:je</exclude> + <exclude>junit:junit</exclude> + <exclude>javax.servlet:servlet-api</exclude> + </excludes> </dependencySet> </dependencySets> </assembly> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -166,7 +166,9 @@ <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.0-beta-2</version> + <scope>compile</scope> </dependency> + <!-- <dependency> <groupId>org.apache</groupId> <artifactId>hadoop</artifactId> @@ -179,6 +181,7 @@ <version>0.9-dev-508238</version> <scope>compile</scope> </dependency> + --> <dependency> <groupId>javax.servlet</groupId> <artifactId>servlet-api</artifactId> @@ -297,7 +300,7 @@ <repository> <releases> <enabled>true</enabled> - <updatePolicy>always</updatePolicy> + <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> @@ -322,7 +325,7 @@ <layout>default</layout> <releases> <enabled>true</enabled> - <updatePolicy>always</updatePolicy> + <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <!-- @@ -354,6 +357,11 @@ <dependencies> <dependency> <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-thirdparty</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-core</artifactId> <version>${project.version}</version> </dependency> @@ -371,15 +379,19 @@ </dependencyManagement> <modules> <module> + nutchwax-thirdparty + </module> + <module> nutchwax-core </module> <module> + nutchwax-plugins + </module> + <module> nutchwax-job </module> - <!-- <module> nutchwax-war </module> - --> </modules> </project> Modified: trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/src/plugin/build-plugin.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -3,6 +3,9 @@ <!--Copied from nutch/src/plugin. Changes so we build into nutchwax/build and so we get dependencies from a nutch we expect to be in the nutchwax directory. + + + Called from maven2. --> <!-- Imported by plugin build.xml files to define default targets. --> @@ -44,16 +47,11 @@ <property name="build.encoding" value="ISO-8859-1"/> - <fileset id="lib.jars" dir="${root}" includes="lib/*.jar"/> <!-- the normal classpath --> <path id="classpath"> <pathelement location="${build.classes}"/> - <fileset refid="lib.jars"/> <pathelement location="${nutch.root}/target/classes"/> - <fileset dir="${nutch.root}/lib"> - <include name="*.jar" /> - </fileset> <!--IA: Add the nutch jars.--> <fileset dir="${real.nutch.root}/lib"> <include name="*.jar" /> @@ -99,6 +97,9 @@ debug="${javac.debug}" deprecation="${javac.deprecation}"> <classpath refid="classpath"/> + <!--This build file is being called out of maven2. Its + setting the below reference to maven.compile.classpath.--> + <classpath refid="maven.compile.classpath"/> </javac> </target> @@ -124,9 +125,6 @@ <copy file="plugin.xml" todir="${deploy.dir}" preservelastmodified="true"/> <copy file="${build.dir}/${name}.jar" todir="${deploy.dir}"/> - <copy todir="${deploy.dir}" flatten="true"> - <fileset refid="lib.jars"/> - </copy> </target> <!-- ================================================================== --> Deleted: trunk/archive-access/projects/nutchwax/src/plugin/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/build.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/src/plugin/build.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -1,43 +0,0 @@ -<?xml version="1.0"?> - -<project name="Nutch" default="deploy" basedir="."> - - <!-- ====================================================== --> - <!-- Build & deploy all the plugin jars. --> - <!-- ====================================================== --> - <target name="deploy"> - <ant dir="index-wax" target="deploy"/> - <ant dir="query-wax" target="deploy"/> - <ant dir="parse-default" target="deploy"/> - <ant dir="parse-waxext" target="deploy"/> - <ant dir="query-host" target="deploy"/> - <ant dir="query-anchor" target="deploy"/> - <ant dir="query-title" target="deploy"/> - <ant dir="query-content" target="deploy"/> - </target> - - <!-- ====================================================== --> - <!-- Test all of the plugins. --> - <!-- ====================================================== --> - <target name="test"> - <ant dir="index-wax" target="test"/> - <ant dir="query-wax" target="test"/> - <ant dir="parse-default" target="test"/> - <ant dir="parse-waxext" target="test"/> - </target> - - <!-- ====================================================== --> - <!-- Clean all of the plugins. --> - <!-- ====================================================== --> - <target name="clean"> - <ant dir="index-wax" target="clean"/> - <ant dir="query-wax" target="clean"/> - <ant dir="parse-default" target="clean"/> - <ant dir="parse-waxext" target="clean"/> - <ant dir="query-host" target="clean"/> - <ant dir="query-anchor" target="clean"/> - <ant dir="query-title" target="clean"/> - <ant dir="query-content" target="clean"/> - </target> - -</project> Deleted: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.11.0-200702160009.jar =================================================================== (Binary files differ) Added: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.12.0.jar =================================================================== (Binary files differ) Property changes on: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/lib/archive-commons-1.12.0.jar ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Modified: trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml 2007-03-20 00:29:13 UTC (rev 1592) +++ trunk/archive-access/projects/nutchwax/src/plugin/index-wax/plugin.xml 2007-03-20 00:36:23 UTC (rev 1593) @@ -12,7 +12,7 @@ <!--Alternative is to change the nutch script so that it includes libs from other than its local directory. Without that, need to have lib local to plugin.--> - <library name="archive-commons-1.11.0-200702160009.jar" /> + <library name="archive-commons-1.12.0.jar" /> </runtime> <extension id="org.archive.access.nutch.indexer" This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-20 04:53:25
|
Revision: 1615 http://archive-access.svn.sourceforge.net/archive-access/?rev=1615&view=rev Author: stack-sf Date: 2007-03-19 20:41:46 -0700 (Mon, 19 Mar 2007) Log Message: ----------- A nutchwax-plugins A nutchwax-plugins/pom.xml Add. Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-plugins/ trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml Added: trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml 2007-03-20 03:41:46 UTC (rev 1615) @@ -0,0 +1,122 @@ +<?xml version="1.0"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> +<parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> +</parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-plugins</artifactId> + <!--This project compiles the nutchwax plugins. + It does not produce any product. + --> + <packaging>pom</packaging> + <name>NutchWAX Plugins</name> + <build> + <plugins> + <plugin> + <artifactId>maven-antrun-plugin</artifactId> + <executions> + <execution> + <id>antrun.compile</id> + <phase>compile</phase> + <configuration> + <tasks> + <!--Call each of our plugins. Set the inheritRef so we can get + at the maven dependencies when we go to compile. + --> + <echo>Compiling plugins</echo> + <ant dir="../src/plugin/index-wax" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-wax" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/parse-default" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/parse-waxext" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-host" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-anchor" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-title" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-content" target="deploy" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + <execution> + <id>antrun.clean</id> + <phase>clean</phase> + <configuration> + <tasks> + <!-- Make these conditional so do not run everytime + Done as part of the generate-sources step so that + we can invoke it from eclipse. + --> + <echo>Cleaning plugins</echo> + <ant dir="../src/plugin/index-wax" target="clean" inheritAll="false" + inheritRefs="true"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-wax" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/parse-default" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/parse-waxext" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-host" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-anchor" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-title" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + <ant dir="../src/plugin/query-content" target="clean" inheritAll="false"> + <property name="build.compiler" value="extJavac" /> + </ant> + </tasks> + </configuration> + <goals> + <goal>run</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-core</artifactId> + <scope> + compile + </scope> + </dependency> + </dependencies> +</project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-20 21:34:07
|
Revision: 1618 http://archive-access.svn.sourceforge.net/archive-access/?rev=1618&view=rev Author: stack-sf Date: 2007-03-20 14:32:18 -0700 (Tue, 20 Mar 2007) Log Message: ----------- M nutchwax/nutchwax-core/pom.xml D nutchwax/nutchwax-war Moved to nutchwax-webapp. M nutchwax/nutchwax-thirdparty/pom.xml Copy nutch classes to target/classes so can be found by subsequent modules. M nutchwax/pom.xml A nutchwax/nutchwax-webapp M nutchwax/nutchwax-webapp/pom.xml Renamed webapp module. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml trunk/archive-access/projects/nutchwax/pom.xml Added Paths: ----------- trunk/archive-access/projects/nutchwax/nutchwax-webapp/ trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-webapp/src/ Removed Paths: ------------- trunk/archive-access/projects/nutchwax/nutchwax-war/ trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-webapp/src/ Modified: trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-20 14:47:42 UTC (rev 1617) +++ trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-20 21:32:18 UTC (rev 1618) @@ -10,8 +10,14 @@ <artifactId>nutchwax-core</artifactId> <packaging>jar</packaging> <name>NutchWAX Core Jar</name> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-thirdparty</artifactId> + </dependency> + </dependencies> <build> - <sourceDirectory>../src/java</sourceDirectory> + <sourceDirectory>../src/java</sourceDirectory> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> @@ -20,21 +26,11 @@ <source>1.5</source> <target>1.5</target> <!-- + <compilerArgument> -verbose -cp ../third-party/nutch/build/classes</compilerArgument> <compilerArgument> -verbose -classpath ../third-party/nutch/build/classes</compilerArgument> --> </configuration> </plugin> </plugins> </build> - <!--Look for placeholder nutchwax-thirdparty jar - Means third-party sources have been compiled. - The jar itself is empty. - --> - <dependencies> - <dependency> - <groupId>org.archive.nutchwax</groupId> - <artifactId>nutchwax-thirdparty</artifactId> - <scope>compile</scope> - </dependency> - </dependencies> </project> Modified: trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-20 14:47:42 UTC (rev 1617) +++ trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-20 21:32:18 UTC (rev 1618) @@ -8,11 +8,14 @@ <modelVersion>4.0.0</modelVersion> <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-thirdparty</artifactId> - <!--This project just compiles third-party code. - The produced jar has nothing in it but is useful - as a product subsequent modules can check for - to ensure third-party build has preceeded their - build. + <!--This pom produces an empty placeholder jar. We + used to build the nutch classes into the produced + jar and then reference it in later projects but looking + at maven with debug enabled, it does not actually use the + jars produced by earlier modules, it instead puts the + target/classes directory on the classpath instead. So, + below, after building nutch, we copy the nutch classes + to target/classes so later modules can find them. --> <packaging>jar</packaging> <name>NutchWAX Third-party Dependencies</name> @@ -22,8 +25,8 @@ <artifactId>maven-antrun-plugin</artifactId> <executions> <execution> - <id>antrun.generate.sources</id> - <phase>generate-sources</phase> + <id>antrun.compile</id> + <phase>compile</phase> <configuration> <tasks> <!-- Make these conditional so do not run everytime @@ -32,6 +35,13 @@ --> <echo>Compiling third.party dependencies as part of generate-sources</echo> <ant dir=".." target="third.party.jar"/> + <!--Copy over the nutch classes to target/classes so they + can be found by later modules (target/classes is what maven + has on its classpath when it goes to build subsequent modules). + --> + <copy todir="target/classes" overwrite="true"> + <fileset dir="../third-party/nutch/build/classes" /> + </copy> <ant dir=".." target="third.party.plugins"/> </tasks> </configuration> Copied: trunk/archive-access/projects/nutchwax/nutchwax-webapp (from rev 1615, trunk/archive-access/projects/nutchwax/nutchwax-war) Deleted: trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml 2007-03-20 03:41:46 UTC (rev 1615) +++ trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml 2007-03-20 21:32:18 UTC (rev 1618) @@ -1,48 +0,0 @@ -<?xml version="1.0"?> -<!-- - See head of the nutchwax-job/pom.xml for some pointers - on the 'weird' stuff that is going on in here, the - overwriting of this poms' product by a copy from the - directory above. - - POM reference: http://maven.apache.org/pom.html - - List of the better articles on maven: - - http://www.javaworld.com/javaworld/jw-05-2006/jw-0529-maven.html - http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-maven_p.html - - URLs on converting from 1.0 to 2.0 maven (not much good generally): - - http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade - http://maven.apache.org/guides/mini/guide-m1-m2.html - --> -<project xmlns="http://maven.apache.org/POM/4.0.0" - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 - http://maven.apache.org/maven-v4_0_0.xsd"> - <parent> - <groupId>org.archive</groupId> - <artifactId>nutchwax</artifactId> - <version>0.11.0-SNAPSHOT</version> - </parent> - <modelVersion>4.0.0</modelVersion> - <groupId>org.archive.nutchwax</groupId> - <artifactId>nutchwax-war</artifactId> - <packaging>war</packaging> - <name>NutchWAX Webapp</name> - <dependencies> - <dependency> - <groupId>org.archive.nutchwax</groupId> - <artifactId>nutchwax-core</artifactId> - </dependency> - </dependencies> - <distributionManagement> - <site> - <id>website</id> - <name>Website</name> - <!--Pass as command-line system property to maven--> - <url>${website.url}/projects/${project.parent.artifactId}/${project.artifactId}</url> - </site> - </distributionManagement> -</project> Copied: trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml (from rev 1617, trunk/archive-access/projects/nutchwax/nutchwax-war/pom.xml) =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml (rev 0) +++ trunk/archive-access/projects/nutchwax/nutchwax-webapp/pom.xml 2007-03-20 21:32:18 UTC (rev 1618) @@ -0,0 +1,48 @@ +<?xml version="1.0"?> +<!-- + See head of the nutchwax-job/pom.xml for some pointers + on the 'weird' stuff that is going on in here, the + overwriting of this poms' product by a copy from the + directory above. + + POM reference: http://maven.apache.org/pom.html + + List of the better articles on maven: + + http://www.javaworld.com/javaworld/jw-05-2006/jw-0529-maven.html + http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-maven_p.html + + URLs on converting from 1.0 to 2.0 maven (not much good generally): + + http://wiki.osafoundation.org/bin/view/Journal/Maven2Upgrade + http://maven.apache.org/guides/mini/guide-m1-m2.html + --> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 + http://maven.apache.org/maven-v4_0_0.xsd"> + <parent> + <groupId>org.archive</groupId> + <artifactId>nutchwax</artifactId> + <version>0.11.0-SNAPSHOT</version> + </parent> + <modelVersion>4.0.0</modelVersion> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-webapp</artifactId> + <packaging>war</packaging> + <name>NutchWAX Webapp</name> + <dependencies> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-core</artifactId> + </dependency> + </dependencies> + <distributionManagement> + <site> + <id>website</id> + <name>Website</name> + <!--Pass as command-line system property to maven--> + <url>${website.url}/projects/${project.parent.artifactId}/${project.artifactId}</url> + </site> + </distributionManagement> +</project> Copied: trunk/archive-access/projects/nutchwax/nutchwax-webapp/src (from rev 1617, trunk/archive-access/projects/nutchwax/nutchwax-war/src) Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-03-20 14:47:42 UTC (rev 1617) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-03-20 21:32:18 UTC (rev 1618) @@ -168,21 +168,7 @@ <version>1.0-beta-2</version> <scope>compile</scope> </dependency> - <!-- <dependency> - <groupId>org.apache</groupId> - <artifactId>hadoop</artifactId> - <version>0.10.1-core</version> - <scope>compile</scope> - </dependency> - <dependency> - <groupId>org.apache</groupId> - <artifactId>nutch</artifactId> - <version>0.9-dev-508238</version> - <scope>compile</scope> - </dependency> - --> - <dependency> <groupId>javax.servlet</groupId> <artifactId>servlet-api</artifactId> <version>2.4</version> @@ -372,7 +358,7 @@ </dependency> <dependency> <groupId>org.archive.nutchwax</groupId> - <artifactId>nutchwax-war</artifactId> + <artifactId>nutchwax-webapp</artifactId> <version>${project.version}</version> </dependency> </dependencies> @@ -391,7 +377,7 @@ nutchwax-job </module> <module> - nutchwax-war + nutchwax-webapp </module> </modules> </project> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-21 17:19:02
|
Revision: 1622 http://archive-access.svn.sourceforge.net/archive-access/?rev=1622&view=rev Author: stack-sf Date: 2007-03-21 10:06:07 -0700 (Wed, 21 Mar 2007) Log Message: ----------- * nutchwax-plugins/pom.xml Comment on why packaging has to be jar though no real maven product produced by this module. * nutchwax-core/pom.xml Set scope on when nutchwax-thirdparty is needed. * nutchwax-thirdparty/pom.xml Updated comment on packaging type. * pom.xml Add in mention of plugins module. * .classpath Removed some plugins and added in conf and plugin dirs. * nutchwax-job/src/main/assembly/assemble-job.xml Do not include plugins empty jar in assembly. * nutchwax-job/pom.xml Add dependency on plugins and fastutil. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/.classpath trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml trunk/archive-access/projects/nutchwax/pom.xml Modified: trunk/archive-access/projects/nutchwax/.classpath =================================================================== --- trunk/archive-access/projects/nutchwax/.classpath 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/.classpath 2007-03-21 17:06:07 UTC (rev 1622) @@ -10,58 +10,26 @@ <classpathentry kind="src" path="src/plugin/query-title/src/java"/> <classpathentry kind="src" path="src/plugin/query-wax/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/analysis-de/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/analysis-fr/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/clustering-carrot2/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/clustering-carrot2/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/creativecommons/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/creativecommons/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/index-basic/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/index-more/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/languageidentifier/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/languageidentifier/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-http/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-http/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-parsems/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-regex-filter/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/lib-regex-filter/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/microformats-reltag/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/ontology/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/ontology/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-ext/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-ext/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-html/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-html/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-js/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msexcel/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msexcel/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-mspowerpoint/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-mspowerpoint/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msword/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-msword/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-oo/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-oo/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-pdf/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-pdf/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-rss/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-rss/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-swf/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-swf/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-text/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-zip/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/parse-zip/src/test"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-file/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-ftp/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-http/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/protocol-httpclient/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/query-basic/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/query-more/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/query-site/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/query-url/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/query-url/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/scoring-opic/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/subcollection/src/java"/> - <classpathentry kind="src" path="third-party/nutch/src/plugin/subcollection/src/test"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/summary-basic/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/summary-lucene/src/java"/> <classpathentry kind="src" path="third-party/nutch/src/plugin/urlfilter-automaton/src/java"/> @@ -268,5 +236,7 @@ <classpathentry kind="lib" path="third-party/nutch/src/plugin/urlfilter-automaton/lib/automaton.jar"/> <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/> <classpathentry kind="con" path="org.maven.ide.eclipse.MAVEN2_CLASSPATH_CONTAINER"/> + <classpathentry kind="lib" path="third-party/nutch/build"/> + <classpathentry kind="lib" path="third-party/nutch/conf"/> <classpathentry kind="output" path="target"/> </classpath> Modified: trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/nutchwax-core/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -14,6 +14,7 @@ <dependency> <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-thirdparty</artifactId> + <scope>compile</scope> </dependency> </dependencies> <build> Modified: trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -65,6 +65,15 @@ <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-core</artifactId> </dependency> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-plugins</artifactId> + </dependency> + <dependency> + <groupId>it.unimi.dsi</groupId> + <artifactId>fastutil</artifactId> + <version>5.0.3</version> + </dependency> </dependencies> <distributionManagement> <site> Modified: trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/nutchwax-job/src/main/assembly/assemble-job.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -108,6 +108,7 @@ <exclude>junit:junit</exclude> <exclude>javax.servlet:servlet-api</exclude> <exclude>org.archive.nutchwax:nutchwax-thirdparty</exclude> + <exclude>org.archive.nutchwax:nutchwax-plugins</exclude> </excludes> </dependencySet> </dependencySets> Modified: trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/nutchwax-plugins/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -9,9 +9,15 @@ <groupId>org.archive.nutchwax</groupId> <artifactId>nutchwax-plugins</artifactId> <!--This project compiles the nutchwax plugins. - It does not produce any product. + It does not produce any product. It has a + packaging of type jar so there will be an + empty jar, but it is just ignored. I tried + setting the type to pom having downstream + modules depend on a pom, but maven doesn't + seem to like that. You cannot depend + on a pom; must be jar (or war I suppose). --> - <packaging>pom</packaging> + <packaging>jar</packaging> <name>NutchWAX Plugins</name> <build> <plugins> Modified: trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -13,9 +13,13 @@ jar and then reference it in later projects but looking at maven with debug enabled, it does not actually use the jars produced by earlier modules, it instead puts the - target/classes directory on the classpath instead. So, - below, after building nutch, we copy the nutch classes + target/classes directory on the classpath. So, below, + after building nutch, we copy the nutch classes to target/classes so later modules can find them. + + The packaging needs to be jar and not pom since + it doesn not seem like you can have downstream modules + depend on a pom (only on a jar). --> <packaging>jar</packaging> <name>NutchWAX Third-party Dependencies</name> Modified: trunk/archive-access/projects/nutchwax/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/pom.xml 2007-03-21 04:51:07 UTC (rev 1621) +++ trunk/archive-access/projects/nutchwax/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) @@ -361,6 +361,11 @@ <artifactId>nutchwax-webapp</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>org.archive.nutchwax</groupId> + <artifactId>nutchwax-plugins</artifactId> + <version>${project.version}</version> + </dependency> </dependencies> </dependencyManagement> <modules> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <sta...@us...> - 2007-03-21 19:14:49
|
Revision: 1623 http://archive-access.svn.sourceforge.net/archive-access/?rev=1623&view=rev Author: stack-sf Date: 2007-03-21 12:13:23 -0700 (Wed, 21 Mar 2007) Log Message: ----------- M nutchwax/src/main/assembly/distribution.xml Change where we fetch job and war jars from. M nutchwax/nutchwax-thirdparty/pom.xml D nutchwax/build.xml Removed build.xml. Remove misimpression that you can use ant to build this project. Call third.party build.xml directly instead. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml Removed Paths: ------------- trunk/archive-access/projects/nutchwax/build.xml Deleted: trunk/archive-access/projects/nutchwax/build.xml =================================================================== --- trunk/archive-access/projects/nutchwax/build.xml 2007-03-21 17:06:07 UTC (rev 1622) +++ trunk/archive-access/projects/nutchwax/build.xml 2007-03-21 19:13:23 UTC (rev 1623) @@ -1,342 +0,0 @@ -<?xml version="1.0"?> - -<project name="nutchwax" default="all"> - <property name="name" value="${ant.project.name}"/> - <property name="root" value="${basedir}"/> - - <!--'nutch.root' is pointer at core nutch. Expect to find it in - '${basedir}/third-party' named 'nutch'. - --> - <!--Keep this aligned with whats in maven2 pom--> - <property name="nutchwax.version" value="-0.11.0-SNAPSHOT"/> - <property name="nutch.root" location="${root}/third-party/nutch"/> - - <property file="${user.home}/.$(name}.build.properties" /> - - <property name="src.dir" location="${root}/src/java"/> - <property name="src.test" location="${root}/src/test"/> - - <available file="${src.test}" type="dir" property="test.available"/> - - <property name="conf.dir" location="${root}/conf"/> - - <property name="build.dir" location="${root}/target"/> - <property name="build.classes" location="${build.dir}/classes"/> - <property name="build.test" location="${build.dir}/test"/> - - <property name="build.plugins" location="${nutch.root}/build/plugins"/> - <property name="deploy.dir" location="${build.plugins}/${name}"/> - - <property name="this.web" location="${root}/src/web"/> - <property name="nutch.web" location="${nutch.root}/src/web"/> - - <property name="javac.deprecation" value="off"/> - <property name="javac.debug" value="on"/> - - <property name="javadoc.link" - value="http://java.sun.com/j2se/1.5/docs/api/"/> - - <property name="build.encoding" value="ISO-8859-1"/> - - <!-- the normal classpath --> - <path id="classpath"> - <pathelement location="${build.classes}"/> - <pathelement location="${nutch.root}/build/classes"/> - <fileset dir="${nutch.root}/lib"> - <include name="*.jar" /> - </fileset> - </path> - - <!-- the unit test classpath --> - <path id="test.classpath"> - <pathelement location="${build.test}" /> - <pathelement location="${conf.dir}"/> - <pathelement location="${nutch.root}/conf"/> - <pathelement location="${nutch.root}/build"/> - <path refid="classpath"/> - </path> - - <target name="third.party.plugins" description="Build third-party plugins"> - <echo message="Building nutch third-party dependency (plugins)" /> - <ant dir="third-party/nutch" target="compile-plugins" inheritAll="false" > - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - <target name="third.party.compile" description="Compile third-party src"> - <echo message="Building nutch third-party dependency (compile)" /> - <ant dir="third-party/nutch" target="compile" inheritAll="false" > - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - <target name="third.party.jar" description="Build third-party jars" > - <echo message="Building nutch third-party dependency (jar)" /> - <ant dir="third-party/nutch" target="jar" inheritAll="false" > - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - <target name="third.party.war" description="Build third-party wars" > - <echo message="Building nutch third-party dependency (war)" /> - <ant dir="third-party/nutch" target="war" inheritAll="false" > - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - <target name="third.party.clean" description="Clean third-party software"> - <echo message="Cleaning nutch third-party dependency" /> - <ant dir="third-party/nutch" target="clean" inheritAll="false" > - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - - <!-- ====================================================== --> - <!-- Stuff needed by all targets --> - <!-- ====================================================== --> - <target name="init"> - <mkdir dir="${build.dir}"/> - <mkdir dir="${build.classes}"/> - <mkdir dir="${build.test}"/> - </target> - - <!-- ====================================================== --> - <!-- Compile the Java files --> - <!-- ====================================================== --> - <target name="compile" depends="init" - description="Compile nutchwax classes"> - <property name="build.compiler" value="extJavac" /> - <javac - encoding="${build.encoding}" - srcdir="${src.dir}" - includes="**/*.java" - destdir="${build.classes}" - debug="${javac.debug}" - target="1.5" - source="1.5" - deprecation="${javac.deprecation}"> - <classpath refid="classpath"/> - </javac> - </target> - - <!-- ====================================================== --> - <!-- Compile plugins --> - <!-- ====================================================== --> - <target name="compile-plugins" - description="Compile all nutchwax plugins"> - <ant dir="src/plugin" target="deploy" inheritAll="false"> - <property name="build.compiler" value="extJavac" /> - </ant> - </target> - - <!-- ================================================================== --> - <!-- Make job jar --> - <!-- ================================================================== --> - <!-- --> - <!-- ================================================================== --> - <target name="jar" depends="compile, compile-plugins" - description="Builds nutchwax jobs jar of all tasks to do import, etc." > - <zip destfile="${build.dir}/${name}-job${nutchwax.version}.jar"> - <zipfileset prefix="META-INF" file="${conf.dir}/MANIFEST.MF"/> - <zipfileset file="${conf.dir}/log4j.properties"/> - <zipfileset file="${conf.dir}/wax-parse-plugins.xml"/> - <zipfileset file="${conf.dir}/wax-default.xml"/> - <zipfileset file="${conf.dir}/regex-normalize.xml"/> - <zipfileset file="${conf.dir}/regex-urlfilter.txt"/> - <zipfileset file="${nutch.root}/conf/mime-types.xml"/> - <zipfileset file="${nutch.root}/conf/nutch-default.xml"/> - <zipfileset file="${nutch.root}/conf/common-terms.utf8"/> - <zipfileset prefix="bin" file="${basedir}/src/plugin/parse-waxext/bin/parse-pdf.sh" filemode="555"/> - <!--<zipfileset refid="lib.jars"/> - --> - - <!--Include all class files both nutch and nutchwax at top level - so all needed to launch a job using the 'hadoop jar nutchwax.jobs' - is on the classpath (Only classes that are at top-level in a jar can - be found on CLASSPATH. Jars inside jars or classes under 'classes' - directory cannot be found or added to CLASSPATH, not without custom - classloader: See - http://java.sun.com/docs/books/tutorial/deployment/jar/downman.html). - --> - <zipfileset dir="${build.dir}/classes" /> - <zipfileset dir="${nutch.root}/build/classes" /> - <!-- Be selective about which plugins to copy over. Otherwise - the jar gets massive (16Megs with all plugins. 10Megs not - including plugins used at other than indexing time). - - Include query-time filters for case where we're running in - distributed mode. - - Note, we EXCLUDE parse-js. Otherwise, its run as part of - html parse. We don't want this because the parse-js currently - adds base url as anchor text polluting the linkdb and its kinda - messy regards URLs it finds in javascript. It needs some work. - Meantime, we'll do w/o the URLs it finds in linkdb. See - NUTCH-425 and - http://sourceforge.net/tracker/index.php?func=detail&aid=1591709&group_id=118427&atid=681137 - --> - <zipfileset prefix="plugins" dir="${nutch.root}/build/plugins"> - <!-- See above why we exclude parse-js--> - <exclude name="parse-js/**" /> - <include name="analysis-*/**" /> - <include name="index-*/**" /> - <include name="language-*/**" /> - <include name="lib-*/**" /> - <include name="nutch-*/**" /> - <include name="scoring-*/**" /> - <include name="query-*/**" /> - <include name="summary-*/**" /> - <include name="urlfilter-*/**" /> - <include name="urlnormalizer-*/**" /> - <include name="parse-*/**" /> - </zipfileset> - <!--Add wax plugins--> - <zipfileset prefix="wax-plugins" dir="${build.dir}/wax-plugins"> - <include name="*/**" /> - </zipfileset> - <!--Include nutch dependencies in job jar. --> - <zipfileset prefix="lib" file="${nutch.root}/lib/commons-lang*jar"/> - <zipfileset prefix="lib" file="${nutch.root}/lib/lucene*jar"/> - <zipfileset prefix="lib" file="${nutch.root}/lib/jakarta-oro*jar"/> - <zipfileset prefix="lib" file="${nutch.root}/lib/xerces*jar"/> - <zipfileset prefix="lib" file="${nutch.root}/lib//concurrent-1.3.4.jar"/> - <!--Finally, include the README.txt file so can tell what - hadoop and nutch this was built against--> - <zipfileset file="${root}/README.txt"/> - </zip> - </target> - - <!-- ================================================================== --> - <!-- Build all including third-party dependencies (i.e. nutch) --> - <!-- ================================================================== --> - <!-- --> - <!-- ================================================================== --> - <target name="all" depends="third.party.jar,third.party.war,jar,compile,war" /> - - <!-- ================================================================== --> - <!-- Compile test code --> - <!-- ================================================================== --> - <target name="compile-test" depends="compile" if="test.available"> - <javac - encoding="${build.encoding}" - srcdir="${src.test}" - includes="**/*.java" - destdir="${build.test}" - debug="${debug}"> - <classpath refid="test.classpath"/> - </javac> - </target> - - <!-- ================================================================== --> - <!-- Run unit tests --> - <!-- ================================================================== --> - <target name="test" depends="compile-test" if="test.available" - description="Run tests"> - - <junit printsummary="yes" haltonfailure="no" fork="yes" - errorProperty="tests.failed" failureProperty="tests.failed"> - <sysproperty key="test.data" value="${build.test}/data"/> - <sysproperty key="test.input" value="${root}/data"/> - <classpath refid="test.classpath"/> - <formatter type="plain" /> - <batchtest todir="${build.test}" unless="testcase"> - <fileset dir="${src.test}" - includes="**/Test*.java" excludes="**/${test.exclude}.java" /> - </batchtest> - <batchtest todir="${build.test}" if="testcase"> - <fileset dir="${src.test}" includes="**/${testcase}.java"/> - </batchtest> - </junit> - - <fail if="tests.failed">Tests failed!</fail> - - </target> - - <!-- ================================================================== --> - <!-- build war file --> - <!-- ================================================================== --> - <target name="war" depends="compile, compile-plugins" - description="Builds nutchwax war" > - <!--Copy our nutchwax nutch-site.xml template into the build dir as - nutch-site.xml. Then in the below, add it into the WEB-INF/classes dir. - --> - <war destfile="${build.dir}/${name}-webapp${nutchwax.version}.war" webxml="${this.web}/web.xml"> - <fileset dir="${nutch.web}/jsp"> - <exclude name="**/search.jsp"/> - <exclude name="**/web.xml"/> - <exclude name="**/refine*.xml"/> - <!--Don't copy these over until they jsp compile.--> - <exclude name="**/cluster.jsp"/> - <exclude name="**/refine-query*"/> - </fileset> - <fileset dir="${this.web}"> - <exclude name="**/web.xml"/> - </fileset> - <classes dir="${nutch.root}/conf" > - <exclude name="**/*.template"/> - </classes> - <classes dir="${root}/conf"> - <exclude name="**/*.template"/> - </classes> - <classes dir="${nutch.web}/locale"/> - <classes file="${this.web}/log4j.properties"/> - <lib dir="${root}/lib"> - <include name="archive-commons-*.jar" /> - </lib> - <lib dir="${nutch.root}/build"> - <include name="nutch*.jar"/> - </lib> - <lib dir="${nutch.root}/lib"> - <include name="lucene*.jar"/> - <include name="hadoop*.jar"/> - <include name="taglibs-*.jar"/> - <include name="dom4j-*.jar"/> - <include name="xerces-*.jar"/> - <include name="log4j-*.jar"/> - <include name="commons-lang-*.jar"/> - <include name="commons-cli-*.jar"/> - <include name="commons-logging-*.jar"/> - </lib> - <!--Copy into place the nutchwax classes.--> - <zipfileset prefix="WEB-INF/classes" - dir="${build.dir}/classes/" /> - - <!--Be selective about plugins to copy. Shrinks size of webapp. - --> - <zipfileset prefix="WEB-INF/classes/plugins" - dir="${nutch.root}/build/plugins"> - <include name="analysis-*/**" /> - <include name="clustering-*/**" /> - <include name="language-*/**" /> - <include name="lib-lucene-*/**" /> - <include name="lib-log4j-*/**" /> - <include name="lib-regex-*/**" /> - <include name="microformats-*/**" /> - <include name="nutch-*/**" /> - <include name="query-*/**" /> - <include name="urlfilter-*/**" /> - <include name="urlnormalizer-*/**" /> - <include name="summary-*/**" /> - <include name="urlfilter-*/**" /> - </zipfileset> - <zipfileset prefix="WEB-INF/classes/plugins" - dir="${build.dir}/wax-plugins"/> - <webinf dir="${nutch.root}/lib"> - <include name="taglibs-*.tld"/> - </webinf> - </war> - </target> - - - <!-- ================================================================== --> - <!-- Clean. Delete the build files, and their directories --> - <!-- ================================================================== --> - <target name="clean" description="Clean up all built"> - <delete dir="${build.dir}"/> - </target> - - <!-- ================================================================== --> - <!-- Clean all. Delete the build files including third-party builds --> - <!-- and their directories --> - <!-- ================================================================== --> - <target name="clean-all" - depends="clean,third.party.clean" - description="Clean up all built including third-party dependencies" /> - -</project> Modified: trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-21 17:06:07 UTC (rev 1622) +++ trunk/archive-access/projects/nutchwax/nutchwax-thirdparty/pom.xml 2007-03-21 19:13:23 UTC (rev 1623) @@ -37,8 +37,10 @@ Done as part of the generate-sources step so that we can invoke it from eclipse. --> - <echo>Compiling third.party dependencies as part of generate-sources</echo> - <ant dir=".." target="third.party.jar"/> + <echo>Building nutch third-party dependency (jar)</echo> + <ant dir="../third-party/nutch" target="jar" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> <!--Copy over the nutch classes to target/classes so they can be found by later modules (target/classes is what maven has on its classpath when it goes to build subsequent modules). @@ -46,7 +48,10 @@ <copy todir="target/classes" overwrite="true"> <fileset dir="../third-party/nutch/build/classes" /> </copy> - <ant dir=".." target="third.party.plugins"/> + <echo>Building nutch third-party dependency (plugins)</echo> + <ant dir="../third-party/nutch" target="compile-plugins" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> </tasks> </configuration> <goals> @@ -58,7 +63,10 @@ <phase>clean</phase> <configuration> <tasks> - <ant dir=".." target="clean-all"/> + <echo>Cleaning nutch third-party dependency</echo> + <ant dir="../third-party/nutch" target="clean" inheritAll="false" > + <property name="build.compiler" value="extJavac" /> + </ant> </tasks> </configuration> <goals> Modified: trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml =================================================================== --- trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml 2007-03-21 17:06:07 UTC (rev 1622) +++ trunk/archive-access/projects/nutchwax/src/main/assembly/distribution.xml 2007-03-21 19:13:23 UTC (rev 1623) @@ -17,8 +17,8 @@ <directory>target</directory> <outputDirectory /> <includes> - <include>nutchwax*.jar</include> - <include>nutchwax*.war</include> + <include>nutchwax-job/target/nutchwax-job*.jar</include> + <include>nutchwax-webapp/nutchwax-webapp*.war</include> </includes> </fileSet> <fileSet> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |