From: <bra...@us...> - 2007-09-28 00:09:00
|
Revision: 2009 http://archive-access.svn.sourceforge.net/archive-access/?rev=2009&view=rev Author: bradtofel Date: 2007-09-27 17:09:04 -0700 (Thu, 27 Sep 2007) Log Message: ----------- MOVED: src/site/xdocs => src/site/xdoc Modified Paths: -------------- trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml Added Paths: ----------- trunk/archive-access/projects/wayback/dist/src/site/xdoc/ Removed Paths: ------------- trunk/archive-access/projects/wayback/dist/src/site/xdocs/ Copied: trunk/archive-access/projects/wayback/dist/src/site/xdoc (from rev 1983, trunk/archive-access/projects/wayback/dist/src/site/xdocs) Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdocs/navigation.xml 2007-08-31 20:24:58 UTC (rev 1983) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml 2007-09-28 00:09:04 UTC (rev 2009) @@ -14,6 +14,7 @@ <item name="Requirements" href="requirements.html"/> <item name="Downloads" href="downloads.html"/> <item name="User Manual" href="user_manual.html"/> + <item name="Test" href="test.html"/> <item name="FAQ" href="/faq.html"/> <item name="API" href="./apidocs"/> <item name="Browse/Submit a Bug" This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2007-09-28 00:13:10
|
Revision: 2011 http://archive-access.svn.sourceforge.net/archive-access/?rev=2011&view=rev Author: bradtofel Date: 2007-09-27 17:13:11 -0700 (Thu, 27 Sep 2007) Log Message: ----------- moved *.dia to src/site/resources/dia moved *.png to src/site/resources/images Added Paths: ----------- trunk/archive-access/projects/wayback/dist/src/site/resources/dia/ trunk/archive-access/projects/wayback/dist/src/site/resources/dia/ARCProxy.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/AlphaRemoteResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/DynamicCDXResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/HTTP11ResourceStore.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/RemoteResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Shared-Small.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Shared.dia trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Standard.dia trunk/archive-access/projects/wayback/dist/src/site/resources/images/ARCProxy.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/AlphaRemoteResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/DynamicCDXResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/HTTP11ResourceStore.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/RemoteResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Component.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Shared-Small.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Shared.png trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Standard.png Removed Paths: ------------- trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Component.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.png trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.dia trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.png Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/ARCProxy.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/AlphaRemoteResourceIndex.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/DynamicCDXResourceIndex.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/HTTP11ResourceStore.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/RemoteResourceIndex.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Shared-Small.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Shared.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/dia/WM-Standard.dia (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.dia) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/ARCProxy.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/AlphaRemoteResourceIndex.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/DynamicCDXResourceIndex.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/HTTP11ResourceStore.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/RemoteResourceIndex.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Component.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Component.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Shared-Small.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Shared.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.png) =================================================================== (Binary files differ) Copied: trunk/archive-access/projects/wayback/dist/src/site/resources/images/WM-Standard.png (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.png) =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/ARCProxy.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/AlphaRemoteResourceIndex.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/DynamicCDXResourceIndex.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/HTTP11ResourceStore.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/RemoteResourceIndex.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Component.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared-Small.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Shared.png =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.dia =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/WM-Standard.png =================================================================== (Binary files differ) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2007-09-28 00:14:42
|
Revision: 2012 http://archive-access.svn.sourceforge.net/archive-access/?rev=2012&view=rev Author: bradtofel Date: 2007-09-27 17:14:43 -0700 (Thu, 27 Sep 2007) Log Message: ----------- moved src/site/xdoc/faq.fml to src/site/fml/faq.fml Added Paths: ----------- trunk/archive-access/projects/wayback/dist/src/site/fml/ trunk/archive-access/projects/wayback/dist/src/site/fml/faq.fml Removed Paths: ------------- trunk/archive-access/projects/wayback/dist/src/site/xdoc/faq.fml Copied: trunk/archive-access/projects/wayback/dist/src/site/fml/faq.fml (from rev 2009, trunk/archive-access/projects/wayback/dist/src/site/xdoc/faq.fml) =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/fml/faq.fml (rev 0) +++ trunk/archive-access/projects/wayback/dist/src/site/fml/faq.fml 2007-09-28 00:14:43 UTC (rev 2012) @@ -0,0 +1,39 @@ +<?xml version="1.0" encoding="UTF-8"?> +<faqs title="Frequently Asked Questions"> + + <part id="general"> + <title>General</title> + + <faq id="about"> + <question> + What is this project all about? + </question> + <answer> + <p> + The project is designed to replace the current Wayback Machine with an + all Java solution that is flexible enough to provide an easy-to-use + solution for the single-machine at-home user, as well as scaling up + to hundreds of machines for a full historical collection. + </p> + <p> + Primarily it is a few easily replaceable interfaces, and some core + classes that utilize those interfaces to provide the Wayback + service. Presently only trivial implementations of those interfaces + have been developed, but we hope that these interfaces will allow a + high degree of flexibility and experimentation. + </p> + </answer> + </faq> + <faq id="install"> + <question> + How can I install and use this? + </question> + <answer> + <p> + See the <a href="user_manual.html">User Manual</a> for information + about installing and using this application. + </p> + </answer> + </faq> + </part> +</faqs> Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/faq.fml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/faq.fml 2007-09-28 00:13:11 UTC (rev 2011) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/faq.fml 2007-09-28 00:14:43 UTC (rev 2012) @@ -1,39 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<faqs title="Frequently Asked Questions"> - - <part id="general"> - <title>General</title> - - <faq id="about"> - <question> - What is this project all about? - </question> - <answer> - <p> - The project is designed to replace the current Wayback Machine with an - all Java solution that is flexible enough to provide an easy-to-use - solution for the single-machine at-home user, as well as scaling up - to hundreds of machines for a full historical collection. - </p> - <p> - Primarily it is a few easily replaceable interfaces, and some core - classes that utilize those interfaces to provide the Wayback - service. Presently only trivial implementations of those interfaces - have been developed, but we hope that these interfaces will allow a - high degree of flexibility and experimentation. - </p> - </answer> - </faq> - <faq id="install"> - <question> - How can I install and use this? - </question> - <answer> - <p> - See the <a href="user_manual.html">User Manual</a> for information - about installing and using this application. - </p> - </answer> - </faq> - </part> -</faqs> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2007-09-28 00:55:20
|
Revision: 2013 http://archive-access.svn.sourceforge.net/archive-access/?rev=2013&view=rev Author: bradtofel Date: 2007-09-27 17:55:20 -0700 (Thu, 27 Sep 2007) Log Message: ----------- ADDED: new developer_environment.apt file Added Paths: ----------- trunk/archive-access/projects/wayback/dist/src/site/apt/ trunk/archive-access/projects/wayback/dist/src/site/apt/developer_environment.apt Added: trunk/archive-access/projects/wayback/dist/src/site/apt/developer_environment.apt =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/apt/developer_environment.apt (rev 0) +++ trunk/archive-access/projects/wayback/dist/src/site/apt/developer_environment.apt 2007-09-28 00:55:20 UTC (rev 2013) @@ -0,0 +1,134 @@ + --- + Setting up the Wayback Eclipse Development Environment + --- + Brad Tofel (brad at archive dot org) + --- + + +Getting Eclipse Europa installed + + [[1]] download and unpack eclipse-europa with "JEE" support + + [[2]] do latest software updates + + [[3]] software update: find and install: + + * Search for new features to install + + * New Remote Site + + * Name: <<<Subclipse>>> <(or whatever)> + + * Url: <<<http://subclipse.tigris.org/update_1.2.x>>> + + [[4]] select both Subclipse and Europa Discovery Site search sites + + [[5]] Check box to install "Subclipse" features + + [[6]] on same dialog, choose "Select Required" button.\ + this will select all dependencies from the Europa Discovery Site. + + [[7]] "Next", then accept terms of aggreement, then "Next", then "Finish" + + [[8]] Choose "Install All" on Feature Verification dialog. + + [[9]] choose "yes" to restart now + + [[10]] software update: find and install: + + * Search for new features to install + + * New Remote Site + + * Name: <<<Maven 2>>> <(or whatever)> + + * Url: <<<http://m2eclipse.codehaus.org/update/>>> + + [[11]] search Maven 2 plugin site with "Finish" + + [[12]] Check box to install "Subclipse" features + + [[13]] "Next", then accept terms of agreement, then "Next", then "Finish" + + [[14]] Choose "Install All" on Feature Verification dialog. + + [[15]] choose "yes" to restart now + + +Install Apache Tomcat + + From {{http://tomcat.apache.org/download-55.cgi}}. + +Adding a Tomcat server to Eclipse + + [[1]] Choose "File" \>\> "New" \>\> "Other..." + + [[2]] Choose "Server" \>\> "Server", and click "Next" + + [[3]] Fill out dialog "New Server:Define a New Server" + + * Server's host name: <<<localhost>>> + + * server type: "Apache" \>\> "Tomcat v5.5 Server", and click "Next" + + [[4]] Fill out dialog "New Server:Tomcat Server" + + * Name: <<<Apache Tomcat v5.5>>> + + * Tomcat installation directory: <(locate directory where you installed Tomcat 5.5)> + +Add WORKSPACE_ROOT classpath variable + + [[1]] Choose "Window" \>\> "Preferences..." + + [[2]] "General" \>\> "Workspace" \>\> "Linked Resources" \>\> "New..." + + [[3]] Fill in "New Variable" dialog: + + * Name: <<<WORKSPACE_ROOT>>> + + * Location: <(path to your workspace)> + + +Checking out source from SVN + + [[1]] Choose "File" \>\> "New" \>\> "Project..." + + [[2]] Choose "SVN" \>\> "Checkout Projects from SVN" + + [[3]] Choose "Create a new repository location", then "Next" + + [[4]] Fill in SVN repository Url: + + * Url: <<<https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback>>> + + [[5]] select top directory, and click "Finish" + + [[6]] wait for project to checkout and workspace to be rebuilt + +Running and Debugging webapp on local Tomcat server: + + [[1]] Choose "File" \>\> "Import..." + + [[2]] Choose "General" \>\> "Existing Projects into Workspace", then "Next" + + [[3]] Choose "Select root directory" then "Browse..." + + [[4]] under <<<wayback>>> choose <<<wayback-webapp>>> directory, and click "OK" + + [[5]] "Finish" + +Configuring wayback-webapp to run on the Apache Tomcat 5.5 server + + [[1]] On the Servers tab, right-click then choose "Add and Remove Projects..." + + [[2]] add <<<wayback-webapp>>>, click "Finish" + + [[3]] exit and restart Eclipse + + [[4]] start server + +Accessing ARC file content: + + [[1]] place arc.gz files in <</tmp/wayback/arcs/>> <(or whatever directory you've changed the store:arcDir property to)> + \ No newline at end of file This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-02-06 02:00:38
|
Revision: 2175 http://archive-access.svn.sourceforge.net/archive-access/?rev=2175&view=rev Author: bradtofel Date: 2008-02-05 18:00:42 -0800 (Tue, 05 Feb 2008) Log Message: ----------- DOCS: updated with new 1.2.0 features, configuration, added release_notes.html Modified Paths: -------------- trunk/archive-access/projects/wayback/dist/src/site/site.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/index.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml Added Paths: ----------- trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml Modified: trunk/archive-access/projects/wayback/dist/src/site/site.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/site.xml 2008-02-06 01:13:42 UTC (rev 2174) +++ trunk/archive-access/projects/wayback/dist/src/site/site.xml 2008-02-06 02:00:42 UTC (rev 2175) @@ -31,6 +31,7 @@ <item name="User Manual" href="user_manual.html"/> <item name="Administrator Manual" href="administrator_manual.html"/> <item name="Developer Manual" href="developer_manual.html"/> + <item name="Release Notes" href="release_notes.html"/> <item name="FAQ" href="/faq.html"/> <item name="API" href="./apidocs"/> <item name="Browse/Submit a Bug" Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2008-02-06 01:13:42 UTC (rev 2174) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2008-02-06 02:00:42 UTC (rev 2175) @@ -127,34 +127,40 @@ <section name="org.archive.wayback.ResourceStore implementations"> - <subsection name="LocalARCResourceStore"> + <subsection name="LocalResourceStore"> <p> This implementation works well for small - collections, where all the ARC files can be placed in a single + collections, where all the ARC/WARC files can be placed in a single directory on the same computer running the wayback application. Using NFS or another network filesystem technology and symbolic - links can allow this implementation to deal with ARC files in + links can allow this implementation to deal with files in multiple directories, or across multiple storage nodes. This implementation also includes the capability to run a background - thread to automatically notice new ARC files appearing, index - those ARC files, and hand off the index data for merging with + thread to automatically notice new ARC/WARC files appearing, index + those files, and hand off the index data for merging with a BDBResourceIndex. </p> <p> - The XML configuration template for a LocalARCResourceStore follows: + The XML configuration template for a LocalResourceStore follows: <pre> -<property name="resourceStore"> - <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" - init-method="init"> - <property name="arcDir" value="/tmp/wayback/arcs/" /> - <property name="queuedDir" value="/tmp/wayback/arc-indexer/queued" /> - <property name="workDir" value="/tmp/wayback/arc-indexer/work" /> - <property name="runInterval" value="10000" /> - <property name="indexClient"> - <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> - <property name="tmpDir" value="/tmp/wayback/arc-indexer/tmp" /> - <property name="target" value="/tmp/wayback/index-data/incoming" /> +<property name="resourceStore"> + <bean class="org.archive.wayback.resourcestore.LocalResourceStore" + init-method="init"> + + <property name="dataDir" value="/tmp/wayback/arcs/" /> + + <property name="indexThread"> + <bean class="org.archive.wayback.resourcestore.AutoIndexThread"> + <property name="queuedDir" value="/tmp/wayback/arc-indexer/queued" /> + <property name="workDir" value="/tmp/wayback/arc-indexer/work" /> + <property name="runInterval" value="10000" /> + <property name="indexClient"> + <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> + <property name="tmpDir" value="/tmp/wayback/arc-indexer/tmp" /> + <property name="target" value="/tmp/wayback/index-data/incoming" /> + </bean> + </property> </bean> </property> </bean> @@ -167,7 +173,7 @@ <ul> <li> <b> - arcDir + dataDir </b> is the local directory where ARC files will be located. @@ -175,7 +181,8 @@ </ul> </p> <p> - Optional configuration (only needed for automatic indexing) + Optional configuration (only needed if the indexThread property-bean + is specified, for automatic indexing) <ul> <li> <b> @@ -226,19 +233,19 @@ </subsection> - <subsection name="HttpARCResourceStore"> + <subsection name="Http11ResourceStore"> <p> - This implementation allows the wayback - application to access documents in remote ARC files via HTTP 1.1, - and scales to millions of ARC files. + This implementation allows the wayback application to access + documents in remote ARC/WARC files via HTTP 1.1, and scales to + millions of ARC/WARC files. </p> <p> - The XML configuration template for an HttpARCResourceStore follows: + The XML configuration template for an Http11ResourceStore follows: <pre> -<property name="resourceStore"> - <bean class="org.archive.wayback.resourcestore.HttpARCResourceStore"> - <property name="urlPrefix" value="http://localhost:8080/arcproxy/" /> +<property name="resourceStore"> + <bean class="org.archive.wayback.resourcestore.Http11ResourceStore"> + <property name="urlPrefix" value="http://localhost:8080/arcproxy/" /> </bean> </property> @@ -251,8 +258,8 @@ <b> urlPrefix </b> - this is the http:// prefix where ARC files are exported with an - ArcProxy installation. See elsewhere in this document for + this is the http:// prefix where ARC/WARC files are exported with + an ArcProxy installation. See elsewhere in this document for information about setting up an ArcProxy. </li> </ul> @@ -346,10 +353,11 @@ This implementation is good for larger scale installations, bounded mostly by the size of the index you can (first create, and later) store on a single machine. Using the command line tool - <b>arc-indexer</b>, and the standard UNIX <b>sort</b> tool - (see note below on LC_ALL), you create a sorted flat text file - that is searched on each request. Building these sorted files, - and updating the index are manual operations presently. + <b>arc-indexer</b> or <b>warc-indexer</b>, and the standard UNIX + <b>sort</b> tool (see note below on LC_ALL), you create a sorted + flat text file that is searched on each request. Building these + sorted files, and updating the index are manual operations + presently. <pre> <bean id="cdxsearchresultsource" class="org.archive.wayback.resourceindex.cdx.CDXIndex"> @@ -460,8 +468,8 @@ also provide different levels of access to a collection. For example, users within a particular subnet may be able to access all documents within a collection via one AccessPoint, but users outside that subnet - may only be restricted to viewing documents currently allowed by a - web sites current robots.txt file. + may be restricted to viewing documents allowed by a web sites current + robots.txt file. </p> <p> The XML configuration template for an AccessPoint follows: @@ -735,23 +743,27 @@ HTML documents returned in Archival URL Replay mode are modified from the original version to provide a replay experience more consistent to viewing the original - content. This is accomplished by the insertion of - Javascript, which executes in the client browser after - the page has loaded. This Javascript modifies most URLs - within the HTML page, both Anchors (links) as well as - embedded content (images, applets, etc) so that they - become appropriate Archival URL requests back to the Wayback - application. + content. This is accomplished by one of two methods. The first + includes modification of a subset of the HTML tags on the server, + combined with the insertion of JavaScript into the HTML page. This + JavaScript executes in the client browser after the page has loaded, + and modifies the remaining URLs within the HTML page, both + Anchors (links) as well as embedded content (images, applets, etc) + so that they become appropriate Archival URL requests back to the + Wayback application. The second method involves rewriting all HTML + tags within the page on the server, to make embedded URLs point back + into the Wayback application. </div> <br></br> <div> - This Javascript is imperfect: sometimes requests - "leak" to the live web temporarily, before the - Javascript has executed. Also, not all URLs are - rewritten correctly, especially URLs that are created - by Javascript that was in the original page, and - specialized file types containing links like Flash and - PDF documents. + There is a trade-off between these two approaches. The entirely + server-side rewriting requires more server resources, and is less + tested than the JavaScript method. The JavaScript is also imperfect: + sometimes requests "leak" to the live web temporarily, before the + Javascript has executed. With both methods, not all URLs are + rewritten correctly, especially URLs that are created by JavaScript + that was in the original page, and specialized file types containing + links like Flash and PDF documents. </div> <br></br> <div> @@ -854,13 +866,11 @@ <property name="replay"> <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> - <property name="jsInserts"> - <list> - <value>http://wayback.somehost.org:8080/wb-webapp/wm.js</value> - </list> - </property> + <property name="serverSideRendering" value="false" /> <property name="jspInserts"> <list> + <value>/replay/ArchiveComment.jsp</value> + <value>/replay/ClientSideJSInsert.jsp</value> <value>/replay/Timeline.jsp</value> </list> </property> @@ -897,16 +907,20 @@ </tr> <tr> <td> - jsInserts + serverSideRendering </td> <td> required </td> <td> - This list must include a reference to the wm.js javascript file, - but references to additional javascript files here will result in - a reference to those javascript URLs within all replayed HTML - pages. + When set to true, all URL rewriting occurs on the server, + eliminating the need for client side Javascript rewriting. If this + option is set to false, then the <i>ClientSideJSInsert.jsp</i> + <b>jspInsert</b> should be used. If this option is true, and + you're attempting to set up an entirely JavaScript free + installation which includes an embedded Timeline in replayed + HTML documents, you can use the <i>JSLessTimeline.jsp</i> + <b>jspInsert</b>. </td> </tr> <tr> @@ -917,12 +931,27 @@ optional </td> <td> - If any values are referenced here, then those .jsp files will be + If any values are included here, then those .jsp files will be invoked for every replayed document, and the resulting output will be included in replayed HTML pages. The example included - here will result in a Timeline banner in-page presence being - included with each replayed HTML page, allowing navigation - between different versions of the current URL. + here will result in: + <ul> + <li> + An HTML comment embedded inside replayed web pages indicating + the dates the document was captured and the date it was served + by wayback. + </li> + <li> + A reference to a javascript file, client-rewrite.js, which + will attempt to modify URLs within the users browser to make + them direct back into wayback. + </li> + <li> + A timeline banner embedded in the top of HTML pages that + allows navigation between other versions of the currently + viewed document. + </li> + </ul> </td> </tr> <tr> @@ -962,6 +991,12 @@ </td> </tr> </table> + <p> + Note that the old <b>jsInserts</b> configuration has been deprecated, + in favor of including references to JavaScript files using jspInserts. + Also note that the use of the ClientSideJSInsert.jsp is required when + serverSideRendering is set to false. + </p> </subsection> <subsection name="Proxy"> @@ -973,8 +1008,6 @@ documents from the live web, the Wayback Machine will retrieve documents from the local repository of ARC files. </p> - <br></br> - <br></br> <p> Proxy Replay mode does not suffer from the shortcomings of the inserted Javascript that the Archival URL mode uses, @@ -984,8 +1017,6 @@ client browser to the Wayback Machine - no date information is sent with the request. </p> - <br></br> - <br></br> <p> In Proxy Replay mode, the Wayback Machine will return the most recent version captured of any requested page. This @@ -996,15 +1027,10 @@ here </a>. </p> - <br></br> - <br></br> <p> Thanks Oskar! </p> - - <br></br> - <br></br> - <div> + <p> The following is an example Proxy Replay Access Point definition. It assumes to be running on a host <b>wayback.somehost.org</b>, that a Tomcat Connector has been added for port <b>8090</b>, @@ -1036,16 +1062,14 @@ </bean> </pre> - </div> - <br></br> - <br></br> - <div> + </p> + <p> <b>redirectURI</b> is required, and must be set to the name of the host where the Wayback application is running. If this is not the primary name of the machine running the Wayback application, then you may need to also specify the hostname used for the Wayback application in the <b>localhostNames</b> configuration list. - </div> + </p> </subsection> </section> @@ -1181,7 +1205,7 @@ <pre> -UIResults results = UIResults.getFromRequest(request); +UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); String instString = results.getContextConfig("inst"); String logoString = results.getContextConfig("logo"); @@ -1199,15 +1223,8 @@ <p> All the command line tools can be found which can be found underneath the directory where you unpacked your distribution - at:<b>bin/*</b> (example: <i>bin/location-client</i>). You will - need to change permissions on the tools to allow them to be - executed: + at:<b>bin/*</b> (example: <i>bin/location-client</i>). </p> - <p> - <code> - chmod a+x bin/* - </code> - </p> <subsection name="bdb-client"> <p> @@ -1219,10 +1236,10 @@ <code> bin/bdb-client -r BDB_DIR BDB_NAME [PREFIX] </code> - <p> + <div> Output records from a BDB database on STDOUT. - </p> - <p> + </div> + <div> where: <ul> <li> @@ -1241,17 +1258,17 @@ order. </li> </ul> - </p> + </div> </li> <li> <code> bin/bdb-client -w BDB_DIR BDB_NAME </code> - <p> + <div> Read CDX format lines from STDIN, and insert into a BDB, creating the BDB if needed. - </p> - <p> + </div> + <div> where: <ul> <li> @@ -1262,7 +1279,7 @@ <i>BDB_NAME</i> Open BDB with this name. </li> </ul> - </p> + </div> </li> </ol> </p> @@ -1284,27 +1301,35 @@ output. </li> <li> - <i>FILE [FILE2 ...]</i> Sequentially search through - each file specified, outputting the lines prefixed - with KEY for each file. Note that the complete - output of bin-search will be sorted when used with - a single file, but when multiple files are searched, - the results may not be sorted completely. + <i>FILE [FILE2 ...]</i> Search through all files specified, + outputting the lines prefixed with KEY from each file in a single, + sorted stream. This assumes that all FILE arguments are sorted. </li> </ul> </p> </subsection> - <subsection name="arc-indexer"> + <subsection name="arc-indexer|warc-indexer"> <p> - This tool creates a CDX format index for the ARC file at ARC_PATH, - either on STDOUT, or at the path specified by CDX_PATH. The resulting - file can be sorted and merged with other CDX format index files to - generate CDX format ResourceIndex. - <code> - bin/arc-indexer ARC_PATH [CDX_PATH] - </code> + These tools create a CDX format index for the ARC/WARC file at + PATH, either on STDOUT, or at the path specified by CDX_PATH. The + resulting file can be sorted and merged with other CDX format index + files to generate CDX format ResourceIndex. </p> + <pre> + bin/arc-indexer [-identity] PATH [CDX_PATH] + bin/warc-indexer [-identity] PATH [CDX_PATH] + </pre> + <p> + Note that when manually constructing CDX files using these tools, you + <b>must</b> set the environment variable <b>LC_ALL=C</b> when using + the standard UNIX <b>sort</b> command line tool. + </p> + <p> + The <b>-identity</b> option causes the tools to skip canonicalization + of URLs. See the documentation for the <b>url-client</b> tool, and + the URL Canonicalization section below for more information. + </p> </subsection> <subsection name="location-client"> @@ -1367,7 +1392,7 @@ <subsection name="url-client"> <p> URLs stored in BDB and CDX format ResourceIndexes are - <i>canonicalized</i> to a more genertic form. Before + <i>canonicalized</i> to a more generic form. Before performing a lookup operation on the ResourceIndex, the same canonicalization function is applied to requested URLs. This tool will read space(" ") delimited lines from STDIN, and @@ -1380,20 +1405,25 @@ This tool is mostly useful for debugging the canonicalization function, but can also be used, if the canonicalization function is altered, to update an existing - CDX index, without recreating CDX files from original ARCs. + CDX index, without recreating CDX files from original ARCs. See the + seciond URL Canonicalization for more information. </p> <p> <code> - bin/url-client [-cdx] [-f FIELD] + bin/url-client [-cdx] [-d DELIMITER] [-f FIELD] [-f FIELD2] ... </code> <ul> <li> - <i>-cdx</i> Pass thru lines prefixed with " CDX " - unchanged. + <i>-cdx</i> Pass thru lines prefixed with " CDX " unchanged. </li> <li> + <i>-d DELIMITER</i> Use DELIMITER as to separate fields instead + of default Space(" "). + </li> + <li> <i>-f FIELD</i> alter column FIELD of each line, - instead of the default column 1. + instead of the default column 1. If specified multiple times, then + each column will be canonicalized in transformed lines. </li> </ul> </p> @@ -1455,6 +1485,165 @@ </pre> </section> + <section name="URL Canonicalization"> + <subsection name="Introduction and Concepts"> + <p> + Sometimes URLs found in the field can have multiple forms, for + example: + <pre> + http://www.example.com/img/foo.gif + http://www.example.com/docs/../img/foo.gif + </pre> + are both valid representations of the exact same URL. Another, less + certain example would be: + <pre> + http://www.example.com/Interview.html + http://www.example.com/interview.html + </pre> + which differ only in the capitalization of the letter "i". On some + operating systems, these two URLs legitimately specify two distinct + documents. On Windows platforms, they refer to the same document. If + the document on a web server is actually named "Interview.html", but + a web designer creates a web page that refers to this document using + the lowercase "interview.html", then the link will work, and they and + the web site visitors may never notice the difference. The same + situation on a different operating system would probably not work + (although some web server plugins and modules will also correct this + problem transparently) and the web designer would probably notice and + correct the problem. In practice, we have found that it is very rare + for the two URLs above with different capitalization to refer to + different documents, and they can be treated as equivalent in most + situations. + </p> + <p> + Another example, which occurs far more often in the real world, + involves web servers injecting a session ID inside paths to documents + hosted on that web server. These session IDs allow the web server to + track individual user's states. Here are some example URLs + demonstrating path session ID injection: + <pre> + http://www.example.com/(S(4hqa0555fwsecu455xqckv45))/page1.aspx + http://www.example.com/(S(4hqa0555fwsecu455xqckv45))/page2.aspx + http://www.example.com/(S(a63098d96360a63098d96360))/page3.aspx + </pre> + In these examples, the first two URLs are using one session ID, and + the third uses a different session ID. If <b>page3.aspx</b> refers to + <b>page1.aspx</b> using an anchor like this: + <pre> + <a href="page1.aspx">page1</a> + </pre> + and a user visiting <b>page3.aspx</b> clicks the link to page1, then + the wayback will recieve a request for the URL: + <pre> + http://www.example.com/(S(a63098d96360a63098d96360))/page1.aspx + </pre> + If page1.aspx was captured using the different session ID, then the + wayback will be unable to locate this document in the index, even + though it was captured. + </p> + <p> + This session ID problem can be mitigated by <i>canonicalizing</i> the + URLs as they are placed in the index, so the index would contain the + following URLs, instead of the original form, which the crawler + captured: + <pre> + http://www.example.com/page1.aspx + http://www.example.com/page2.aspx + http://www.example.com/page3.aspx + </pre> + If the same canonicalization scheme is used to transform incoming + requests, before attempting to lookup URLs in the index, then the + software is able to locate and return the documents correctly. + </p> + </subsection> + <subsection name="Current Status within Wayback"> + <p> + Currently the Wayback includes only a single reference implementation + of a canonicalization scheme, which is currently called + <b>AggressiveUrlCanonicalizer</b>. This implementation provides the + following canonicalization: + <ul> + <li> + <b>www# removal</b> + http://www.example.com => example.com, + http://www13.example.com => example.com + </li> + <li> + <b>user info removal</b> + http://us...@ex... => example.com, + http://user:pas...@ex... => example.com, + </li> + <li> + <b>session ID removal</b> + http://www.example.com/(S(a63098d96360a63098d96360))/page1.aspx + => + example.com/page1.aspx + <br></br> + <i>(and other common session ID path injection schemes)</i> + </li> + <li> + <b>path and CGI argument lowercasing</b> + http://www.example.com/Interviews.cgi?Interview=Left + => + example.com/interviews.cgi?interview=left + </li> + <li> + <b>extra query argument delimiter removal</b> + http://www.example.com/Interviews.cgi?Interview=Left& + => + example.com/interviews.cgi?interview=left + </li> + <li> + <b>unneeded query specifier removal</b> + http://www.example.com/Interviews.cgi? + => + example.com/interviews.cgi + </li> + </ul> + These heuristics generally lead to correcting many common URL lookup + problems, but in some cases, these operation do the wrong thing, + typically by making content which is actually different appear to be + the same thing. + </p> + <p> + At the IA, we have recently switched to building CDX files using the + <b>-identity</b> option on the <b>arc-indexer</b> and + <b>warc-indexer</b> tools, and have added an additional step in our + CDX creation processes which uses the <b>url-client</b> tool before + sorting and merging CDX files. By keeping the original "identity" CDX + files, we have been able to test various URL canonicalization + strategies without the overhead of re-processing all the source + materials. + </p> + </subsection> + <subsection name="Future Directions within Wayback"> + <p> + In upcoming wayback releases, we intend to provide more + canonicalization implementations, including a configurable + implementation that will allow broad customization capabilities. + </p> + <p> + We also intend to alter the format of wayback indexes significantly. + Using this new format will be optional, but once indexes are created + in the new format is created, other indexes with different + canonicalization strategies can be built from them without requiring + a complete reindex of the original ARC/WARC content. + </p> + <p> + The new format will also allow a degree of dynamic canonicalization + at run-time, meaning different strategies can be tested using the + same indexes, and site-specific canonicalization strategies may be + possible. + </p> + <p> + We believe that allowing (advanced) users to easily change between + canonicalization strategies within the same wayback session will + promote better community understanding of the impacts of different + strategies, and will enable the community to build a set of best + practices for URL canonicalization. + </p> + </subsection> + </section> </body> </document> Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/index.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/index.xml 2008-02-06 01:13:42 UTC (rev 2174) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/index.xml 2008-02-06 02:00:42 UTC (rev 2175) @@ -8,7 +8,86 @@ </properties> <body> + <section name="Introduction"> + <p><b>wayback</b> is an open source java implementation of the + <a href="http://www.archive.org/web/web.php">The Internet Archive + Wayback Machine</a>. + </p> + <p> + The current production version of the Wayback Machine is implemented in + perl, and lacks in maintainability and extensibility. Also, the code is + not open source. Primary motivation for the new version is to address + these three issues, enabling public distribution of the application, and + easy experimentation with new features and access technologies. + </p> + <p> + The current Java version of the Wayback Machine supports three access, + or Replay modes of operation: "Archival Url" mode "Proxy" mode, and + "Domain Prefix" mode. + </p> + <p> + Archival URL mode provides a user experience very close to the current + production Wayback Machine. All query and replay access requests can be + expressed as URLs. In Archival Url replay mode, archived content is + modified as it is returned to users, attempting to make links and + embedded content refer back to the Wayback Machine by rewriting them as + Archival URLs. + </p> + <p> + Proxy URL mode allows replaying of archived documents within a client + browser by configuring the browser to proxy all HTTP requests through + the Wayback Machine. This has the strong advantage that no Javascript + or server side page markup is required to coerce the client browser to + request additional URLs and embedded content from the Wayback Machine + -- content just works as-is. When used with the Firefox plugin + extension, available + <a href="http://archive-access.sourceforge.net/projects/waxtoolbar/"> + here + </a>, client browsers can navigate between versions of the current + document, and the Wayback Machine server will attempt to display images + from the same time period as pages being viewed. The Proxy URL mode + requires special configuration of the client web browser to access the + Wayback Service. This browser configuration is not complex, but it + means that content cannot be accessed as a global URL. + </p> + <p>See the <a href="administrator_manual.html">Administrator Manual</a> + to learn more about access modes. + </p> + <p> + The current Java version can operate in several deployment modes, + ranging from a stand alone application on a single host holding all + archived documents and indexes, up to a highly distributed system where + indexes and archived content is spread across hundreds of machines. + </p> + <p> + In the local, standalone mode, this software includes the capability to + scan for new archived content in a specified location, and to + automatically index and serve the new content as it appears. Directing + the Wayback to look for ARC files in the directory where an instance of + the Heritrix web crawler is writing ARC output should provide the + capability to browse content archived by Heritrix as it is crawled. + </p> + </section> <section name="News"> + <subsection name="New Release - 1.2.0, 1/30/2008"> + <p> + Release 1.2.0 has several new features, as well as several + bug-fixes. Wayback now supports compressed and uncompressed + ARC and WARC formats. Previously there was only support for + compressed ARC files. This version also includes a new Archival URL + replay mechanism, where all URL rewriting occurs on the server, + obviating the need for client-side Javascript, and preventing + some request leakage. This version also includes the capability to + replace the default URL canonicalization scheme(currently there is + still only one implementation available, but the groundwork for + using different schemes is now in place.) This version also + includes support for de-duplicated WARC records. + </p> + <p> + Please see the <a href="release_notes.html">Release Notes</a> for + specific features and bug fixes. + </p> + </subsection> <subsection name="New Release - 1.0.0, 10/12/2007"> <p> Release 1.0.0 has several significant changes, most notably a @@ -124,83 +203,10 @@ </p> </subsection> <subsection name="First Release - 0.2.0, 12/09/2005"> - <p>First public release of the open source wayback. - See below in the <a href="#Introduction">Introduction</a> - section for a listing of initial features. + <p> + First public release of the open source wayback. </p> </subsection> </section> - <section name="Introduction"> - <p><b>wayback</b> is an open source java implementation of the - <a href="http://www.archive.org/web/web.php">The Internet Archive - Wayback Machine</a>. - </p> - <p> - The current production version of the Wayback Machine is implemented in - perl, and lacks in maintainability and extensibility. Also, the code is - not open source. Primary motivation for the new version is to address - these three issues, enabling public distribution of the application, and - easy experimentation with new features and access technologies. - </p> - <p> - The current Java version of the Wayback Machine supports two access, or - replay modes of operation: "Archival Url" mode and "Proxy" mode. - </p> - <p> - Archival URL mode provides a user experience very close to the current - production Wayback Machine. All query and replay access requests can be - expressed as URLs. In Archival Url replay mode, HTML documents are - delivered with additional Javascript embedded in the page. This - Javascript alters the document within the browser, attempting to make - links and embedded content refer back to the Wayback Machine by - rewriting them as Archival URLs. - </p> - <p> - Proxy URL mode allows replaying of archived documents within a client - browser by configuring the browser to proxy all HTTP requests through - the Wayback Machine. This has the strong advantage that no Javascript - page markup is required to coerce the client browser to request - additional URLs and embedded content from the Wayback Machine -- content - just works as-is. When used with the Firefox plugin extension, available - <a href="http://archive-access.sourceforge.net/projects/waxtoolbar/"> - here - </a>, client browsers can navigate between versions of the current - document, and the Wayback Machine server will attempt to display images - from the same time period as pages being viewed. The Proxy URL mode - requires special configuration of the client web browser to access the - Wayback Service. This browser configuration is not complex, but it - means that content cannot be accessed as a global URL. - </p> - <p> - Timeline Mode allows for navigation between different dates collected - of the current page, similar to the WERA application, using framesets. - </p> - <p>See the <a href="user_manual.html">User Manual</a> to learn more - about access modes. - </p> - <p> - The current Java version is intended to operate as a standalone webapp, - maintaining an index on the machine hosting the webapp. This index - contains records of the resources within a set of ARC files, which are - also assumed to be stored on the same machine hosting the webapp. - </p> - <p> - This software includes the capability to scan for ARC files in a - specified location, and to automatically index and serve content in - newly discovered ARC files as they appear. Directing the Wayback - Machine to look for ARC files in the directory where an instance of the - Heritrix web crawler is writing ARC output should provide the - capability to browse content archived by Heritrix as it is crawled. - </p> - <p> - The 0.4.0 version includes the capability to retrieve documents from ARC - files stored on remote hosts using HTTP 1.1. Please see the User Manual - for more information about using this and other new features. - </p> - <p> - Future versions of this software may integrate more tightly with the - Heritrix web crawler application. - </p> - </section> </body> </document> Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml 2008-02-06 01:13:42 UTC (rev 2174) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/navigation.xml 2008-02-06 02:00:42 UTC (rev 2175) @@ -5,7 +5,7 @@ <properties> <title>Wayback</title> <author email="brad at archive dot org">Brad Tofel</author> - <revision>$Id$</revision> + <revision>$Id:navigation.xml 2009 2007-09-28 00:09:04Z bradtofel $</revision> </properties> <body> @@ -14,6 +14,7 @@ <item name="Requirements" href="requirements.html"/> <item name="Downloads" href="downloads.html"/> <item name="User Manual" href="user_manual.html"/> + <item name="Release Notes" href="release_notes.html"/> <item name="Test" href="test.html"/> <item name="FAQ" href="/faq.html"/> <item name="API" href="./apidocs"/> Added: trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml (rev 0) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml 2008-02-06 02:00:42 UTC (rev 2175) @@ -0,0 +1,123 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> + +<document> + <properties> + <title>Release Notes</title> + <author email="brad at archive dot org">Brad Tofel</author> + <revision>$Id: index.xml 2040 2007-10-12 23:21:40Z bradtofel $</revision> + </properties> + + <body> + <section name="Releases"> + <p> + Full listing of changes and bug fixes are not currently available prior + to release 1.2.0. + </p> + </section> + <section name="Release 1.2.0"> + <subsection name="Features"> + <ul> + <li> + now supports compressed and uncompressed ARC and WARC files. + </li> + <li> + initial revision of "deduplicated" WARC record handling, which + returns the last version that was actually stored when + subsequent captures are not saved because they have not changed. + </li> + <li> + now filters (literal) duplicate records from the ResourceIndex, + in case the same capture (url + date) appears twice, or in two + CDX files. + </li> + <li> + UrlCanonicalizer is now pluggable, current functionality is now + implemented in AggressiveUrlCanonicalizer. Added + IdentityUrlCanonicalizer, which performs no canonicalization. + </li> + <li> + <b>bin-search</b> command line tool now outputs a single stream of + sorted results from multiple files, instead of returning matches + from each file sequentially. + </li> + <li> + extracted several replay features into separately jspInserts that + can now be mixed and matched. + </li> + <li> + now handles most text/css URL rewriting, both inside HTML pages, + and in externally linked .css files. + </li> + <li> + externalized commented embedded inside replayed HTML pages into + jspInsert: ArchiveComment.jsp. + </li> + <li> + non-javascript Archival URL replay mode, where all URL rewriting + occurs on the server. This includes a non-javascript + Timeline jspInsert. + </li> + <li> + added two-month timeline partition. + </li> + <li> + root page of webapp now lists access points, when users make + a request that does not specify one. Also, now access point + "slash-pages" are available "without the slash". + </li> + </ul> + </subsection> + <subsection name="Bug Fixes"> + <ul> + <li> + Now rewrite Location and Content-Base HTTP headers in non-HTML + Archival URL replayed documents. + </li> + <li> + now rewrites all <b>background</b> attributes found in returned + pages (archival URL mode only) instead of just on BODY tags. + </li> + <li> + now rewrites <b>src</b> attributes on INPUT tags. + </li> + <li> + command line tools now allow whitespace arguments, important for + tools accepting delimter arguments. + </li> + <li> + replay URLs in query results now include non-standard ports, if + needed. + </li> + <li> + Timezone is now explicitly set to GMT/UTC, fixing a Calendar + result partioning problem. + </li> + <li> + uncaught character-encoding exceptions now handled, plus + slightly improved detection of correct character encoding by + removing internal whitespace in declared encoding names. + </li> + <li> + archival URL parsing of query end-date now assumes latest + possible date given a partial end-date, instead of earliest + possible date. + </li> + <li> + re-implemented lost "closest" indicator for XML results. + </li> + <li> + now supports multiple auto index threads, one per ResourceStore, + and also multiple auto index merge threads, one per BDB + ResourceIndex. + </li> + <li> + fixed hard-coded maximum year issue. + </li> + <li> + reimplemented NotInArchive logging, which was lost in 1.0.0. + </li> + </ul> + </subsection> + </section> + </body> +</document> \ No newline at end of file This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-04-17 20:53:19
|
Revision: 2254 http://archive-access.svn.sourceforge.net/archive-access/?rev=2254&view=rev Author: bradtofel Date: 2008-04-17 13:52:50 -0700 (Thu, 17 Apr 2008) Log Message: ----------- DOCS: explicit mention of LocalARCResourceStore => LocalResourceStore implementation class change. Updated bug tracking URL to webteam.archive.org/jira/ Added 1.2.1 release notes. Modified Paths: -------------- trunk/archive-access/projects/wayback/dist/src/site/site.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml Modified: trunk/archive-access/projects/wayback/dist/src/site/site.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/site.xml 2008-04-17 20:39:00 UTC (rev 2253) +++ trunk/archive-access/projects/wayback/dist/src/site/site.xml 2008-04-17 20:52:50 UTC (rev 2254) @@ -35,7 +35,7 @@ <item name="FAQ" href="/faq.html"/> <item name="API" href="./apidocs"/> <item name="Browse/Submit a Bug" - href="http://sourceforge.net/tracker/?group_id=118427&atid=681137"/> + href="http://webteam.archive.org/jira/secure/IssueNavigator.jspa?component=10031"/> </menu> <!--Its not possible to change the labels used in reports, not yet anyways. Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2008-04-17 20:39:00 UTC (rev 2253) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2008-04-17 20:52:50 UTC (rev 2254) @@ -230,6 +230,12 @@ </li> </ul> </p> + <p> + <b>Note:</b> upgrading from Wayback 1.0 to 1.2 requires changing + ResourceStore implementations from <b>LocalARCResourceStore</b> to + <b>LocalResourceStore</b>. <b>LocalARCResourceStore</b> is now + deprecated. + </p> </subsection> Modified: trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml 2008-04-17 20:39:00 UTC (rev 2253) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml 2008-04-17 20:52:50 UTC (rev 2254) @@ -11,9 +11,68 @@ <section name="Releases"> <p> Full listing of changes and bug fixes are not currently available prior - to release 1.2.0. + to release 1.2.1. </p> </section> + <section name="Release 1.2.1"> + <subsection name="Features"> + <ul> + <li> + Now explicitly sets the <b>charset</b> component of replayed HTML + page <b>Content-Type</b> HTTP headers in Archival URL mode. This + overrides Tomcat's default behavior of explicitly setting this value + to Tomcat's <b>default</b> encoding character set, if a document + does not set it explicitly. The original <b>Content-Type</b> HTTP + header value is now returned as HTTP header + <b>X-Wayback-Orig-Content-Type</b>. + </li> + </ul> + </subsection> + <subsection name="Bug Fixes"> + <ul> + <li> + added getter/setter for replay image, css, javascript, and html + error handling .jsps + </li> + <li> + now returns "closest" indicator on XML query results, fixing problem + with WAXToolbar/Proxy mode.(<i>ACC-11</i>) + </li> + <li> + <b>auto-indexer</b> now closes ARC/WARC files after indexing, fixing + out-of-filehandle problem(<i>ACC-12</i>) + </li> + <li> + <b>location-client</b> now syncs .warc and .warc.gz files with + locationDB, in addition to .arc and .arc.gz files.(<i>ACC-13</i>) + </li> + <li> + fixed problem which prevented captures archived after webapp was + deployed from being returned. Now captures up to the current moment + are returned. (<i>ACC-14</i>) + </li> + <li> + changed all .jsp files to return UTF-8(<i>ACC-18</i>) + </li> + <li> + now sending correct end Date to remote NutchWAX index. + (<i>ACC-20</i>) + </li> + <li> + fixed String OOB exception when attempting to rewrite some CSS text + (<i>ACC-17</i>) + </li> + <li> + now updates CSS "import 'URL';" and 'import "URL";' content. + Previously only updated "import url(URL);" content. + </li> + <li> + fixed Replay redirect loop when using RemoteResourceIndex + (<i>ACC-15</i>) + </li> + </ul> + </subsection> + </section> <section name="Release 1.2.0"> <subsection name="Features"> <ul> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |