From: <bra...@us...> - 2010-12-31 01:05:51
|
Revision: 3360 http://archive-access.svn.sourceforge.net/archive-access/?rev=3360&view=rev Author: bradtofel Date: 2010-12-31 01:05:43 +0000 (Fri, 31 Dec 2010) Log Message: ----------- MOVED: dist/src/site/ to src/site/. POM: updated arctifactId of parent pom to make urls and names come out right again Modified Paths: -------------- trunk/archive-access/projects/wayback/dist/pom.xml trunk/archive-access/projects/wayback/pom.xml trunk/archive-access/projects/wayback/wayback-core/pom.xml trunk/archive-access/projects/wayback/wayback-hadoop/pom.xml trunk/archive-access/projects/wayback/wayback-hadoop-java/pom.xml trunk/archive-access/projects/wayback/wayback-webapp/pom.xml Added Paths: ----------- trunk/archive-access/projects/wayback/src/site/ trunk/archive-access/projects/wayback/src/site/site.xml trunk/archive-access/projects/wayback/src/site/xdoc/administrator_manual.xml trunk/archive-access/projects/wayback/src/site/xdoc/downloads.xml trunk/archive-access/projects/wayback/src/site/xdoc/hadoop.xml trunk/archive-access/projects/wayback/src/site/xdoc/index.xml trunk/archive-access/projects/wayback/src/site/xdoc/release_notes.xml Removed Paths: ------------- trunk/archive-access/projects/wayback/dist/src/site/apt/ trunk/archive-access/projects/wayback/dist/src/site/articles/ trunk/archive-access/projects/wayback/dist/src/site/fml/ trunk/archive-access/projects/wayback/dist/src/site/resources/ trunk/archive-access/projects/wayback/dist/src/site/site.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/access_point_naming.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/downloads.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/hadoop.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/index.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/release_notes.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/requirements.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/resource_index.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/resource_store.xml trunk/archive-access/projects/wayback/dist/src/site/xdoc/user_manual.xml trunk/archive-access/projects/wayback/src/site/site.xml trunk/archive-access/projects/wayback/src/site/xdoc/administrator_manual.xml trunk/archive-access/projects/wayback/src/site/xdoc/downloads.xml trunk/archive-access/projects/wayback/src/site/xdoc/hadoop.xml trunk/archive-access/projects/wayback/src/site/xdoc/index.xml trunk/archive-access/projects/wayback/src/site/xdoc/navigation.xml trunk/archive-access/projects/wayback/src/site/xdoc/release_notes.xml Modified: trunk/archive-access/projects/wayback/dist/pom.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/pom.xml 2010-12-31 00:22:07 UTC (rev 3359) +++ trunk/archive-access/projects/wayback/dist/pom.xml 2010-12-31 01:05:43 UTC (rev 3360) @@ -5,7 +5,7 @@ <modelVersion>4.0.0</modelVersion> <parent> - <artifactId>parent</artifactId> + <artifactId>wayback</artifactId> <groupId>org.archive.wayback</groupId> <version>1.6.0</version> </parent> @@ -33,7 +33,7 @@ <descriptors> <descriptor>src/main/assembly/distribution.xml</descriptor> </descriptors> - <finalName>wayback-${project.version}</finalName> + <finalName>wayback</finalName> </configuration> <executions> <execution> @@ -45,15 +45,6 @@ </executions> </plugin> - <plugin> - <artifactId>maven-site-plugin</artifactId> - <version>2.1</version> - <configuration> - <inputencoding>utf-8</inputencoding> - <outputencoding>utf-8</outputencoding> - </configuration> - </plugin> - </plugins> </build> Deleted: trunk/archive-access/projects/wayback/dist/src/site/site.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/site.xml 2010-12-31 00:22:07 UTC (rev 3359) +++ trunk/archive-access/projects/wayback/dist/src/site/site.xml 2010-12-31 01:05:43 UTC (rev 3360) @@ -1,37 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<project name="Wayback"> - <bannerLeft> - <name>Wayback</name> - <src>images/ia_logo.gif</src> - <href>http://archive-access.sourceforge.net/projects/wayback/</href> - </bannerLeft> - <bannerRight> - <name>Wayback</name> - </bannerRight> - <body> - - <links> - <item name="Sourceforge" href="http://www.sourceforge.net"/> - <item name="Heritrix" href="http://crawler.archive.org"/> - <item name="NutchWAX" href="http://archive-access.sourceforge.net/projects/nutchwax/"/> - <item name="Archive Access" href="http://archive-access.sourceforge.net"/> - <item name="Internet Archive" href="http://www.archive.org"/> - <item name="Home" href="index.html"/> - </links> - - <menu name="Overview"> - <item name="Requirements" href="requirements.html"/> - <item name="Downloads" href="downloads.html"/> - <item name="Administrator Manual" href="administrator_manual.html"/> - <item name="Hadoop CDX Generation" href="hadoop.html"/> - <item name="Release Notes" href="release_notes.html"/> - <item name="FAQ" href="/faq.html"/> - <item name="API" href="./apidocs"/> - <item name="Browse/Submit a Bug" - href="https://webarchive.jira.com/browse/ACC/component/10031"/> - </menu> - - <menu ref="reports"/> - - </body> -</project> Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/access_point_naming.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/access_point_naming.xml 2010-12-31 00:22:07 UTC (rev 3359) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/access_point_naming.xml 2010-12-31 01:05:43 UTC (rev 3360) @@ -1,287 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<document> - <properties> - <title>Access Point Naming</title> - <author email="brad at archive dot org">Brad Tofel</author> - <revision>$$Id$$</revision> - </properties> - - <body> - - - - <section name="Overview"> - <p> - Tomcat (or other servlet containers) are configured to listen on one or - more ports, so each request received on one of those ports is targeted - to a particular webapp based on the name of the .war file deployed under - the <b>webapps/</b> directory. The targeted webapp is determined based on - the first directory in incoming requests. - </p> - <p> - If there are two webapps deployed under the <b>webapps/</b> directory, - called <b>webappA.war</b> and <b>webappB.war</b>, then an incoming - request <b>/webappA/file1</b> will be received by the webapp inside - <b>webappA.war</b> as the request <b>/file1</b>. An incoming request - for <b>webappB/images/foo.gif</b> will be received by the webapp inside - <b>webappB.war</b> as <b>/images/foo.gif</b>. - </p> - <p> - Tomcat (and other servlet containers) allow a special .war file to be - deployed under the <b>webapps/</b> directory called <b>ROOT.war</b> - which will receive requests not matching another webapp. If the above - example also included a webapp deployed under the <b>webapps/</b> - directory named <b>ROOT.war</b>, then requests starting with <b>webappA/</b> - will be received by <b>webappA.war</b>, requests starting with <b>webappB/</b> - will be received by <b>webappB.war</b>, and all other requests will be - receieved by the <b>ROOT.war</b> webapp. - </p> - <p> - If possible, deploying your webapp as <b>ROOT.war</b> will result in - somewhat cleaner public URLs, but this is not a requirement. The - examples below all include alternate URL configuration prefixes depending - on whether you deploy the Wayback .war file as either <b>ROOT.war</b> or - <b>wayback.war</b>. - </p> - <subsection name="AccessPoint Names"> - <p> - Each AccessPoint Spring XML bean definition must include a <b>name</b> - property: - <br></br> - <code> - -<bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> - ... -</bean> - - </code> - <br></br> - The <b>name</b> property indicates how requests <b>that are received - by the Wayback webapp</b> are routed to the appropriate AccessPoint. - Wayback allows targeting AccessPoints based on: - <ul> - <li>hostname</li> - <li>port</li> - <li>first path <b>after</b> the optional webapp deployment name - (which is empty if you deploy your Wayback webapp as - <b>ROOT.war</b>)</li> - </ul> - using the AccessPoint bean <b>name</b> field composed of <b>hostname</b>:<b>port</b>:<b>first_path</b>. - </p> - <p> - If you have configured DNS to resolve multiple hostnames to the same - computer, you can use the <b>hostname:</b> to control AccessPoint - resolving based on virtual hosts. - </p> - <p> - Port is the only required configuration component within the - AccessPoint <b>name</b> configuration. If you have multiple Tomcat - <b>Connector</b>s you can alter this AccessPoint name configuration to - target specific AccessPoints, otherwise, all your AccessPoint names - will have the same port, likely one of 8080, or 80. - </p> - <p> - A more commonly useful AccessPoint name resolving component is the - <b>first-path</b>, which allows you to easily expose multiple - collections within a single Wayback webapp deployment, without varying - hostnames, or ports (which often require network or system - administrator assistance). - </p> - </subsection> - <subsection name="Example AccessPoint names and URLs"> - <p> - The following table shows how urls will map to particular AccessPoints - assuming you have deployed the Wayback webapp as <b>ROOT.war</b>, on - a host with the name "access.example.org", using port 8080. - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>8080:collectionA</td> - <td>http://access.example.org:8080/collectionA/</td> - <td>http://access.example.org:8080/collectionA/*/http://archive.org/</td> - </tr> - <tr> - <td>8080:collectionB</td> - <td>http://access.example.org:8080/collectionB/</td> - <td>http://access.example.org:8080/collectionB/*/http://archive.org/</td> - </tr> - </table> - </p> - <p> - If you deployed your Wayback webapp with the name <b>wayback.war</b> - the following table shows how urls will map to particular - AccessPoints, on a host with the name "access.example.org", using port - 8080. - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>8080:collectionA</td> - <td>http://access.example.org:8080/wayback/collectionA/</td> - <td>http://access.example.org:8080/wayback/collectionA/*/http://archive.org/</td> - </tr> - <tr> - <td>8080:collectionB</td> - <td>http://access.example.org:8080/wayback/collectionB/</td> - <td>http://access.example.org:8080/wayback/collectionB/*/http://archive.org/</td> - </tr> - </table> - </p> - <p> - If you have configured multiple <b>Connector</b>s for your Tomcat - server, listening on both port <b>80</b>, and port <b>8080</b>, and - you deploy <b>ROOT.war</b> you can target different AccessPoints by - port, as shown below. These examples assume your servers hostname is - still "access.example.org". - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>80:collectionA</td> - <td>http://access.example.org/collectionA/</td> - <td>http://access.example.org/collectionA/*/http://archive.org/</td> - </tr> - <tr> - <td>8080:collectionB</td> - <td>http://access.example.org:8080/collectionB/</td> - <td>http://access.example.org:8080/collectionB/*/http://archive.org/</td> - </tr> - <tr> - <td>80:collectionC</td> - <td>http://access.example.org/collectionC/</td> - <td>http://access.example.org/collectionC/*/http://archive.org/</td> - </tr> - </table> - </p> - <p> - If you have a very limited number of AccessPoints to expose, you can - do away with the <b>first-path</b> component, to achieve potentially - very uncluttered Archival URLs. Assuming multiple <b>Connector</b>s - for your Tomcat server, listening on both port <b>80</b>, and port - <b>8080</b>, and you deploy <b>ROOT.war</b> you can target different - AccessPoints by port alone, as shown below. These examples still - assume your servers hostname is "access.example.org". - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>80</td> - <td>http://access.example.org/</td> - <td>http://access.example.org/*/http://archive.org/</td> - </tr> - <tr> - <td>8080</td> - <td>http://access.example.org:8080/</td> - <td>http://access.example.org:8080/*/http://archive.org/</td> - </tr> - </table> - </p> - <p> - Getting somewhat fancy, you can use virtual hosts, doing away with - non-standard ports, and use hostnames alone to specify AccessPoints. - This means getting your Tomcat to listen on port <b>80</b>, and - deploying the webapp as <b>ROOT.war</b>. You'd have to configure your - DNS so both "collection1.example.org" and "collection2.example.org" - point to the host running Wayback: - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>collection1.example.org:80</td> - <td>http://collection1.example.org/</td> - <td>http://collection1.example.org/*/http://archive.org/</td> - </tr> - <tr> - <td>collection2.example.org:80</td> - <td>http://collection2.example.org/</td> - <td>http://collection2.example.org/*/http://archive.org/</td> - </tr> - </table> - </p> - </subsection> - <subsection name="Getting really fancy"> - - <p> - Assuming you've deployed your webapp as <b>ROOT.war</b> and have Tomcat - listening on both port 80 and 8080, with the hostnames - "collection1.example.org" and "collection2.example.org" both - pointing to the host running wayback: - <table> - <tr> - <th>Access Point bean name</th> - <th>Archival URL prefix</th> - <th>Archival URL query example for <b>http://archive.org</b></th> - </tr> - <tr> - <td>collection1.example.org:80</td> - <td>http://collection1.example.org/</td> - <td>http://collection1.example.org/*/http://archive.org/</td> - </tr> - <tr> - <td>collection1.example.org:8080:subset1</td> - <td>http://collection1.example.org:8080/subset1/</td> - <td>http://collection1.example.org:8080/subset1/*/http://archive.org/</td> - </tr> - <tr> - <td>collection1.example.org:8080:subset2</td> - <td>http://collection1.example.org:8080/subset2/</td> - <td>http://collection1.example.org:8080/subset2/*/http://archive.org/</td> - </tr> - <tr> - <td>collection2.example.org:8080</td> - <td>http://collection1.example.org:8080/</td> - <td>http://collection1.example.org:8080/*/http://archive.org/</td> - </tr> - <tr> - <td>collection2.example.org:80:internal</td> - <td>http://collection2.example.org/internal/</td> - <td>http://collection2.example.org/internal/*/http://archive.org/</td> - </tr> - <tr> - <td>collection2.example.org:80:public</td> - <td>http://collection2.example.org/public/</td> - <td>http://collection2.example.org/public/*/http://archive.org/</td> - </tr> - </table> - </p> - </subsection> -<!-- - <subsection name="ArchivalURL Server-Relative URL rewriting"> - <p> - As hard as we've tried to make Server-side rewrite "do the right - thing" in ArchivalURL Replay mode, sometimes things don't work out - right. For example, if a page, <b>http://example.net/news/a.html</b> - contains some Javascript, that generates the following HTML with a - <b>document.write()</b> call: - <br></br> - <code> - -<img src="/foo.gif"></img> - </code> - <br></br> - And you were running an AccessPoint at <b>http://archive.org/web/</b>, - the then page would be expecting that URL to resolve as - <b>http://example.net/foo.gif</b>, but in fact, the page being - displayed as - </p> - <subsection> ---> - </section> - </body> -</document> \ No newline at end of file Deleted: trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml =================================================================== --- trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2010-12-31 00:22:07 UTC (rev 3359) +++ trunk/archive-access/projects/wayback/dist/src/site/xdoc/administrator_manual.xml 2010-12-31 01:05:43 UTC (rev 3360) @@ -1,1243 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<document> - <properties> - <title>Administrators Manual</title> - <author email="brad at archive dot org">Brad Tofel</author> - <revision>$$Id$$</revision> - </properties> - - <body> - - - - <section name="Requirements"> - - <subsection name="Third Party Packages"> - <p> - Please see the - <a href="requirements.html"> - System Requirements - </a> - . - </p> - </subsection> - - - <subsection name="Wayback Software"> - <p> - Please see the - <a href="downloads.html"> - Software Downloads page - </a> - . - </p> - </subsection> - - - </section> - - - - <section name="Installing"> - - - <subsection name="Installing Tomcat"> - <p> - Please refer to the README file included with your Tomcat distribution. - </p> - </subsection> - - - <subsection name="Installing Wayback"> - <p> - Once you have downloaded the .tar.gz file from - sourceforge, you will need to unpack the file to access the - webapp file, <b>wayback-webapp-1.6.0.war</b>. - </p> - <p> - Installation and configuration of this software involves the - following steps: - <ol> - <li> - Placing .war file in appropriate location. - </li> - <li> - Waiting for Tomcat to unpack the .war file. - </li> - <li> - Customizing base wayback.xml and possibly other XML configuration files. - </li> - <li> - Restarting tomcat. - </li> - </ol> - </p> - </subsection> - </section> - - - - <section name="Wayback Configuration Overview"> - <p> - The wayback software provides Query and Replay access to archived - documents. Query access allows users to locate particular documents - within the collection by URL and date. Replay access allows users to - view archived pages within their web browsers. Some Replay modes - require altering the original pages and resources, so embedded and - referenced content is also loaded from the Wayback service, and not - from the live web. - </p> - <p> - A WaybackCollection defines a set of archived documents and an index - which allows documents to be quickly located within the collection. A - WaybackCollection may be exposed to end users through one or more - AccessPoints, which define: - <ul> - <li>the WaybackCollection itself</li> - <li>the URL where users can access the collection</li> - <li>how query results are presented to users (the Query UI)</li> - <li>how documents are returned to users so they appear correctly in - their web browsers (the Replay UI)</li> - <li>the look and feel of the wayback user interface</li> - <li>who can access the documents in the collection</li> - <li>which documents from the collection are available</li> - </ul> - </p> - <p> - Wayback is configured using - <a href="http://static.springsource.org/spring/docs/2.5.x/reference/beans.html#beans-basics">Spring IOC</a>, - to specify and configure concrete implementations of several basic - modules. Please see the - <a href="http://static.springsource.org/spring/docs/2.5.x/reference/beans.html#beans-basics">Spring website</a> for more information on - configuring beans using Spring XML. - </p> - <subsection name="AccessPoint configuration options"> - <p> - An AccessPoint's configuration must specify the following - implementations: - <ul> - <li><a href="WaybackCollection_Configuration"><b>collection</b></a> - the specific WaybackCollection being exposed via this - AccessPoint. - </li> - <li><a href="Query_UI"><b>query</b></a> responsible for generating - user visible content(HTML, XML, etc) in response to user - Queries.</li> - <li><a href="Replay_Modes"><b>replay</b></a> responsible for - determining the appropriate ReplayRenderer implementation based - on the users request and the particular document to be - Replayed.</li> - <li><b>uriConverter</b> responsible for constructing Replay URLs - from records matching users queries. See Replay Modes below. - </li> - <li><b>parser</b> - responsible for translating incoming requests - into WaybackRequests. See Replay Modes below.</li> - </ul> - </p> - <p> - An AccessPoint's configuration may optionally specify the following, - but must specify at least one of replayPrefix, queryPrefix, or - staticPrefix: - <ul> - <li><a href="Exception_Rendering"><b>exception</b></a> - an - implementation responsible for generating error pages to users - </li> - <li> - <a href="Adding_Additional_Configurations_to_an_AccessPoint"> - <b>configs</b> - </a> - a Properties associating arbitrary key-value pairs which - are accessible to .jsp files responsible for generating the UI - </li> - <li> - <a href="Excluding_Documents_within_an_AccessPoint"> - <b>exclusionFactory</b> - </a> - an implementation specifying what documents should be - accessible within this AccessPoint - </li> - <li> - <a href="Restricting_who_can_interact_with_an_AccessPoint"> - <b>authentication</b> - </a> - an implementation specifying who is allowed to connect to - this AccessPoint - </li> - <li> - <b>replayPrefix</b> - a String URL prefix indicating the host, - port, and path to the correct Replay AccessPoint. If unspecified, - defaults to queryPrefix, then staticPrefix. - </li> - <li> - <b>queryPrefix</b> - a String URL prefix indicating the host, - port, and path to the correct Query AccessPoint. If unspecified, - defaults to staticPrefix, then replayPrefix. - </li> - <li> - <b>staticPrefix</b> - a String URL prefix indicating the host, - port, and path to static content used within the UI. If - unspecified, defaults to queryPrefix, then replayPrefix. - </li> - <li> - <b>livewebPrefix</b> - a String URL prefix indicating the host, - port, and path to an AccessPoint configured with Live Web fetching. - </li> - <li><b>locale</b> - A specific Locale to use for all requests - within this AccessPoint, overriding the users preferred Locale - as specified by their web browser. - </li> - <li> - <b>exactHostMatch</b> - true or false, if true, only returns - results exactly matching a given request hostname (case insensitive). - Default is false. - </li> - <li> - <b>exactSchemeMatch</b> - true of false, if true, only returns - results exactly matching a given request scheme. Default is true. - </li> - </ul> - </p> - <p> - AccessPoints can be used to provide different levels and types of - access to the same collection for different users. For example, you - can provide both Proxy and Archival URL mode access to a single - collection by defining 2 AccessPoints with different Replay User - Interfaces but the same WaybackCollection. Using AccessPoints, you can - also provide different levels of access to a collection. For example, - users within a particular subnet may be able to access all documents - within a collection via one AccessPoint, but users outside that subnet - may be restricted to viewing documents allowed by a web sites current - robots.txt file. - </p> - <p> - Please refer to - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/wayback.xml">wayback.xml</a> - within the wayback .war file for detailed example AccessPoint - configurations. - </p> - </subsection> - <subsection name="WaybackCollection Configuration"> - <p> - A WaybackCollection's configuration must specify the following - implementations: - <ul> - <li><a href="resource_store.html">resourceStore</a> the specific - implementation used to specific set of documents within this - collection, and how to access them for Replay requests.</li> - <li><a href="resource_index.html">resourceIndex</a> the specific - implementation responsible for locating documents within the - collection.</li> - </ul> - </p> - <p> - A WaybackCollection's configuration may optionally specify the - following: - <ul> - <li>shutdownables - an List of one or more beans implementing - org.archive.wayback.Shutdownable needed to maintain this - WaybackCollection, typically Daemon Threads which perform - automatic indexing operations on the resourceStore and the - resourceIndex.</li> - </ul> - </p> - <p> - For more information on WaybackCollection configuration options and - automatic indexing, please refer to the following documentation pages - and to the example Spring .xml configuration files within the wayback - .war: - <ul> - <li><a href="resource_store.html">ResourceStore configuration and - automatic indexing</a></li> - <li><a href="resource_index.html">ResourceIndex configuration</a></li> - <li><a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/BDBCollection.xml">BDBCollection.xml</a></li> - <li><a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/CDXCollection.xml">CDXCollection.xml</a></li> - <li><a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/RemoteCollection.xml">RemoteCollection.xml</a></li> -<!-- - <li><a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/NutchCollection.xml">NutchCollection.xml</a></li> ---> - </ul> - </p> - </subsection> - </section> - - <section name="Replay Modes"> - <p> - There are presently 3 Replay modes supported by the Wayback software, - Archival URL mode, Proxy mode, and an experimental DomainPrefix mode. - </p> - <subsection name="Archival URL Replay Mode"> - <p> - Archival URL Replay mode uses a modified URL to designate - documents stored in ARC/WARC files. The general form of an - Archival URL is: - <br></br> - <div> - <code> - http://HOSTNAME:PORT/CONTEXT/TIMESTAMP/URL - </code> - </div> - <br></br> - where - <ul> - <li> - <b>HOSTNAME</b> is the host where the Wayback software is - running. - </li> - <li> - <b>PORT</b> is the port where Tomcat is listening for - incoming HTTP requests, which also refers to part of the name of - the Access Point. See below for example CONTEXT mappings. - </li> - <li> - <b>CONTEXT</b> is an optional context where the Wayback webapp - has been deployed, plus an optional name of the Access Point - within the webapp. See below for example CONTEXT mappings. - </li> - <li> - <b>TIMESTAMP</b> is 0 to 14 digits of a date, possibly - followed by an asterisk ('*'), or one or more tags providing - further specifics for the request. The format of a - TIMESTAMP is: - <div> - <code> - YYYYMMDDHHmmss - </code> - </div> - where - <ul> - <li> - <b>YYYY</b> represents a 4-digit year - </li> - <li> - <b>MM</b> represents a 2-digit, 1-based month - (Jan = 1 - Dec = 12) - </li> - <li> - <b>DD</b> represents a 2-digit day of the month - (01-31) - </li> - <li> - <b>HH</b> represents a 2-digit hour (01-24) - </li> - <li> - <b>mm</b> represents a 2-digit minute (00-59) - </li> - <li> - <b>ss</b> represents a 2-digit second (00-59) - </li> - </ul> - The following are example dates expressed as - 14-digit Timestamps: - <br></br> - <div> - Jan 13, 1999 03:34:35 (am UTC) - 19990113033435 - </div> - <br></br> - <div> - Dec 31, 2004 23:01:00 (pm UTC) - 20041231230100 - </div> - <br></br> - <p> - Following the date portion of a timestamp, the following flags - can be appended: - <ul> - <li> - <b>id_</b> Identity - perform no alterations of the original - resource, return it as it was archived. - </li> - <li> - <b>js_</b> Javascript - return document marked up as javascript. - </li> - <li> - <b>cs_</b> CSS - return document marked up as CSS. - </li> - <li> - <b>im_</b> Image - return document as an image. - </li> - </ul> - </p> - </li> - <li> - <b>URL</b> represents the actual URL that should be - replayed. - </li> - </ul> - <br></br> - <div> - For some simple and more elaborate examples of how AccessPoint bean - names interact with Archival URLs, please refer to - <a href="access_point_naming.html">Access Point Naming</a>. - </div> - <br></br> - <div> - Archival URL mode allows replay of all versions captured - of a particular URL, by modifying the Timestamp. When an - Archival URL Replay request is received for a URL, the - Wayback Machine will replay the closest version in time - to the Timestamp requested of the particular URL. - </div> - <br></br> - <div> - HTML documents returned in Archival URL Replay mode are - modified from the original version to provide a replay - experience more consistent to viewing the original - content. This is accomplished by one of two methods. The first - includes modification of a subset of the HTML tags on the server, - combined with the insertion of JavaScript into the HTML page. This - JavaScript executes in the client browser after the page has loaded, - and modifies the remaining URLs within the HTML page, both - Anchors (links) as well as embedded content (images, applets, etc) - so that they become appropriate Archival URL requests back to the - Wayback application. The second method involves rewriting all HTML - tags within the page on the server, to make embedded URLs point back - into the Wayback application. - </div> - <br></br> - <div> - Currently, we are recommending the entirely server-side rewriting - method, and are deprecating the original server-side plus Javascript - method, but this functionality is still available in Wayback. - Neither method is perfect, not all URLs are rewritten correctly, - particularly URLs that are created by JavaScript in the original - pages, and specialized file types containing links like Flash - and PDF documents. - </div> - <br></br> - </p> - <p> - The properties <b>parser</b> and <b>uriConverter</b> - for Archival URL Access Points must be set to the following - implementations: - <pre> - - <property name="parser"> - <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" - init-method="init"> - <property name="maxRecords" value="1000" /> - <property name="earliestTimestamp" value="1996" /> - </bean> - </property> - - <property name="uriConverter"> - <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> - <property name="replayURIPrefix" value="http://wayback.example.org:8080/collection/" /> - </bean> - </property> - - </pre> - </p> - <table> - <tr> - <th> - configuration - </th> - <th> - optional/required - </th> - <th> - description - </th> - </tr> - <tr> - <td> - maxRecords - </td> - <td> - optional - </td> - <td> - Sets the default maximum requested records for Archival URL query - requests. - </td> - </tr> - <tr> - <td> - earliestTimestamp - </td> - <td> - optional - </td> - <td> - Set the default start date for requested records for Archival - URL query requests. - </td> - </tr> - <tr> - <td> - replayURIPrefix - </td> - <td> - required - </td> - <td> - Points to the Archival URL prefix of the Access Point as - illustrated in <a href="access_point_naming.html">Access Point Naming</a> document. - </td> - </tr> - </table> - <p> - For additional configuration examples and information about - ArchivalUrl Replay mode, please see the file - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ArchivalUrlReplay.xml">ArchivalUrlReplay.xml</a> - </p> - </subsection> - - <subsection name="Proxy Replay Mode"> - <p> - Wayback can be configured to act as an HTTP proxy server. To utilize - this mode, the wayback webapp <b>must</b> be deployed as the ROOT - context, no other AccessPoints can use the port dedicated to the - Proxy AccessPoint, and client browsers must be configured to proxy - all HTTP requests through the Wayback Machine application. Instead of - retrieving documents from the live web, the Wayback Machine will - retrieve documents from the configured WaybackCollection. - </p> - <p> - Proxy Replay mode does not suffer from the shortcomings of - the inserted Javascript that the Archival URL mode uses, all URLs - function as they did originally, but there can be another drawback - to using this feature: no date information is sent with each request. - Wayback attempts to address this problem by associating the date - clicked on query pages when a Replay session is begun, with the - users IP address. This can fail to work properly in situations where - multiple users are behind a NAT system which causes them to appear to - have the same IP address. - </p> - <p> - Additionally, there is an experimental Firefox-specific plugin - developed by Oskar Grenholm, which provides a novel interface - to navigate between different captured versions of a page within - Proxy mode, and also sends a special HTTP header which allows Wayback - to uniquely associate the correct date with browsers, even those - behind a NAT system. You can find out more about - this plugin and download it - <a href="http://archive-access.sourceforge.net/projects/waxtoolbar/"> - here - </a>. - </p> - <p> - Thanks Oskar! - </p> - <p> - The following is an example Proxy Replay Access Point definition. It - assumes to be running on a host <b>wayback.somehost.org</b>, that a - Tomcat Connector has been added for port <b>8090</b>, - that the Wayback webapp has been deployed at the ROOT context, and - that another Archival URL Access Point named "8080:wayback" has been - configured. - <pre> - -<bean name="8090" parent="8080:wayback"> - <property name="queryPrefix" value="http://wayback.somehost.org/" /> - <property name="replay"> ref="proxyreplay" /> - <property name="uriconverter"> - <bean class="org.archive.wayback.proxy.RedirectResultURIConverter"> - <property name="redirectURI" value="http://wayback.somehost.org/jsp/Redirect.jsp" /> - </bean> - </property> - <property name="parser"> - <bean class="org.archive.wayback.proxy.ProxyRequestParser" > - <property name="localhostNames"> - <list> - <value>wayback.somehost.org</value> - </list> - </property> - <property name="maxRecords" value="1000" /> - </bean> - </property> -</bean> - - </pre> - </p> - <p> - <b>redirectURI</b> is required, and must be set to the name of the - host where the Wayback application is running. If this is not the - primary name of the machine running the Wayback application, then you - may need to also specify the hostname used for the Wayback application - in the <b>localhostNames</b> configuration list. - </p> - <p> - For additional configuration examples and information about - Proxy Replay mode, please see the file - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ProxyReplay.xml">ProxyReplay.xml</a> - </p> - </subsection> - - <subsection name="DomainPrefix Replay Mode"> - <p> - Wayback includes an additional, experimental Replay mode which is - similar to Archival URL mode, in that any document can be refernced - as a global URL, without any browser configuration requirements. This - mode requires deploying the Wayback webapp in ROOT context, and a - special DNS wildcard aliasing, so that all hostnames with a common - suffix will be directed to your host running Wayback. - </p> - <p> - The general form of a DomainPrefix URL is: - <br></br> - <div> - <code> - http://TIMESTAMP.ARCHIVE-HOSTNAME.WAYBACK-HOSTNAME:PORT/ARCHIVE-PATH - </code> - </div> - </p> - <p> - Here is an example DomainPrefix URL, on an assumed host - <b>wayback.somehost.org</b>, with a wayback webapp deployed as - <b>ROOT</b>, via the Access Point named <b>8081</b> (which indicates the - port Wayback requests will be recieved on) for the - page <b>http://www.yahoo.com/foo.gif</b> on Dec 31, 1999 at 12:00:00 UTC. - <br></br> - <div> - <code> - http://19991231120000.www.yahoo.com.wayback.somehost.org:8081/foo.gif - </code> - </div> - </p> - <p> - This mode performs all URL rewriting on the server side, so needs no - client-side Javascript to execute, and also does not suffer from some - of the request leakage problems present in Archival URL mode. It - presently is somewhat naive about rewriting links within returned - documents, and will also rewrite URLs in the text of pages - (not desired), as well as URLs referenced within the page (desired). - </p> - <p> - For additional configuration examples and information about - Domain Prefix Replay mode, please see the files - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/wayback.xml">wayback.xml</a> - and - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/DomainPrefixReplay.xml">DomainPrefixReplay.xml</a> - . - </p> - </subsection> - </section> - - - <section name="Wayback UI customization options"> - <p> - Wayback provides several opportunities for customizing the user - interface presented to users, which can be grouped into 4 categories: - <ul> - <li>Query UI rendering .jsp files.</li> - <li>Replay insert .jsp files.</li> - <li>Exception rendering .jsp files.</li> - <li>Localization .properties files.</li> - </ul> - </p> - <subsection name="Query UI"> - <p> - All content returned by Wayback in response to Query requests is - generated by .jsp files, which are executed and provided access to - the results found within the ResourceIndex. Wayback is distributed - with several sample implementations. - </p> - <p> - To alter the default behavior, you may either provide your own .jsp - files, and configure the Renderer to use them instead of the - default .jsp files, or the default .jsp files may be modified - directly. - <ul> - <li> - <b>captureJsp</b> - used when the request indicates that - a listing of all dates available for a single URL should be - returned. Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/query/HTMLCaptureResults.jsp">/WEB-INF/query/HTMLCaptureResults.jsp</a>. - An alternate implementation, /WEB-INF/query/CalendarResults.jsp - will generate HTML output similar to the global Wayback Machine - service. - </li> - <li> - <b>urlJsp</b> - used when the request indicates that a summary - of captures available for a number of URLs should be returned. - Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/query/HTMLUrlResults.jsp">/WEB-INF/query/HTMLUrlResults.jsp</a> - </li> - <li> - <b>xmlCaptureJsp</b> - used when the request indicates that - a listing of all dates available for a single URL should be - returned in XML format. Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/query/XMLCaptureResults.jsp">/WEB-INF/query/XMLCaptureResults.jsp</a>. - </li> - <li> - <b>xmlUrlJsp</b> - used when the request indicates that a - summary of captures available for a number of URLs should be - returned in XML format. - Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/query/XMLUrlResults.jsp">/WEB-INF/query/XMLUrlResults.jsp</a> - </li> - </ul> - </p> - </subsection> - <subsection name="Replay Inserts"> - <p> - Wayback allows for embedding additional content within replayed HTML - pages in all Replay modes. This is accomplished by executing one or - more .jsp files with access to context information about the request, - the results, and the actual Resource being returned. The output of - each .jsp file is included within the returned page. - </p> - <p> - Wayback is distributed with several example .jsp insert files that - can be used as is, modified to suit installation requirements, or - used as examples for more elaborate customizations: - <ul> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/ArchiveComment.jsp">/WEB-INF/replay/ArchiveComment.jsp</a> - inserts an HTML comment indicating when the document was - captured and retrieved. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/ClientSideJSInsert.jsp">/WEB-INF/replay/ClientSideJSInsert.jsp</a> - inserts some Javascript into the returned HTML page that updates - links, images, and other embedded content, attempting to make - all URL references within the page point back into the Wayback - service. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/DebugBanner.jsp">/WEB-INF/replay/DebugBanner.jsp</a> - Not intended for production use, but a slightly more complex - jsp insert example that demonstrates how to access various - request context data, and is sometimes useful for debugging. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/Disclaimer.jsp">/WEB-INF/replay/Disclaimer.jsp</a> - Inserts a small banner at the top of replayed HTML pages, - alerting users that they are viewing an archived page, and - providing some information about the particular capture. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/JSLessTimeline.jsp">/WEB-INF/replay/JSLessTimeline.jsp</a> - Inserts a banner in the top of replayed documents which allows - users to navigate directly between other captures of the current - page they are viewing. This version does not use Javascript to - place the banner, so it will appear in all HTML pages within a - frameset. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/Timeline.jsp">/WEB-INF/replay/Timeline.jsp</a> - Inserts a banner in the top of replayed documents which allows - users to navigate directly between other captures of the current - page they are viewing. This version uses Javascript to - place the banner, attempting to only place the banner in the - largest frame within a frameset. - </li> - <li> - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/replay/Toolbar.jsp">/WEB-INF/replay/Toolbar.jsp</a> - Inserts a fancier banner in the top of replayed documents which - includes a graphic representaion of the number of captures over - time and allows users to navigate directly between other captures - of the current page they are viewing. This version uses Javascript - to place the banner, attempting to only place the banner in the - largest frame within a frameset. - </li> - </ul> - </p> - </subsection> - <subsection name="Exception Rendering"> - <p> - Wayback is distributed with a default ExceptionRenderer that allows - customization of several types of anticipated exceptions that can - occur through normal operations. The BaseExceptionRenderer allows - installations to provide alternate .jsp files which are executed, and - the output of these .jsp files are returned to end users. To alter - the default behavior, you may either provide your own .jsp files, and - configure the BaseExceptionRenderer to use them instead of the - default .jsp files, or the default .jsp files may be modified - directly. - <ul> - <li> - <b>xmlErrorJsp</b> - used when the request indicates that XML - data should be returned. Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/exception/XMLError.jsp">/WEB-INF/exception/XMLError.jsp</a> - </li> - <li> - <b>errorJsp</b> - used for HTML Replay exceptions, and for all - Query exceptions. Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/exception/HTMLError.jsp">/WEB-INF/exception/HTMLError.jsp</a> - </li> - <li> - <b>imageErrorJsp</b> - used when the request appears to be an - embedded Replay request that expects an image to be returned. - Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/exception/HTMLError.jsp">/WEB-INF/exception/HTMLError.jsp</a> - which produced HTML output. This may be desirable over - returning an actual image, since web browsers will usually show - any HTML alternate text associated with the image in place of - the image when image data is not returned. Wayback also - includes a 1x1 pixel gif, error_image.gif, which can be used to - display a gray box in place of images requests that result in - an exception. - </li> - <li> - <b>javascriptErrorJsp</b> - used when the request appears to be an - embedded Replay request that expects Javascript content to be - returned. Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/exception/JavaScriptError.jsp">/WEB-INF/exception/JavaScriptError.jsp</a> - </li> - <li> - <b>cssErrorJsp</b> - used when the request appears to be an - embedded Replay request that expects CSS content to be returned. - Default is - <a href="https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/exception/CSSError.jsp">/WEB-INF/exception/CSSError.jsp</a> - </li> - </ul> - </p> - </subsection> - <subsection name="Localization .properties files."> - <p> - Wayback is packaged with a set of reference implementation .jsp files - for generating Query, Replay, and Exception user interface pages. - References to actual user visible text is abstracted within these - .jsp files so the specific text to display in various pages are read - from a .properties file. Wayback will automatically search for a - Locale-specific .properties file from which these text values should - be loaded, allowing the language presented to users to be changed. - </p> - <p> - By default, Wayback will use the language preference indicated by the - users web browser to find an appropriate .properties files, - defaulting to the standard English text if the users preferred - language is not available. Particular AccessPoints can be forced to a - particular Locale using the AccessPoint.locale property. - </p> - <p> - Several language customization .property files have already been - contributed by users in the community and are now included with the - standard Wayback distribution. We plan for a completely new and - improved UI implementation for version 1.6, and plan a more active - outreach program to create customizations in as many languages as - possible once this new UI is completed, and the required text - elements are determined. - </p> - </subsection> - </section> - - - <section name="Excluding Documents within an AccessPoint"> - <subsection name="Excluding Documents with live Robots.txt"> - Documents may be excluded from access within an Access Point by - retroactively enforcing the policies in a web sites live robots.txt - documents by adding the following configuration in the Access Point. - <pre> - -<property name="exclusionFactory" ref="excluder-factory-robot" /> - - </pre> - - <br></br> - Please see the default wayback.xml packaged with this software for an - example bean definition for the referenced <b>excluder-factory-robot</b> - bean. - </subsection> - - <subsection name="Excluding Documents with an Administrative List"> - Documents may be excluded from access within an Access Point by - using a plain text file listing URL prefixes which should be blocked. - If this option is used with a non-zero value for <b>checkInterval</b>, - the Wayback software will monitor the external file, and will - automatically reload the file when it changes. - <br></br> - The following Spring configuration defines a static exclusion file that - causes URLs listed in the file <b>/tmp/exclude.txt</b> to be blocked, - with the file being checked for updates every 10 minutes. - <pre> - -<bean id="static-exclusion" class="org.archive.wayback.... [truncated message content] |