From: Michael S. <sta...@us...> - 2005-10-20 16:34:32
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17502/src/articles Modified Files: manual.xml what-is-wera.xml Log Message: * project.xml * src/articles/manual.xml Added pointer to what-is-wera.html. * src/articles/what-is-wera.xml Spelling. Formatting. Index: manual.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/articles/manual.xml,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** manual.xml 20 Oct 2005 00:16:39 -0000 1.4 --- manual.xml 20 Oct 2005 16:34:11 -0000 1.5 *************** *** 92,95 **** --- 92,100 ---- available.</para> + <note> + <para>For more on the workings and architecture of WERA, + see <ulink url="what-is-wera.html"></ulink>.</para> + </note> + <section> <title>NutchWAX</title> Index: what-is-wera.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/articles/what-is-wera.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** what-is-wera.xml 20 Oct 2005 13:56:49 -0000 1.2 --- what-is-wera.xml 20 Oct 2005 16:34:11 -0000 1.3 *************** *** 6,10 **** <articleinfo> ! <releaseinfo>$id$</releaseinfo> <author> --- 6,10 ---- <articleinfo> ! <releaseinfo>$Id$</releaseinfo> <author> *************** *** 35,39 **** </figure> ! <para>Whne the user clicks the Timeline link of a specific hit, the Timeline View shows up (shown below). Each version (timestamp) of the given url is marked along the timeline. The user may navigate between the --- 35,39 ---- </figure> ! <para>When the user clicks the Timeline link of a specific hit, the Timeline View shows up (shown below). Each version (timestamp) of the given url is marked along the timeline. The user may navigate between the *************** *** 80,84 **** <para>Based on the query submitted Wera constructs a search request and sends it (1) to NutchWax (http get request, e.g. ! http://localhost:8082/nutchwax/opensearch?query=lux&start=0&hitsPerPage=10&hitsPerDup=1&dedupField=exacturl)</para> </listitem> --- 80,84 ---- <para>Based on the query submitted Wera constructs a search request and sends it (1) to NutchWax (http get request, e.g. ! <literal>http://localhost:8082/nutchwax/opensearch?query=lux&start=0&hitsPerPage=10&hitsPerDup=1&dedupField=exacturl</literal>)</para> </listitem> *************** *** 109,123 **** <listitem> ! <para>Wera executes searches on <emphasis>exaturl</emphasis> to find the version closest to the timestamp submitted as parameter to the timeline view script (1,2). For that particular version ! Wera constructs a request to the arcretriever containing the name ! of the ARC file where the version recides as well as the offset within that file where the version is stored (the ARC name and offset are stored in the index). Wera now requests, and receives ! an archived resource (3, 4) from the arcretriever (request example: ! http://localhost:8082/arcretriever/arcretriever?reqtype=getfile&aid=5902508/IAH-20051004171809-00000-test). ! If the resource is of type text/html (information in result set from NutchWax), a javascript link rewriter is inserted in the resource to ensure that links point to Wera rather than out to the --- 109,126 ---- <listitem> ! <para>Wera executes searches on <emphasis>exacturl</emphasis> to find the version closest to the timestamp submitted as parameter to the timeline view script (1,2). For that particular version ! Wera constructs a request to the <emphasis>arcretriever</emphasis> ! containing the name ! of the ARC file where the version resides as well as the offset within that file where the version is stored (the ARC name and offset are stored in the index). Wera now requests, and receives ! an archived resource (3, 4) from the ! <emphasis>arcretriever</emphasis> (request example: ! <literal>http://localhost:8082/arcretriever/arcretriever?reqtype=getfile&aid=5902508/IAH-20051004171809-00000-test</literal>). ! If the resource is of type ! <literal>text/html</literal> (information in result set from NutchWax), a javascript link rewriter is inserted in the resource to ensure that links point to Wera rather than out to the *************** *** 129,133 **** <note> ! <para>A resource of type text/html will often contain inline references to images etc. Provided the javascript link rewriter does its job on these, the step above will be repeated for each --- 132,137 ---- <note> ! <para>A resource of type <literal>text/html</literal> will often ! contain inline references to images etc. Provided the javascript link rewriter does its job on these, the step above will be repeated for each *************** *** 142,146 **** <title>Practical use</title> ! <para>The original vision for the NwaToolset (the predecessor of Wera) was to enable search across the different Nordic Web Archives and provide seamless navigation within the different archives. The ability --- 146,151 ---- <title>Practical use</title> ! <para>The original vision for the ! <ulink url="http://nwa.nb.no">NwaToolset</ulink> (the predecessor of Wera) was to enable search across the different Nordic Web Archives and provide seamless navigation within the different archives. The ability *************** *** 148,153 **** url="http://fastsearch.com/">Fast Search & Transfer</ulink>'s multi node architecture. To enable Wera to retrieve a particular document with ! a given aid from the right archive the collection field was introduced ! in the index (also present in the NutxhWax index). The Wera config file holds the mapping from collection to archive (or rather Wera installation).</para> --- 153,159 ---- url="http://fastsearch.com/">Fast Search & Transfer</ulink>'s multi node architecture. To enable Wera to retrieve a particular document with ! a given <literal>aid</literal> (Archive ID) from the right archive the ! collection field was introduced ! in the index (also present in the NutchWax index). The Wera config file holds the mapping from collection to archive (or rather Wera installation).</para> *************** *** 156,160 **** the actual link rewriting was done by the owner of the document. Each archive holder would have to set up their own Wera installation. When ! one Wera was requesting a document from a remote archive the remote Wera should make the necessary changes to the document before delivering it to the calling Wera. The reason for this was to make sure that the owner --- 162,166 ---- the actual link rewriting was done by the owner of the document. Each archive holder would have to set up their own Wera installation. When ! one Wera was requesting a document from a remote archive, the remote Wera should make the necessary changes to the document before delivering it to the calling Wera. The reason for this was to make sure that the owner *************** *** 175,188 **** </figure> ! <para>In the Wera installation of W1 the different collections indexed ! in NutchWax is mapped to corresponding Wera installations of W2- Wn. When the timeline view on W1 encounters a resource located on a different node (e.g. the collection mapping points to the Wera ! installation of W2) it requests that resource from the Wera installation ! at W2. Wera at W2 fetches the resource from its Retriever and does the ! necessary changes to the file before delivering it to Wera at W1 (e.g. inserts javascript link rewriter or rewrites it server side). When Wera ! at W1 receives this file it does an additional rewrite in order to have ! the links point to itself rather than to W2's Wera.</para> <para>In a real-life large scale Web Archive where the ARC files are --- 181,199 ---- </figure> ! <para>In the Wera installation of ! <emphasis>W1</emphasis> the different collections indexed ! in NutchWax are mapped to corresponding Wera installations of ! <emphasis>W2- Wn</emphasis>. When the timeline view on W1 encounters a resource located on a different node (e.g. the collection mapping points to the Wera ! installation of <emphasis>W2</emphasis>) it requests that resource from ! the Wera installation at <literal>W2</literal>. Wera at ! <literal>W2</literal> fetches the resource from its Retriever and does ! the necessary changes to the file before delivering it to Wera at ! <literal>W1</literal> (e.g. inserts javascript link rewriter or rewrites it server side). When Wera ! at <literal>W1</literal> receives this file it does an additional ! rewrite in order to have the links point to itself rather than to ! <literal>W2</literal>'s Wera.</para> <para>In a real-life large scale Web Archive where the ARC files are *************** *** 194,196 **** </section> </section> ! </article> \ No newline at end of file --- 205,207 ---- </section> </section> ! </article> |