From: Sverre B. <sv...@us...> - 2005-10-26 09:21:13
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31313/src/articles Modified Files: what-is-wera.xml Log Message: Added section on WERA future Index: what-is-wera.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/articles/what-is-wera.xml,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** what-is-wera.xml 21 Oct 2005 07:33:39 -0000 1.4 --- what-is-wera.xml 26 Oct 2005 09:21:02 -0000 1.5 *************** *** 138,200 **** </listitem> </itemizedlist> ! <section> ! <title>Practical use</title> ! <para>The original vision for the <ulink ! url="http://nwa.nb.no">NwaToolset</ulink> (the predecessor of Wera) was ! to enable search across the different Nordic Web Archives and provide ! seamless navigation within the different archives. The ability to search ! across the different indexes was solved by the using <ulink ! url="http://fastsearch.com/">Fast Search & Transfer</ulink>'s multi ! node architecture. To enable Wera to retrieve a particular document with ! a given <literal>aid</literal> (Archive ID) from the right archive the ! collection field was introduced in the index (also present in the ! NutchWax index). The Wera config file holds the mapping from collection ! to archive (or rather Wera installation).</para> ! <para>Another reason to include the collection field was to ensure that ! the actual link rewriting was done by the owner of the document. Each ! archive holder would have to set up their own Wera installation. When ! one Wera was requesting a document from a remote archive, the remote ! Wera should make the necessary changes to the document before delivering ! it to the calling Wera. The reason for this was to make sure that the ! owner had full control over what was delivered to the calling site, thus ! being able to threat the document in accordance with local policies ! rather than the policies of the caller site. The figure below ! illustrates the currently supported use of mapping between collection ! and archive nodes.</para> ! <figure> ! <title>Wera interfacing several archive nodes</title> ! <mediaobject> ! <imageobject> ! <imagedata fileref="images/wera3.png" /> ! </imageobject> ! </mediaobject> ! </figure> ! <para>In the Wera installation of <emphasis>W1</emphasis> the different ! collections indexed in NutchWax are mapped to corresponding Wera ! installations of <emphasis>W2- Wn</emphasis>. When the timeline view on ! W1 encounters a resource located on a different node (e.g. the ! collection mapping points to the Wera installation of ! <emphasis>W2</emphasis>) it requests that resource from the Wera ! installation at <literal>W2</literal>. Wera at <literal>W2</literal> ! fetches the resource from its Retriever and does the necessary changes ! to the file before delivering it to Wera at <literal>W1</literal> (e.g. ! inserts javascript link rewriter or rewrites it server side). When Wera ! at <literal>W1</literal> receives this file it does an additional ! rewrite in order to have the links point to itself rather than to ! <literal>W2</literal>'s Wera.</para> ! <para>In a real-life large scale Web Archive where the ARC files are ! distributed across tens or hundreds of hosts it will not be practical to ! set up one Wera installation for each of these. A better solution will ! be to introduce communication between the different retrievers or have ! one front-end retriever interfacing all the other retrievers within one ! archive. This has to be added in a later release of Wera.</para> ! </section> </section> </article> \ No newline at end of file --- 138,236 ---- </listitem> </itemizedlist> + </section> ! <section> ! <title>Practical use</title> ! <para>The original vision for the <ulink ! url="http://nwa.nb.no">NwaToolset</ulink> (the predecessor of Wera) was to ! enable search across the different Nordic Web Archives and provide ! seamless navigation within the different archives. The ability to search ! across the different indexes was solved by the using <ulink ! url="http://fastsearch.com/">Fast Search & Transfer</ulink>'s multi ! node architecture. To enable Wera to retrieve a particular document with a ! given <literal>aid</literal> (Archive ID) from the right archive the ! collection field was introduced in the index (also present in the NutchWax ! index). The Wera config file holds the mapping from collection to archive ! (or rather Wera installation).</para> ! <para>Another reason to include the collection field was to ensure that ! the actual link rewriting was done by the owner of the document. Each ! archive holder would have to set up their own Wera installation. When one ! Wera was requesting a document from a remote archive, the remote Wera ! should make the necessary changes to the document before delivering it to ! the calling Wera. The reason for this was to make sure that the owner had ! full control over what was delivered to the calling site, thus being able ! to threat the document in accordance with local policies rather than the ! policies of the caller site. The figure below illustrates the currently ! supported use of mapping between collection and archive nodes.</para> ! <figure> ! <title>Wera interfacing several archive nodes</title> ! <mediaobject> ! <imageobject> ! <imagedata fileref="images/wera3.png" /> ! </imageobject> ! </mediaobject> ! </figure> ! <para>In the Wera installation of <emphasis>W1</emphasis> the different ! collections indexed in NutchWax are mapped to corresponding Wera ! installations of <emphasis>W2- Wn</emphasis>. When the timeline view on W1 ! encounters a resource located on a different node (e.g. the collection ! mapping points to the Wera installation of <emphasis>W2</emphasis>) it ! requests that resource from the Wera installation at ! <literal>W2</literal>. Wera at <literal>W2</literal> fetches the resource ! from its Retriever and does the necessary changes to the file before ! delivering it to Wera at <literal>W1</literal> (e.g. inserts javascript ! link rewriter or rewrites it server side). When Wera at ! <literal>W1</literal> receives this file it does an additional rewrite in ! order to have the links point to itself rather than to ! <literal>W2</literal>'s Wera.</para> ! <para>In a real-life large scale Web Archive where the ARC files are ! distributed across tens or hundreds of hosts it will not be practical to ! set up one Wera installation for each of these. A better solution will be ! to introduce communication between the different retrievers or have one ! front-end retriever interfacing all the other retrievers within one ! archive. This has to be added in a later release of Wera.</para> ! </section> ! ! <section> ! <title>The future of WERA</title> ! ! <para>As long as there are institutions using WERA, and these institutions ! see a need for fixing bugs and adding functionality, WERA will evolve. Of ! course, the actual work put into it will depend on the resources available ! at these institutions. We also hope that future enhancements of WERA will ! be funded, or partly funded by IIPC, as was the case with the work done to ! enable release 0.4.0 of WERA (and NutchWax).</para> ! ! <para>The most important requirement for a future release of WERA will be ! to support retrieval from several Web Archive hosts through one single ARC ! retriever interface. In addition we need to do something with the ! remaining bugs that didn't make it into the 0.4.0. release (handling of ! redirects and better handling of frames). There are also a few requests ! for enhancements registered that needs attention, one of them being the ! advanced search interface.</para> ! ! <para>One of the main complaints from users has been that WERA required ! the user to install and set up Tomcat, Apache + PHP and Perl + a number of ! CPAN modules. The dependency on Perl is long since removed but WERA still ! requires Tomcat (java Arc Retriever) and Apache (PHP web applications for ! searching and navigating). Over time, we would like WERA to move ! completely to Java, both for simplifying the install, setup and ! maintenance as well as improving the chances of getting users involved in ! the further development of WERA. Fortunately the move to Java may be done ! gradually because WERA is modular, and http is used to communicate between ! the different modules. The work of porting WERA to Java should be ! coordinated with the work done on <ulink ! url="http://archive-access.sourceforge.net/projects/wayback/">wayback</ulink>, ! to prevent implementing the same functionallity twice.</para> ! ! <para>We strongly encourage users of WERA/NutchWax to contribute by ! submitting bugs and RFE's, as well as providing feedback on the ! usefullness of the tools.</para> </section> </article> \ No newline at end of file |