You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(10) |
Sep
(36) |
Oct
(339) |
Nov
(103) |
Dec
(152) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(141) |
Feb
(102) |
Mar
(125) |
Apr
(203) |
May
(57) |
Jun
(30) |
Jul
(139) |
Aug
(46) |
Sep
(64) |
Oct
(105) |
Nov
(34) |
Dec
(162) |
2007 |
Jan
(81) |
Feb
(57) |
Mar
(141) |
Apr
(72) |
May
(9) |
Jun
(1) |
Jul
(144) |
Aug
(88) |
Sep
(40) |
Oct
(43) |
Nov
(34) |
Dec
(20) |
2008 |
Jan
(44) |
Feb
(45) |
Mar
(16) |
Apr
(36) |
May
(8) |
Jun
(77) |
Jul
(177) |
Aug
(66) |
Sep
(8) |
Oct
(33) |
Nov
(13) |
Dec
(37) |
2009 |
Jan
(2) |
Feb
(5) |
Mar
(8) |
Apr
|
May
(36) |
Jun
(19) |
Jul
(46) |
Aug
(8) |
Sep
(1) |
Oct
(66) |
Nov
(61) |
Dec
(10) |
2010 |
Jan
(13) |
Feb
(16) |
Mar
(38) |
Apr
(76) |
May
(47) |
Jun
(32) |
Jul
(35) |
Aug
(45) |
Sep
(20) |
Oct
(61) |
Nov
(24) |
Dec
(16) |
2011 |
Jan
(22) |
Feb
(34) |
Mar
(11) |
Apr
(8) |
May
(24) |
Jun
(23) |
Jul
(11) |
Aug
(42) |
Sep
(81) |
Oct
(48) |
Nov
(21) |
Dec
(20) |
2012 |
Jan
(30) |
Feb
(25) |
Mar
(4) |
Apr
(6) |
May
(1) |
Jun
(5) |
Jul
(5) |
Aug
(8) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
|
From: Brad <bra...@us...> - 2005-10-18 02:30:53
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3336/src/webapp Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/webapp added to the repository |
From: Brad <bra...@us...> - 2005-10-18 02:30:53
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/ReplayUI In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3336/src/webapp/jsp/ReplayUI Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/ReplayUI added to the repository |
From: Brad <bra...@us...> - 2005-10-18 02:30:53
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3336/src/webapp/jsp Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp added to the repository |
From: Brad <bra...@us...> - 2005-10-18 02:30:53
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3336/src/webapp/jsp/QueryUI Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI added to the repository |
From: Brad <bra...@us...> - 2005-10-18 02:29:06
|
Update of /cvsroot/archive-access/archive-access/projects/wayback In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3007/wayback Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback added to the repository |
From: Michael S. <sta...@us...> - 2005-10-18 02:21:07
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2066/xdocs Modified Files: srcbuild.xml Log Message: * xdocs//srcbuild.xml Note on how to build with 0.7.1 nutch. Index: srcbuild.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/srcbuild.xml,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** srcbuild.xml 17 Oct 2005 20:53:13 -0000 1.5 --- srcbuild.xml 18 Oct 2005 02:20:55 -0000 1.6 *************** *** 29,33 **** the ${NUTCHWAX} directory so you need to either rename nutch directory as Nutch or make a symbolic link from ! nutch-0.?.? to Nutch.</p> <p>Symlink ${NUTCHWAX}/nutch/conf/nutch-site.xml to --- 29,39 ---- the ${NUTCHWAX} directory so you need to either rename nutch directory as Nutch or make a symbolic link from ! nutch-0.?.? to Nutch. ! If building against 0.7.1, you'll need to create the directory ! <literal>${NUTCH_HOME}/src/plugins/nutch-extensionpoints/src/java</literal> ! else the nutch ant build fails. You'll also have to update ! ${NUTCHWAX}/project.properties to point at the nutch 0.7.1 jar rather ! than at the 0.7.0 jar. ! </p> <p>Symlink ${NUTCHWAX}/nutch/conf/nutch-site.xml to |
From: Michael S. <sta...@us...> - 2005-10-17 21:51:49
|
Update of /cvsroot/archive-access/archive-access/projects/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1967 Modified Files: project.xml Log Message: * project.xml Added pointer to downloads page. * xdocs/navigation.xml Added poitner to new release notes. * src/articles/releasenotes.xml Added new release notes document. Brought over old release notes and put them here. Index: project.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/project.xml,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** project.xml 11 Oct 2005 01:09:50 -0000 1.10 --- project.xml 17 Oct 2005 21:51:34 -0000 1.11 *************** *** 42,45 **** --- 42,47 ---- href="http://www.netpreserve.net">International Internet Preservation Consortium (IIPC)</a>. + <p />See the <a href="downloads.html">downloads page</a> + for the latest release of WERA. </description> <!-- a short description of what the project does --> |
From: Michael S. <sta...@us...> - 2005-10-17 21:51:48
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1967/src/articles Added Files: releasenotes.xml Log Message: * project.xml Added pointer to downloads page. * xdocs/navigation.xml Added poitner to new release notes. * src/articles/releasenotes.xml Added new release notes document. Brought over old release notes and put them here. --- NEW FILE: releasenotes.xml --- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <article> <title>WERA Release Notes</title> <articleinfo> <date>$Date: 2005/10/17 21:51:35 $</date> </articleinfo> <sect1 id="0_4_0"> <title>Release 0.4.0 - NOT YET RELEASED</title> <abstract> <para>TODO</para> </abstract> <sect2 id="0_4_0_limitations"> <title>Known Limitations/Issues</title> <sect3><title>?</title> <para> ? </para> </sect3> </sect2> <sect2 id="0_4_0_changes"> <title>Changes</title> <sect3 > <title>?</title> <para /> </sect3> </sect2> </sect1> <sect1 id="0_2_2"> <title>Release 0.2.2</title> <abstract> <para>Bug fixes</para> </abstract> <sect2 id="0_2_2_changes"> <title>Changes</title> <para>Fixed <ulink url="http://sourceforge.net/tracker/index.php?func=detail&aid=1277376&group_id=118427&atid=681137">1277376 duplicate hits in result list</ulink>. WERA now uses NutchWAX's dedup functionality to supress duplicate hits in result list. Gives improved performance.</para> </sect2> </sect1> <sect1 id="0_2_1"> <title>Release 0.2.1</title> <abstract> <para>First release of WERA</para> </abstract> <sect2> <title>Known Limitations/Issues</title> <para>When no X installed the Java based installer should fall back to console mode. Some reports of problems with this. If so, install wera manually. See manual. </para> <para>WERA does not work properly with PHP5. Has to do with PHP5's new Object Model. When using the 'NEAR' mode of the documentLocator it will return a resultset concatenated by the resultsets for 'BEFORE' and 'AFTER' instead of returning the one closest in time. Results in wrong aid to the documentRetriever when presenting inline objects. </para> </sect2> <sect2 id="0_2_1_changes"> <title>Changes</title> <para> <orderedlist> <listitem>Support for nutchwax search engine added</listitem> <listitem>Support for nwalucene search removed (replaced by the above). </listitem> <listitem>Support for Fast Search Engine currently not working (will be added in later version).</listitem> <listitem>Advanced search removed (may be added in later version).</listitem> <listitem>Server side link rewriting replaced by javascript client side link rewriting.</listitem> </orderedlist> </para> </sect2> </sect1> </article> |
From: Michael S. <sta...@us...> - 2005-10-17 21:51:48
|
Update of /cvsroot/archive-access/archive-access/projects/wera/xdocs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1967/xdocs Modified Files: navigation.xml Log Message: * project.xml Added pointer to downloads page. * xdocs/navigation.xml Added poitner to new release notes. * src/articles/releasenotes.xml Added new release notes document. Brought over old release notes and put them here. Index: navigation.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/xdocs/navigation.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** navigation.xml 6 Oct 2005 22:32:46 -0000 1.3 --- navigation.xml 17 Oct 2005 21:51:35 -0000 1.4 *************** *** 16,19 **** --- 16,20 ---- <item name="Documentation" > <item name="Wera Manual" href="/articles/manual.html"/> + <item name="Release Notes" href="/articles/releasenotes.html"/> <item name="FAQ" href="faq.html"/> </item> |
From: Michael S. <sta...@us...> - 2005-10-17 20:57:14
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22684/articles Added Files: releasenotes.xml Log Message: * articles/releasenotes.xml Startup some release notes. --- NEW FILE: releasenotes.xml --- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <article> <title>Nutchwax Release Notes</title> <articleinfo> <date>$Date: 2005/10/17 20:57:03 $</date> <authorgroup> <corpauthor>Internet Archive</corpauthor> </authorgroup> </articleinfo> <sect1 id="1_6_0"> <title>Release 0.4.0 - NOT YET RELEASED</title> <abstract> <para>TODO</para> </abstract> <sect2 id="0_4_0_limitations"> <title>Known Limitations/Issues</title> <sect3 id="bdb_nfs"><title>java.io.IOException: No locks available</title> <para>Bdb will complain 'No locks available' when crawler is being built/run on an NFS mount. Workaround is not run on an NFS-mounted volume. </para> </sect3> </sect2> <sect2 id="0_4_0_changes"> <title>Changes</title> <sect3 id="postselector"> <title>Postselector</title> <para>The Postselector has been refactored out of existence. Its responsibilities have been parcelled out to two new Processors: LinksScoper and FrontierScheduler. LinksScoper is responsible for scope checking of extracted links. FrontierScheduler does the scheduling of URIs with the Frontier. </para> <para>This change was done to allow introduction of processors between scope checking and Frontier scheduling steps. </para> <para>Because of this change, order files from 1.4.0 Heritrix or before will need to be updated -- Postselector references replaced by LinkScoper and FrontierScheduler references -- before they can be used with Heritrix 1.6.0 (Referencing a non-existent Postselector in an order file usually shows as -50 fetch status in crawl.log). </para> </sect3> </sect2> </sect1> </article> |
From: Michael S. <sta...@us...> - 2005-10-17 20:56:39
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22605/articles Log Message: Directory /cvsroot/archive-access/archive-access/projects/nutch/src/articles added to the repository |
From: Michael S. <sta...@us...> - 2005-10-17 20:53:29
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21794/xdocs Modified Files: srcbuild.xml Log Message: * xdocs/srcbuild.xml Note that we work w/ 0.7.x of nutch. Index: srcbuild.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/srcbuild.xml,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** srcbuild.xml 29 Jul 2005 22:12:24 -0000 1.4 --- srcbuild.xml 17 Oct 2005 20:53:13 -0000 1.5 *************** *** 20,32 **** Let ${NUTCHWAX} be this directory everafter.</p> ! <p>Obtain a Nutch ! <a href="http://lucene.apache.org/nutch/release/nightly/">nightly build</a>. ! The below has been tested working using 07/13/2005. Revert to this ! version of Nutch if problems in Nutch (It won't work with release 0.6 ! of Nutch). Unbundle the nightly build. It usually untars as nutch-nightly. ! The build scripts are looking for 'nutch' in the ${NUTCHWAX} directory so you need to ! either rename nutch-nightly as Nutch or make a symbolic link from ! nutch-nightly to Nutch.</p> <p>Symlink ${NUTCHWAX}/nutch/conf/nutch-site.xml to --- 20,33 ---- Let ${NUTCHWAX} be this directory everafter.</p> ! <p>Obtain the latest Nutch release. See ! <a href="http://www.apache.org/dyn/closer.cgi/lucene/nutch/">nutch ! downloads</a>. ! The below has been tested working using nutch 0.7.0 and 0.7.1. Revert to ! this version of Nutch if problems building (Nutchwax will not work with ! release 0.6 of Nutch). Unbundle the nutch release It usually untars as ! nutch-0.?.?. The build scripts are looking for 'nutch' in the ${NUTCHWAX} directory so you need to ! either rename nutch directory as Nutch or make a symbolic link from ! nutch-0.?.? to Nutch.</p> <p>Symlink ${NUTCHWAX}/nutch/conf/nutch-site.xml to |
From: Sverre B. <sv...@us...> - 2005-10-17 11:04:23
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6310 Modified Files: top.php Log Message: Imporved output when no hit on url ... Index: top.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/top.php,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** top.php 5 Oct 2005 01:38:18 -0000 1.3 --- top.php 17 Oct 2005 11:04:11 -0000 1.4 *************** *** 82,89 **** if ($timeline_data == false) { ! include($conf_includepath . "/header.inc"); ! print $timeline->getError(); ! include($conf_includepath . "/footer.inc"); ! die(); } --- 82,93 ---- if ($timeline_data == false) { ! $error = $timeline->getError(); ! if ($error) { ! include($conf_includepath . "/header.inc"); ! print $error; ! include($conf_includepath . "/footer.inc"); ! die(); ! } ! # else, no hits ... } |
From: Michael S. <sta...@us...> - 2005-10-15 01:19:04
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9411/src/java/org/archive/access/nutch Modified Files: NutchwaxOpenSearchServlet.java Log Message: * src/java/org/archive/access/nutch/NutchwaxOpenSearchServlet.java Use same code as new version 2 patch that I put up into NUTCH-110. Index: NutchwaxOpenSearchServlet.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch/NutchwaxOpenSearchServlet.java,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** NutchwaxOpenSearchServlet.java 13 Oct 2005 15:53:36 -0000 1.4 --- NutchwaxOpenSearchServlet.java 15 Oct 2005 01:18:56 -0000 1.5 *************** *** 268,272 **** String name, String text) { Element child = doc.createElement(name); ! child.appendChild(doc.createTextNode(getLegalXml(text))); parent.appendChild(child); } --- 268,272 ---- String name, String text) { Element child = doc.createElement(name); ! child.appendChild(doc.createTextNode(toValidXmlText(text))); parent.appendChild(child); } *************** *** 275,279 **** String ns, String name, String text) { Element child = doc.createElementNS((String)NS_MAP.get(ns), ns+":"+name); ! child.appendChild(doc.createTextNode(getLegalXml(text))); parent.appendChild(child); } --- 275,279 ---- String ns, String name, String text) { Element child = doc.createElementNS((String)NS_MAP.get(ns), ns+":"+name); ! child.appendChild(doc.createTextNode(toValidXmlText(text))); parent.appendChild(child); } *************** *** 282,330 **** String name, String value) { Attr attribute = doc.createAttribute(name); ! attribute.setValue(getLegalXml(value)); node.getAttributes().setNamedItem(attribute); } ! /* ! * Ensure string is legal xml. ! * First look to see if string has illegal characters. If it doesn't, ! * just return it. Otherwise, create new string with illegal characters ! * @param text String to verify. ! * @return Passed <code>text</code> or a new string with illegal ! * characters removed if any found in <code>text</code>. ! * @see http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char */ ! private static String getLegalXml(final String text) { ! if (text == null) { ! return null; ! } ! boolean allLegal = true; ! for (int i = 0; i < text.length(); i++) { ! if (!isLegalXml(text.charAt(i))) { ! allLegal = false; ! break; ! } ! } ! return allLegal? text: createLegalXml(text); } ! private static String createLegalXml(final String text) { ! if (text == null) { ! return null; ! } ! StringBuffer buffer = new StringBuffer(text.length()); ! for (int i = 0; i < text.length(); i++) { ! char c = text.charAt(i); ! if (isLegalXml(c)) { ! buffer.append(c); ! } } ! return buffer.toString(); ! } ! ! private static boolean isLegalXml(final char c) { ! return c == 0x9 || c == 0xa || c == 0xd || (c >= 0x20 && c <= 0xd7ff) ! || (c >= 0xe000 && c <= 0xfffd) || (c >= 0x10000 && c <= 0x10ffff); } } - --- 282,427 ---- String name, String value) { Attr attribute = doc.createAttribute(name); ! attribute.setValue(value); node.getAttributes().setNamedItem(attribute); } ! /** ! * Escapes a string so that it can be safely put into an XML text node. ! * Please note that some characters cannot be serialized into an XML text ! * (Such characters are dropped from the String returned). Refer to ! * <a href="http://www.w3.org/TR/2000/REC-xml-20001006#charsets">XML ! * specification</a> for more information. ! * ! * @param str The string to be escaped. ! * <code>IllegalArgumentException</code> is thrown when an unescapable ! * sequence of characters is encountered. Otherwise, the offending ! * characters will be omitted in the output. ! * @return A string that is safe to use in an XML element or attribute. The ! * xml 5 'special characters' are entity encoded if present and characters ! * outside of the legal range for xml documents will have been removed. ! * @author Dawid Weiss */ ! public static String toValidXmlText(final String str) ! { ! return toValidXmlText(str, false); } ! /** ! * Escapes a string so that it can be safely put into an XML text node. ! * Please note that some characters cannot be serialized into an XML text. ! * Refer to <a href="http://www.w3.org/TR/2000/REC-xml-20001006#charsets">XML ! * specification</a> for more information. ! * ! * @param str The string to be escaped. ! * @param exceptionOnUnescapable If true, ! * <code>IllegalArgumentException</code> is thrown when an unescapable ! * sequence of characters is encountered. Otherwise, the offending ! * characters will be omitted in the output. ! * @return A string that is safe to use in an XML element or attribute. The ! * xml 5 'special characters' are entity encoded if present and characters ! * outside of the legal range for xml documents will have been removed ! * (if <code>exceptionOnUnescapable</code> is true. ! * @author Dawid Weiss ! */ ! public static String toValidXmlText(final String str, ! final boolean exceptionOnUnescapable) ! { ! StringBuffer buffer = null; ! ! for (int i = 0; i < str.length(); i++) ! { ! char ch = str.charAt(i); ! String entity; ! ! switch (ch) ! { ! case '<': // '<' ! entity = "<"; ! ! break; ! ! case '>': // '>' ! entity = ">"; ! ! break; ! ! case '&': // '&' ! entity = "&"; ! ! break; ! ! case '\'': ! entity = "'"; ! ! break; ! ! case '"': ! entity = """; ! ! break; ! ! case 0x09: // valid xml characters ! case 0x0a: ! case 0x0d: ! entity = null; ! ! break; ! ! default: ! ! // check if valid XML characters ! if ( ! ((ch >= 0x20) && (ch <= 0xD7FF)) || ! ((ch >= 0xe000) && (ch <= 0xfffd)) || ! ((ch >= 0x10000) && (ch <= 0x10ffff)) ! ) ! { ! entity = null; ! ! break; ! } ! else ! { ! if (exceptionOnUnescapable) ! { ! throw new IllegalArgumentException( ! "Character is not within valid XML characters " + ! "(code: 0x" + Integer.toHexString(ch) + ! ", position: " + i + ")." ! ); ! } ! else ! { ! // replace the character with an empty string. ! entity = ""; ! ! break; ! } ! } ! } ! ! if (buffer == null) ! { ! if (entity != null) ! { ! buffer = new StringBuffer(str.length() + 20); ! buffer.append(str.substring(0, i)); ! buffer.append(entity); ! } ! } ! else ! { ! if (entity == null) ! { ! buffer.append(ch); ! } ! else ! { ! buffer.append(entity); ! } ! } } ! ! return (buffer != null) ? buffer.toString() : str; } } |
From: Michael S. <sta...@us...> - 2005-10-13 22:09:36
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/articles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8665/src/articles Modified Files: manual.xml Log Message: * src/articles/manual.xml Improved note on gui installer -- that it won't be available in next release. Index: manual.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/articles/manual.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** manual.xml 6 Oct 2005 22:32:46 -0000 1.2 --- manual.xml 13 Oct 2005 22:09:28 -0000 1.3 *************** *** 275,279 **** <note> <para>The java-based installer is momentarily unavailable. Will ! be fixed in upcoming release. </para> </note> --- 275,279 ---- <note> <para>The java-based installer is momentarily unavailable. Will ! be fixed in upcoming -- post 0.4.0 -- release. </para> </note> |
From: Michael S. <sta...@us...> - 2005-10-13 15:53:49
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv16586/src/java/org/archive/access/nutch Modified Files: NutchwaxOpenSearchServlet.java Log Message: Fix for ' 1312212 ] bad xml chars in search results' * src/java/org/archive/access/nutch/NutchwaxOpenSearchServlet.java I didn't want to bring into nutchwax a complete copy of OpenSearchServlet but have no choice if I want to fix bad xml bug. Have submitted patch to nutch. If it gets applied I'll remove the inclusion of the total servlet. Meantime, the below runs all text through a filter that looks for disallowed xml characters. Index: NutchwaxOpenSearchServlet.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch/NutchwaxOpenSearchServlet.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** NutchwaxOpenSearchServlet.java 6 Oct 2005 17:35:02 -0000 1.3 --- NutchwaxOpenSearchServlet.java 13 Oct 2005 15:53:36 -0000 1.4 *************** *** 1,274 **** ! /* NutchwaxOpenSearchServlet.java * ! * $Id$ * ! * Created Jul 26, 2005 * ! * Copyright (C) 2005 Internet Archive. ! * ! * This file is part of the archive-access tools project ! * (http://sourceforge.net/projects/archive-access). ! * ! * The archive-access tools are free software; you can redistribute them and/or ! * modify them under the terms of the GNU Lesser Public License as published by ! * the Free Software Foundation; either version 2.1 of the License, or any ! * later version. ! * ! * The archive-access tools are distributed in the hope that they will be ! * useful, but WITHOUT ANY WARRANTY; without even the implied warranty of ! * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser ! * Public License for more details. ! * ! * You should have received a copy of the GNU Lesser Public License along with ! * the archive-access tools; if not, write to the Free Software Foundation, ! * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.access.nutch; ! import java.io.BufferedReader; import java.io.IOException; ! import java.io.UnsupportedEncodingException; ! import java.security.Principal; ! import java.util.Enumeration; ! import java.util.Locale; import java.util.Map; - import javax.servlet.RequestDispatcher; import javax.servlet.ServletException; ! import javax.servlet.ServletInputStream; ! import javax.servlet.http.Cookie; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; - import javax.servlet.http.HttpSession; - - import org.apache.nutch.searcher.OpenSearchServlet; - - public class NutchwaxOpenSearchServlet extends OpenSearchServlet { - public void doGet(final HttpServletRequest req, - final HttpServletResponse res) - throws ServletException, IOException { - // Make a delegating method that preprocesses the query string - // converting any exacturl values so they'll pass the NutchAnalysis. - HttpServletRequest delegatingReq = new HttpServletRequest() { - public String getParameter(String parameter) { - String q = req.getParameter(parameter); - return (parameter != null && parameter.equals("query"))? - NutchwaxQuery.encodeExacturl(q): q; - } - - public String getAuthType() { - return req.getAuthType(); - } - - public Cookie[] getCookies() { - return req.getCookies(); - } - - public long getDateHeader(String arg0) { - return req.getDateHeader(arg0); - } - - public String getHeader(String arg0) { - return req.getHeader(arg0); - } - - public Enumeration getHeaders(String arg0) { - return req.getHeaders(arg0); - } - - public Enumeration getHeaderNames() { - return req.getHeaderNames(); - } - - public int getIntHeader(String arg0) { - return req.getIntHeader(arg0); - } - - public String getMethod() { - return req.getMethod(); - } - - public String getPathInfo() { - return req.getPathInfo(); - } ! public String getPathTranslated() { ! return req.getPathTranslated(); ! } ! ! public String getContextPath() { ! return req.getContextPath(); ! } ! ! public String getQueryString() { ! return req.getQueryString(); ! } - public String getRemoteUser() { - return req.getRemoteUser(); - } ! public boolean isUserInRole(String arg0) { ! return req.isUserInRole(arg0); ! } ! public Principal getUserPrincipal() { ! return req.getUserPrincipal(); ! } ! public String getRequestedSessionId() { ! return req.getRequestedSessionId(); ! } ! public String getRequestURI() { ! return req.getRequestURI(); ! } ! public StringBuffer getRequestURL() { ! return req.getRequestURL(); ! } ! public String getServletPath() { ! return req.getServletPath(); ! } ! public HttpSession getSession(boolean arg0) { ! return req.getSession(arg0); ! } ! public HttpSession getSession() { ! return req.getSession(); ! } ! public boolean isRequestedSessionIdValid() { ! return req.isRequestedSessionIdValid(); ! } ! public boolean isRequestedSessionIdFromCookie() { ! return req.isRequestedSessionIdFromCookie(); ! } ! public boolean isRequestedSessionIdFromURL() { ! return req.isRequestedSessionIdFromURL(); ! } ! public boolean isRequestedSessionIdFromUrl() { ! return req.isRequestedSessionIdFromUrl(); ! } ! public Object getAttribute(String arg0) { ! return req.getAttribute(arg0); ! } ! public Enumeration getAttributeNames() { ! return req.getAttributeNames(); ! } ! public String getCharacterEncoding() { ! return req.getCharacterEncoding(); ! } ! public void setCharacterEncoding(String arg0) ! throws UnsupportedEncodingException { ! req.setCharacterEncoding(arg0); ! } ! public int getContentLength() { ! return req.getContentLength(); ! } ! public String getContentType() { ! return req.getContentType(); ! } ! public ServletInputStream getInputStream() throws IOException { ! return req.getInputStream(); ! } ! public Enumeration getParameterNames() { ! return req.getParameterNames(); ! } ! public String[] getParameterValues(String arg0) { ! return req.getParameterValues(arg0); ! } ! public Map getParameterMap() { ! return req.getParameterMap(); ! } ! public String getProtocol() { ! return req.getProtocol(); ! } ! public String getScheme() { ! return req.getScheme(); ! } ! public String getServerName() { ! return req.getServerName(); ! } ! public int getServerPort() { ! return req.getServerPort(); ! } ! public BufferedReader getReader() throws IOException { ! return req.getReader(); ! } ! public String getRemoteAddr() { ! return req.getRemoteAddr(); ! } ! public String getRemoteHost() { ! return req.getRemoteHost(); ! } ! public void setAttribute(String arg0, Object arg1) { ! req.setAttribute(arg0, arg1); ! } ! public void removeAttribute(String arg0) { ! req.removeAttribute(arg0); ! } ! public Locale getLocale() { ! return req.getLocale(); ! } ! public Enumeration getLocales() { ! return req.getLocales(); ! } ! public boolean isSecure() { ! return req.isSecure(); ! } ! public RequestDispatcher getRequestDispatcher(String arg0) { ! return req.getRequestDispatcher(arg0); ! } ! public String getRealPath(String arg0) { ! return req.getRealPath(arg0); ! } ! public int getRemotePort() { ! return req.getRemotePort(); ! } ! public String getLocalName() { ! return req.getLocalName(); ! } ! public String getLocalAddr() { ! return req.getLocalAddr(); ! } ! public int getLocalPort() { ! return req.getLocalPort(); ! } ! }; ! super.doGet(delegatingReq, res); ! } } --- 1,330 ---- ! /** ! * Copyright 2005 The Apache Software Foundation * ! * Licensed under the Apache License, Version 2.0 (the "License"); ! * you may not use this file except in compliance with the License. ! * You may obtain a copy of the License at * ! * http://www.apache.org/licenses/LICENSE-2.0 * ! * Unless required by applicable law or agreed to in writing, software ! * distributed under the License is distributed on an "AS IS" BASIS, ! * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ! * See the License for the specific language governing permissions and ! * limitations under the License. */ + + // Changed package name by St.Ack package org.archive.access.nutch; ! // Added by St.Ack. ! import org.apache.nutch.searcher.NutchBean; ! import org.apache.nutch.searcher.Query; ! import org.apache.nutch.searcher.HitDetails; ! import org.apache.nutch.searcher.Hit; ! import org.apache.nutch.searcher.Hits; ! import java.io.IOException; ! import java.net.URLEncoder; ! import java.util.logging.Level; import java.util.Map; + import java.util.HashMap; + import java.util.Set; + import java.util.HashSet; import javax.servlet.ServletException; ! import javax.servlet.ServletConfig; ! import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; ! import javax.xml.parsers.*; ! import org.w3c.dom.*; ! import javax.xml.transform.TransformerFactory; ! import javax.xml.transform.Transformer; ! import javax.xml.transform.dom.DOMSource; ! import javax.xml.transform.stream.StreamResult; ! /** Present search results using A9's OpenSearch extensions to RSS, plus a few ! * Nutch-specific extensions. ! * ! * This is the nutch version with filtering for bad xml characters and ! * encoding of exacturl. St.Ack 10/12/2005. ! */ ! public class NutchwaxOpenSearchServlet extends HttpServlet { ! private static final Map NS_MAP = new HashMap(); ! static { ! NS_MAP.put("opensearch", "http://a9.com/-/spec/opensearchrss/1.0/"); ! NS_MAP.put("nutch", "http://www.nutch.org/opensearchrss/1.0/"); ! } ! private static final Set SKIP_DETAILS = new HashSet(); ! static { ! SKIP_DETAILS.add("url"); // redundant with RSS link ! SKIP_DETAILS.add("title"); // redundant with RSS title ! } ! private NutchBean bean; ! public void init(ServletConfig config) throws ServletException { ! try { ! bean = NutchBean.get(config.getServletContext()); ! } catch (IOException e) { ! throw new ServletException(e); ! } ! } ! public void doGet(HttpServletRequest request, HttpServletResponse response) ! throws ServletException, IOException { ! NutchBean.LOG.info("query request from " + request.getRemoteAddr()); ! // get parameters from request ! request.setCharacterEncoding("UTF-8"); ! String queryString = request.getParameter("query"); ! if (queryString == null) ! queryString = ""; ! // Do exacturl encoding. Added by St.Ack ! queryString = NutchwaxQuery.encodeExacturl(queryString); ! String urlQuery = URLEncoder.encode(queryString, "UTF-8"); ! int start = 0; // first hit to display ! String startString = request.getParameter("start"); ! if (startString != null) ! start = Integer.parseInt(startString); ! ! int hitsPerPage = 10; // number of hits to display ! String hitsString = request.getParameter("hitsPerPage"); ! if (hitsString != null) ! hitsPerPage = Integer.parseInt(hitsString); ! String sort = request.getParameter("sort"); ! boolean reverse = ! sort!=null && "true".equals(request.getParameter("reverse")); ! // De-Duplicate handling. Look for duplicates field and for how many ! // duplicates per results to return. Default duplicates field is 'site' ! // and duplicates per results default is '2'. ! String dedupField = request.getParameter("dedupField"); ! if (dedupField == null || dedupField.length() == 0) { ! dedupField = "site"; ! } ! int hitsPerDup = 2; ! String hitsPerDupString = request.getParameter("hitsPerDup"); ! if (hitsPerDupString != null && hitsPerDupString.length() > 0) { ! hitsPerDup = Integer.parseInt(hitsPerDupString); ! } else { ! // If 'hitsPerSite' present, use that value. ! String hitsPerSiteString = request.getParameter("hitsPerSite"); ! if (hitsPerSiteString != null && hitsPerSiteString.length() > 0) { ! hitsPerDup = Integer.parseInt(hitsPerSiteString); ! } ! } ! ! // Make up query string for use later drawing the 'rss' logo. ! String params = "&hitsPerPage=" + hitsPerPage + ! (sort == null ? "" : "&sort=" + sort + (reverse? "&reverse=true": "") + ! (dedupField == null ? "" : "&dedupField=" + dedupField)); ! Query query = Query.parse(queryString); ! NutchBean.LOG.info("query: " + queryString); ! // execute the query ! Hits hits; ! try { ! hits = bean.search(query, start + hitsPerPage, hitsPerDup, dedupField, ! sort, reverse); ! } catch (IOException e) { ! NutchBean.LOG.log(Level.WARNING, "Search Error", e); ! hits = new Hits(0,new Hit[0]); ! } ! NutchBean.LOG.info("total hits: " + hits.getTotal()); ! // generate xml results ! int end = (int)Math.min(hits.getLength(), start + hitsPerPage); ! int length = end-start; ! Hit[] show = hits.getHits(start, end-start); ! HitDetails[] details = bean.getDetails(show); ! String[] summaries = bean.getSummary(details, query); ! String requestUrl = request.getRequestURL().toString(); ! String base = requestUrl.substring(0, requestUrl.lastIndexOf('/')); ! ! try { ! DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); ! factory.setNamespaceAware(true); ! Document doc = factory.newDocumentBuilder().newDocument(); ! ! Element rss = addNode(doc, doc, "rss"); ! addAttribute(doc, rss, "version", "2.0"); ! addAttribute(doc, rss, "xmlns:opensearch", ! (String)NS_MAP.get("opensearch")); ! addAttribute(doc, rss, "xmlns:nutch", (String)NS_MAP.get("nutch")); ! Element channel = addNode(doc, rss, "channel"); ! ! addNode(doc, channel, "title", "Nutch: " + queryString); ! addNode(doc, channel, "description", "Nutch search results for query: " ! + queryString); ! addNode(doc, channel, "link", ! base+"/search.jsp" ! +"?query="+urlQuery ! +"&start="+start ! +"&hitsPerDup="+hitsPerDup ! +params); ! addNode(doc, channel, "opensearch", "totalResults", ""+hits.getTotal()); ! addNode(doc, channel, "opensearch", "startIndex", ""+start); ! addNode(doc, channel, "opensearch", "itemsPerPage", ""+hitsPerPage); ! addNode(doc, channel, "nutch", "query", queryString); ! ! if ((hits.totalIsExact() && end < hits.getTotal()) // more hits to show ! || (!hits.totalIsExact() && (hits.getLength() > start+hitsPerPage))){ ! addNode(doc, channel, "nutch", "nextPage", requestUrl ! +"?query="+urlQuery ! +"&start="+end ! +"&hitsPerDup="+hitsPerDup ! +params); ! } ! if ((!hits.totalIsExact() && (hits.getLength() <= start+hitsPerPage))) { ! addNode(doc, channel, "nutch", "showAllHits", requestUrl ! +"?query="+urlQuery ! +"&hitsPerDup="+0 ! +params); ! } ! for (int i = 0; i < length; i++) { ! Hit hit = show[i]; ! HitDetails detail = details[i]; ! String title = detail.getValue("title"); ! String url = detail.getValue("url"); ! String id = "idx=" + hit.getIndexNo() + "&id=" + hit.getIndexDocNo(); ! ! if (title == null || title.equals("")) // use url for docs w/o title ! title = url; ! Element item = addNode(doc, channel, "item"); ! addNode(doc, item, "title", title); ! addNode(doc, item, "description", summaries[i]); ! addNode(doc, item, "link", url); ! addNode(doc, item, "nutch", "site", hit.getDedupValue()); ! addNode(doc, item, "nutch", "cache", base+"/cached.jsp?"+id); ! addNode(doc, item, "nutch", "explain", base+"/explain.jsp?"+id ! +"&query="+urlQuery); ! if (hit.moreFromDupExcluded()) { ! addNode(doc, item, "nutch", "moreFromSite", requestUrl ! +"?query=" ! +URLEncoder.encode("site:"+hit.getDedupValue() ! +" "+queryString, "UTF-8") ! +"&hitsPerSite="+0 ! +params); ! } ! for (int j = 0; j < detail.getLength(); j++) { // add all from detail ! String field = detail.getField(j); ! if (!SKIP_DETAILS.contains(field)) ! addNode(doc, item, "nutch", field, detail.getValue(j)); ! } ! } ! // dump DOM tree ! DOMSource source = new DOMSource(doc); ! TransformerFactory transFactory = TransformerFactory.newInstance(); ! Transformer transformer = transFactory.newTransformer(); ! transformer.setOutputProperty("indent", "yes"); ! StreamResult result = new StreamResult(response.getOutputStream()); ! response.setContentType("text/xml"); ! transformer.transform(source, result); ! } catch (javax.xml.parsers.ParserConfigurationException e) { ! throw new ServletException(e); ! } catch (javax.xml.transform.TransformerException e) { ! throw new ServletException(e); ! } ! ! } ! private static Element addNode(Document doc, Node parent, String name) { ! Element child = doc.createElement(name); ! parent.appendChild(child); ! return child; ! } ! private static void addNode(Document doc, Node parent, ! String name, String text) { ! Element child = doc.createElement(name); ! child.appendChild(doc.createTextNode(getLegalXml(text))); ! parent.appendChild(child); ! } ! private static void addNode(Document doc, Node parent, ! String ns, String name, String text) { ! Element child = doc.createElementNS((String)NS_MAP.get(ns), ns+":"+name); ! child.appendChild(doc.createTextNode(getLegalXml(text))); ! parent.appendChild(child); ! } ! private static void addAttribute(Document doc, Element node, ! String name, String value) { ! Attr attribute = doc.createAttribute(name); ! attribute.setValue(getLegalXml(value)); ! node.getAttributes().setNamedItem(attribute); ! } ! /* ! * Ensure string is legal xml. ! * First look to see if string has illegal characters. If it doesn't, ! * just return it. Otherwise, create new string with illegal characters ! * @param text String to verify. ! * @return Passed <code>text</code> or a new string with illegal ! * characters removed if any found in <code>text</code>. ! * @see http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char ! */ ! private static String getLegalXml(final String text) { ! if (text == null) { ! return null; ! } ! boolean allLegal = true; ! for (int i = 0; i < text.length(); i++) { ! if (!isLegalXml(text.charAt(i))) { ! allLegal = false; ! break; ! } ! } ! return allLegal? text: createLegalXml(text); ! } ! private static String createLegalXml(final String text) { ! if (text == null) { ! return null; ! } ! StringBuffer buffer = new StringBuffer(text.length()); ! for (int i = 0; i < text.length(); i++) { ! char c = text.charAt(i); ! if (isLegalXml(c)) { ! buffer.append(c); ! } ! } ! return buffer.toString(); ! } ! ! private static boolean isLegalXml(final char c) { ! return c == 0x9 || c == 0xa || c == 0xd || (c >= 0x20 && c <= 0xd7ff) ! || (c >= 0xe000 && c <= 0xfffd) || (c >= 0x10000 && c <= 0x10ffff); ! } } + |
From: Doug C. <cu...@us...> - 2005-10-12 16:49:13
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3474 Modified Files: Tag: mapred IndexArcs.java Log Message: Add some command line options. Index: IndexArcs.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch/Attic/IndexArcs.java,v retrieving revision 1.1.2.1 retrieving revision 1.1.2.2 diff -C2 -d -r1.1.2.1 -r1.1.2.2 *** IndexArcs.java 1 Sep 2005 18:45:29 -0000 1.1.2.1 --- IndexArcs.java 12 Oct 2005 16:49:04 -0000 1.1.2.2 *************** *** 40,45 **** /* Import and index a set of arc files. */ public static void main(String args[]) throws Exception { ! if (args.length < 1) { ! System.out.println("Usage: IndexArcs <arcsDir> [-dir d]"); return; } --- 40,45 ---- /* Import and index a set of arc files. */ public static void main(String args[]) throws Exception { ! if (args.length < 2) { ! System.out.println("Usage: IndexArcs <arcsDir> <crawlDir> [-noimport] [-noinvert] [-noindex]"); return; } *************** *** 47,85 **** JobConf conf = new JobConf(NutchConf.get()); ! File arcsDir = null; ! File dir = new File("crawl-" + getDate()); ! for (int i = 0; i < args.length; i++) { ! if ("-dir".equals(args[i])) { ! dir = new File(args[i+1]); ! i++; ! } else if (args[i] != null) { ! arcsDir = new File(args[i]); } } NutchFileSystem fs = NutchFileSystem.get(conf); - if (fs.exists(dir)) { - throw new RuntimeException(dir + " already exists."); - } ! LOG.info("IndexArcs started in: " + dir); LOG.info("arcsDir = " + arcsDir); ! File linkDb = new File(dir + "/linkdb"); ! File index = new File(dir + "/indexes"); ! File segments = new File(dir + "/segments"); ! File segment = new File(segments, getDate()); ! // import arcs ! new ImportArcs(conf).importArcs(arcsDir, segment); ! // invert links ! new LinkDb(conf).invert(linkDb, segments); ! // index everything ! new Indexer(conf).index(index, linkDb, fs.listFiles(segments)); ! LOG.info("IndexArcs finished: " + dir); } } --- 47,93 ---- JobConf conf = new JobConf(NutchConf.get()); ! File arcsDir = new File(args[0]); ! File crawlDir = new File(args[1]); ! boolean noImport = false; ! boolean noInvert = false; ! boolean noIndex = false; ! ! for (int i = 2; i < args.length; i++) { ! if ("-noimport".equals(args[i])) { ! noImport = true; ! } else if ("-noinvert".equals(args[i])) { ! noInvert = true; ! } else if ("-noindex".equals(args[i])) { ! noIndex = true; } } NutchFileSystem fs = NutchFileSystem.get(conf); ! LOG.info("IndexArcs started in: " + crawlDir); LOG.info("arcsDir = " + arcsDir); ! File linkDb = new File(crawlDir + "/linkdb"); ! File segments = new File(crawlDir + "/segments"); ! if (!noImport) { // import arcs ! File segment = new File(segments, getDate()); ! LOG.info("importing arcs in " + arcsDir + " to " + segment); ! new ImportArcs(conf).importArcs(arcsDir, segment); ! } ! if (!noInvert) { // invert links ! LOG.info("inverting links in " + segments); ! new LinkDb(conf).invert(linkDb, segments); ! } ! if (!noIndex) { // index ! File index = new File(crawlDir + "/indexes"); ! LOG.info("indexing " + crawlDir); ! new Indexer(conf).index(index, linkDb, fs.listFiles(segments)); ! } ! LOG.info("IndexArcs finished: " + crawlDir); } } |
From: Sverre B. <sv...@us...> - 2005-10-12 13:25:37
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9329 Modified Files: index.php Log Message: Printing debug output when $conf_debug=1 Index: index.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/index.php,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** index.php 10 Oct 2005 13:13:22 -0000 1.5 --- index.php 12 Oct 2005 13:25:29 -0000 1.6 *************** *** 233,239 **** $results = $search->getResultSet(); ! //print "<pre>"; ! //print_r($results); ! //print "</pre>"; if ($total > 0) { print (nls("Total number of versions found")." : <b>$total</b>. "); --- 233,242 ---- $results = $search->getResultSet(); ! if ($conf_debug == 1) { ! print "Query url : <a href=\"" . $search->queryurl . "\">" . $search->queryurl . "</a>"; ! print "<pre>"; ! print_r($results); ! print "</pre>"; ! } if ($total > 0) { print (nls("Total number of versions found")." : <b>$total</b>. "); *************** *** 265,268 **** --- 268,274 ---- $versions = $search2->getResultSet(); $numversions = $search2->getNumHitsTotal(); + if ($conf_debug == 1) { + $count_versions_matching_queryurl = $search2->queryurl; + } } else { *************** *** 277,280 **** --- 283,289 ---- if ($search2->doQuery()) { $totalversions = $search2->getNumHitsTotal(); + if ($conf_debug == 1) { + $count_versions_total_queryurl = $search2->queryurl; + } } else { *************** *** 285,288 **** --- 294,305 ---- print $numversions_text1 . " "; print $numversions_text2 . $totalversions."<br>"; + if ($conf_debug == 1) { + if (isset($count_versions_matching_queryurl)) { + print "Url for counting versions matching query : <a href=\"" . $count_versions_matching_queryurl. "\">" . $count_versions_matching_queryurl . "</a><br/>"; + } + if (isset($count_versions_total_queryurl)) { + print "Url for counting versions total : <a href=\"" . $count_versions_total_queryurl. "\">" . $count_versions_total_queryurl . "</a><br/>"; + } + } } |
From: Sverre B. <sv...@us...> - 2005-10-12 13:01:34
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/help In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3306/help Modified Files: no_help.php en_help.php Log Message: Bug 1322668 Index: en_help.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/help/en_help.php,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** en_help.php 5 Oct 2005 01:38:18 -0000 1.2 --- en_help.php 12 Oct 2005 13:01:20 -0000 1.3 *************** *** 68,76 **** <h1>Search</h1> - <p><b>Match</b><br> - You can query for several words and require that each document in the result set contains all those words <i>(and)</i>, any of the words <i>(or)</i>, or an exact phrase.</p> - <p><b>Query string</b><br> ! The query string can be a whole word or a truncated one, like <i>air*</i> (would match e.g. <i>airport</i> or <i>airmail</i>).</p> <p><b>Search period</b><br> --- 68,80 ---- <h1>Search</h1> <p><b>Query string</b><br> ! <p> ! Type one or more search terms. Wera will present the results matching <b>all</b> of the search terms you type in. ! <ul> ! <li>Search for a phrase: ["term<sub>1</sub> .. term<sub>n</sub>"]</li> ! <li>Searching for all documents of type text/html: [type:text type:html]</li> ! <li>Search for a specific url: [exacturl:http://www.nb.no/]</li> ! </ul> ! </p> <p><b>Search period</b><br> Index: no_help.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/help/no_help.php,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** no_help.php 5 Oct 2005 01:38:18 -0000 1.2 --- no_help.php 12 Oct 2005 13:01:20 -0000 1.3 *************** *** 66,77 **** <!-- ********************************************************************************* --> ! <h1>Search</h1> <p><b>Søk etter</b><br> ! Du kan søke etter flere ord og kreve at alle dokumentene i resultatet inneholder alle ordene, et av ordene, eller en eksakt frase</p> ! ! <p><b>Spørring</b><br> ! Spørringen kan være et helt ord eller trunkert, som <i>traktor*</i> (vil gi treff på f.eks. <i>traktorsko</i> og <i>traktordekk</i>). ! <p><b>à r (fra - til)</b><br> For avgrense søket til en tidsperiode fyll i årstall fra og til (evt. kun et av dem for før og etter et gitt årstall). --- 66,80 ---- <!-- ********************************************************************************* --> ! <h1>Søk</h1> <p><b>Søk etter</b><br> ! <p> ! Tast inn ett eller flere ord. Wera vil vise resultatene som matcher <b>alle</b> ordene du tastet inn. ! <ul> ! <li>Frase: ["term<sub>1</sub> .. term<sub>n</sub>"]</li> ! <li>Alle dokumenter av type text/html: [type:text type:html]</li> ! <li>Søk etter en gitt url: [exacturl:http://www.nb.no/]</li> ! </ul> ! </p> <p><b>à r (fra - til)</b><br> For avgrense søket til en tidsperiode fyll i årstall fra og til (evt. kun et av dem for før og etter et gitt årstall). |
From: Sverre B. <sv...@us...> - 2005-10-12 13:00:09
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2861 Modified Files: help.php Log Message: Index: help.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/help.php,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** help.php 5 Oct 2005 01:38:18 -0000 1.2 --- help.php 12 Oct 2005 12:59:54 -0000 1.3 *************** *** 38,42 **** $helpfile = "$conf_rootpath/help/" . $language . "_help.php"; } ! ?> --- 38,42 ---- $helpfile = "$conf_rootpath/help/" . $language . "_help.php"; } ! Header("content-type: text/html; charset=UTF-8", false); ?> *************** *** 44,47 **** --- 44,48 ---- <HEAD> <link rel="stylesheet" href="<?php print $conf_gui_style;?>" type="text/css"> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"> <TITLE>WERA help</TITLE> </HEAD> |
From: Sverre B. <sv...@us...> - 2005-10-12 11:56:06
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/lib/seal In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18934/lib/seal Modified Files: nutch.inc Log Message: Removed debug output because of bug 1324757. Debug output should be produced in the calling scripts instead. Index: nutch.inc =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/lib/seal/nutch.inc,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** nutch.inc 6 Oct 2005 19:18:42 -0000 1.5 --- nutch.inc 12 Oct 2005 11:55:56 -0000 1.6 *************** *** 177,184 **** } - if ($this->debug == 1) { - print $this->queryurl; - } - if ($this->isReady()) { $this->hitno = $this->offset; --- 177,180 ---- *************** *** 190,194 **** if ($data) { if (!xml_parse($this->xml_parser, $data)) { - #die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($this->xml_parser)), xml_get_current_line_number($this->xml_parser))); $retval = false; $this->errormsg = sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($this->xml_parser)), xml_get_current_line_number($this->xml_parser)); --- 186,189 ---- *************** *** 208,217 **** $this->timespent = (microtime_float() - $time_start); - if ($this->debug == 1) { - print "\n<!--"; - print $this->queryurl; - print_r($this->resultset); - print "-->\n"; - } } } --- 203,206 ---- |
From: Sverre B. <sv...@us...> - 2005-10-12 11:43:46
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv16734 Modified Files: documentDispatcher.php Log Message: Fixed bug 1324755 Index: documentDispatcher.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/documentDispatcher.php,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** documentDispatcher.php 10 Oct 2005 13:11:37 -0000 1.5 --- documentDispatcher.php 12 Oct 2005 11:43:35 -0000 1.6 *************** *** 208,212 **** $handler_url .= '?aid='.urlencode($conf_document_retriever . $document['archiveidentifier']).'&time='.$document['date'].'&mime='.$document['mime'].'&url='.$document['url']; if ($document['encoding']) { ! Header("content-type: " . $document['mime'] . ", " . $document['encoding'], false); } else { --- 208,212 ---- $handler_url .= '?aid='.urlencode($conf_document_retriever . $document['archiveidentifier']).'&time='.$document['date'].'&mime='.$document['mime'].'&url='.$document['url']; if ($document['encoding']) { ! Header("content-type: " . $document['mime'] . "; charset=" . $document['encoding'], false); } else { |
From: Sverre B. <sv...@us...> - 2005-10-11 11:26:51
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/handlers In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24016/handlers Modified Files: html_javascript.php html_javascript.php.js Log Message: Added url and time in the javascript popup showing at the top of the viewed web page Index: html_javascript.php.js =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/handlers/html_javascript.php.js,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** html_javascript.php.js 4 Oct 2005 22:59:27 -0000 1.1 --- html_javascript.php.js 11 Oct 2005 11:26:54 -0000 1.2 *************** *** 78,84 **** "<div style='" + "position:relative;z-index:99999;"+ ! "border:1px solid;color:black;background-color:lightYellow;font-size:10px;font-family:sans-serif;padding:5px'>" + "WERA... External links, forms, and search boxes may not function within this collection. " + ! "[ <a style='color:blue;font-size:10px;text-decoration:underline' href=\"javascript:void(top.disclaimElem.style.display='none')\">hide</a> ]" + "</div>"; --- 78,85 ---- "<div style='" + "position:relative;z-index:99999;"+ ! "border:1px solid;color:black;background-color:lightYellow;font-size:10px;font-family:sans-serif;padding:5px'>" + "WERA... External links, forms, and search boxes may not function within this collection. " + ! " Url: " + weraUrl + ", time: " + weraTime + ! " [ <a style='color:blue;font-size:10px;text-decoration:underline' href=\"javascript:void(top.disclaimElem.style.display='none')\">hide</a> ]" + "</div>"; Index: html_javascript.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/handlers/html_javascript.php,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** html_javascript.php 5 Oct 2005 01:38:18 -0000 1.2 --- html_javascript.php 11 Oct 2005 11:26:54 -0000 1.3 *************** *** 72,78 **** $document = preg_replace("/<html>/i", "<HTML>\n" . $hrefstring . "</HEAD>", $document, 1); } ! $js_to_insert = "<SCRIPT language=\"Javascript\">\n"; $js_to_insert .= "var sWayBackCGI = \"##P#R#E#F#I#X##TIME#$time\"\n"; $js_to_insert .= "</SCRIPT>\n"; $js_to_insert .= file_get_contents($_SERVER['SCRIPT_FILENAME'] . ".js"); --- 72,80 ---- $document = preg_replace("/<html>/i", "<HTML>\n" . $hrefstring . "</HEAD>", $document, 1); } ! $weraTime = substr($time,0,4) . "-" . substr($time,4,2) . "-" . substr($time,6,2) . " " . substr($time,8,2) . ":" . substr($time,10,2) . ":" . substr($time,12,2); $js_to_insert = "<SCRIPT language=\"Javascript\">\n"; $js_to_insert .= "var sWayBackCGI = \"##P#R#E#F#I#X##TIME#$time\"\n"; + $js_to_insert .= "var weraTime = \"$weraTime\"\n"; + $js_to_insert .= "var weraUrl = \"$url\"\n"; $js_to_insert .= "</SCRIPT>\n"; $js_to_insert .= file_get_contents($_SERVER['SCRIPT_FILENAME'] . ".js"); |
From: Michael S. <sta...@us...> - 2005-10-11 01:09:50
|
Update of /cvsroot/archive-access/archive-access/projects/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22924 Modified Files: project.xml Log Message: * project.xml Add demo link. Index: project.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/project.xml,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** project.xml 6 Oct 2005 23:16:29 -0000 1.9 --- project.xml 11 Oct 2005 01:09:50 -0000 1.10 *************** *** 37,40 **** --- 37,42 ---- See the <a href="articles/manual">wera Manual</a> for more on how wera works, requirements, and installation. + For a demo of wera+nutchwax, see + <a href="http://nwa.nb.no/wera/">nwa.nb.no/wera</a>. Wera development has been sponsored by the <a href="http://www.netpreserve.net">International Internet Preservation |
From: Sverre B. <sv...@us...> - 2005-10-10 13:13:28
|
Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13455 Modified Files: index.php Log Message: Fixed bug 1322554 Index: index.php =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/index.php,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** index.php 6 Oct 2005 02:07:37 -0000 1.4 --- index.php 10 Oct 2005 13:13:22 -0000 1.5 *************** *** 259,266 **** $numversions_text2 = ""; ! if ($conf_show_num_verions_matching_query) { $vquery = $querystring . " exacturl:" . urlencode($value["url"]); $search2->setQuery($vquery); - if ($search2->doQuery()) { $versions = $search2->getResultSet(); --- 259,265 ---- $numversions_text2 = ""; ! if ($conf_show_num_verions_matching_query and !strstr($querystring, "exacturl:")) { $vquery = $querystring . " exacturl:" . urlencode($value["url"]); $search2->setQuery($vquery); if ($search2->doQuery()) { $versions = $search2->getResultSet(); |