You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(10) |
Sep
(36) |
Oct
(339) |
Nov
(103) |
Dec
(152) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(141) |
Feb
(102) |
Mar
(125) |
Apr
(203) |
May
(57) |
Jun
(30) |
Jul
(139) |
Aug
(46) |
Sep
(64) |
Oct
(105) |
Nov
(34) |
Dec
(162) |
2007 |
Jan
(81) |
Feb
(57) |
Mar
(141) |
Apr
(72) |
May
(9) |
Jun
(1) |
Jul
(144) |
Aug
(88) |
Sep
(40) |
Oct
(43) |
Nov
(34) |
Dec
(20) |
2008 |
Jan
(44) |
Feb
(45) |
Mar
(16) |
Apr
(36) |
May
(8) |
Jun
(77) |
Jul
(177) |
Aug
(66) |
Sep
(8) |
Oct
(33) |
Nov
(13) |
Dec
(37) |
2009 |
Jan
(2) |
Feb
(5) |
Mar
(8) |
Apr
|
May
(36) |
Jun
(19) |
Jul
(46) |
Aug
(8) |
Sep
(1) |
Oct
(66) |
Nov
(61) |
Dec
(10) |
2010 |
Jan
(13) |
Feb
(16) |
Mar
(38) |
Apr
(76) |
May
(47) |
Jun
(32) |
Jul
(35) |
Aug
(45) |
Sep
(20) |
Oct
(61) |
Nov
(24) |
Dec
(16) |
2011 |
Jan
(22) |
Feb
(34) |
Mar
(11) |
Apr
(8) |
May
(24) |
Jun
(23) |
Jul
(11) |
Aug
(42) |
Sep
(81) |
Oct
(48) |
Nov
(21) |
Dec
(20) |
2012 |
Jan
(30) |
Feb
(25) |
Mar
(4) |
Apr
(6) |
May
(1) |
Jun
(5) |
Jul
(5) |
Aug
(8) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
|
From: Brad <bra...@us...> - 2005-11-16 03:11:42
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/core Modified Files: WaybackLogic.java Timestamp.java Resource.java Added Files: RequestFilter.java WaybackRequest.java SearchResult.java SearchResults.java Removed Files: ResourceResult.java ResourceResults.java WMRequest.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: SearchResult.java --- /* SearchResult * * $Id: SearchResult.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 12:45:18 PM Nov 9, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.core; import java.util.Properties; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class SearchResult { private Properties data = null; public SearchResult() { super(); data = new Properties(); } public boolean containsKey(String key) { return data.containsKey(key); } public String get(String key) { return (String) data.get(key); } public String put(String key, String value) { return (String) data.put(key, value); } } --- NEW FILE: SearchResults.java --- /* SearchResults * * $Id: SearchResults.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 12:52:13 PM Nov 9, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.core; import java.util.ArrayList; import java.util.Iterator; import java.util.Properties; import org.archive.wayback.WaybackConstants; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class SearchResults { private ArrayList results = null; private String firstResultDate; private String lastResultDate; private Properties filters = new Properties(); public SearchResults() { super(); results = new ArrayList(); } /** * @return true if no SearchResult objects, false otherwise. */ public boolean isEmpty() { return results.isEmpty(); } /** * @param result * SearchResult to add to this set */ public void addSearchResult(final SearchResult result) { String resultDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); if((firstResultDate == null) || (firstResultDate.compareTo(resultDate) < 0)) { firstResultDate = resultDate; } if((lastResultDate == null) || (lastResultDate.compareTo(resultDate) > 0)) { lastResultDate = resultDate; } results.add(result); } /** * @return number of SearchResult objects contained in these SearchResults */ public int getResultCount() { return results.size(); } /** * @return an Iterator that contains the ResourceResult objects */ public Iterator iterator() { return results.iterator(); } /** * @return Returns the firstResultDate. */ public String getFirstResultDate() { return firstResultDate; } /** * @return Returns the lastResultDate. */ public String getLastResultDate() { return lastResultDate; } public boolean containsFilter(String key) { return filters.containsKey(key); } public String getFilter(String key) { return (String) filters.get(key); } public String putFilter(String key, String value) { return (String) filters.put(key, value); } } --- ResourceResult.java DELETED --- --- WMRequest.java DELETED --- Index: Resource.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core/Resource.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** Resource.java 19 Oct 2005 01:22:36 -0000 1.2 --- Resource.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 24,27 **** --- 24,36 ---- package org.archive.wayback.core; + import java.io.IOException; + import java.io.InputStream; + import java.util.Enumeration; + import java.util.Iterator; + import java.util.Map; + import java.util.Properties; + import java.util.Set; + + import org.apache.commons.httpclient.Header; import org.archive.io.arc.ARCRecord; *************** *** 30,43 **** * to allow the Wayback to operator with non-ARC file format resources. Probably * the interface required will end up looking very much like ARCRecord, but can ! * be reimplemented to handle new ARC formats or non-ARC formats. At the moment, ! * users of this class just grab the ARCRecord out and use it directly. * * @author Brad Tofel * @version $Date$, $Revision$ */ ! public class Resource { ARCRecord arcRecord = null; ! /** * Constructor --- 39,55 ---- * to allow the Wayback to operator with non-ARC file format resources. Probably * the interface required will end up looking very much like ARCRecord, but can ! * be reimplemented to handle new ARC formats or non-ARC formats. * * @author Brad Tofel * @version $Date$, $Revision$ */ ! public class Resource extends InputStream { + private static String ARC_META_PREFIX = "arcmeta."; + private static String HTTP_HEADER_PREFIX = "httpheader."; ARCRecord arcRecord = null; ! boolean parsedHeader = false; ! Properties metaData = new Properties(); ! /** * Constructor *************** *** 50,53 **** --- 62,126 ---- } + public void parseHeaders () throws IOException { + if(!parsedHeader) { + arcRecord.skipHttpHeader(); + + // copy all HTTP headers to metaData, prefixing with + // HTTP_HEADER_PREFIX + Header[] headers = arcRecord.getHttpHeaders(); + if (headers != null) { + for (int i = 0; i < headers.length; i++) { + String value = headers[i].getValue(); + String name = headers[i].getName(); + metaData.put(HTTP_HEADER_PREFIX + name,value); + } + } + + // copy all ARC record header fields to metaData, prefixing with + // ARC_META_PREFIX + Map headerMetaMap = arcRecord.getMetaData().getHeaderFields(); + Set keys = headerMetaMap.keySet(); + Iterator itr = keys.iterator(); + while(itr.hasNext()) { + Object metaKey = itr.next(); + Object metaValue = headerMetaMap.get(metaKey); + String metaStringValue = (metaValue == null) ? "" : + metaValue.toString(); + metaData.put(ARC_META_PREFIX + metaKey.toString(), + metaStringValue); + } + + parsedHeader = true; + } + } + + public Properties filterMeta(String prefix) { + Properties matching = new Properties(); + for (Enumeration e = metaData.keys(); e.hasMoreElements();) { + String key = (String) e.nextElement(); + if (key.startsWith(prefix)) { + String finalKey = key.substring(prefix.length()); + String value = (String) metaData.get(key); + matching.put(finalKey, value); + } + } + return matching; + } + + public Properties getHttpHeaders() { + return filterMeta(HTTP_HEADER_PREFIX); + } + + public Properties getARCMetadata() { + return filterMeta(ARC_META_PREFIX); + } + + /* (non-Javadoc) + * @see org.archive.io.arc.ARCRecord#getStatusCode() + */ + public int getStatusCode() { + return arcRecord.getStatusCode(); + } + /** * @return the ARCRecord underlying this Resource. *************** *** 64,66 **** --- 137,167 ---- } + /* (non-Javadoc) + * @see org.archive.io.arc.ARCRecord#read() + */ + public int read() throws IOException { + return arcRecord.read(); + } + + /* (non-Javadoc) + * @see org.archive.io.arc.ARCRecord#read(byte[], int, int) + */ + public int read(byte[] arg0, int arg1, int arg2) throws IOException { + return arcRecord.read(arg0, arg1, arg2); + } + + /* (non-Javadoc) + * @see java.io.InputStream#read(byte[]) + */ + public int read(byte[] b) throws IOException { + return arcRecord.read(b); + } + + /* (non-Javadoc) + * @see org.archive.io.arc.ARCRecord#skip(long) + */ + public long skip(long arg0) throws IOException { + return arcRecord.skip(arg0); + } + } Index: Timestamp.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core/Timestamp.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** Timestamp.java 20 Oct 2005 00:40:41 -0000 1.3 --- Timestamp.java 16 Nov 2005 03:11:29 -0000 1.4 *************** *** 146,150 **** String last = LAST1_TIMESTAMP; if (input.length() == 0) { ! return LAST2_TIMESTAMP; } if (input.length() < 4) { --- 146,150 ---- String last = LAST1_TIMESTAMP; if (input.length() == 0) { ! return ArchiveUtils.get14DigitDate(new Date()); } if (input.length() < 4) { Index: WaybackLogic.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core/WaybackLogic.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** WaybackLogic.java 19 Oct 2005 01:22:36 -0000 1.2 --- WaybackLogic.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 24,34 **** package org.archive.wayback.core; import java.util.Properties; import java.util.logging.Logger; ! import org.archive.wayback.QueryUI; ! import org.archive.wayback.ReplayUI; import org.archive.wayback.ResourceIndex; import org.archive.wayback.ResourceStore; /** --- 24,38 ---- package org.archive.wayback.core; + import java.util.Enumeration; import java.util.Properties; import java.util.logging.Logger; ! import org.archive.wayback.PropertyConfigurable; ! import org.archive.wayback.ReplayRenderer; ! import org.archive.wayback.QueryRenderer; ! import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.ResourceIndex; import org.archive.wayback.ResourceStore; + import org.archive.wayback.exception.ConfigurationException; /** *************** *** 38,56 **** * @version $Date$, $Revision$ */ ! public class WaybackLogic { private static final Logger LOGGER = Logger.getLogger(WaybackLogic.class .getName()); ! private static final String REPLAY_UI_CLASS = "replayui.class"; ! private static final String QUERY_UI_CLASS = "queryui.class"; ! private static final String RESOURCE_STORE_CLASS = "resourcestore.class"; ! private static final String RESOURCE_INDEX_CLASS = "resourceindex.class"; ! private ReplayUI replayUI = null; ! private QueryUI queryUI = null; private ResourceIndex resourceIndex = null; --- 42,65 ---- * @version $Date$, $Revision$ */ ! public class WaybackLogic implements PropertyConfigurable { private static final Logger LOGGER = Logger.getLogger(WaybackLogic.class .getName()); ! private static final String REPLAY_URI_CONVERTER_PROPERTY = ! "replayuriconverter"; ! private static final String REPLAY_RENDERER_PROPERTY = "replayrenderer"; ! private static final String QUERY_RENDERER_PROPERTY = "queryrenderer"; ! private static final String RESOURCE_STORE_PROPERTY = "resourcestore"; ! private static final String RESOURCE_INDEX_PROPERTY = "resourceindex"; ! private ReplayResultURIConverter uriConverter = null; ! ! private ReplayRenderer replayRenderer = null; ! ! private QueryRenderer queryRenderer = null; private ResourceIndex resourceIndex = null; *************** *** 74,152 **** * @throws Exception */ ! public void init(Properties p) throws Exception { LOGGER.info("WaybackLogic constructing classes..."); ! replayUI = (ReplayUI) getInstance(p, REPLAY_UI_CLASS, "replayui"); ! queryUI = (QueryUI) getInstance(p, QUERY_UI_CLASS, "queryUI"); ! resourceStore = (ResourceStore) getInstance(p, RESOURCE_STORE_CLASS, ! "resourceStore"); ! resourceIndex = (ResourceIndex) getInstance(p, RESOURCE_INDEX_CLASS, ! "resourceIndex"); ! LOGGER.info("WaybackLogic initializing classes..."); ! try { ! replayUI.init(p); ! LOGGER.info("initialized replayUI"); ! queryUI.init(p); ! LOGGER.info("initialized queryUI"); ! resourceStore.init(p); ! LOGGER.info("initialized resourceStore"); ! resourceIndex.init(p); ! LOGGER.info("initialized resourceIndex"); - } catch (Exception e) { - throw new Exception(e.getMessage()); - } } ! protected Object getInstance(final Properties p, ! final String classProperty, final String pretty) throws Exception { ! Object result = null; ! String className = (String) p.get(classProperty); ! if ((className == null) || (className.length() <= 0)) { ! throw new Exception("No config (" + classProperty + " for " ! + pretty + ")"); } try { ! result = Class.forName(className).newInstance(); ! LOGGER.info("new " + className + " " + pretty + " created."); } catch (Exception e) { ! // Convert. Add info. ! throw new Exception("Failed making " + pretty + " with " ! + className + ": " + e.getMessage()); } return result; } /** ! * @return Returns the queryUI. */ ! public QueryUI getQueryUI() { ! return queryUI; } /** ! * @return Returns the replayUI. */ ! public ReplayUI getReplayUI() { ! return replayUI; } /** ! * @return Returns the resourceIndex. */ ! public ResourceIndex getResourceIndex() { ! return resourceIndex; } /** ! * @return Returns the resourceStore. */ ! public ResourceStore getResourceStore() { ! return resourceStore; } --- 83,185 ---- * @throws Exception */ ! public void init(Properties p) throws ConfigurationException { LOGGER.info("WaybackLogic constructing classes..."); ! uriConverter = (ReplayResultURIConverter) getInstance(p, ! REPLAY_URI_CONVERTER_PROPERTY); ! replayRenderer = (ReplayRenderer) getInstance(p, ! REPLAY_RENDERER_PROPERTY); ! queryRenderer = (QueryRenderer) getInstance(p, QUERY_RENDERER_PROPERTY); ! resourceStore = (ResourceStore) getInstance(p, RESOURCE_STORE_PROPERTY); ! resourceIndex = (ResourceIndex) getInstance(p, RESOURCE_INDEX_PROPERTY); ! ! LOGGER.info("WaybackLogic initialized classes..."); } ! protected PropertyConfigurable getInstance(final Properties p, ! final String classPrefix) throws ConfigurationException { ! PropertyConfigurable result = null; ! ! String classNameKey = classPrefix + ".classname"; ! String propertyPrefix = classPrefix + "."; ! String className = null; ! ! // build new class-specific Properties for class initialization: ! Properties classProperties = new Properties(); ! for (Enumeration e = p.keys(); e.hasMoreElements();) { ! String key = (String) e.nextElement(); ! ! if (key.equals(classNameKey)) { ! ! // special .classname value: ! className = (String) p.get(key); ! ! } else if (key.startsWith(propertyPrefix)) { ! ! String finalKey = key.substring(propertyPrefix.length()); ! String value = (String) p.get(key); ! classProperties.put(finalKey, value); ! ! } ! } ! ! // did we find the implementation class? ! if (className == null) { ! throw new ConfigurationException("No configuration for (" ! + classNameKey + ")"); } try { ! result = (PropertyConfigurable) Class.forName(className) ! .newInstance(); } catch (Exception e) { ! e.printStackTrace(); ! throw new ConfigurationException(e.getMessage()); } + LOGGER.info("new " + className + " created."); + result.init(p); + LOGGER.info("initialized " + className); + return result; } /** ! * @return Returns the resourceIndex. */ ! public ResourceIndex getResourceIndex() { ! return resourceIndex; } /** ! * @return Returns the resourceStore. */ ! public ResourceStore getResourceStore() { ! return resourceStore; } /** ! * @return Returns the uriConverter. */ ! public ReplayResultURIConverter getURIConverter() { ! return uriConverter; } /** ! * @return Returns the replayRenderer. */ ! public ReplayRenderer getReplayRenderer() { ! return replayRenderer; ! } ! ! /** ! * @return Returns the queryRenderer. ! */ ! public QueryRenderer getQueryRenderer() { ! return queryRenderer; } --- NEW FILE: WaybackRequest.java --- /* WMRequest * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.core; import java.util.Properties; /** * Abstraction of all the data associated with a users request to the Wayback * Machine. * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class WaybackRequest { private int resultsPerPage = 1000; private int pageNum = 1; private Properties filters = new Properties(); /** * Constructor */ public WaybackRequest() { super(); } /** * @return Returns the pageNum. */ public int getPageNum() { return pageNum; } /** * @param pageNum The pageNum to set. */ public void setPageNum(int pageNum) { this.pageNum = pageNum; } /** * @return Returns the resultsPerPage. */ public int getResultsPerPage() { return resultsPerPage; } /** * @param resultsPerPage The resultsPerPage to set. */ public void setResultsPerPage(int resultsPerPage) { this.resultsPerPage = resultsPerPage; } public boolean containsKey(String key) { return filters.containsKey(key); } public String get(String key) { return (String) filters.get(key); } public void put(String key, String value) { filters.put(key, value); } } --- ResourceResults.java DELETED --- --- NEW FILE: RequestFilter.java --- /* RequestFilter * * $Id: RequestFilter.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 1:17:08 PM Nov 8, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.core; import java.io.IOException; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.RequestDispatcher; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public abstract class RequestFilter implements Filter { private static final String WMREQUEST_ATTRIBUTE = "wmrequest.attribute"; private static final String HANDLER_URL = "handler.url"; private String handlerUrl = null; /** * Constructor */ public RequestFilter() { super(); } public void init(FilterConfig c) throws ServletException { handlerUrl = c.getInitParameter(HANDLER_URL); if ((handlerUrl == null) || (handlerUrl.length() <= 0)) { throw new ServletException("No config (" + HANDLER_URL + ")"); } } public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { if (!handle(request, response)) { chain.doFilter(request, response); } } protected boolean handle(final ServletRequest request, final ServletResponse response) throws IOException, ServletException { if (!(request instanceof HttpServletRequest)) { return false; } if (!(response instanceof HttpServletResponse)) { return false; } HttpServletRequest httpRequest = (HttpServletRequest) request; //HttpServletResponse httpResponse = (HttpServletResponse) response; WaybackRequest wbRequest = parseRequest(httpRequest); if (wbRequest == null) { return false; } request.setAttribute(WMREQUEST_ATTRIBUTE, wbRequest); RequestDispatcher dispatcher = request.getRequestDispatcher(handlerUrl); dispatcher.forward(request, response); return true; } protected abstract WaybackRequest parseRequest( HttpServletRequest httpRequest); public void destroy() { } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:42
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/PipelineUI In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/webapp/jsp/PipelineUI Modified Files: PipelineStatus.jsp Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: PipelineStatus.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/PipelineUI/PipelineStatus.jsp,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** PipelineStatus.jsp 21 Oct 2005 03:24:40 -0000 1.1 --- PipelineStatus.jsp 16 Nov 2005 03:11:30 -0000 1.2 *************** *** 1,4 **** <jsp:include page="../../template/UI-header.jsp" /> ! <%@ page import="org.archive.wayback.arcindexer.PipelineStatus" %> <% PipelineStatus status = (PipelineStatus) request.getAttribute("pipelinestatus"); --- 1,4 ---- <jsp:include page="../../template/UI-header.jsp" /> ! <%@ page import="org.archive.wayback.cdx.indexer.PipelineStatus" %> <% PipelineStatus status = (PipelineStatus) request.getAttribute("pipelinestatus"); |
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/archivalurl In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/archivalurl Added Files: ReplayFilter.java QueryFilter.java JSReplayRenderer.java ResultURIConverter.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: QueryFilter.java --- /* QueryFilter * * $Id: QueryFilter.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 1:22:14 PM Nov 8, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.archivalurl; import java.text.ParseException; import java.util.regex.Matcher; import java.util.regex.Pattern; import javax.servlet.http.HttpServletRequest; import org.apache.commons.httpclient.URIException; import org.archive.net.UURI; import org.archive.net.UURIFactory; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.RequestFilter; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.WaybackRequest; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class QueryFilter extends RequestFilter { private final static Pattern WB_QUERY_REGEX = Pattern .compile("^/(\\d{0,13})\\*/(.*[^*])$"); private final static Pattern WB_PATH_QUERY_REGEX = Pattern .compile("^/(\\d{0,13})\\*/(.*)\\*$"); public WaybackRequest parseRequest(HttpServletRequest request) { WaybackRequest wbRequest = null; Matcher matcher = null; String queryString = request.getQueryString(); String origRequestPath = request.getRequestURI(); if (queryString != null) { origRequestPath = request.getRequestURI() + "?" + queryString; } String contextPath = request.getContextPath(); if (!origRequestPath.startsWith(contextPath)) { return null; } String requestPath = origRequestPath.substring(contextPath.length()); matcher = WB_QUERY_REGEX.matcher(requestPath); if (matcher != null && matcher.matches()) { wbRequest = new WaybackRequest(); String dateStr = matcher.group(1); String urlStr = matcher.group(2); try { String startDate = Timestamp.parseBefore(dateStr).getDateStr(); String endDate = Timestamp.parseAfter(dateStr).getDateStr(); wbRequest.put(WaybackConstants.REQUEST_START_DATE,startDate); wbRequest.put(WaybackConstants.REQUEST_END_DATE,endDate); // wbRequest.setStartTimestamp(Timestamp.parseBefore(dateStr)); // wbRequest.setEndTimestamp(Timestamp.parseAfter(dateStr)); } catch (ParseException e1) { e1.printStackTrace(); return null; } wbRequest.put(WaybackConstants.REQUEST_TYPE, WaybackConstants.REQUEST_URL_QUERY); // wbRequest.setQuery(); if (!urlStr.startsWith("http://")) { urlStr = "http://" + urlStr; } try { UURI requestURI = UURIFactory.getInstance(urlStr); wbRequest.put(WaybackConstants.REQUEST_URL, requestURI.toString()); // wbRequest.setRequestURI(requestURI); } catch (URIException e) { wbRequest = null; } } else { matcher = WB_PATH_QUERY_REGEX.matcher(requestPath); if (matcher != null && matcher.matches()) { wbRequest = new WaybackRequest(); String dateStr = matcher.group(1); String urlStr = matcher.group(2); try { String startDate = Timestamp.parseBefore(dateStr).getDateStr(); String endDate = Timestamp.parseAfter(dateStr).getDateStr(); wbRequest.put(WaybackConstants.REQUEST_START_DATE, startDate); wbRequest.put(WaybackConstants.REQUEST_END_DATE,endDate); // wbRequest.setStartTimestamp(Timestamp.parseBefore(dateStr)); // wbRequest.setEndTimestamp(Timestamp.parseAfter(dateStr)); } catch (ParseException e1) { e1.printStackTrace(); return null; } wbRequest.put(WaybackConstants.REQUEST_TYPE, WaybackConstants.REQUEST_URL_PREFIX_QUERY); // wbRequest.setPathQuery(); if (!urlStr.startsWith("http://")) { urlStr = "http://" + urlStr; } try { UURI requestURI = UURIFactory.getInstance(urlStr); wbRequest.put(WaybackConstants.REQUEST_URL,requestURI.toString()); } catch (URIException e) { wbRequest = null; } } } return wbRequest; } } --- NEW FILE: ResultURIConverter.java --- /* ResultURIConverter * * $Id: ResultURIConverter.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 5:24:36 PM Nov 1, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.archivalurl; import java.util.Properties; import org.apache.commons.httpclient.URIException; import org.archive.net.UURI; import org.archive.net.UURIFactory; import org.archive.wayback.WaybackConstants; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.core.SearchResult; import org.archive.wayback.exception.ConfigurationException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class ResultURIConverter implements ReplayResultURIConverter { private final static String REPLAY_URI_PREFIX_PROPERTY = "replayuriprefix"; private String replayUriPrefix; /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#init(java.util.Properties) */ public void init(Properties p) throws ConfigurationException { // TODO Auto-generated method stub replayUriPrefix = (String) p.get( REPLAY_URI_PREFIX_PROPERTY); if (replayUriPrefix == null || replayUriPrefix.length() <= 0) { throw new ConfigurationException("Failed to find " + REPLAY_URI_PREFIX_PROPERTY); } } /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#makeReplayURI(org.archive.wayback.core.ResourceResult) */ public String makeReplayURI(SearchResult result) { return replayUriPrefix + "/" + result.get(WaybackConstants.RESULT_CAPTURE_DATE) + "/" + result.get(WaybackConstants.RESULT_URL); } /** * @return Returns the replayUriPrefix. */ public String getReplayUriPrefix() { return replayUriPrefix; } /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#makeRedirectReplayURI(org.archive.wayback.core.SearchResult, java.lang.String) */ public String makeRedirectReplayURI(SearchResult result, String url) { String finalUrl = url; try { UURI origURI = UURIFactory.getInstance(url); if(!origURI.isAbsoluteURI()) { String resultUrl = result.get(WaybackConstants.RESULT_URL); UURI absResultURI = UURIFactory.getInstance(resultUrl); UURI finalURI = absResultURI.resolve(url); finalUrl = finalURI.getEscapedURI(); } } catch (URIException e) { // TODO Auto-generated catch block e.printStackTrace(); } return replayUriPrefix + "/" + result.get(WaybackConstants.RESULT_CAPTURE_DATE) + "/" + finalUrl; } } --- NEW FILE: JSReplayRenderer.java --- /* JSRenderer * * $Id: JSReplayRenderer.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 1:34:16 PM Nov 8, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.archivalurl; import java.io.IOException; import java.text.ParseException; import javax.servlet.ServletException; import javax.servlet.ServletOutputStream; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.WaybackConstants; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.core.Resource; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.proxy.RawReplayRenderer; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class JSReplayRenderer extends RawReplayRenderer { private final static String TEXT_HTML_MIME = "text/html"; private boolean isRawReplayResult(SearchResult result) { if (-1 == result.get(WaybackConstants.RESULT_MIME_TYPE).indexOf( TEXT_HTML_MIME)) { return true; } return false; } private void redirectToBetterUrl(HttpServletResponse httpResponse, String url) throws IOException { httpResponse.sendRedirect(url); } public void renderResource(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, SearchResult result, Resource resource, ReplayResultURIConverter uriConverter) throws ServletException, IOException { if (resource == null) { throw new IllegalArgumentException("No resource"); } if (result == null) { throw new IllegalArgumentException("No result"); } // redirect to actual date if diff than request: if (!wbRequest.get(WaybackConstants.REQUEST_EXACT_DATE).equals( result.get(WaybackConstants.RESULT_CAPTURE_DATE))) { String betterURI = uriConverter.makeReplayURI(result); redirectToBetterUrl(httpResponse, betterURI); } else { if (isRawReplayResult(result)) { super.renderResource(httpRequest, httpResponse, wbRequest, result, resource, uriConverter); } else { resource.parseHeaders(); copyRecordHttpHeader(httpResponse, resource, uriConverter, result, false); // slurp the whole thing into RAM: byte[] bbuffer = new byte[4 * 1024]; StringBuffer sbuffer = new StringBuffer(); for (int r = -1; (r = resource.read(bbuffer, 0, bbuffer.length)) != -1;) { String chunk = new String(bbuffer); sbuffer.append(chunk.substring(0, r)); } markUpPage(sbuffer, result, uriConverter); httpResponse.setHeader("Content-Length", "" + sbuffer.length()); ServletOutputStream out = httpResponse.getOutputStream(); out.print(new String(sbuffer)); } } } private void markUpPage(StringBuffer page, SearchResult result, ReplayResultURIConverter uriConverter) { // TODO deal with frames.. insertBaseTag(page, result); insertJavascript(page, result, uriConverter); } private void insertBaseTag(StringBuffer page, SearchResult result) { String resultUrl = result.get(WaybackConstants.RESULT_URL); String baseTag = "<BASE HREF=\"http://" + resultUrl + "\">"; int insertPoint = page.indexOf("<head>"); if (-1 == insertPoint) { insertPoint = page.indexOf("<HEAD>"); } if (-1 == insertPoint) { insertPoint = 0; } else { insertPoint += 6; // just after the tag } page.insert(insertPoint, baseTag); } private void insertJavascript(StringBuffer page, SearchResult result, ReplayResultURIConverter uriConverter) { String resourceTS = result.get(WaybackConstants.RESULT_CAPTURE_DATE); String nowTS; try { nowTS = Timestamp.currentTimestamp().getDateStr(); } catch (ParseException e) { nowTS = "UNKNOWN"; } String contextPath = uriConverter.getReplayUriPrefix() + "/" + resourceTS + "/"; String scriptInsert = "<SCRIPT language=\"Javascript\">\n" + "<!--\n" + "\n" + "// FILE ARCHIVED ON " + resourceTS + " AND RETRIEVED FROM THE\n" + "// INTERNET ARCHIVE ON " + nowTS + ".\n" + "// JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.\n" + "//\n" + "// ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.\n" + "// SECTION 108(a)(3)).\n" + "\n" + " var sWayBackCGI = \"" + contextPath + "\";\n" + " \n" + "function xResolveUrl(url) {\n" + " var image = new Image();\n" + " image.src = url;\n" + " return image.src;\n" + "}\n" + "function xLateUrl(aCollection, sProp) {\n" + " var i = 0;\n" + " for(i = 0; i < aCollection.length; i++) {\n" + " if (typeof(aCollection[i][sProp]) == \"string\") {\n" + " if (aCollection[i][sProp].indexOf(\"mailto:\") == -1 &&\n" + " aCollection[i][sProp].indexOf(\"javascript:\") == -1) {\n" + " if(aCollection[i][sProp].indexOf(\"http\") == 0) {\n" + " aCollection[i][sProp] = sWayBackCGI + aCollection[i][sProp];\n" + " } else {\n" + " aCollection[i][sProp] = sWayBackCGI + xResolveUrl(aCollection[i][sProp]);\n" + " }\n" + " }\n" + " }\n" + " }\n" + "}\n" + " \n" + " xLateUrl(document.getElementsByTagName(\"IMG\"),\"src\");\n" + " xLateUrl(document.getElementsByTagName(\"A\"),\"href\");\n" + " xLateUrl(document.getElementsByTagName(\"AREA\"),\"href\");\n" + " xLateUrl(document.getElementsByTagName(\"OBJECT\"),\"codebase\");\n" + " xLateUrl(document.getElementsByTagName(\"OBJECT\"),\"data\");\n" + " xLateUrl(document.getElementsByTagName(\"APPLET\"),\"codebase\");\n" + " xLateUrl(document.getElementsByTagName(\"APPLET\"),\"archive\");\n" + " xLateUrl(document.getElementsByTagName(\"EMBED\"),\"src\");\n" + " xLateUrl(document.getElementsByTagName(\"BODY\"),\"background\");\n" + "\n" + "// -->\n" + "\n" + "</SCRIPT>\n"; int insertPoint = page.indexOf("</body>"); if (-1 == insertPoint) { insertPoint = page.indexOf("</BODY>"); } if (-1 == insertPoint) { insertPoint = page.length(); } page.insert(insertPoint, scriptInsert); } } --- NEW FILE: ReplayFilter.java --- /* ReplayFilter * * $Id: ReplayFilter.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 1:08:38 PM Nov 8, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.archivalurl; import java.text.ParseException; import java.util.regex.Matcher; import java.util.regex.Pattern; import javax.servlet.http.HttpServletRequest; import org.apache.commons.httpclient.URIException; import org.archive.net.UURI; import org.archive.net.UURIFactory; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.RequestFilter; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.WaybackRequest; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class ReplayFilter extends RequestFilter { private final Pattern WB_REQUEST_REGEX = Pattern .compile("^/(\\d{1,14})/(.*)$"); /** * Constructor */ public ReplayFilter() { super(); } public WaybackRequest parseRequest(HttpServletRequest httpRequest) { WaybackRequest wbRequest = null; Matcher matcher = null; String queryString = httpRequest.getQueryString(); String origRequestPath = httpRequest.getRequestURI(); if (queryString != null) { origRequestPath = httpRequest.getRequestURI() + "?" + queryString; } String contextPath = httpRequest.getContextPath(); if (!origRequestPath.startsWith(contextPath)) { return null; } String requestPath = origRequestPath.substring(contextPath.length()); matcher = WB_REQUEST_REGEX.matcher(requestPath); if (matcher != null && matcher.matches()) { wbRequest = new WaybackRequest(); String dateStr = matcher.group(1); String urlStr = matcher.group(2); if (!urlStr.startsWith("http://")) { urlStr = "http://" + urlStr; } wbRequest.put(WaybackConstants.REQUEST_EXACT_DATE,dateStr); try { String startDate = Timestamp.earliestTimestamp().getDateStr(); String endDate = Timestamp.currentTimestamp().getDateStr(); wbRequest.put(WaybackConstants.REQUEST_START_DATE,startDate); wbRequest.put(WaybackConstants.REQUEST_END_DATE,endDate); } catch (ParseException e1) { e1.printStackTrace(); return null; } wbRequest.put(WaybackConstants.REQUEST_TYPE, WaybackConstants.REQUEST_REPLAY_QUERY); String referer = httpRequest.getHeader("REFERER"); if (referer == null) { referer = null; } wbRequest.put(WaybackConstants.REQUEST_REFERER_URL,referer); try { UURI requestURI = UURIFactory.getInstance(urlStr); wbRequest.put(WaybackConstants.REQUEST_URL, requestURI.toString()); } catch (URIException e) { wbRequest = null; } } return wbRequest; } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:42
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/rawreplayui Removed Files: RawReplayUI.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- RawReplayUI.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:41
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/webapp/jsp/QueryUI Modified Files: requestform.jsp ErrorResult.jsp PathQueryResults.jsp QueryResults.jsp Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: PathQueryResults.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/PathQueryResults.jsp,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** PathQueryResults.jsp 26 Oct 2005 01:13:30 -0000 1.3 --- PathQueryResults.jsp 16 Nov 2005 03:11:29 -0000 1.4 *************** *** 1,25 **** <%@ page import="java.util.Iterator" %> <%@ page import="java.util.ArrayList" %> ! <%@ page import="org.archive.wayback.core.ResourceResult" %> <%@ page import="org.archive.wayback.core.Timestamp" %> ! <%@ page import="org.archive.wayback.simplequeryui.UIResults" %> <jsp:include page="../../template/UI-header.jsp" /> <% ! UIResults results = (UIResults) request.getAttribute("ui-results"); String searchString = results.getSearchUrl(); ! int resultCount = results.getNumResults(); ! Timestamp searchStartTs = results.getStartTimestamp(); ! Timestamp searchEndTs = results.getEndTimestamp(); ! String prettySearchStart = searchStartTs.prettyDate(); ! String prettySearchEnd = searchEndTs.prettyDate(); Iterator itr = results.resultsIterator(); %> ! <B><%= resultCount %></B> results for <B><%= searchString %></B><BR> ! between <B><%= prettySearchStart %></B> and <B><%= prettySearchEnd %></B> ! <HR> <% --- 1,24 ---- <%@ page import="java.util.Iterator" %> <%@ page import="java.util.ArrayList" %> ! <%@ page import="org.archive.wayback.WaybackConstants" %> ! <%@ page import="org.archive.wayback.core.SearchResult" %> <%@ page import="org.archive.wayback.core.Timestamp" %> ! <%@ page import="org.archive.wayback.query.UIQueryResults" %> <jsp:include page="../../template/UI-header.jsp" /> <% ! UIQueryResults results = (UIQueryResults) request.getAttribute("ui-results"); String searchString = results.getSearchUrl(); ! int resultCount = results.getResultCount(); ! String prettySearchStart = results.prettySearchStartDate(); ! String prettySearchEnd = results.prettySearchEndDate(); Iterator itr = results.resultsIterator(); %> ! <b><%= resultCount %></b> results for <b><%= searchString %></b><br></br> ! between <b><%= prettySearchStart %></b> and <b><%= prettySearchEnd %></b> ! <hr></hr> <% *************** *** 28,40 **** String lastMD5 = null; while(itr.hasNext()) { ! ResourceResult result = (ResourceResult) itr.next(); ! String url = result.getUrl(); ! String prettyDate = result.getTimestamp().prettyDate(); ! String origHost = result.getOrigHost(); ! String MD5 = result.getMd5Fragment(); ! String redirectFlag = result.isRedirect() ? "(redirect)" : ""; ! String httpResponse = result.getHttpResponseCode(); ! String mimeType = result.getMimeType(); String replayUrl = results.resultToReplayUrl(result); --- 27,41 ---- String lastMD5 = null; while(itr.hasNext()) { ! SearchResult result = (SearchResult) itr.next(); ! String url = result.get(WaybackConstants.RESULT_URL); ! String prettyDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); ! String origHost = result.get(WaybackConstants.RESULT_ORIG_HOST); ! String MD5 = result.get(WaybackConstants.RESULT_MD5_DIGEST); ! String redirectFlag = (0 == result.get( ! WaybackConstants.RESULT_REDIRECT_URL).compareTo("-")) ! ? "" : "(redirect)"; ! String httpResponse = result.get(WaybackConstants.RESULT_HTTP_CODE); ! String mimeType = result.get(WaybackConstants.RESULT_MIME_TYPE); String replayUrl = results.resultToReplayUrl(result); *************** *** 52,56 **** if(newUrl) { %> ! <HR><B><%= url %></B><BR> <% } --- 53,57 ---- if(newUrl) { %> ! <hr></hr><b><%= url %></b><br></br> <% } *************** *** 60,70 **** %> ! <A HREF="<%= replayUrl %>"><%= prettyDate %></A> ! <SPAN style="color:black;"><%= origHost %></SPAN> ! <SPAN style="color:gray;"><%= httpResponse %></SPAN> ! <SPAN style="color:brown;"><%= mimeType %></SPAN> <%= redirectFlag %> (new version) ! <BR> <% --- 61,71 ---- %> ! <a href="<%= replayUrl %>"><%= prettyDate %></a> ! <span style="color:black;"><%= origHost %></span> ! <span style="color:gray;"><%= httpResponse %></span> ! <span style="color:brown;"><%= mimeType %></span> <%= redirectFlag %> (new version) ! <br></br> <% *************** *** 72,79 **** %> ! <A HREF="<%= replayUrl %>"><%= prettyDate %></A> ! <SPAN style="color:green;"><%= origHost %></SPAN> ! <SPAN style="color:lightgray;">unchanged</SPAN> ! <BR> <% --- 73,80 ---- %> ! <a href="<%= replayUrl %>"><%= prettyDate %></a> ! <span style="color:green;"><%= origHost %></span> ! <span style="color:lightgray;">unchanged</span> ! <br></br> <% Index: requestform.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/requestform.jsp,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** requestform.jsp 26 Oct 2005 01:14:00 -0000 1.2 --- requestform.jsp 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 1,24 **** <jsp:include page="../../template/UI-header.jsp" /> ! <h2>Wayabck Search form:</h2> ! <p>The URL field is required. All date fields are optional.<br> ! To search for a single URL only, use the Query Type.<br> ! To search for all URLs beginning with a prefix URL, use PathQuery Type.<br> </p> <hr> <table> ! <FORM ACTION="../../query"> ! <tr><td>URL:</td><td><INPUT TYPE="TEXT" NAME="url" WIDTH="80"></td></tr> ! <tr><td>Exact Date:</td><td><INPUT TYPE="TEXT" NAME="date" WIDTH="80"></td></tr> ! <tr><td>Earliest Date:</td><td><INPUT TYPE="TEXT" NAME="earliest" WIDTH="80"></td></tr> ! <tr><td>Latest Date:</td><td><INPUT TYPE="TEXT" NAME="latest" WIDTH="80"></td></tr> <tr> <td>Type:</td> <td> ! Query <INPUT TYPE="RADIO" NAME="type" VALUE="query" CHECKED="YES"> ! PathQuery <INPUT TYPE="RADIO" NAME="type" VALUE="pathQuery"> </td> </tr> ! <tr><td colspan="2" align="left"><INPUT TYPE="SUBMIT" VALUE="Submit"></td></tr> ! </FORM> </table> <jsp:include page="../../template/UI-footer.jsp" /> --- 1,24 ---- <jsp:include page="../../template/UI-header.jsp" /> ! <h2>Wayback Search form:</h2> ! <p>The URL field is required. All date fields are optional.<br></br> ! To search for a single URL only, use the Query Type.<br></br> ! To search for all URLs beginning with a prefix URL, use PathQuery Type.<br></br> </p> <hr> <table> ! <form action="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/query"> ! <tr><td>URL:</td><td><input type="TEXT" name="url" WIDTH="80"></td></tr> ! <tr><td>Exact Date:</td><td><input type="TEXT" name="exactdate" WIDTH="80"></td></tr> ! <tr><td>Earliest Date:</td><td><input type="TEXT" name="startdate" WIDTH="80"></td></tr> ! <tr><td>Latest Date:</td><td><input type="TEXT" name="enddate" WIDTH="80"></td></tr> <tr> <td>Type:</td> <td> ! Query <input type="RADIO" name="type" value="urlquery" CHECKED="YES"> ! PathQuery <input type="RADIO" name="type" value="urlprefixquery"> </td> </tr> ! <tr><td colspan="2" align="left"><input type="SUBMIT" value="Submit"></td></tr> ! </form> </table> <jsp:include page="../../template/UI-footer.jsp" /> Index: QueryResults.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/QueryResults.jsp,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** QueryResults.jsp 20 Oct 2005 00:40:41 -0000 1.2 --- QueryResults.jsp 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 1,13 **** <%@ page import="java.util.Iterator" %> <%@ page import="java.util.ArrayList" %> ! <%@ page import="org.archive.wayback.core.ResourceResult" %> <%@ page import="org.archive.wayback.core.Timestamp" %> ! <%@ page import="org.archive.wayback.simplequeryui.UIResults" %> <jsp:include page="../../template/UI-header.jsp" /> <% ! UIResults results = (UIResults) request.getAttribute("ui-results"); String searchString = results.getSearchUrl(); ! int resultCount = results.getNumResults(); Timestamp searchStartTs = results.getStartTimestamp(); --- 1,14 ---- <%@ page import="java.util.Iterator" %> <%@ page import="java.util.ArrayList" %> ! <%@ page import="org.archive.wayback.WaybackConstants" %> ! <%@ page import="org.archive.wayback.core.SearchResult" %> <%@ page import="org.archive.wayback.core.Timestamp" %> ! <%@ page import="org.archive.wayback.query.UIQueryResults" %> <jsp:include page="../../template/UI-header.jsp" /> <% ! UIQueryResults results = (UIQueryResults) request.getAttribute("ui-results"); String searchString = results.getSearchUrl(); ! int resultCount = results.getResultCount(); Timestamp searchStartTs = results.getStartTimestamp(); *************** *** 19,25 **** %> ! <B><%= resultCount %></B> results for <B><%= searchString %></B><BR> ! between <B><%= prettySearchStart %></B> and <B><%= prettySearchEnd %></B> ! <HR> <% --- 20,26 ---- %> ! <b><%= resultCount %></b> results for <b><%= searchString %></b><br></br> ! between <b><%= prettySearchStart %></b> and <b><%= prettySearchEnd %></b> ! <hr></hr> <% *************** *** 27,38 **** String lastMD5 = null; while(itr.hasNext()) { ! ResourceResult result = (ResourceResult) itr.next(); ! String prettyDate = result.getTimestamp().prettyDate(); ! String origHost = result.getOrigHost(); ! String MD5 = result.getMd5Fragment(); ! String redirectFlag = result.isRedirect() ? "(redirect)" : ""; ! String httpResponse = result.getHttpResponseCode(); ! String mimeType = result.getMimeType(); String replayUrl = results.resultToReplayUrl(result); --- 28,42 ---- String lastMD5 = null; while(itr.hasNext()) { ! SearchResult result = (SearchResult) itr.next(); ! String url = result.get(WaybackConstants.RESULT_URL); ! String prettyDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); ! String origHost = result.get(WaybackConstants.RESULT_ORIG_HOST); ! String MD5 = result.get(WaybackConstants.RESULT_MD5_DIGEST); ! String redirectFlag = (0 == result.get( ! WaybackConstants.RESULT_REDIRECT_URL).compareTo("-")) ! ? "" : "(redirect)"; ! String httpResponse = result.get(WaybackConstants.RESULT_HTTP_CODE); ! String mimeType = result.get(WaybackConstants.RESULT_MIME_TYPE); String replayUrl = results.resultToReplayUrl(result); *************** *** 48,64 **** if(updated) { %> ! <A HREF="<%= replayUrl %>"><%= prettyDate %></A> ! <SPAN style="color:black;"><%= origHost %></SPAN> ! <SPAN style="color:gray;"><%= httpResponse %></SPAN> ! <SPAN style="color:brown;"><%= mimeType %></SPAN> <%= redirectFlag %> (new version) ! <BR> <% } else { %> ! <A HREF="<%= replayUrl %>"><%= prettyDate %></A> ! <SPAN style="color:green;"><%= origHost %></SPAN> ! <BR> <% } --- 52,68 ---- if(updated) { %> ! <a href="<%= replayUrl %>"><%= prettyDate %></a> ! <span style="color:black;"><%= origHost %></span> ! <span style="color:gray;"><%= httpResponse %></span> ! <span style="color:brown;"><%= mimeType %></span> <%= redirectFlag %> (new version) ! <br></br> <% } else { %> ! <a href="<%= replayUrl %>"><%= prettyDate %></a> ! <span style="color:green;"><%= origHost %></span> ! <br></br> <% } Index: ErrorResult.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/ErrorResult.jsp,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ErrorResult.jsp 20 Oct 2005 00:40:41 -0000 1.2 --- ErrorResult.jsp 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 1,3 **** <jsp:include page="../../template/UI-header.jsp" /> ! <B><%= (String) request.getAttribute("message") %></B> <jsp:include page="../../template/UI-footer.jsp" /> --- 1,12 ---- + <%@ page import="org.archive.wayback.exception.WaybackException" %> <jsp:include page="../../template/UI-header.jsp" /> ! <% ! ! WaybackException e = (WaybackException) request.getAttribute("exception"); ! ! %> ! ! <h2><%= (String) e.getTitle() %></h2> ! <p><b><%= (String) e.getMessage() %></b></p> ! <p><%= (String) e.getDetails() %></p> <jsp:include page="../../template/UI-footer.jsp" /> |
From: Brad <bra...@us...> - 2005-11-16 03:11:41
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/localresourcestore In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/localresourcestore Modified Files: LocalARCResourceStore.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: LocalARCResourceStore.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/localresourcestore/LocalARCResourceStore.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** LocalARCResourceStore.java 21 Oct 2005 03:24:40 -0000 1.3 --- LocalARCResourceStore.java 16 Nov 2005 03:11:30 -0000 1.4 *************** *** 31,36 **** --- 31,39 ---- import org.archive.io.arc.ARCReader; import org.archive.io.arc.ARCReaderFactory; + import org.archive.wayback.WaybackConstants; import org.archive.wayback.ResourceStore; import org.archive.wayback.core.Resource; + import org.archive.wayback.core.SearchResult; + import org.archive.wayback.exception.ConfigurationException; /** *************** *** 43,48 **** private static final String RESOURCE_PATH = "arcpath"; - private static final String ARCTAIL = ".arc.gz"; - private String path = null; --- 46,49 ---- *************** *** 54,61 **** } ! public void init(Properties p) throws Exception { String configPath = (String) p.get(RESOURCE_PATH); if ((configPath == null) || (configPath.length() < 1)) { ! throw new IllegalArgumentException("Failed to find " + RESOURCE_PATH); } --- 55,62 ---- } ! public void init(Properties p) throws ConfigurationException { String configPath = (String) p.get(RESOURCE_PATH); if ((configPath == null) || (configPath.length() < 1)) { ! throw new ConfigurationException("Failed to find " + RESOURCE_PATH); } *************** *** 64,71 **** } ! public Resource retrieveResource(ARCLocation location) throws IOException { String arcName = location.getName(); ! if (!arcName.endsWith(ARCTAIL)) { ! arcName += ARCTAIL; } File arcFile = new File(arcName); --- 65,73 ---- } ! public Resource retrieveResource(SearchResult result) throws IOException { ! ARCLocation location = resultToARCLocation(result); String arcName = location.getName(); ! if (!arcName.endsWith(ARCReader.DOT_COMPRESSED_ARC_FILE_EXTENSION)) { ! arcName += ARCReader.DOT_COMPRESSED_ARC_FILE_EXTENSION; } File arcFile = new File(arcName); *************** *** 77,81 **** --- 79,87 ---- + arcFile.getAbsolutePath() + ")"); } else { + + // TODO: does this "just work" with HTTP 1.1 ranges? + // seems like we'd have to know the length for that to work.. ARCReader reader = ARCReaderFactory.get(arcFile); + Resource r = new Resource(reader.get(location.getOffset())); return r; *************** *** 83,86 **** --- 89,112 ---- } + public ARCLocation resultToARCLocation(SearchResult result) { + final String daArcName = result.get(WaybackConstants.RESULT_ARC_FILE); + final long daOffset = Long.parseLong(result.get( + WaybackConstants.RESULT_OFFSET)); + + return new ARCLocation() { + private String filename = daArcName; + + private long offset = daOffset; + + public String getName() { + return this.filename; + } + + public long getOffset() { + return this.offset; + } + }; + } + /** * @param args |
From: Brad <bra...@us...> - 2005-11-16 03:11:41
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/jsreplayui Removed Files: JSReplayUI.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- JSReplayUI.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:41
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback Modified Files: ResourceStore.java ResourceIndex.java Added Files: WaybackConstants.java ReplayResultURIConverter.java QueryRenderer.java ReplayRenderer.java PropertyConfigurable.java Removed Files: ReplayUI.java RequestParser.java QueryUI.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: PropertyConfigurable.java --- /* PropertyConfigurable * * $Id: PropertyConfigurable.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 3:46:34 PM Nov 7, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback; import java.util.Properties; import org.archive.wayback.exception.ConfigurationException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public interface PropertyConfigurable { /** * Initialize this Object. Pass in the specific * configurations via Properties. * * @param p * Generic properties bag for configurations * @throws ConfigurationException */ public void init(final Properties p) throws ConfigurationException; } --- QueryUI.java DELETED --- --- NEW FILE: ReplayRenderer.java --- /* ReplayRenderer * * $Id: ReplayRenderer.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 5:27:09 PM Nov 1, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback; import java.io.IOException; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.core.Resource; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.WaybackException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public interface ReplayRenderer extends PropertyConfigurable { public void renderException(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, WaybackException exception) throws ServletException, IOException; public void renderResource(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, SearchResult result, Resource resource, ReplayResultURIConverter uriConverter) throws ServletException, IOException; } --- ReplayUI.java DELETED --- --- NEW FILE: QueryRenderer.java --- /* QueryRenderer * * $Id: QueryRenderer.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 2:39:48 PM Nov 7, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback; import java.io.IOException; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.WaybackException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public interface QueryRenderer extends PropertyConfigurable { public void renderException(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, WaybackException exception) throws ServletException, IOException; public void renderUrlResults(HttpServletRequest httpRequest, HttpServletResponse response, WaybackRequest wbRequest, SearchResults results, ReplayResultURIConverter uriConverter) throws ServletException, IOException; public void renderUrlPrefixResults(HttpServletRequest httpRequest, HttpServletResponse response, WaybackRequest wbRequest, SearchResults results, ReplayResultURIConverter uriConverter) throws ServletException, IOException; } Index: ResourceStore.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/ResourceStore.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ResourceStore.java 19 Oct 2005 01:22:37 -0000 1.2 --- ResourceStore.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 25,32 **** import java.io.IOException; - import java.util.Properties; - import org.archive.io.arc.ARCLocation; import org.archive.wayback.core.Resource; /** --- 25,31 ---- import java.io.IOException; import org.archive.wayback.core.Resource; + import org.archive.wayback.core.SearchResult; /** *************** *** 36,40 **** * @version $Date$, $Revision$ */ ! public interface ResourceStore { /** * Transform an ARCLocation into a Resource --- 35,39 ---- * @version $Date$, $Revision$ */ ! public interface ResourceStore extends PropertyConfigurable { /** * Transform an ARCLocation into a Resource *************** *** 44,57 **** * @throws IOException */ ! public Resource retrieveResource(ARCLocation location) throws IOException; - /** - * Initialize this ResourceStore. Pass in the specific configurations via - * Properties. - * - * @param p - * Generic properties bag for configurations - * @throws Exception - */ - public void init(Properties p) throws Exception; } --- 43,47 ---- * @throws IOException */ ! public Resource retrieveResource(SearchResult result) throws IOException; } --- RequestParser.java DELETED --- --- NEW FILE: WaybackConstants.java --- /* WaybackConstants * * $Id: WaybackConstants.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 3:28:47 PM Nov 14, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public interface WaybackConstants { /** * Request: filter results before this 14-digit timestamp */ public static final String REQUEST_START_DATE = "startdate"; /** * Request: filter results after this 14-digit timestamp */ public static final String REQUEST_END_DATE = "enddate"; /** * Request: (replay) find closest result to this 14-digit timestamp */ public static final String REQUEST_EXACT_DATE = "exactdate"; /** * Request: URL or URL prefix requested */ public static final String REQUEST_URL = "url"; /** * Request: URL of referrer, if supplied, or "" if not */ public static final String REQUEST_REFERER_URL = "refererurl"; /** * Request: defines type - urlquery, urlprefixquery, or replay */ public static final String REQUEST_TYPE = "type"; /** * Request: urlquery type request */ public static final String REQUEST_URL_QUERY = "urlquery"; /** * Request: urlprefixquery type request */ public static final String REQUEST_URL_PREFIX_QUERY = "urlprefixquery"; /** * Request: replay type request */ public static final String REQUEST_REPLAY_QUERY = "replay"; /** * Results: int first record of all matching returned, 1-based */ public static final String RESULTS_FIRST_RECORD = "firstrecord"; /** * Results: int first page of all matching pages to return, 1-based */ public static final String RESULTS_FIRST_PAGE = "firstpage"; /** * Results: boolean: "true"|"false" if there are more records matching * than those returned in the currect SearchResults */ public static final String RESULTS_HAS_MORE = "hasmore"; /** * Result: URL of captured document */ public static final String RESULT_URL = "url"; /** * Result: 14-digit timestamp when document was captured */ public static final String RESULT_CAPTURE_DATE = "capturedate"; /** * Result: basename of ARC file containing this document. */ public static final String RESULT_ARC_FILE = "arcfile"; /** * Result: compressed byte offset within ARC file where this document's * gzip envelope begins. */ public static final String RESULT_OFFSET = "compressedoffset"; /** * Result: original exact host from which this document was captured. */ public static final String RESULT_ORIG_HOST = "originalhost"; /** * Result: best-guess at mime-type of this document. */ public static final String RESULT_MIME_TYPE = "mimetype"; /** * Result: 3-digit integer HTTP response code. may be '0' in some * fringe conditions, old ARCs, bug in crawler, etc. */ public static final String RESULT_HTTP_CODE = "httpresponsecode"; /** * Result: all or part of the 32-digit hexadecimal MD5 digest of this * document */ public static final String RESULT_MD5_DIGEST= "md5digest"; /** * Result: URL that this document redirected to, or '-' if it does * not redirect */ public static final String RESULT_REDIRECT_URL = "redirecturl"; } --- NEW FILE: ReplayResultURIConverter.java --- /* ReplayURI * * $Id: ReplayResultURIConverter.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 5:20:43 PM Nov 1, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback; import org.archive.wayback.core.SearchResult; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public interface ReplayResultURIConverter extends PropertyConfigurable { /** * @param result * @return user-viewable String URL that will replay the ResourceResult */ public String makeReplayURI(final SearchResult result); public String makeRedirectReplayURI(final SearchResult result, String url); public String getReplayUriPrefix (); } Index: ResourceIndex.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/ResourceIndex.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ResourceIndex.java 19 Oct 2005 01:22:37 -0000 1.2 --- ResourceIndex.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 24,33 **** package org.archive.wayback; ! import java.io.IOException; ! import java.util.Properties; ! ! import org.archive.wayback.core.ResourceResults; ! import org.archive.wayback.core.WMRequest; ! import org.archive.wayback.exception.WaybackException; /** --- 24,32 ---- package org.archive.wayback; ! import org.archive.wayback.core.SearchResults; ! import org.archive.wayback.core.WaybackRequest; ! import org.archive.wayback.exception.BadQueryException; ! import org.archive.wayback.exception.ResourceIndexNotAvailableException; ! import org.archive.wayback.exception.ResourceNotInArchiveException; /** *************** *** 37,41 **** * @version $Date$, $Revision$ */ ! public interface ResourceIndex { /** * Transform a WMRequest into a ResourceResults. --- 36,40 ---- * @version $Date$, $Revision$ */ ! public interface ResourceIndex extends PropertyConfigurable { /** * Transform a WMRequest into a ResourceResults. *************** *** 45,62 **** * WMRequest * ! * @throws IOException ! * @throws WaybackException ! */ ! public ResourceResults query(final WMRequest request) throws IOException, ! WaybackException; ! ! /** ! * Initialize this ResourceIndex. Pass in the specific configurations via ! * Properties. ! * ! * @param p ! * Generic properties bag for configurations ! * @throws Exception */ ! public void init(Properties p) throws Exception; } --- 44,52 ---- * WMRequest * ! * @throws ResourceIndexNotAvailableException ! * @throws ResourceNotInArchiveException */ ! public SearchResults query(final WaybackRequest request) ! throws ResourceIndexNotAvailableException, ! ResourceNotInArchiveException, BadQueryException; } |
From: Brad <bra...@us...> - 2005-11-16 03:11:40
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/cdx In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/cdx Added Files: CDXRecord.java LocalBDBResourceIndex.java BDBResourceIndex.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: BDBResourceIndex.java --- /* BDBResourceIndex * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx; import java.io.File; import java.text.ParseException; import java.util.Iterator; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import com.sleepycat.je.Cursor; import com.sleepycat.je.Database; import com.sleepycat.je.DatabaseConfig; import com.sleepycat.je.DatabaseEntry; import com.sleepycat.je.DatabaseException; import com.sleepycat.je.Environment; import com.sleepycat.je.EnvironmentConfig; import com.sleepycat.je.LockMode; import com.sleepycat.je.OperationStatus; /** * ResourceResults-specific wrapper on top of the BDBJE database. * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class BDBResourceIndex { private String path; private String dbName; Environment env = null; Database db = null; /** * Constructor * * @param thePath * directory where BDBJE files are stored * @param theDbName * name of BDB database * @throws DatabaseException */ public BDBResourceIndex(final String thePath, final String theDbName) throws DatabaseException { super(); initializeDB(thePath, theDbName); } protected void initializeDB(final String thePath, final String theDbName) throws DatabaseException { path = thePath; dbName = theDbName; EnvironmentConfig environmentConfig = new EnvironmentConfig(); environmentConfig.setAllowCreate(true); environmentConfig.setTransactional(false); File file = new File(path); env = new Environment(file, environmentConfig); DatabaseConfig databaseConfig = new DatabaseConfig(); databaseConfig.setAllowCreate(true); databaseConfig.setTransactional(false); // perform other database configurations db = env.openDatabase(null, dbName, databaseConfig); } /** * shut down the BDB. * * @throws DatabaseException */ public void shutdownDB() throws DatabaseException { if (db != null) { db.close(); } if (env != null) { env.close(); } } // TODO add aditional "replay" search method which allows passing in of // an exact date, and use a "scrolling window" of the best results, to // allow for returning the N closest results to a particular date, within // a specific window of dates... protected SearchResults doUrlSearch(final String url, final String firstDate, final String lastDate, final String exactHost, final int startRecord, final int maxRecords) { SearchResults results = new SearchResults(); DatabaseEntry key = new DatabaseEntry(); DatabaseEntry value = new DatabaseEntry(); int numRecords = 0; int numSkipped = 0; String searchStart = url + " " + firstDate; key.setData(searchStart.getBytes()); key.setPartial(false); try { Cursor cursor = db.openCursor(null, null); OperationStatus status = cursor.getSearchKeyRange(key, value, LockMode.DEFAULT); while (status == OperationStatus.SUCCESS) { // String keyString = new String(key.getData()); String valueString = new String(value.getData()); CDXRecord parser = new CDXRecord(); parser.parseLine(valueString, 0); if (!parser.url.equals(url)) { break; } if (parser.captureDate.compareTo(lastDate) > 0) { break; } if (parser.captureDate.compareTo(firstDate) >= 0) { if (numSkipped >= startRecord) { results.addSearchResult(parser.toSearchResult()); numRecords++; if (numRecords >= maxRecords) { results.putFilter(WaybackConstants.RESULTS_HAS_MORE, "true"); break; } } else { numSkipped++; } } status = cursor.getNext(key, value, LockMode.DEFAULT); } cursor.close(); } catch (DatabaseException dbe) { // TODO: let this bubble up as Index error dbe.printStackTrace(); } catch (ParseException e) { // TODO: let this bubble up as Index error e.printStackTrace(); } return results; } protected SearchResults doUrlPrefixSearch(final String urlPrefix, final String firstDate, final String lastDate, final String exactHost, final int startRecord, final int maxRecords) { SearchResults results = new SearchResults(); DatabaseEntry key = new DatabaseEntry(); DatabaseEntry value = new DatabaseEntry(); int numRecords = 0; int numSkipped = 0; String searchStart = urlPrefix; key.setData(searchStart.getBytes()); key.setPartial(false); try { Cursor cursor = db.openCursor(null, null); OperationStatus status = cursor.getSearchKeyRange(key, value, LockMode.DEFAULT); while (status == OperationStatus.SUCCESS) { String valueString = new String(value.getData()); CDXRecord parser = new CDXRecord(); parser.parseLine(valueString, 0); if (!parser.url.startsWith(urlPrefix)) { break; } if ((parser.captureDate.compareTo(lastDate) <= 0) && (parser.captureDate.compareTo(firstDate) >= 0)) { if (numSkipped >= startRecord) { results.addSearchResult(parser.toSearchResult()); numRecords++; if (numRecords >= maxRecords) { // TODO should this be here?... results.putFilter(WaybackConstants.RESULTS_HAS_MORE, "true"); break; } } else { numSkipped++; } } status = cursor.getNext(key, value, LockMode.DEFAULT); } cursor.close(); } catch (DatabaseException dbe) { // TODO: let this bubble up as Index error dbe.printStackTrace(); } catch (ParseException e) { // TODO: let this bubble up as Index error e.printStackTrace(); } return results; } /** * Add all ResourceResult in results to BDB index * @param results * @throws Exception */ public void addResults(SearchResults results) throws Exception { Iterator itr = results.iterator(); DatabaseEntry key = new DatabaseEntry(); DatabaseEntry value = new DatabaseEntry(); OperationStatus status = null; CDXRecord parser = new CDXRecord(); try { Cursor cursor = db.openCursor(null, null); while (itr.hasNext()) { SearchResult result = (SearchResult) itr.next(); parser.fromSearchResult(result); String keyString = parser.toKey(); String valueString = parser.toValue(); key.setData(keyString.getBytes()); value.setData(valueString.getBytes()); status = cursor.put(key, value); if (status != OperationStatus.SUCCESS) { throw new Exception("oops, put had non-success status"); } } cursor.close(); } catch (DatabaseException e) { e.printStackTrace(); } } } --- NEW FILE: CDXRecord.java --- /* CDXRecord * * $Id: CDXRecord.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 4:40:45 PM Nov 10, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx; import java.text.ParseException; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.SearchResult; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class CDXRecord { public String url; public String captureDate; public String origHost = null; public String mimeType = null; public String httpResponseCode = null; public String md5Fragment = null; public String redirectUrl = null; public long compressedOffset = -1; public String arcFileName = null; public CDXRecord() { super(); } /** * Attempt to deserialize state from a single text line, fields delimited by * spaces. There are standard ways to do this, and this is not one of * them... for no good reason. * * @param line * @param lineNumber * @throws ParseException */ public void parseLine(final String line, final int lineNumber) throws ParseException { String[] tokens = line.split(" "); if (tokens.length != 9) { throw new ParseException(line, lineNumber); } url = tokens[0]; captureDate = tokens[1]; origHost = tokens[2]; mimeType = tokens[3]; httpResponseCode = tokens[4]; md5Fragment = tokens[5]; redirectUrl = tokens[6]; compressedOffset = Long.parseLong(tokens[7]); arcFileName = tokens[8]; } public SearchResult toSearchResult() { SearchResult result = new SearchResult(); result.put(WaybackConstants.RESULT_URL, url); result.put(WaybackConstants.RESULT_CAPTURE_DATE, captureDate); result.put(WaybackConstants.RESULT_ORIG_HOST, origHost); result.put(WaybackConstants.RESULT_MIME_TYPE, mimeType); result.put(WaybackConstants.RESULT_HTTP_CODE, httpResponseCode); result.put(WaybackConstants.RESULT_MD5_DIGEST, md5Fragment); result.put(WaybackConstants.RESULT_REDIRECT_URL, redirectUrl); // HACKHACK: result.put(WaybackConstants.RESULT_OFFSET, "" + compressedOffset); result.put(WaybackConstants.RESULT_ARC_FILE, arcFileName); return result; } public void fromSearchResult(final SearchResult result) { url = result.get(WaybackConstants.RESULT_URL); captureDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); origHost = result.get(WaybackConstants.RESULT_ORIG_HOST); mimeType = result.get(WaybackConstants.RESULT_MIME_TYPE); httpResponseCode = result.get(WaybackConstants.RESULT_HTTP_CODE); md5Fragment = result.get(WaybackConstants.RESULT_MD5_DIGEST); redirectUrl = result.get(WaybackConstants.RESULT_REDIRECT_URL); compressedOffset = Long.parseLong(result.get( WaybackConstants.RESULT_OFFSET)); arcFileName = result.get(WaybackConstants.RESULT_ARC_FILE); } public String toValue() { return url + " " + captureDate + " " + origHost + " " + mimeType + " " + httpResponseCode + " " + md5Fragment + " " + redirectUrl + " " + compressedOffset + " " + arcFileName; } public String toKey() { return url + " " + captureDate; } } --- NEW FILE: LocalBDBResourceIndex.java --- /* LocalBDBResourceIndex * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx; import java.text.ParseException; import java.util.Properties; import org.apache.commons.httpclient.URIException; import org.archive.net.UURI; import org.archive.net.UURIFactory; import org.archive.wayback.ResourceIndex; import org.archive.wayback.WaybackConstants; import org.archive.wayback.cdx.indexer.IndexPipeline; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.BadQueryException; import org.archive.wayback.exception.ConfigurationException; import org.archive.wayback.exception.ResourceIndexNotAvailableException; import org.archive.wayback.exception.ResourceNotInArchiveException; import com.sleepycat.je.DatabaseException; /** * Implements ResourceIndex interface using a BDBResourceIndex * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class LocalBDBResourceIndex implements ResourceIndex { private final static String INDEX_PATH = "resourceindex.indexpath"; private final static String DB_NAME = "resourceindex.dbname"; private final static int MAX_RECORDS = 1000; private BDBResourceIndex db = null; private IndexPipeline pipeline = null; /** * Constructor */ public LocalBDBResourceIndex() { super(); } public void init(Properties p) throws ConfigurationException { System.out.println("initializing LocalDBDResourceIndex..."); String dbPath = (String) p.get(INDEX_PATH); if (dbPath == null || (dbPath.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + INDEX_PATH); } String dbName = (String) p.get(DB_NAME); if (dbName == null || (dbName.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + DB_NAME); } try { db = new BDBResourceIndex(dbPath, dbName); } catch (DatabaseException e) { e.printStackTrace(); throw new ConfigurationException(e.getMessage()); } pipeline = new IndexPipeline(); pipeline.init(p); } public SearchResults query(WaybackRequest wbRequest) throws ResourceIndexNotAvailableException, ResourceNotInArchiveException, BadQueryException { UURI searchURI; String searchHost; String searchPath; int resultsPerPage = wbRequest.getResultsPerPage(); int pageNum = wbRequest.getPageNum(); int startResult; String searchUrl = wbRequest.get(WaybackConstants.REQUEST_URL); String searchType = wbRequest.get(WaybackConstants.REQUEST_TYPE); String startDate = wbRequest.get(WaybackConstants.REQUEST_START_DATE); String endDate = wbRequest.get(WaybackConstants.REQUEST_END_DATE); if (resultsPerPage < 1) { throw new BadQueryException("resultsPerPage cannot be < 1"); } if (resultsPerPage > MAX_RECORDS) { throw new BadQueryException("resultsPerPage cannot be > " + MAX_RECORDS); } if(pageNum < 1) { throw new BadQueryException("pageNum must be > 0"); } startResult = (pageNum - 1) * resultsPerPage; if ((searchUrl == null) || (searchUrl.length() == 0)) { throw new BadQueryException(WaybackConstants.REQUEST_URL + " must be specified"); } if ((searchType == null) || (searchType.length() == 0)) { throw new BadQueryException(WaybackConstants.REQUEST_TYPE + " must be specified"); } if ((startDate == null) || (startDate.length() == 0)) { try { startDate = Timestamp.earliestTimestamp().getDateStr(); } catch (ParseException e) { e.printStackTrace(); throw new BadQueryException("unexpected data error " + e.getMessage()); } } if ((endDate == null) || (endDate.length() == 0)) { try { endDate = Timestamp.currentTimestamp().getDateStr(); } catch (ParseException e) { e.printStackTrace(); throw new BadQueryException("unexpected data error " + e.getMessage()); } } try { if (searchUrl.startsWith("http://")) { if (-1 == searchUrl.indexOf('/', 8)) { searchUrl = searchUrl + "/"; } } else { if (!searchUrl.contains("/")) { searchUrl = searchUrl + "/"; } searchUrl = "http://" + searchUrl; } searchURI = UURIFactory.getInstance(searchUrl); searchHost = searchURI.getHostBasename(); searchPath = searchURI.getEscapedPathQuery(); } catch (URIException e) { e.printStackTrace(); throw new BadQueryException("Problem with URI " + e.getMessage()); } String keyUrl = searchHost + searchPath; SearchResults results; if (searchType.equals(WaybackConstants.REQUEST_REPLAY_QUERY)) { results = db.doUrlSearch(keyUrl, startDate, endDate, null, startResult, resultsPerPage); } else if (searchType.equals(WaybackConstants.REQUEST_URL_QUERY)) { results = db.doUrlSearch(keyUrl, startDate, endDate, null, startResult, resultsPerPage); } else if (searchType.equals( WaybackConstants.REQUEST_URL_PREFIX_QUERY)) { results = db.doUrlPrefixSearch(keyUrl, startDate, endDate, null, startResult, resultsPerPage); } else { throw new BadQueryException("Unknown query type, must be " + WaybackConstants.REQUEST_REPLAY_QUERY + ", " + WaybackConstants.REQUEST_URL_QUERY + ", or " + WaybackConstants.REQUEST_URL_PREFIX_QUERY); } if(results.isEmpty()) { throw new ResourceNotInArchiveException("the URL " + keyUrl + " is not in the archive."); } results.putFilter(WaybackConstants.REQUEST_URL,keyUrl); results.putFilter(WaybackConstants.REQUEST_START_DATE,startDate); results.putFilter(WaybackConstants.REQUEST_END_DATE,endDate); results.putFilter(WaybackConstants.RESULTS_FIRST_RECORD,""+startResult); return results; } } |
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/cdx/indexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/cdx/indexer Added Files: BDBResourceIndexWriter.java ArcIndexer.java PipelineFilter.java PipelineStatus.java IndexPipeline.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: BDBResourceIndexWriter.java --- /* BDBResourceIndexWriter * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx.indexer; import java.io.File; import java.io.RandomAccessFile; import org.archive.wayback.cdx.BDBResourceIndex; import org.archive.wayback.cdx.CDXRecord; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import com.sleepycat.je.DatabaseException; /** * Implements updates to a BDBResourceIndex * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class BDBResourceIndexWriter { // TODO: move to somewhere better... private final static String CDX_HEADER_MAGIC = " CDX "; private BDBResourceIndex db = null; /** * Constructor */ public BDBResourceIndexWriter() { super(); } protected void init(final String thePath, final String theDbName) throws Exception { db = new BDBResourceIndex(thePath, theDbName); } protected void init(BDBResourceIndex db) { this.db = db; } protected void shutdown() throws DatabaseException { db.shutdownDB(); } /** * reads all ResourceResult objects from CDX at filePath, and merges them * into the BDBResourceIndex. * * @param indexFile * to CDX file * @throws Exception */ public void importFile(File indexFile) throws Exception { SearchResults results = readFile(indexFile); db.addResults(results); } private SearchResults readFile(File indexFile) throws Exception { RandomAccessFile raFile = new RandomAccessFile(indexFile, "r"); SearchResults results = new SearchResults(); int lineNumber = 0; CDXRecord cdxRecord = new CDXRecord(); while (true) { String line = raFile.readLine(); if (line == null) { break; } lineNumber++; if ((lineNumber == 1) && (line.contains(CDX_HEADER_MAGIC))) { continue; } cdxRecord.parseLine(line, lineNumber); SearchResult result = cdxRecord.toSearchResult(); results.addSearchResult(result); } return results; } /** * @param args */ public static void main(String[] args) { try { BDBResourceIndexWriter idx = new BDBResourceIndexWriter(); idx.init(args[0], args[1]); idx.importFile(new File(args[2])); idx.shutdown(); } catch (Exception e) { e.printStackTrace(); } } } --- NEW FILE: ArcIndexer.java --- /* ArcIndexer * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx.indexer; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.text.ParseException; import java.util.Iterator; import org.archive.io.arc.ARCReader; import org.archive.io.arc.ARCReaderFactory; import org.archive.io.arc.ARCRecord; import org.archive.io.arc.ARCRecordMetaData; import org.archive.net.UURI; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import org.apache.commons.httpclient.Header; /** * Transforms an ARC file into ResourceResults, or a serialized ResourceResults * file(CDX). * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class ArcIndexer { private final static String LOCATION_HTTP_HEADER = "Location"; private final static String CDX_HEADER_STRING = " CDX N b h m s k r V g"; /** * Constructor */ public ArcIndexer() { super(); } /** * Create a ResourceResults representing the records in ARC file at arcPath. * * @param arc * @return ResourceResults in arcPath. * @throws IOException */ public SearchResults indexArc(File arc) throws IOException { SearchResults results = new SearchResults(); ARCReader arcReader = ARCReaderFactory.get(arc); arcReader.setParseHttpHeaders(true); // doh. this does not generate quite the columns we need: // arcReader.createCDXIndexFile(arcPath); Iterator itr = arcReader.iterator(); while (itr.hasNext()) { ARCRecord rec = (ARCRecord) itr.next(); SearchResult result; try { result = arcRecordToSearchResult(rec, arc); } catch (NullPointerException e) { e.printStackTrace(); continue; } catch (ParseException e) { e.printStackTrace(); continue; } if(result != null) { results.addSearchResult(result); } } return results; } private SearchResult arcRecordToSearchResult(final ARCRecord rec, File arc) throws NullPointerException, IOException, ParseException { rec.close(); ARCRecordMetaData meta = rec.getMetaData(); SearchResult result = new SearchResult(); result.put(WaybackConstants.RESULT_ARC_FILE,arc.getName()); result.put(WaybackConstants.RESULT_OFFSET,""+meta.getOffset()); String statusCode = (meta.getStatusCode() == null) ? "-" : meta .getStatusCode(); result.put(WaybackConstants.RESULT_HTTP_CODE,statusCode); result.put(WaybackConstants.RESULT_MD5_DIGEST,meta.getDigest()); result.put(WaybackConstants.RESULT_MIME_TYPE,meta.getMimetype()); String uriStr = meta.getUrl(); if(uriStr.startsWith(ARCRecord.ARC_MAGIC_NUMBER)) { // skip filedesc record... return null; } UURI uri = new UURI(uriStr, false); result.put(WaybackConstants.RESULT_ORIG_HOST,uri.getHost()); String redirectUrl = "-"; Header[] headers = rec.getHttpHeaders(); if (headers != null) { for (int i = 0; i < headers.length; i++) { if (headers[i].getName().equals(LOCATION_HTTP_HEADER)) { redirectUrl = headers[i].getValue(); break; } } } result.put(WaybackConstants.RESULT_REDIRECT_URL,redirectUrl); result.put(WaybackConstants.RESULT_CAPTURE_DATE,meta.getDate()); UURI uriCap = new UURI(meta.getUrl(), false); String searchHost = uriCap.getHostBasename(); String searchPath = uriCap.getEscapedPathQuery(); String indexUrl = searchHost + searchPath; result.put(WaybackConstants.RESULT_URL,indexUrl); return result; } /** * Write out ResourceResults into CDX file at cdxPath * * @param results * @param target * @throws IOException */ public void serializeResults(final SearchResults results, File target) throws IOException { // TODO will this automatically close when it falls out of scope? FileOutputStream output = new FileOutputStream(target); output.write((CDX_HEADER_STRING + "\n").getBytes()); Iterator itr = results.iterator(); while (itr.hasNext()) { SearchResult result = (SearchResult) itr.next(); output.write((result.toString() + "\n").getBytes()); } } /** * @param args */ public static void main(String[] args) { ArcIndexer indexer = new ArcIndexer(); File arc = new File(args[0]); File cdx = new File(args[1]); try { SearchResults results = indexer.indexArc(arc); indexer.serializeResults(results, cdx); } catch (Exception e) { e.printStackTrace(); } } } --- NEW FILE: PipelineStatus.java --- /* PipelineStatus * * Created on Oct 20, 2005 * * Copyright (C) 2005 Internet Archive. * * This file is part of the wayback (crawler.archive.org). * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx.indexer; /** * Data bag for handing off status of Pipeline to PipelineStatus.jsp. * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class PipelineStatus { private String numQueuedForIndex; private String numQueuedForMerge; /** * Constructor */ public PipelineStatus() { super(); // TODO Auto-generated constructor stub } /** * @return Returns the numQueuedForIndex. */ public String getNumQueuedForIndex() { return numQueuedForIndex; } /** * @param numQueuedForIndex * The numQueuedForIndex to set. */ public void setNumQueuedForIndex(String numQueuedForIndex) { this.numQueuedForIndex = numQueuedForIndex; } /** * @return Returns the numQueuedForMerge. */ public String getNumQueuedForMerge() { return numQueuedForMerge; } /** * @param numQueuedForMerge * The numQueuedForMerge to set. */ public void setNumQueuedForMerge(String numQueuedForMerge) { this.numQueuedForMerge = numQueuedForMerge; } /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub } } --- NEW FILE: IndexPipeline.java --- /* IndexPipeline * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx.indexer; import java.io.File; import java.io.IOException; import java.net.MalformedURLException; import java.util.ArrayList; import java.util.Iterator; import java.util.Properties; import org.archive.wayback.PropertyConfigurable; import org.archive.wayback.cdx.BDBResourceIndex; import org.archive.wayback.core.SearchResults; import org.archive.wayback.exception.ConfigurationException; import com.sleepycat.je.DatabaseException; import com.sun.org.apache.xml.internal.utils.StringToStringTable; /** * Implements indexing of new ARC files, and merging with a BDBResourceIndex. * Assumes LocalBDBResourceIndex and LocalARCResourceStore for now. * Maintains state using directories and files for now. * * There are 3 primary components, each could be a thread, but the steps are * run in serial for the moment: * 1) watch for new ARC files, and queue them for indexing * 2) index queued ARC files into CDX format, queue the CDX files for merging. * 3) merge queued CDX files with the ResourceIndex. * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class IndexPipeline implements PropertyConfigurable{ private final static String RUN_PIPELINE = "indexpipeline.runpipeline"; private final static String INDEX_PATH = "resourceindex.indexpath"; private final static String DB_NAME = "resourceindex.dbname"; private final static String ARC_PATH = "arcpath"; private final static String WORK_PATH = "indexpipeline.workpath"; private final static String QUEUED_DIR = "queued"; private final static String TO_BE_INDEXED_DIR = "toBeIndexed"; private final static String INDEXING_DIR = "indexing"; private final static String TO_BE_MERGED_DIR = "toBeMerged"; private File arcDir = null; private File workDir = null; private File queuedDir = null; private File toBeIndexedDir = null; private File indexingDir = null; private File toBeMergedDir = null; private BDBResourceIndex db = null; private static Thread indexUpdateThread = null; /** * Constructor */ public IndexPipeline() { super(); } private void ensureDir(File dir) throws IOException { if (!dir.isDirectory() && !dir.mkdirs()) { throw new IOException("FAILED to create " + dir.getAbsolutePath()); } } /** * Initialize this object, creating directories if needed, and starting * thread if configured. * * @param p configuration * @throws IOException */ public void init(Properties p) throws ConfigurationException { // where do we find ARC files? String arcPath = (String) p.get(ARC_PATH); if (arcPath == null || (arcPath.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + ARC_PATH); } // where is the BDB? (and what is it named?) String dbPath = (String) p.get(INDEX_PATH); if (dbPath == null || (dbPath.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + INDEX_PATH); } String dbName = (String) p.get(DB_NAME); if (dbName == null || (dbName.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + DB_NAME); } // where do we keep working files? String workPath = (String) p.get(WORK_PATH); if (workPath == null || (workPath.length() <= 0)) { throw new IllegalArgumentException("Failed to find " + WORK_PATH); } arcDir = new File(arcPath); workDir = new File(workPath); queuedDir = new File(workDir,QUEUED_DIR); toBeIndexedDir = new File(workDir,TO_BE_INDEXED_DIR); indexingDir = new File(workDir,INDEXING_DIR); toBeMergedDir = new File(workDir,TO_BE_MERGED_DIR); try { ensureDir(workDir); ensureDir(queuedDir); ensureDir(toBeIndexedDir); ensureDir(indexingDir); ensureDir(toBeMergedDir); File dbFile = new File(dbPath); ensureDir(dbFile); } catch (IOException e) { e.printStackTrace(); throw new ConfigurationException(e.getMessage()); } String runPipeline = (String) p.get(RUN_PIPELINE); try { db = new BDBResourceIndex(dbPath, dbName); } catch (DatabaseException e) { e.printStackTrace(); throw new ConfigurationException(e.getMessage()); } if ((runPipeline != null) && (runPipeline.equals("1"))) { // TODO: Logger! System.out.println("LocalDBDResourceIndex starting pipeline " + "thread..."); if (indexUpdateThread == null) { startIndexPipelineThread(db); } } } private synchronized void startIndexPipelineThread( final BDBResourceIndex bdb) { if (indexUpdateThread != null) { return; } indexUpdateThread = new IndexPipelineThread(bdb, this); indexUpdateThread.start(); } private StringToStringTable getQueuedFiles() { StringToStringTable hash = new StringToStringTable(); String entries[] = queuedDir.list(); for (int i = 0; i < entries.length; i++) { hash.put(entries[i], "i"); } return hash; } private Iterator getDirFilesIterator(File dir) { String files[] = dir.list(); ArrayList list = new ArrayList(); if (files != null) { for (int i = 0; i < files.length; i++) { File file = new File(dir, files[i]); if (file.isFile()) { list.add(files[i]); } } } return list.iterator(); } // this should be a method call into ResourceStore... private Iterator getNewArcs() { StringToStringTable queued = getQueuedFiles(); ArrayList newArcs = new ArrayList(); String arcs[] = arcDir.list(); if (arcs != null) { for (int i = 0; i < arcs.length; i++) { File arc = new File(arcDir,arcs[i]); if(arc.isFile() && arcs[i].endsWith(".arc.gz")) { if (!queued.contains(arcs[i])) { newArcs.add(arcs[i]); } } } } return newArcs.iterator(); } private void queueArcForIndex(final String newArc) throws IOException { File newQueuedFile = new File(queuedDir,newArc); File newToBeIndexedFile = new File(toBeIndexedDir,newArc); newToBeIndexedFile.createNewFile(); newQueuedFile.createNewFile(); } /** * Find any new ARC files and queue them for indexing. * @throws IOException */ public void queueNewArcsForIndex() throws IOException { Iterator newArcs = getNewArcs(); while(newArcs.hasNext()) { String newArc = (String) newArcs.next(); queueArcForIndex(newArc); } } /** * Index any ARC files queued for indexing, queueing the resulting CDX files * for merging with the BDBResourceIndex. * * @param indexer * @throws MalformedURLException * @throws IOException */ public void indexArcs(ArcIndexer indexer) throws MalformedURLException, IOException { Iterator toBeIndexed = getDirFilesIterator(toBeIndexedDir); while(toBeIndexed.hasNext()) { String base = (String) toBeIndexed.next(); File arcFile = new File(arcDir,base); File toBeIndexedFlagFile = new File(toBeIndexedDir,base); File indexFile = new File(indexingDir,base); File toBeMergedFile = new File(toBeMergedDir,base); SearchResults res = indexer.indexArc(arcFile); indexer.serializeResults(res, indexFile); if (!indexFile.renameTo(toBeMergedFile)) { throw new IOException("Unable to move " + indexFile.getAbsolutePath() + " to " + toBeMergedFile.getAbsolutePath()); } if (!toBeIndexedFlagFile.delete()) { throw new IOException("Unable to delete " + toBeIndexedFlagFile.getAbsolutePath()); } } } /** * Add any new CDX files in toBeMergedDir to the BDB, deleting the CDX * files as they are merged * @param dbWriter */ public void mergeIndex(BDBResourceIndexWriter dbWriter) { int numMerged = 0; Iterator toBeMerged = getDirFilesIterator(toBeMergedDir); while(toBeMerged.hasNext()) { File indexFile = new File(toBeMergedDir,(String) toBeMerged.next()); try { dbWriter.importFile(indexFile); if (!indexFile.delete()) { throw new IOException("Unable to unlink " + indexFile.getAbsolutePath()); } numMerged++; } catch (Exception e) { e.printStackTrace(); } } if (numMerged > 0) { System.out.println("Merged " + numMerged + " files."); } } /** * Gather a snapshot of the pipeline in a PipelineStatus object. * @return PipelineStatus */ public PipelineStatus getStatus() { PipelineStatus status = new PipelineStatus(); String index[] = toBeIndexedDir.list(); String merge[] = toBeMergedDir.list(); String numQueuedForIndex = (index == null) ? "0" : "" + index.length; String numQueuedForMerge = (merge == null) ? "0" : "" + merge.length; status.setNumQueuedForIndex(numQueuedForIndex); status.setNumQueuedForMerge(numQueuedForMerge); return status; } /** * @param args */ public static void main(String[] args) { } /** * Thread that repeatedly runs processing of an IndexPipeline and merges new * data into a BDBResourceIndex * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ private class IndexPipelineThread extends Thread { private final static int SLEEP_MILLISECONDS = 10000; private BDBResourceIndexWriter merger = null; private ArcIndexer indexer = new ArcIndexer(); IndexPipeline pipeline = null; /** * Constructor * * @param bdb * initialized BDBResourceIndex * @param pipeline * initialized IndexPipeline */ public IndexPipelineThread(final BDBResourceIndex bdb, IndexPipeline pipeline) { super("IndexPipelineThread"); super.setDaemon(true); merger = new BDBResourceIndexWriter(); merger.init(bdb); this.pipeline = pipeline; System.out.print("Pipeline Thread is ALIVE!"); } public void run() { while (true) { try { pipeline.queueNewArcsForIndex(); pipeline.indexArcs(indexer); pipeline.mergeIndex(merger); sleep(SLEEP_MILLISECONDS); } catch (InterruptedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } } } --- NEW FILE: PipelineFilter.java --- /* PipeLineServletFilter * * Created on Oct 20, 2005 * * Copyright (C) 2005 Internet Archive. * * This file is part of the wayback (crawler.archive.org). * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.cdx.indexer; import java.io.IOException; import java.util.Enumeration; import java.util.Properties; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.RequestDispatcher; import javax.servlet.ServletContext; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.exception.ConfigurationException; /** * @author brad * */ public class PipelineFilter implements Filter { private final String PIPELINE_STATUS_JSP = "pipeline.statusjsp"; private IndexPipeline pipeline = null; private String pipelineStatusJsp = null; /** * Constructor */ public PipelineFilter() { super(); } public void init(FilterConfig c) throws ServletException { Properties p = new Properties(); pipelineStatusJsp = c.getInitParameter(PIPELINE_STATUS_JSP); if ((pipelineStatusJsp == null) || (pipelineStatusJsp.length() <= 0)) { throw new ServletException("No config (" + PIPELINE_STATUS_JSP + ")"); } ServletContext sc = c.getServletContext(); for (Enumeration e = sc.getInitParameterNames(); e.hasMoreElements();) { String key = (String) e.nextElement(); p.put(key, sc.getInitParameter(key)); } pipeline = new IndexPipeline(); try { pipeline.init(p); } catch (ConfigurationException e) { e.printStackTrace(); throw new ServletException(e.getMessage()); } } /* * (non-Javadoc) * * @see javax.servlet.Filter#doFilter(javax.servlet.ServletRequest, * javax.servlet.ServletResponse, javax.servlet.FilterChain) */ public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { if (!handle(request, response)) { chain.doFilter(request, response); } } protected boolean handle(final ServletRequest request, final ServletResponse response) throws IOException, ServletException { if (!(request instanceof HttpServletRequest)) { return false; } if (!(response instanceof HttpServletResponse)) { return false; } HttpServletRequest httpRequest = (HttpServletRequest) request; PipelineStatus status = pipeline.getStatus(); request.setAttribute("pipelinestatus", status); RequestDispatcher dispatcher = httpRequest .getRequestDispatcher(pipelineStatusJsp); dispatcher.forward(request, response); return true; } /* * (non-Javadoc) * * @see javax.servlet.Filter#destroy() */ public void destroy() { } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:40
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/WEB-INF In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/webapp/WEB-INF Modified Files: web.xml Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: web.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/WEB-INF/web.xml,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** web.xml 26 Oct 2005 01:17:13 -0000 1.4 --- web.xml 16 Nov 2005 03:11:30 -0000 1.5 *************** *** 4,7 **** --- 4,11 ---- <web-app> + <!-- Local Arc Path Configuration: + used by both indexpipeline and LocalARCResourceStore + --> + <context-param> <param-name>arcpath</param-name> *************** *** 15,61 **** - <!-- ReplayUI Configuration --> - - <context-param> - <param-name>replayui.class</param-name> - <!-- - <param-value>org.archive.wayback.rawreplayui.RawReplayUI</param-value> - --> - <param-value>org.archive.wayback.jsreplayui.JSReplayUI</param-value> - <description>Class that implements ReplayUI for this Wayback</description> - </context-param> - - <context-param> - <param-name>replayui.jsppath</param-name> - <param-value>jsp/ReplayUI</param-value> - <description> - RawReplayUI specific path to jsp pages. relative to webapp/ - </description> - </context-param> - - - - <!-- QueryUI Configuration --> - - <context-param> - <param-name>queryui.class</param-name> - <param-value>org.archive.wayback.simplequeryui.SimpleQueryUI</param-value> - <description>Class that implements QueryUI for this Wayback</description> - </context-param> - - <context-param> - <param-name>queryui.jsppath</param-name> - <param-value>jsp/QueryUI</param-value> - <description> - SimpleQueryUI specific path to jsp pages. relative to webapp/ - </description> - </context-param> - - - <!-- ResourceStore Configuration --> <context-param> ! <param-name>resourcestore.class</param-name> <param-value>org.archive.wayback.localresourcestore.LocalARCResourceStore</param-value> <description>Class that implements ResourceStore for this Wayback</description> --- 19,26 ---- <!-- ResourceStore Configuration --> <context-param> ! <param-name>resourcestore.classname</param-name> <param-value>org.archive.wayback.localresourcestore.LocalARCResourceStore</param-value> <description>Class that implements ResourceStore for this Wayback</description> *************** *** 67,72 **** <context-param> ! <param-name>resourceindex.class</param-name> ! <param-value>org.archive.wayback.localbdbresourceindex.LocalBDBResourceIndex</param-value> <description>Class that implements ResourceIndex for this Wayback</description> </context-param> --- 32,37 ---- <context-param> ! <param-name>resourceindex.classname</param-name> ! <param-value>org.archive.wayback.cdx.LocalBDBResourceIndex</param-value> <description>Class that implements ResourceIndex for this Wayback</description> </context-param> *************** *** 89,92 **** --- 54,61 ---- </context-param> + + + <!-- ResourceIndex Pipeline Configuration --> + <context-param> <param-name>indexpipeline.workpath</param-name> *************** *** 106,148 **** </context-param> ! ! <!-- Replay Servlet Configuration --> ! ! <servlet> ! <servlet-name>ReplayServlet</servlet-name> ! <servlet-class>org.archive.wayback.servletglue.WBReplayUIServlet</servlet-class> ! </servlet> ! <servlet-mapping> ! <servlet-name>ReplayServlet</servlet-name> ! <url-pattern>/replay</url-pattern> ! </servlet-mapping> ! ! <!-- Replay Filter Configuration --> <filter> ! <filter-name>RetrievalFilter</filter-name> ! <filter-class>org.archive.wayback.servletglue.RequestFilter</filter-class> ! ! <init-param> ! <param-name>requestparser.class</param-name> ! <param-value>org.archive.wayback.rawreplayui.RawReplayUI</param-value> ! </init-param> <init-param> ! <param-name>handler.url</param-name> ! <param-value>/replay</param-value> </init-param> </filter> <filter-mapping> ! <filter-name>RetrievalFilter</filter-name> ! <url-pattern>/*</url-pattern> </filter-mapping> <!-- Query Servlet Configuration --> <servlet> <servlet-name>QueryServlet</servlet-name> ! <servlet-class>org.archive.wayback.servletglue.WBQueryUIServlet</servlet-class> </servlet> <servlet-mapping> --- 75,104 ---- </context-param> ! <!-- Pipeline Filter Configuration ! this enables a trival (and very in-progress) UI for viewing the ! pipeline status. ! --> <filter> ! <filter-name>PipelineFilter</filter-name> ! <filter-class>org.archive.wayback.cdx.indexer.PipelineFilter</filter-class> <init-param> ! <param-name>pipeline.statusjsp</param-name> ! <param-value>jsp/PipelineUI/PipelineStatus.jsp</param-value> </init-param> </filter> <filter-mapping> ! <filter-name>PipelineFilter</filter-name> ! <url-pattern>/pipeline</url-pattern> </filter-mapping> + <!-- Query Servlet Configuration --> <servlet> <servlet-name>QueryServlet</servlet-name> ! <servlet-class>org.archive.wayback.query.QueryServlet</servlet-class> </servlet> <servlet-mapping> *************** *** 151,169 **** </servlet-mapping> ! <!-- Query Filter Configuration --> <filter> <filter-name>QueryFilter</filter-name> ! <filter-class>org.archive.wayback.servletglue.RequestFilter</filter-class> <init-param> - <param-name>requestparser.class</param-name> - <param-value>org.archive.wayback.simplequeryui.SimpleQueryUI</param-value> - </init-param> - <init-param> <param-name>handler.url</param-name> <param-value>/query</param-value> </init-param> - </filter> <filter-mapping> --- 107,152 ---- </servlet-mapping> ! <!-- QueryUI Configuration --> ! ! <context-param> ! <param-name>queryrenderer.classname</param-name> ! <param-value>org.archive.wayback.query.Renderer</param-value> ! <description>Implementation responsible for drawing Index Query results</description> ! </context-param> ! ! <context-param> ! <param-name>queryui.jsppath</param-name> ! <param-value>jsp/QueryUI</param-value> ! <description> ! SimpleQueryUI specific path to jsp pages. relative to webapp/ ! </description> ! </context-param> ! ! ! ! ! <!-- Replay Servlet Configuration --> ! ! <servlet> ! <servlet-name>ReplayServlet</servlet-name> ! <servlet-class>org.archive.wayback.replay.ReplayServlet</servlet-class> ! </servlet> ! <servlet-mapping> ! <servlet-name>ReplayServlet</servlet-name> ! <url-pattern>/replay</url-pattern> ! </servlet-mapping> ! ! ! ! <!-- Archival Url Query Filter Configuration --> <filter> <filter-name>QueryFilter</filter-name> ! <filter-class>org.archive.wayback.archivalurl.QueryFilter</filter-class> <init-param> <param-name>handler.url</param-name> <param-value>/query</param-value> </init-param> </filter> <filter-mapping> *************** *** 171,191 **** <url-pattern>/*</url-pattern> </filter-mapping> ! ! ! <!-- Pipeline Filter Configuration --> <filter> ! <filter-name>PipelineFilter</filter-name> ! <filter-class>org.archive.wayback.arcindexer.PipelineFilter</filter-class> <init-param> ! <param-name>pipeline.statusjsp</param-name> ! <param-value>jsp/PipelineUI/PipelineStatus.jsp</param-value> </init-param> </filter> <filter-mapping> ! <filter-name>PipelineFilter</filter-name> ! <url-pattern>/pipeline</url-pattern> </filter-mapping> ! ! </web-app> --- 154,250 ---- <url-pattern>/*</url-pattern> </filter-mapping> ! ! ! <!-- Archival Url Replay Filter Configuration --> <filter> ! <filter-name>ReplayFilter</filter-name> ! <filter-class>org.archive.wayback.archivalurl.ReplayFilter</filter-class> ! <init-param> ! <param-name>handler.url</param-name> ! <param-value>/replay</param-value> </init-param> </filter> <filter-mapping> ! <filter-name>ReplayFilter</filter-name> ! <url-pattern>/*</url-pattern> </filter-mapping> ! ! ! <!-- Archival Url URI Conversion Configuration --> ! ! <context-param> ! <param-name>replayuriconverter.classname</param-name> ! <param-value>org.archive.wayback.archivalurl.ResultURIConverter</param-value> ! <description>Class that implements translation of index results to Replayable URIs for this Wayback</description> ! </context-param> ! ! <context-param> ! <param-name>replayuriprefix</param-name> ! <param-value>http://localhost:8080/wayback</param-value> ! <description>HTTP URI prefix for the replay UI</description> ! </context-param> ! ! ! <!-- Archival Url JSReplayUI Configuration --> ! ! <context-param> ! <param-name>replayrenderer.classname</param-name> ! <param-value>org.archive.wayback.archivalurl.JSReplayRenderer</param-value> ! <description>Implementation responsible for drawing replayed resources and replay error messages</description> ! </context-param> ! ! <context-param> ! <param-name>replayui.jsppath</param-name> ! <param-value>jsp/ReplayUI</param-value> ! <description> ! RawReplayUI specific path to jsp pages. relative to webapp/ ! </description> ! </context-param> ! ! ! <!-- Proxy RawReplayUI Configuration --> ! <!-- ! ! <context-param> ! <param-name>replayrenderer.classname</param-name> ! <param-value>org.archive.wayback.proxy.RawReplayRenderer</param-value> ! <description>Implementation responsible for drawing replayed resources and replay error messages</description> ! </context-param> ! ! <context-param> ! <param-name>replayui.jsppath</param-name> ! <param-value>jsp/ReplayUI</param-value> ! <description> ! RawReplayUI specific path to jsp pages. relative to webapp/ ! </description> ! </context-param> ! --> ! <!-- Proxy URI Conversion Configuration --> ! <!-- ! ! <context-param> ! <param-name>replayuriconverter.classname</param-name> ! <param-value>org.archive.wayback.proxy.ResultURIConverter</param-value> ! <description>Class that implements translation of index results to Replayable URIs for this Wayback</description> ! </context-param> ! --> ! <!-- Proxy ReplayFilter Configuration --> ! <!-- ! <filter> ! <filter-name>ReplayFilter</filter-name> ! <filter-class>org.archive.wayback.proxy.ReplayFilter</filter-class> ! ! <init-param> ! <param-name>handler.url</param-name> ! <param-value>/replay</param-value> ! </init-param> ! </filter> ! <filter-mapping> ! <filter-name>ReplayFilter</filter-name> ! <url-pattern>/*</url-pattern> ! </filter-mapping> ! --> ! </web-app> |
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/query In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/query Added Files: Renderer.java UIQueryResults.java OpenSearchQueryParser.java QueryServlet.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: UIQueryResults.java --- /* UIQueryResults * * $Id: UIQueryResults.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 12:03:14 PM Nov 8, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.query; import java.text.ParseException; import java.util.Iterator; import org.archive.wayback.WaybackConstants; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.WaybackRequest; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class UIQueryResults { private String searchUrl; private Timestamp startTimestamp; private Timestamp endTimestamp; private Timestamp firstResultTimestamp; private Timestamp lastResultTimestamp; private int resultCount; private SearchResults results; private ReplayResultURIConverter uriConverter; /** * @param wmRequest * @param results * @param request * @param replayUI * @throws ParseException */ public UIQueryResults(WaybackRequest wbRequest, SearchResults results, ReplayResultURIConverter uriConverter) throws ParseException { this.searchUrl = wbRequest.get(WaybackConstants.RESULT_URL); this.startTimestamp = Timestamp.parseBefore(results. getFilter(WaybackConstants.REQUEST_START_DATE)); this.endTimestamp = Timestamp.parseBefore(results.getFilter( WaybackConstants.REQUEST_END_DATE)); this.firstResultTimestamp = Timestamp.parseBefore(results .getFirstResultDate()); this.lastResultTimestamp = Timestamp.parseBefore(results .getLastResultDate()); this.resultCount = results.getResultCount(); this.results = results; this.uriConverter = uriConverter; } /** * @return Timestamp end cutoff requested by user */ public Timestamp getEndTimestamp() { return endTimestamp; } /** * @return first Timestamp in returned ResourceResults */ public Timestamp getFirstResultTimestamp() { return firstResultTimestamp; } /** * @return last Timestamp in returned ResourceResults */ public Timestamp getLastResultTimestamp() { return lastResultTimestamp; } /** * @return number of SearchResult objects in response */ public int getResultCount() { return resultCount; } /** * @return URL or URL prefix requested by user */ public String getSearchUrl() { return searchUrl; } /** * @return Timestamp start cutoff requested by user */ public Timestamp getStartTimestamp() { return startTimestamp; } /** * @return Iterator of ResourceResults */ public Iterator resultsIterator() { return results.iterator(); } /** * @param result * @return URL string that will replay the specified Resource Result. */ public String resultToReplayUrl(SearchResult result) { return uriConverter.makeReplayURI(result); } public String prettySearchEndDate() { return endTimestamp.prettyDate(); } public String prettySearchStartDate() { return startTimestamp.prettyDate(); } } --- NEW FILE: QueryServlet.java --- /* QueryServlet * * $Id: QueryServlet.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 2:42:50 PM Nov 7, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.query; import java.io.IOException; import java.util.Enumeration; import java.util.Iterator; import java.util.Map; import java.util.Properties; import java.util.Set; import javax.servlet.ServletConfig; import javax.servlet.ServletContext; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.WaybackConstants; import org.archive.wayback.QueryRenderer; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.ResourceIndex; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.WaybackLogic; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.BadQueryException; import org.archive.wayback.exception.WaybackException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class QueryServlet extends HttpServlet { /** * */ private static final String WMREQUEST_ATTRIBUTE = "wmrequest.attribute"; private static final long serialVersionUID = 1L; private WaybackLogic wayback = new WaybackLogic(); /** * Constructor */ public QueryServlet() { super(); } public void init(ServletConfig c) throws ServletException { Properties p = new Properties(); for (Enumeration e = c.getInitParameterNames(); e.hasMoreElements();) { String key = (String) e.nextElement(); p.put(key, c.getInitParameter(key)); } ServletContext sc = c.getServletContext(); for (Enumeration e = sc.getInitParameterNames(); e.hasMoreElements();) { String key = (String) e.nextElement(); p.put(key, sc.getInitParameter(key)); } // TODO initialize renderer try { wayback.init(p); } catch (Exception e) { throw new ServletException(e.getMessage()); } } private String getMapParam(Map queryMap, String field) { String arr[] = (String[]) queryMap.get(field); if (arr == null || arr.length == 0) { return null; } return arr[0]; } public WaybackRequest parseCGIRequest(HttpServletRequest httpRequest) throws BadQueryException { WaybackRequest wbRequest = new WaybackRequest(); Map queryMap = httpRequest.getParameterMap(); Set keys = queryMap.keySet(); Iterator itr = keys.iterator(); while(itr.hasNext()) { String key = (String) itr.next(); String val = getMapParam(queryMap,key); wbRequest.put(key,val); } return wbRequest; } public void doGet(HttpServletRequest httpRequest, HttpServletResponse httpResponse) throws IOException, ServletException { WaybackRequest wbRequest = (WaybackRequest) httpRequest .getAttribute(WMREQUEST_ATTRIBUTE); ResourceIndex idx = wayback.getResourceIndex(); QueryRenderer renderer = wayback.getQueryRenderer(); ReplayResultURIConverter uriConverter = wayback.getURIConverter(); try { if (wbRequest == null) { wbRequest = parseCGIRequest(httpRequest); } SearchResults results; results = idx.query(wbRequest); if (wbRequest.get(WaybackConstants.REQUEST_TYPE).equals( WaybackConstants.REQUEST_URL_QUERY)) { renderer.renderUrlResults(httpRequest, httpResponse, wbRequest, results, uriConverter); } else if (wbRequest.get(WaybackConstants.REQUEST_TYPE).equals( WaybackConstants.REQUEST_URL_PREFIX_QUERY)) { renderer.renderUrlPrefixResults(httpRequest, httpResponse, wbRequest, results, uriConverter); } else { throw new BadQueryException("Unknown query " + WaybackConstants.REQUEST_TYPE); } } catch (WaybackException wbe) { renderer.renderException(httpRequest, httpResponse, wbRequest, wbe); } } } --- NEW FILE: OpenSearchQueryParser.java --- /* OpenSearchParser * * $Id: OpenSearchQueryParser.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 1:37:19 PM Nov 14, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.query; import java.util.Map; import java.util.regex.Pattern; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.BadQueryException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class OpenSearchQueryParser { private final static String SEARCH_QUERY = "q"; private final static String SEARCH_RESULTS = "count"; private final static String START_PAGE = "start_page"; // private final static String START_INDEX = "start_index"; private final static Pattern WHITESPACE_PATTERN = Pattern.compile("\\s+"); // singles consume the next non-whitespace token following the term private String[] singleTokens = { "url", "site", "mimetype", "noredirect" }; // lines consume the entire rest of the query private String[] lineTokens = { "terms" }; private String getMapParam(Map queryMap, String field) { String arr[] = (String[]) queryMap.get(field); if (arr == null || arr.length == 0) { return null; } return arr[0]; } public WaybackRequest parseQuery(Map queryMap) throws BadQueryException { WaybackRequest wbRequest = new WaybackRequest(); String query = getMapParam(queryMap, SEARCH_QUERY); String numResults = getMapParam(queryMap, SEARCH_RESULTS); String startPage = getMapParam(queryMap, START_PAGE); if (numResults != null) { int nr = Integer.parseInt(numResults); wbRequest.setResultsPerPage(nr); } if (startPage != null) { int sp = Integer.parseInt(startPage); wbRequest.setPageNum(sp); } if (query == null) { throw new BadQueryException("No search query argument"); } parseTerms(wbRequest, query); return wbRequest; } private void parseTerms(WaybackRequest wbRequest, String query) throws BadQueryException { // first try the entire line_tokens: for (int i = 0; i < lineTokens.length; i++) { String token = lineTokens[i] + ":"; int index = query.indexOf(token); if (index > -1) { // found it, take value as the remainder of the query String value = query.substring(index + token.length()); // TODO: trim trailing whitespace? wbRequest.put(lineTokens[i], value); query = query.substring(0, index); } } // now split whatever is left on whitespace: String[] parts = WHITESPACE_PATTERN.split(query); for (int i = 0; i < parts.length; i++) { String token = parts[i]; int colonIndex = token.indexOf(":"); if (colonIndex == -1) { throw new BadQueryException("Bad search token(" + token + ")"); } String key = token.substring(0, colonIndex); String value = token.substring(colonIndex + 1); // TODO: make sure key is in singleTokens? // let's just let em all thru for now: wbRequest.put(key, value); } } } --- NEW FILE: Renderer.java --- /* QueryRenderer * * $Id: Renderer.java,v 1.1 2005/11/16 03:11:30 bradtofel Exp $ * * Created on 2:47:42 PM Nov 7, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.query; import java.io.IOException; import java.text.ParseException; import java.util.Properties; import javax.servlet.RequestDispatcher; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.QueryRenderer; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.ConfigurationException; import org.archive.wayback.exception.WaybackException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class Renderer implements QueryRenderer { private final static String JSP_PATH = "queryui.jsppath"; private String jspPath = null; private final String ERROR_JSP = "ErrorResult.jsp"; private final String QUERY_JSP = "QueryResults.jsp"; private final String PREFIX_QUERY_JSP = "PathQueryResults.jsp"; public void init(Properties p) throws ConfigurationException { this.jspPath = (String) p.get(JSP_PATH); if (this.jspPath == null || this.jspPath.length() <= 0) { throw new ConfigurationException("Failed to find " + JSP_PATH); } } public void renderException(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, WaybackException exception) throws ServletException, IOException { httpRequest.setAttribute("exception", exception); String finalJspPath = jspPath + "/" + ERROR_JSP; RequestDispatcher dispatcher = httpRequest .getRequestDispatcher(finalJspPath); dispatcher.forward(httpRequest, httpResponse); } public void renderUrlResults(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, SearchResults results, ReplayResultURIConverter uriConverter) throws ServletException, IOException { UIQueryResults uiResults; try { uiResults = new UIQueryResults(wbRequest, results, uriConverter); } catch (ParseException e) { // I don't think this should happen... e.printStackTrace(); throw new ServletException(e.getMessage()); } httpRequest.setAttribute("ui-results", uiResults); proxyRequest(httpRequest, httpResponse, QUERY_JSP); } public void renderUrlPrefixResults(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, SearchResults results, ReplayResultURIConverter uriConverter) throws ServletException, IOException { UIQueryResults uiResults; try { uiResults = new UIQueryResults(wbRequest, results, uriConverter); } catch (ParseException e) { // I don't think this should happen... e.printStackTrace(); throw new ServletException(e.getMessage()); } httpRequest.setAttribute("ui-results", uiResults); proxyRequest(httpRequest, httpResponse, PREFIX_QUERY_JSP); } /** * @param request * @param response * @param jspName * @throws ServletException * @throws IOException */ private void proxyRequest(HttpServletRequest request, HttpServletResponse response, final String jspName) throws ServletException, IOException { String finalJspPath = jspPath + "/" + jspName; RequestDispatcher dispatcher = request .getRequestDispatcher(finalJspPath); dispatcher.forward(request, response); } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:40
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/replay In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/replay Added Files: ReplayServlet.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: ReplayServlet.java --- /* WBReplayUIServlet * * Created on 2005/10/18 14:00:00 * * Copyright (C) 2005 Internet Archive. * * This file is part of the Wayback Machine (crawler.archive.org). * * Wayback Machine is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * Wayback Machine is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with Wayback Machine; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.replay; import java.io.IOException; import java.text.ParseException; import java.util.Enumeration; import java.util.Iterator; import java.util.Map; import java.util.Properties; import java.util.Set; import javax.servlet.ServletConfig; import javax.servlet.ServletContext; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.WaybackConstants; import org.archive.wayback.ReplayRenderer; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.ResourceIndex; import org.archive.wayback.ResourceStore; import org.archive.wayback.core.Resource; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.core.WaybackLogic; import org.archive.wayback.exception.BadQueryException; import org.archive.wayback.exception.WaybackException; /** * Servlet implementation for Wayback Replay requests. * * @author Brad Tofel * @version $Date: 2005/11/16 03:11:30 $, $Revision: 1.1 $ */ public class ReplayServlet extends HttpServlet { private static final String WMREQUEST_ATTRIBUTE = "wmrequest.attribute"; private static final long serialVersionUID = 1L; private WaybackLogic wayback = new WaybackLogic(); /** * Constructor */ public ReplayServlet() { super(); } public void init(ServletConfig c) throws ServletException { Properties p = new Properties(); for (Enumeration e = c.getInitParameterNames(); e.hasMoreElements();) { String key = (String) e.nextElement(); p.put(key, c.getInitParameter(key)); } ServletContext sc = c.getServletContext(); for (Enumeration e = sc.getInitParameterNames(); e.hasMoreElements();) { String key = (String) e.nextElement(); p.put(key, sc.getInitParameter(key)); } try { wayback.init(p); } catch (Exception e) { throw new ServletException(e.getMessage()); } } private String getMapParam(Map queryMap, String field) { String arr[] = (String[]) queryMap.get(field); if (arr == null || arr.length == 0) { return null; } return arr[0]; } public WaybackRequest parseCGIRequest(HttpServletRequest httpRequest) throws BadQueryException { WaybackRequest wbRequest = new WaybackRequest(); Map queryMap = httpRequest.getParameterMap(); Set keys = queryMap.keySet(); Iterator itr = keys.iterator(); while(itr.hasNext()) { String key = (String) itr.next(); String val = getMapParam(queryMap,key); wbRequest.put(key,val); } String referer = httpRequest.getHeader("REFERER"); if (referer == null) { referer = null; } wbRequest.put(WaybackConstants.REQUEST_REFERER_URL,referer); return wbRequest; } private SearchResult getClosest(SearchResults results, WaybackRequest wbRequest) throws ParseException { SearchResult closest = null; long closestDistance = 0; SearchResult cur = null; Timestamp wantTimestamp; wantTimestamp = Timestamp.parseBefore(wbRequest. get(WaybackConstants.REQUEST_EXACT_DATE)); Iterator itr = results.iterator(); while (itr.hasNext()) { cur = (SearchResult) itr.next(); long curDistance; try { Timestamp curTimestamp = Timestamp.parseBefore(cur. get(WaybackConstants.RESULT_CAPTURE_DATE)); curDistance = curTimestamp.absDistanceFromTimestamp( wantTimestamp); } catch (ParseException e) { continue; } if ((closest == null) || (curDistance < closestDistance)) { closest = cur; closestDistance = curDistance; } } return closest; } public void doGet(HttpServletRequest httpRequest, HttpServletResponse httpResponse) throws IOException, ServletException { WaybackRequest wbRequest = (WaybackRequest) httpRequest .getAttribute(WMREQUEST_ATTRIBUTE); ResourceIndex idx = wayback.getResourceIndex(); ResourceStore store = wayback.getResourceStore(); ReplayResultURIConverter uriConverter = wayback.getURIConverter(); ReplayRenderer renderer = wayback.getReplayRenderer(); try { if (wbRequest == null) { wbRequest = parseCGIRequest(httpRequest); } SearchResults results = idx.query(wbRequest); SearchResult closest = getClosest(results,wbRequest); // TODO loop here looking for closest online/available version? // OPTIMIZ maybe assume version is here and redirect now if not // exactly the date user requested, before retrieving it... Resource resource = store.retrieveResource(closest); renderer.renderResource(httpRequest, httpResponse, wbRequest, closest, resource,uriConverter); } catch (WaybackException wbe) { renderer.renderException(httpRequest, httpResponse, wbRequest, wbe); } catch (Exception e) { // TODO show something Wayback'ish to the user rather than letting // the container deal? e.printStackTrace(); throw new ServletException(e.getMessage()); } } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:39
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java Modified Files: README.txt Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: README.txt =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/README.txt,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** README.txt 19 Oct 2005 01:22:37 -0000 1.2 --- README.txt 16 Nov 2005 03:11:30 -0000 1.3 *************** *** 63,66 **** the JSPs use header + footer templates for page consistancy, and are very basic. ! ! \ No newline at end of file --- 63,89 ---- the JSPs use header + footer templates for page consistancy, and are very basic. ! ! WaybackRequest: ! resultsPerPage ! pageNum ! Properties: ! startdate = all results must be AFTER this date ! enddate = all results must be BEFORE this date ! exactdate = return results matching EXACTLY this date ! type = "replay" | "urlquery" | "urlprefixquery" | "text" ! url = all results must either exactly match this, or begin with ! this, depending on type ! ! SearchResults: ! CDX: ! url ! capturedate ! arcfile ! compressedoffset ! originalhost ! mimetype ! httpresponsecode ! md5fragment ! redirecturl ! compressedoffset ! arcfilename \ No newline at end of file |
From: Brad <bra...@us...> - 2005-11-16 03:11:39
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/template In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/webapp/template Modified Files: UI-header.jsp UI-footer.jsp Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: UI-header.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/template/UI-header.jsp,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** UI-header.jsp 26 Oct 2005 01:15:01 -0000 1.2 --- UI-header.jsp 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 1,4 **** <!-- HEADER --> ! <html> <head> --- 1,6 ---- + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!-- HEADER --> ! <html xmlns="http://www.w3.org/1999/xhtml"> <head> *************** *** 6,11 **** <link rel="stylesheet" type="text/css" ! href="<%= request.getContextPath() %>/css/styles.css" ! src="<%= request.getContextPath() %>/css/styles.css"> <title>Internet Archive Wayback Machine</title> <base target="_top"> --- 8,13 ---- <link rel="stylesheet" type="text/css" ! href="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/css/styles.css" ! src="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/css/styles.css"> <title>Internet Archive Wayback Machine</title> <base target="_top"> *************** *** 21,25 **** <!-- WAYBACK LOGO --> ! <td width="26%"><a href="<%= request.getContextPath() %>"><img src="<%= request.getContextPath() %>/images/wayback_logo_sm.gif" width="153" height="54" border="0"></a></td> <!-- /WAYBACK LOGO --> --- 23,27 ---- <!-- WAYBACK LOGO --> ! <td width="26%"><a href="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>"><img src="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/images/wayback_logo_sm.gif" width="153" height="54" border="0"></a></td> <!-- /WAYBACK LOGO --> *************** *** 44,52 **** <!-- URL FORM --> ! <form action="<%= request.getContextPath() %>/query" method="GET"> <tr> ! <td nowrap align="center"><img src="images/shim.gif" width="1" height="20"> <b class="mainBodyW"> --- 46,54 ---- <!-- URL FORM --> ! <form action="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/query" method="GET"> <tr> ! <td nowrap align="center"><img src="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/images/shim.gif" width="1" height="20"> <b class="mainBodyW"> *************** *** 54,58 **** Enter Web Address: </font> ! <input type="hidden" name="type" value="query"> <input type="text" name="url" value="http://" size="24" maxlength="256"> --- 56,60 ---- Enter Web Address: </font> ! <input type="hidden" name="type" value="urlquery"> <input type="text" name="url" value="http://" size="24" maxlength="256"> *************** *** 74,78 **** <input type="Submit" name="Submit" value="Take Me Back" align="absMiddle"> ! <a href="<%= request.getContextPath() %>/jsp/QueryUI/requestform.jsp" style="color:white;font-size:11px"> Adv. Search </a> --- 76,80 ---- <input type="Submit" name="Submit" value="Take Me Back" align="absMiddle"> ! <a href="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/jsp/QueryUI/requestform.jsp" style="color:white;font-size:11px"> Adv. Search </a> Index: UI-footer.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/template/UI-footer.jsp,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** UI-footer.jsp 20 Oct 2005 00:40:41 -0000 1.1 --- UI-footer.jsp 16 Nov 2005 03:11:29 -0000 1.2 *************** *** 4,15 **** <p> ! <a href="<%= request.getContextPath() %>">Home</a> | ! <a href="<%= request.getContextPath() %>/help.jsp">Help</a> ! </p> ! <p> ! <a href="http://www.archive.org">Internet Archive</a> | ! <a href="http://www.archive.org/about/terms.php">Terms of Use</a> | ! <a href="http://www.archive.org/about/terms.php#privacy">Privacy Policy</a> </p> </div> --- 4,17 ---- <p> ! <a href="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>">Home</a> | ! <a href="<%= request.getScheme() + "://" + request.getLocalName() + ":" + request.getLocalPort() + request.getContextPath() %>/help.jsp">Help</a> </p> + <!-- + <p> + <a href="http://www.archive.org">Internet Archive</a> | + <a href="http://www.archive.org/about/terms.php">Terms of Use</a> | + <a href="http://www.archive.org/about/terms.php#privacy">Privacy Policy</a> + </p> + --> </div> |
From: Brad <bra...@us...> - 2005-11-16 03:11:39
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/servletglue In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/servletglue Removed Files: RequestFilter.java WBQueryUIServlet.java WBReplayUIServlet.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- WBQueryUIServlet.java DELETED --- --- WBReplayUIServlet.java DELETED --- --- RequestFilter.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:39
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/ippreplayui In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/ippreplayui Removed Files: InPagePresenceReplayUI.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- InPagePresenceReplayUI.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:39
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/localbdbresourceindex In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/localbdbresourceindex Removed Files: LocalBDBResourceIndex.java BDBResourceIndex.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- BDBResourceIndex.java DELETED --- --- LocalBDBResourceIndex.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:38
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/exception In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/exception Modified Files: WaybackException.java BadQueryException.java Added Files: ConfigurationException.java BetterRequestException.java ResourceNotInArchiveException.java ResourceIndexNotAvailableException.java ResourceNotAvailableException.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: BadQueryException.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/exception/BadQueryException.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** BadQueryException.java 19 Oct 2005 01:22:36 -0000 1.2 --- BadQueryException.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 43,55 **** */ public BadQueryException(String message) { ! super(message); } ! ! /** ! * @param args ! */ ! public static void main(String[] args) { ! } - } --- 43,50 ---- */ public BadQueryException(String message) { ! super(message,"Bad Query"); } ! public BadQueryException(String message, String details) { ! super(message,"Bad Query",details); } } --- NEW FILE: ResourceNotInArchiveException.java --- package org.archive.wayback.exception; public class ResourceNotInArchiveException extends WaybackException { /** * */ private static final long serialVersionUID = 1L; /** * Constructor * * @param message */ public ResourceNotInArchiveException(String message) { super(message,"Not in Archive"); } public ResourceNotInArchiveException(String message,String details) { super(message,"Not in Archive",details); } } --- NEW FILE: ResourceIndexNotAvailableException.java --- package org.archive.wayback.exception; public class ResourceIndexNotAvailableException extends WaybackException { /** * */ private static final long serialVersionUID = 1L; /** * Constructor * * @param message */ public ResourceIndexNotAvailableException(String message) { super(message,"Index not available"); } public ResourceIndexNotAvailableException(String message, String details) { super(message,"Index not available",details); } } --- NEW FILE: BetterRequestException.java --- /* BetterRequestException * * $Id: BetterRequestException.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 6:42:01 PM Oct 31, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.exception; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class BetterRequestException extends WaybackException { /** * */ private static final long serialVersionUID = 1L; private String betterURI; /** * Constructor * * @param message */ public BetterRequestException(String betterURI) { super("Better URI for query"); this.betterURI = betterURI; } /** * @return Returns the betterURI. */ public String getBetterURI() { return betterURI; } } Index: WaybackException.java =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/exception/WaybackException.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** WaybackException.java 19 Oct 2005 01:22:36 -0000 1.2 --- WaybackException.java 16 Nov 2005 03:11:29 -0000 1.3 *************** *** 36,40 **** */ private static final long serialVersionUID = 1L; ! /** * Constructor --- 36,43 ---- */ private static final long serialVersionUID = 1L; ! private String message = ""; ! private String title = "Wayback Exception"; ! private String details = ""; ! /** * Constructor *************** *** 44,55 **** public WaybackException(String message) { super(message); } ! /** ! * @param args */ ! public static void main(String[] args) { } } --- 47,85 ---- public WaybackException(String message) { super(message); + this.message = message; } ! ! public WaybackException(String message, String title) { ! super(message); ! this.message = message; ! this.title= title; ! } ! ! public WaybackException(String message, String title, String details) { ! super(message); ! this.message = message; ! this.title= title; ! this.details = details; ! } ! /** ! * @return Returns the title. */ ! public String getTitle() { ! return title; ! } + /** + * @return Returns the message. + */ + public String getMessage() { + return message; } + /** + * @return Returns the details. + */ + public String getDetails() { + return details; + } } --- NEW FILE: ResourceNotAvailableException.java --- package org.archive.wayback.exception; public class ResourceNotAvailableException extends WaybackException { /** * */ private static final long serialVersionUID = 1L; /** * Constructor * * @param message */ public ResourceNotAvailableException(String message) { super(message,"Resource not available"); } public ResourceNotAvailableException(String message,String details) { super(message,"Resource not available",details); } } --- NEW FILE: ConfigurationException.java --- /* ConfigurationException * * $Id: ConfigurationException.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 6:35:13 PM Oct 31, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.exception; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class ConfigurationException extends WaybackException { /** * */ private static final long serialVersionUID = 1L; /** * Constructor * * @param message */ public ConfigurationException(String message) { super(message,"Configuration Error"); } public ConfigurationException(String message, String details) { super(message,"Configuration Error",details); } } |
From: Brad <bra...@us...> - 2005-11-16 03:11:38
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/ReplayUI In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/webapp/jsp/ReplayUI Modified Files: requestform.jsp ErrorResult.jsp Added Files: ErrorJavascript.jsp Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. Index: requestform.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/ReplayUI/requestform.jsp,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** requestform.jsp 20 Oct 2005 00:40:41 -0000 1.1 --- requestform.jsp 16 Nov 2005 03:11:30 -0000 1.2 *************** *** 1,10 **** <jsp:include page="../../template/UI-header.jsp" /> ! <FORM ACTION="../../replay"> ! URL:<INPUT TYPE="TEXT" NAME="url" WIDTH="80"><BR> ! Exact Date:<INPUT TYPE="TEXT" NAME="date" WIDTH="80"><BR> ! Earliest Date:<INPUT TYPE="TEXT" NAME="earliest" WIDTH="80"><BR> ! Latest Date:<INPUT TYPE="TEXT" NAME="latest" WIDTH="80"><BR> ! <INPUT TYPE="HIDDEN" NAME="type" VALUE="replay"> ! <INPUT TYPE="SUBMIT" VALUE="Submit"> ! </FORM> <jsp:include page="../../template/UI-footer.jsp" /> --- 1,10 ---- <jsp:include page="../../template/UI-header.jsp" /> ! <form action="../../replay"> ! URL:<input type="TEXT" name="url" width="80"><br></br> ! Exact Date:<input type="TEXT" name="exactdate" width="80"><br></br> ! Earliest Date:<input type="TEXT" name="startdate" width="80"><br></br> ! Latest Date:<input type="TEXT" name="enddate" width="80"><br></br> ! <input type="HIDDEN" name="type" value="replay"> ! <input type="SUBMIT" value="Submit"> ! </form> <jsp:include page="../../template/UI-footer.jsp" /> --- NEW FILE: ErrorJavascript.jsp --- <%@ page import="org.archive.wayback.exception.WaybackException" %> <% WaybackException e = (WaybackException) request.getAttribute("exception"); %> // Javascript wayback retrieval error: // // Title: <%= (String) e.getTitle() %> // Message: <%= (String) e.getMessage() %> // Details: <%= (String) e.getDetails() %> Index: ErrorResult.jsp =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/ReplayUI/ErrorResult.jsp,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ErrorResult.jsp 20 Oct 2005 00:40:41 -0000 1.2 --- ErrorResult.jsp 16 Nov 2005 03:11:30 -0000 1.3 *************** *** 1,3 **** <jsp:include page="../../template/UI-header.jsp" /> ! <B><%= (String) request.getAttribute("message") %></B> <jsp:include page="../../template/UI-footer.jsp" /> --- 1,12 ---- + <%@ page import="org.archive.wayback.exception.WaybackException" %> <jsp:include page="../../template/UI-header.jsp" /> ! <% ! ! WaybackException e = (WaybackException) request.getAttribute("exception"); ! ! %> ! ! <h2><%= (String) e.getTitle() %></h2> ! <p><b><%= (String) e.getMessage() %></b></p> ! <p><%= (String) e.getDetails() %></p> <jsp:include page="../../template/UI-footer.jsp" /> |
From: Brad <bra...@us...> - 2005-11-16 03:11:38
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/simplequeryui In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/simplequeryui Removed Files: SimpleQueryUI.java UIResults.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- UIResults.java DELETED --- --- SimpleQueryUI.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:37
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/proxy In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/proxy Added Files: ReplayFilter.java ResultURIConverter.java RawReplayRenderer.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- NEW FILE: ResultURIConverter.java --- /* ProxyResultURIConverter * * $Id: ResultURIConverter.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 4:19:21 PM Nov 15, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.proxy; import java.util.Properties; import org.apache.commons.httpclient.URIException; import org.archive.net.UURI; import org.archive.net.UURIFactory; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.SearchResult; import org.archive.wayback.exception.ConfigurationException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class ResultURIConverter implements ReplayResultURIConverter { /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#init(java.util.Properties) */ public void init(Properties p) throws ConfigurationException { } /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#makeReplayURI(org.archive.wayback.core.ResourceResult) */ public String makeReplayURI(SearchResult result) { String finalUrl = result.get(WaybackConstants.RESULT_URL); if(!finalUrl.startsWith("http://")) { finalUrl = "http://" + finalUrl; } return finalUrl; } /** * @return Returns the replayUriPrefix. */ public String getReplayUriPrefix() { return ""; } /* (non-Javadoc) * @see org.archive.wayback.ReplayResultURIConverter#makeRedirectReplayURI(org.archive.wayback.core.SearchResult, java.lang.String) */ public String makeRedirectReplayURI(SearchResult result, String url) { String finalUrl = url; try { UURI origURI = UURIFactory.getInstance(url); if(!origURI.isAbsoluteURI()) { String resultUrl = result.get(WaybackConstants.RESULT_URL); UURI absResultURI = UURIFactory.getInstance(resultUrl); UURI finalURI = absResultURI.resolve(url); finalUrl = finalURI.getEscapedURI(); } } catch (URIException e) { // TODO Auto-generated catch block e.printStackTrace(); } if(!finalUrl.startsWith("http://")) { finalUrl = "http://" + finalUrl; } return finalUrl; } } --- NEW FILE: RawReplayRenderer.java --- /* ReplayRenderer * * $Id: RawReplayRenderer.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 5:50:38 PM Oct 31, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.proxy; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.util.Enumeration; import java.util.Properties; import java.util.regex.Matcher; import java.util.regex.Pattern; import javax.servlet.RequestDispatcher; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.archive.wayback.ReplayRenderer; import org.archive.wayback.ReplayResultURIConverter; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.Resource; import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.ConfigurationException; import org.archive.wayback.exception.WaybackException; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class RawReplayRenderer implements ReplayRenderer { private final static String JSP_PATH = "replayui.jsppath"; private final static String HTTP_LENGTH_HEADER= "Content-Length"; private final static String HTTP_LOCATION_HEADER = "Location"; protected final Pattern IMAGE_REGEX = Pattern .compile(".*\\.(jpg|jpeg|gif|png|bmp|tiff|tif)$"); private String jspPath; private final String ERROR_JSP = "ErrorResult.jsp"; private final String ERROR_JAVASCRIPT = "ErrorJavascript.jsp"; private final String ERROR_IMAGE = "error_image.gif"; public void init(Properties p) throws ConfigurationException { this.jspPath = (String) p.get(JSP_PATH); if (this.jspPath == null || this.jspPath.length() <= 0) { throw new IllegalArgumentException("Failed to find " + JSP_PATH); } } private boolean requestIsEmbedded(HttpServletRequest httpRequest, WaybackRequest wbRequest) { String referer = wbRequest.get(WaybackConstants.REQUEST_REFERER_URL); return (referer != null && referer.length() > 0); } private boolean requestIsImage (HttpServletRequest httpRequest, WaybackRequest wbRequest) { String requestUrl = wbRequest.get(WaybackConstants.REQUEST_URL); Matcher matcher = IMAGE_REGEX.matcher(requestUrl); return (matcher != null && matcher.matches()); } private boolean requestIsJavascript (HttpServletRequest httpRequest, WaybackRequest wbRequest) { String requestUrl = wbRequest.get(WaybackConstants.REQUEST_URL); return requestUrl.endsWith(".js"); } // TODO special handling for Javascript and Images: send empty image // or empty text file to avoid client errors public void renderException(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, WaybackException exception) throws ServletException, IOException { String finalJspPath = jspPath + "/" + ERROR_JSP; // is this object embedded? if(requestIsEmbedded(httpRequest,wbRequest)) { if(requestIsJavascript(httpRequest,wbRequest)) { finalJspPath = jspPath + "/" + ERROR_JAVASCRIPT; } else if(requestIsImage(httpRequest,wbRequest)) { finalJspPath = jspPath + "/" + ERROR_IMAGE; } } httpRequest.setAttribute("exception", exception); RequestDispatcher dispatcher = httpRequest .getRequestDispatcher(finalJspPath); dispatcher.forward(httpRequest, httpResponse); } public void renderResource(HttpServletRequest httpRequest, HttpServletResponse httpResponse, WaybackRequest wbRequest, SearchResult result, Resource resource, ReplayResultURIConverter uriConverter) throws ServletException, IOException { resource.parseHeaders(); copyRecordHttpHeader(httpResponse, resource, uriConverter, result, false); copy(resource, httpResponse.getOutputStream()); } protected void copyRecordHttpHeader(HttpServletResponse response, Resource resource, ReplayResultURIConverter uriConverter, SearchResult result, boolean noLength) throws IOException { Properties headers = resource.getHttpHeaders(); int code = resource.getStatusCode(); // Only return legit status codes -- don't return any minus // codes, etc. if (code <= HttpServletResponse.SC_CONTINUE) { String identifier = ""; response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR, "Bad status code " + code + " (" + identifier + ")."); return; } response.setStatus(code); if (headers != null) { // Copy all headers to the response -- even date and // server, but don't copy Content-Length if arguments indicate for (Enumeration e = headers.keys(); e.hasMoreElements();) { String key = (String) e.nextElement(); String value = (String) headers.get(key); if (noLength) { if (-1 != key.indexOf(HTTP_LENGTH_HEADER)) { continue; } } if(0 == key.indexOf(HTTP_LOCATION_HEADER)) { value = uriConverter.makeRedirectReplayURI(result,value); } response.setHeader(key, (value == null) ? "" : value); } } } protected void copy(InputStream is, OutputStream os) throws IOException { // TODO: Don't allocate everytime. byte[] buffer = new byte[4 * 1024]; for (int r = -1; (r = is.read(buffer, 0, buffer.length)) != -1;) { os.write(buffer, 0, r); } } } --- NEW FILE: ReplayFilter.java --- /* ProxyReplayFilter * * $Id: ReplayFilter.java,v 1.1 2005/11/16 03:11:29 bradtofel Exp $ * * Created on 6:08:59 PM Nov 14, 2005. * * Copyright (C) 2005 Internet Archive. * * This file is part of wayback. * * wayback is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * any later version. * * wayback is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser Public License for more details. * * You should have received a copy of the GNU Lesser Public License * along with wayback; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ package org.archive.wayback.proxy; import java.text.ParseException; import java.util.List; import javax.servlet.FilterConfig; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import org.archive.util.InetAddressUtil; import org.archive.wayback.WaybackConstants; import org.archive.wayback.core.Timestamp; import org.archive.wayback.core.RequestFilter; import org.archive.wayback.core.WaybackRequest; /** * * * @author brad * @version $Date: 2005/11/16 03:11:29 $, $Revision: 1.1 $ */ public class ReplayFilter extends RequestFilter { private List localhostNames = null; public ReplayFilter() { super(); } public void init(final FilterConfig c) throws ServletException { this.localhostNames = InetAddressUtil.getAllLocalHostNames(); super.init(c); } /* (non-Javadoc) * @see org.archive.wayback.core.RequestFilter#parseRequest(javax.servlet.http.HttpServletRequest) */ @Override protected WaybackRequest parseRequest(HttpServletRequest httpRequest) { WaybackRequest wbRequest = null; if(isLocalRequest(httpRequest)) { return wbRequest; } String requestServer = httpRequest.getServerName(); String requestPath = httpRequest.getRequestURI(); //int port = httpRequest.getServerPort(); String requestQuery = httpRequest.getQueryString(); String requestScheme = httpRequest.getScheme(); if (requestQuery != null) { requestPath = requestPath + "?" + requestQuery; } String requestUrl = requestScheme + "://" + requestServer + requestPath; wbRequest = new WaybackRequest(); wbRequest.put(WaybackConstants.REQUEST_URL,requestUrl); wbRequest.put(WaybackConstants.REQUEST_TYPE, WaybackConstants.REQUEST_REPLAY_QUERY); String referer = httpRequest.getHeader("REFERER"); if (referer == null) { referer = ""; } wbRequest.put(WaybackConstants.REQUEST_REFERER_URL,referer); try { wbRequest.put(WaybackConstants.REQUEST_EXACT_DATE, Timestamp.currentTimestamp().getDateStr()); } catch (ParseException e) { // Shouldn't happen... e.printStackTrace(); } return wbRequest; } protected boolean isLocalRequest(HttpServletRequest httpRequest) { return this.localhostNames.contains(httpRequest.getServerName()); } } |
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/arcindexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30992/src/java/org/archive/wayback/arcindexer Removed Files: IndexPipeline.java PipelineFilter.java ArcIndexer.java PipelineStatus.java BDBResourceIndexWriter.java Log Message: Massive overhaul decomposing into three main categories of changes: 1) All internal datatypes are now extensible (currently Properties, but should be Maps) including: a) WaybackRequest(was WBRequest) b) SearchResults (was ResourceResults) c) SearchResult (was ResourceResult) d) Resource so that there is no longer an assumption of Archival URL queries, or "CDX-style" index results. This will put more responsiblility on the UI components to interrogate SearchResults to decide how to render, but should enable extension to data returned from Indexes, as well as allow far more flexibility in queries, predominantly geared towards free-text searching. This is still somewhat clunky, as there are no convenience accessor methods, so all users refer to constants when interacting with them. 2) Major cleanup of servlet and filter interaction with servlet container. ReplayUI and QueryUI are now just plain old servlets, and filters can be optionally added to allow non-CGI argument requests to be coerced into standard WaybackRequest objects. 3) Alternate "Proxy" Replay mode is now functional, and some work has been done towards an alternate Nutch ResourceIndex. Currently the web.xml contains example configurations for both Proxy and Archival Url replay modes, but the Proxy related configurations are commented out. Proxy mode *requires* changing the servlet context to ROOT. ArchivalUrl replay mode works as ROOT context and as any (I think) other context. There are some cosmetic double-slashe issues to work out. --- BDBResourceIndexWriter.java DELETED --- --- ArcIndexer.java DELETED --- --- PipelineStatus.java DELETED --- --- IndexPipeline.java DELETED --- --- PipelineFilter.java DELETED --- |
From: Brad <bra...@us...> - 2005-11-16 03:11:29
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/proxy In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30954/src/java/org/archive/wayback/proxy Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/proxy added to the repository |
From: Brad <bra...@us...> - 2005-11-16 03:11:29
|
Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/archivalurl In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30954/src/java/org/archive/wayback/archivalurl Log Message: Directory /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/archivalurl added to the repository |