You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(10) |
Sep
(36) |
Oct
(339) |
Nov
(103) |
Dec
(152) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(141) |
Feb
(102) |
Mar
(125) |
Apr
(203) |
May
(57) |
Jun
(30) |
Jul
(139) |
Aug
(46) |
Sep
(64) |
Oct
(105) |
Nov
(34) |
Dec
(162) |
| 2007 |
Jan
(81) |
Feb
(57) |
Mar
(141) |
Apr
(72) |
May
(9) |
Jun
(1) |
Jul
(144) |
Aug
(88) |
Sep
(40) |
Oct
(43) |
Nov
(34) |
Dec
(20) |
| 2008 |
Jan
(44) |
Feb
(45) |
Mar
(16) |
Apr
(36) |
May
(8) |
Jun
(77) |
Jul
(177) |
Aug
(66) |
Sep
(8) |
Oct
(33) |
Nov
(13) |
Dec
(37) |
| 2009 |
Jan
(2) |
Feb
(5) |
Mar
(8) |
Apr
|
May
(36) |
Jun
(19) |
Jul
(46) |
Aug
(8) |
Sep
(1) |
Oct
(66) |
Nov
(61) |
Dec
(10) |
| 2010 |
Jan
(13) |
Feb
(16) |
Mar
(38) |
Apr
(76) |
May
(47) |
Jun
(32) |
Jul
(35) |
Aug
(45) |
Sep
(20) |
Oct
(61) |
Nov
(24) |
Dec
(16) |
| 2011 |
Jan
(22) |
Feb
(34) |
Mar
(11) |
Apr
(8) |
May
(24) |
Jun
(23) |
Jul
(11) |
Aug
(42) |
Sep
(81) |
Oct
(48) |
Nov
(21) |
Dec
(20) |
| 2012 |
Jan
(30) |
Feb
(25) |
Mar
(4) |
Apr
(6) |
May
(1) |
Jun
(5) |
Jul
(5) |
Aug
(8) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
|
|
From: <bi...@us...> - 2010-02-12 20:39:58
|
Revision: 2955
http://archive-access.svn.sourceforge.net/archive-access/?rev=2955&view=rev
Author: binzino
Date: 2010-02-12 20:39:44 +0000 (Fri, 12 Feb 2010)
Log Message:
-----------
Fix counting of hits and add use of hitsPerSite argument.
Modified Paths:
--------------
trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java
Modified: trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java 2010-01-29 00:22:22 UTC (rev 2954)
+++ trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java 2010-02-12 20:39:44 UTC (rev 2955)
@@ -431,10 +431,10 @@
try {
final Query query = Query.parse( queryString, conf);
- final Hits hits = bean.search(query, numHits);
+ final Hits hits = bean.search(query, numHits, hitsPerSite);
System.out.println( "Total hits : " + hits.getTotal () );
System.out.println( "Hits length: " + hits.getLength() );
- final int length = (int)Math.min(hits.getTotal(), numHits);
+ final int length = (int)Math.min(hits.getLength(), numHits);
final Hit[] show = hits.getHits(0, length);
final HitDetails[] details = bean.getDetails(show);
final Summary[] summaries = bean.getSummary(details, query);
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-29 00:50:29
|
Revision: 2953
http://archive-access.svn.sourceforge.net/archive-access/?rev=2953&view=rev
Author: binzino
Date: 2010-01-29 00:20:42 +0000 (Fri, 29 Jan 2010)
Log Message:
-----------
Updated to use NutchBean since NutchWaxBean is deprecated. Also fixed bug in NutchBean not observing the -n parameter.
Modified Paths:
--------------
trunk/archive-access/projects/nutchwax/archive/bin/nutchwax
trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java
Modified: trunk/archive-access/projects/nutchwax/archive/bin/nutchwax
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/bin/nutchwax 2010-01-19 22:11:50 UTC (rev 2952)
+++ trunk/archive-access/projects/nutchwax/archive/bin/nutchwax 2010-01-29 00:20:42 UTC (rev 2953)
@@ -80,7 +80,7 @@
;;
search)
shift
- ${NUTCH_HOME}/bin/nutch org.archive.nutchwax.NutchWaxBean "$@"
+ ${NUTCH_HOME}/bin/nutch org.apache.nutch.searcher.NutchBean "$@"
;;
*)
echo ""
Modified: trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java 2010-01-19 22:11:50 UTC (rev 2952)
+++ trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/NutchBean.java 2010-01-29 00:20:42 UTC (rev 2953)
@@ -431,10 +431,10 @@
try {
final Query query = Query.parse( queryString, conf);
- final Hits hits = bean.search(query, 10);
+ final Hits hits = bean.search(query, numHits);
System.out.println( "Total hits : " + hits.getTotal () );
System.out.println( "Hits length: " + hits.getLength() );
- final int length = (int)Math.min(hits.getTotal(), 10);
+ final int length = (int)Math.min(hits.getTotal(), numHits);
final Hit[] show = hits.getHits(0, length);
final HitDetails[] details = bean.getDetails(show);
final Summary[] summaries = bean.getSummary(details, query);
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-29 00:43:10
|
Revision: 2954
http://archive-access.svn.sourceforge.net/archive-access/?rev=2954&view=rev
Author: binzino
Date: 2010-01-29 00:22:22 +0000 (Fri, 29 Jan 2010)
Log Message:
-----------
Added code (from NutchWAX 0.12.x) to handle cases where segment is missing or record is not found in the segment.
Modified Paths:
--------------
trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java
Modified: trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java 2010-01-29 00:20:42 UTC (rev 2953)
+++ trunk/archive-access/projects/nutchwax/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java 2010-01-29 00:22:22 UTC (rev 2954)
@@ -340,10 +340,26 @@
if (this.summarizer == null) { return new Summary(); }
- final Segment segment = getSegment(details);
- final ParseText parseText = segment.getParseText(getKey(details));
- final String text = (parseText != null) ? parseText.getText() : "";
+ String text = "";
+ Segment segment = getSegment(details);
+ if ( segment != null )
+ {
+ try
+ {
+ ParseText parseText = segment.getParseText(getKey(details));
+ text = (parseText != null ) ? parseText.getText() : "";
+ }
+ catch ( Exception e )
+ {
+ LOG.error( "segment = " + segment.segmentDir, e );
+ }
+ }
+ else
+ {
+ LOG.warn( "No segment for: " + details );
+ }
+
return this.summarizer.getSummary(text, query);
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-19 22:11:58
|
Revision: 2952
http://archive-access.svn.sourceforge.net/archive-access/?rev=2952&view=rev
Author: binzino
Date: 2010-01-19 22:11:50 +0000 (Tue, 19 Jan 2010)
Log Message:
-----------
Added property check to disable writing of crawl datum.
Modified Paths:
--------------
tags/nutchwax-0_12_9/archive/src/java/org/archive/nutchwax/Importer.java
Modified: tags/nutchwax-0_12_9/archive/src/java/org/archive/nutchwax/Importer.java
===================================================================
--- tags/nutchwax-0_12_9/archive/src/java/org/archive/nutchwax/Importer.java 2010-01-13 01:31:20 UTC (rev 2951)
+++ tags/nutchwax-0_12_9/archive/src/java/org/archive/nutchwax/Importer.java 2010-01-19 22:11:50 UTC (rev 2952)
@@ -465,7 +465,11 @@
try
{
- output.collect( key, new NutchWritable( datum ) );
+ // Back-port this little change from NutchWAX 0.13. We don't need the crawl datum.
+ if ( jobConf.getBoolean( "nutchwax.import.store.crawl", false ) )
+ {
+ output.collect( key, new NutchWritable( datum ) );
+ }
if ( jobConf.getBoolean( "nutchwax.import.store.content", false ) )
{
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2951
http://archive-access.svn.sourceforge.net/archive-access/?rev=2951&view=rev
Author: bradtofel
Date: 2010-01-13 01:31:20 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
BUGFIX(unreported): ID String was incorrect due to very old copy-paste problem
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveDocumentNotAvailableException.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveDocumentNotAvailableException.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveDocumentNotAvailableException.java 2010-01-13 01:30:27 UTC (rev 2950)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveDocumentNotAvailableException.java 2010-01-13 01:31:20 UTC (rev 2951)
@@ -39,7 +39,7 @@
*
*/
private static final long serialVersionUID = 1L;
- protected static final String ID = "resourceIndexNotAvailable";
+ protected static final String ID = "liveDocumentNotAvailable";
protected static final String defaultMessage = "Live document unavailable";
/**
* Constructor
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2950
http://archive-access.svn.sourceforge.net/archive-access/?rev=2950&view=rev
Author: bradtofel
Date: 2010-01-13 01:30:27 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
INITIAL REV: Exception to indicate unexpected and uncorrectable problems with the the LiveWebCache
Added Paths:
-----------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveWebCacheUnavailableException.java
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveWebCacheUnavailableException.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveWebCacheUnavailableException.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveWebCacheUnavailableException.java 2010-01-13 01:30:27 UTC (rev 2950)
@@ -0,0 +1,79 @@
+/* LiveWebCacheUnavailableException
+ *
+ * $Id$:
+ *
+ * Created on Dec 18, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.exception;
+
+import java.net.URL;
+
+import javax.servlet.http.HttpServletResponse;
+
+/**
+ * @author brad
+ *
+ */
+public class LiveWebCacheUnavailableException extends WaybackException {
+ /**
+ *
+ */
+ private static final long serialVersionUID = 1L;
+ protected static final String ID = "liveWebCacheNotAvailable";
+ protected static final String defaultMessage = "LiveWebCache unavailable";
+ /**
+ * Constructor
+ * @param url
+ * @param code
+ */
+ public LiveWebCacheUnavailableException(URL url, int code) {
+ super("The URL " + url.toString() + " is not available(HTTP " + code +
+ " returned)",defaultMessage);
+ id = ID;
+ }
+ /**
+ * Constructor with message and details
+ * @param url
+ * @param code
+ * @param details
+ */
+ public LiveWebCacheUnavailableException(URL url, int code, String details){
+ super("The URL " + url.toString() + " is not available(HTTP " + code +
+ " returned)",defaultMessage,details);
+ id = ID;
+ }
+ /**
+ * @param url
+ */
+ public LiveWebCacheUnavailableException(String url){
+ super("The URL " + url + " is not available",defaultMessage);
+ id = ID;
+ }
+
+ /**
+ * @return the HTTP status code appropriate to this exception class.
+ */
+ public int getStatus() {
+ return HttpServletResponse.SC_BAD_GATEWAY;
+ }
+
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/exception/LiveWebCacheUnavailableException.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2949
http://archive-access.svn.sourceforge.net/archive-access/?rev=2949&view=rev
Author: bradtofel
Date: 2010-01-13 01:28:52 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
FEATURE: added logging
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java 2010-01-13 01:26:57 UTC (rev 2948)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java 2010-01-13 01:28:52 UTC (rev 2949)
@@ -31,6 +31,7 @@
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
+import java.util.logging.Logger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
@@ -38,6 +39,7 @@
import org.archive.wayback.core.Resource;
import org.archive.wayback.core.CaptureSearchResult;
import org.archive.wayback.exception.LiveDocumentNotAvailableException;
+import org.archive.wayback.exception.LiveWebCacheUnavailableException;
import org.archive.wayback.liveweb.LiveWebCache;
import org.archive.wayback.util.ObjectFilter;
@@ -58,6 +60,8 @@
*/
public class RobotExclusionFilter implements ObjectFilter<CaptureSearchResult> {
+ private final static Logger LOGGER = Logger.getLogger(RobotExclusionFilter.class.getName());
+
private final static String HTTP_PREFIX = "http://";
private final static String ROBOT_SUFFIX = "/robots.txt";
@@ -142,18 +146,28 @@
firstUrlString = urlString;
}
if(rulesCache.containsKey(urlString)) {
+ LOGGER.fine("ROBOT: Cached("+urlString+")");
rules = rulesCache.get(urlString);
} else {
try {
-
+ LOGGER.fine("ROBOT: NotCached("+urlString+")");
+
tmpRules = new RobotRules();
Resource resource = webCache.getCachedResource(new URL(urlString),
maxCacheMS,true);
+ if(resource.getStatusCode() != 200) {
+ LOGGER.info("ROBOT: NotAvailable("+urlString+")");
+ throw new LiveDocumentNotAvailableException(urlString);
+ }
tmpRules.parse(resource);
rulesCache.put(firstUrlString,tmpRules);
rules = tmpRules;
+ LOGGER.info("ROBOT: Downloaded("+urlString+")");
} catch (LiveDocumentNotAvailableException e) {
+ // cache an empty rule: all OK
+// rulesCache.put(firstUrlString, emptyRules);
+// rules = emptyRules;
continue;
} catch (MalformedURLException e) {
e.printStackTrace();
@@ -161,6 +175,9 @@
} catch (IOException e) {
e.printStackTrace();
return null;
+ } catch (LiveWebCacheUnavailableException e) {
+ e.printStackTrace();
+ return null;
}
}
}
@@ -186,6 +203,8 @@
url = new URL(ArchiveUtils.addImpliedHttpIfNecessary(resultURL));
if(!rules.blocksPathForUA(url.getPath(), userAgent)) {
filterResult = ObjectFilter.FILTER_INCLUDE;
+ } else {
+ LOGGER.info("ROBOT: BLOCKED("+resultURL+")");
}
} catch (MalformedURLException e) {
e.printStackTrace();
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2010-01-13 01:27:04
|
Revision: 2948
http://archive-access.svn.sourceforge.net/archive-access/?rev=2948&view=rev
Author: bradtofel
Date: 2010-01-13 01:26:57 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
FEATURE: added proxy host & port configuration methods
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilter.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilterFactory.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilter.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilter.java 2010-01-13 00:26:44 UTC (rev 2947)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilter.java 2010-01-13 01:26:57 UTC (rev 2948)
@@ -51,7 +51,24 @@
* @param accessGroup String group to use with requests to the Oracle
*/
public OracleExclusionFilter(String oracleUrl, String accessGroup) {
+ this(oracleUrl,accessGroup,null);
+ }
+ /**
+ * @param oracleUrl String URL prefix for the Oracle HTTP server
+ * @param accessGroup String group to use with requests to the Oracle
+ * @param proxyHostPort String proxyHost:proxyPort to use for robots.txt
+ */
+ public OracleExclusionFilter(String oracleUrl, String accessGroup,
+ String proxyHostPort) {
client = new AccessControlClient(oracleUrl);
+ if(proxyHostPort != null) {
+ int colonIdx = proxyHostPort.indexOf(':');
+ if(colonIdx > 0) {
+ String host = proxyHostPort.substring(0,colonIdx);
+ int port = Integer.valueOf(proxyHostPort.substring(colonIdx+1));
+ client.setRobotProxy(host, port);
+ }
+ }
this.accessGroup = accessGroup;
}
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilterFactory.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilterFactory.java 2010-01-13 00:26:44 UTC (rev 2947)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/oracleclient/OracleExclusionFilterFactory.java 2010-01-13 01:26:57 UTC (rev 2948)
@@ -38,10 +38,11 @@
private String oracleUrl = null;
private String accessGroup = null;
+ private String proxyHostPort = null;
public ObjectFilter<CaptureSearchResult> get() {
OracleExclusionFilter filter = new OracleExclusionFilter(oracleUrl,
- accessGroup);
+ accessGroup, proxyHostPort);
return filter;
}
@@ -77,4 +78,18 @@
this.accessGroup = accessGroup;
}
+ /**
+ * @return the proxyHostPort
+ */
+ public String getProxyHostPort() {
+ return proxyHostPort;
+ }
+
+ /**
+ * @param proxyHostPort the proxyHostPort to set, ex. "localhost:3128"
+ */
+ public void setProxyHostPort(String proxyHostPort) {
+ this.proxyHostPort = proxyHostPort;
+ }
+
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-13 01:16:00
|
Revision: 2946
http://archive-access.svn.sourceforge.net/archive-access/?rev=2946&view=rev
Author: binzino
Date: 2010-01-13 00:14:06 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
Completed fix for WAX-68.
Modified Paths:
--------------
tags/nutchwax-0_12_9/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java
Modified: tags/nutchwax-0_12_9/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java
===================================================================
--- tags/nutchwax-0_12_9/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java 2010-01-12 22:27:18 UTC (rev 2945)
+++ tags/nutchwax-0_12_9/archive/src/nutch/src/java/org/apache/nutch/searcher/FetchedSegments.java 2010-01-13 00:14:06 UTC (rev 2946)
@@ -233,20 +233,25 @@
}
String version = fields[1];
- if ( ! ( "10".equals( version ) || "12".equals( version ) ) )
+ if ( "10".equals( version ) )
{
- LOG.warn( "Malformed versions line, invalid version ("+version+"): " + version );
- continue;
+ LOG.info( "Version: " + fields[0] + " : " + fields[1] );
+ if ( this.oldFormatSegments == null )
+ {
+ this.oldFormatSegments = new HashSet( );
+ }
+
+ this.oldFormatSegments.add( segment );
}
-
- LOG.info( "Version: " + fields[0] + " : " + fields[1] );
-
- if ( this.oldFormatSegments == null )
+ else if ( "12".equals( version ) )
{
- this.oldFormatSegments = new HashSet( );
+ LOG.info( "Version: " + fields[0] + " : " + fields[1] );
+ // For version 12, nothing to do.
}
-
- this.oldFormatSegments.add( segment );
+ else
+ {
+ LOG.warn( "Malformed versions line, invalid version ("+version+"): " + line );
+ }
}
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-13 00:26:52
|
Revision: 2947
http://archive-access.svn.sourceforge.net/archive-access/?rev=2947&view=rev
Author: binzino
Date: 2010-01-13 00:26:44 +0000 (Wed, 13 Jan 2010)
Log Message:
-----------
Updated for 0.12.9 release.
Modified Paths:
--------------
tags/nutchwax-0_12_9/archive/BUILD-NOTES.txt
tags/nutchwax-0_12_9/archive/HOWTO.txt
tags/nutchwax-0_12_9/archive/INSTALL.txt
tags/nutchwax-0_12_9/archive/RELEASE-NOTES.txt
Modified: tags/nutchwax-0_12_9/archive/BUILD-NOTES.txt
===================================================================
--- tags/nutchwax-0_12_9/archive/BUILD-NOTES.txt 2010-01-13 00:14:06 UTC (rev 2946)
+++ tags/nutchwax-0_12_9/archive/BUILD-NOTES.txt 2010-01-13 00:26:44 UTC (rev 2947)
@@ -1,6 +1,6 @@
BUILD-NOTES.txt
-2009-09-18
+2010-01-13
Aaron Binns
======================================================================
@@ -79,7 +79,7 @@
----------------------------------------------------------------------
The file
- /opt/nutchwax-0.12.8/conf/tika-mimetypes.xml
+ /opt/nutchwax-0.12.9/conf/tika-mimetypes.xml
contains two errors: one where a mimetype is referenced before it is
defined; and a second where a definition has an illegal character.
@@ -110,11 +110,11 @@
You can either apply these patches yourself, or copy an already-patched
copy from:
- /opt/nutchwax-0.12.8/contrib/archive/conf/tika-mimetypes.xml
+ /opt/nutchwax-0.12.9/contrib/archive/conf/tika-mimetypes.xml
to
- /opt/nutchwax-0.12.8/conf/tika-mimetypes.xml
+ /opt/nutchwax-0.12.9/conf/tika-mimetypes.xml
----------------------------------------------------------------------
Modified: tags/nutchwax-0_12_9/archive/HOWTO.txt
===================================================================
--- tags/nutchwax-0_12_9/archive/HOWTO.txt 2010-01-13 00:14:06 UTC (rev 2946)
+++ tags/nutchwax-0_12_9/archive/HOWTO.txt 2010-01-13 00:26:44 UTC (rev 2947)
@@ -1,6 +1,6 @@
HOWTO.txt
-2009-09-18
+2010-01-13
Aaron Binns
Table of Contents
@@ -26,7 +26,7 @@
This HOWTO assumes it is installed in
- /opt/nutchwax-0.12.8
+ /opt/nutchwax-0.12.9
2. ARC/WARC files.
@@ -68,14 +68,10 @@
$ mkdir crawl
$ cd crawl
- $ /opt/nutchwax-0.12.8/bin/nutchwax import ../manifest
- $ /opt/nutchwax-0.12.8/bin/nutch updatedb crawldb -dir segments
- $ /opt/nutchwax-0.12.8/bin/nutch invertlinks linkdb -dir segments
- $ /opt/nutchwax-0.12.8/bin/nutch index indexes crawldb linkdb segments/*
+ $ /opt/nutchwax-0.12.9/bin/nutchwax import ../manifest
+ $ /opt/nutchwax-0.12.9/bin/nutchwax index indexes segments/*
$ ls -F1
- crawldb/
indexes/
- linkdb/
segments/
To those already familiar with Nutch, these steps should be quite
@@ -96,7 +92,7 @@
$ cd ../
$ ls -F1
crawl/
- $ /opt/nutchwax-0.12.8/bin/nutchwax search computer
+ $ /opt/nutchwax-0.12.9/bin/nutchwax search computer
This calls the NutchWaxBean to execute a simple keyword search for
"computer". Use whatever query term you think appears in the
@@ -109,7 +105,7 @@
The Nutch(WAX) web application is bundled with NutchWAX as
- /opt/nutchwax-0.12.8/nutch-1.0-dev.war
+ /opt/nutchwax-0.12.9/nutch-1.0-dev.war
Simply deploy that web application in the same fashion as with
Nutch.
Modified: tags/nutchwax-0_12_9/archive/INSTALL.txt
===================================================================
--- tags/nutchwax-0_12_9/archive/INSTALL.txt 2010-01-13 00:14:06 UTC (rev 2946)
+++ tags/nutchwax-0_12_9/archive/INSTALL.txt 2010-01-13 00:26:44 UTC (rev 2947)
@@ -1,6 +1,6 @@
INSTALL.txt
-2009-09-18
+2010-01-13
Aaron Binns
Table of Contents
@@ -63,10 +63,10 @@
------------------
As mentioned above, NutchWAX 0.12 is built against Nutch-1.0-dev.
Although the Nutch project released 1.0 in early 2009, there were so
-many changes that NutchWAX 0.12.8 is still built against pre-1.0
+many changes that NutchWAX 0.12.9 is still built against pre-1.0
codebase.
-The specific SVN revision that NutchWAX 0.12.8 is built against is:
+The specific SVN revision that NutchWAX 0.12.9 is built against is:
701524
@@ -81,14 +81,14 @@
SVN: NutchWAX
-------------
-Once you have Nutch-1.0-dev checked-out, check-out NutchWAX 0.12.8
+Once you have Nutch-1.0-dev checked-out, check-out NutchWAX 0.12.9
source into Nutch's "contrib" directory.
$ cd contrib
$ svn checkout http://archive-access.svn.sourceforge.net/svnroot/archive-access/tags/nutchwax-0_12_8/archive
This will create a sub-directory named "archive" containing the
-NutchWAX 0.12.8 sources.
+NutchWAX 0.12.9 sources.
Build and install
-----------------
@@ -115,7 +115,7 @@
$ cd /opt
$ tar xvfz nutch-1.0-dev.tar.gz
- $ mv nutch-1.0-dev nutchwax-0.12.8
+ $ mv nutch-1.0-dev nutchwax-0.12.9
======================================================================
@@ -128,24 +128,24 @@
Install it simply by untarring it, for example:
$ cd /opt
- $ tar xvfz nutchwax-0.12.8.tar.gz
+ $ tar xvfz nutchwax-0.12.9.tar.gz
======================================================================
Install start-up scripts
======================================================================
-NutchWAX 0.12.8 comes with a Unix init.d script which can be used to
+NutchWAX 0.12.9 comes with a Unix init.d script which can be used to
automatically start the searcher slaves for a multi-node search
configuration.
Assuming you installed NutchWAX as
- /opt/nutchwax-0.12.8
+ /opt/nutchwax-0.12.9
the script is found at
- /opt/nutchwax-0.12.8/contrib/archive/etc/init.d/searcher-slave
+ /opt/nutchwax-0.12.9/contrib/archive/etc/init.d/searcher-slave
This script can be placed in /etc/init.d then added to the list of
startup scripts to run at bootup by using commands appropriate to your
Modified: tags/nutchwax-0_12_9/archive/RELEASE-NOTES.txt
===================================================================
--- tags/nutchwax-0_12_9/archive/RELEASE-NOTES.txt 2010-01-13 00:14:06 UTC (rev 2946)
+++ tags/nutchwax-0_12_9/archive/RELEASE-NOTES.txt 2010-01-13 00:26:44 UTC (rev 2947)
@@ -1,70 +1,50 @@
RELEASE-NOTES.TXT
-2009-09-18
+2010-01-13
Aaron Binns
-Release notes for NutchWAX 0.12.8
+Release notes for NutchWAX 0.12.9
For the most recent updates and information on NutchWAX,
please visit the project wiki at:
- http://webteam.archive.org/confluence/display/search/NutchWAX
+ http://webarchive.jira.com/wiki/display/search/NutchWAX
-
======================================================================
Overview
======================================================================
-The main enhancement in NutchWAX 0.12.8 is the ability to configure
-HTTP headers to support caching.
+The main enhancement in NutchWAX 0.12.9 is the ability to search
+indexes created with NutchWAX 0.10.
-The Archive is starting to use Squid to cache the HTTP responses from
-NutchWAX and some explicit HTTP response headers were needed to enable
-this. Rather than relying on the servlet container (Tomcat/Jetty) to
-add the response headers, we added a servlet filter to NutchWAX.
+In the segments directory, create a "versions" file and in it
+list the names of the segments and their version, e.g.
-Right now the filter is very basic, in the web.xml file we now have
+ foo-segment 10
+ bar-segment 12
- <filter>
- <filter-name>Cache Settings</filter-name>
- <filter-class>org.archive.nutchwax.CacheSettingsFilter</filter-class>
- <init-param>
- <param-name>max-age</param-name>
- <param-value>259200</param-value> <!-- 72 hours (in seconds) -->
- </init-param>
- </filter>
+where the version number is either 10 or 12. If a segment is not
+listed in the "versions" file, it is assumed to be version 12.
- <filter-mapping>
- <filter-name>Cache Settings</filter-name>
- <servlet-name>OpenSearch</servlet-name>
- </filter-mapping>
+Also, a minor, but convenient enhancement is to no longer require the
+crawldb and linkdb to be present at index time. Neither one of these
+are actually used for indexing and the fact that they were required to
+be given to the index step was a legecy of Nutch. Now, there is a
+NutchWAX 'index' command which only requires the segment(s) to be
+present.
-which configures the filter to add a 'max-age' header with a 72 hour
-limit. This filter is then applied to all instances of the OpenSearch
-servlet.
-
-This allows browsers to cache the OpenSearch response for up to 72
-hours. It also enables any proxies between the browser and server to
-cache the response as well. With the addition of Squid into our
-deployment, we let Squid serve cached responses to repeat queries.
-
-Since our deployment updates every 4 days, a 72-hour expiration works
-well.
-
======================================================================
Issues
======================================================================
For an up-to-date list of NutchWAX issues:
- http://webteam.archive.org/jira/browse/WAX
+ http://webarchive.jira.com/browse/WAX
Issues resolved in this release:
-WAX-61 Change mime-type of OpenSearch XML response from text/xml to
- application/xml.
+WAX-66 Index documents without crawldb nor linkdb.
-WAX-62 Add ability to configure HTTP headers to support caching.
+WAX-67 Nutch OpenOffice parser does not pass along metadata.
-WAX-63 LengthNormUpdater returning error code if no fields in index
- have norms is inconvenient.
+WAX-68 Compatibility with {index+segment}s created by NutchWAX 0.10.
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2010-01-12 22:27:24
|
Revision: 2945
http://archive-access.svn.sourceforge.net/archive-access/?rev=2945&view=rev
Author: bradtofel
Date: 2010-01-12 22:27:18 +0000 (Tue, 12 Jan 2010)
Log Message:
-----------
FEATURE: Added identity flag to incoming requests - the intention being to allow clients to explicitly request a raw copy of archived docs.
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlRequestParser.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/requestparser/ReplayRequestParser.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/core/WaybackRequest.java
Added Paths:
-----------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/selector/IdentityRequestSelector.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlRequestParser.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlRequestParser.java 2010-01-12 22:24:04 UTC (rev 2944)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlRequestParser.java 2010-01-12 22:27:18 UTC (rev 2945)
@@ -59,6 +59,10 @@
*/
public final static String IMG_CONTEXT = "im";
/**
+ * raw/identity context
+ */
+ public final static String IDENTITY_CONTEXT = "id";
+ /**
* Charset detection strategy context - should be followed by an integer
* indicating which strategy to use
*/
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/requestparser/ReplayRequestParser.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/requestparser/ReplayRequestParser.java 2010-01-12 22:24:04 UTC (rev 2944)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/requestparser/ReplayRequestParser.java 2010-01-12 22:27:18 UTC (rev 2945)
@@ -124,6 +124,8 @@
wbRequest.setJSContext(true);
} else if(flag.equals(ArchivalUrlRequestParser.IMG_CONTEXT)) {
wbRequest.setIMGContext(true);
+ } else if(flag.equals(ArchivalUrlRequestParser.IDENTITY_CONTEXT)) {
+ wbRequest.setIdentityContext(true);
} else if(flag.startsWith(ArchivalUrlRequestParser.CHARSET_MODE)) {
String modeString = flag.substring(
ArchivalUrlRequestParser.CHARSET_MODE.length());
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/core/WaybackRequest.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/core/WaybackRequest.java 2010-01-12 22:24:04 UTC (rev 2944)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/core/WaybackRequest.java 2010-01-12 22:27:18 UTC (rev 2945)
@@ -262,6 +262,12 @@
public static final String REQUEST_IMAGE_CONTEXT = "imagecontext";
/**
+ * Request: Identity context requested (totally transparent)
+ */
+ public static final String REQUEST_IDENTITY_CONTEXT = "identitycontext";
+
+
+ /**
* Request: Charset detection mode
*/
public static final String REQUEST_CHARSET_MODE = "charsetmode";
@@ -488,6 +494,7 @@
this.exclusionFilter = exclusionFilter;
}
+ @Deprecated
public ObjectFilter<CaptureSearchResult> getResultFilters() {
ObjectFilterChain<CaptureSearchResult> tmpFilters =
new ObjectFilterChain<CaptureSearchResult>();
@@ -772,6 +779,13 @@
return getBoolean(REQUEST_IMAGE_CONTEXT);
}
+ public void setIdentityContext(boolean isIdentityContext) {
+ setBoolean(REQUEST_IDENTITY_CONTEXT,isIdentityContext);
+ }
+ public boolean isIdentityContext() {
+ return getBoolean(REQUEST_IDENTITY_CONTEXT);
+ }
+
public void setCharsetMode(int mode) {
setInt(REQUEST_CHARSET_MODE,mode);
}
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/selector/IdentityRequestSelector.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/selector/IdentityRequestSelector.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/selector/IdentityRequestSelector.java 2010-01-12 22:27:18 UTC (rev 2945)
@@ -0,0 +1,48 @@
+/* IdentityRequestSelector
+ *
+ * $Id$:
+ *
+ * Created on Dec 17, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.replay.selector;
+
+import org.archive.wayback.core.CaptureSearchResult;
+import org.archive.wayback.core.Resource;
+import org.archive.wayback.core.WaybackRequest;
+
+/**
+ * @author brad
+ *
+ */
+public class IdentityRequestSelector extends BaseReplayRendererSelector {
+
+ /* (non-Javadoc)
+ * @see org.archive.wayback.replay.selector.BaseReplayRendererSelector#canHandle(org.archive.wayback.core.WaybackRequest, org.archive.wayback.core.CaptureSearchResult, org.archive.wayback.core.Resource)
+ */
+ @Override
+ public boolean canHandle(WaybackRequest wbRequest,
+ CaptureSearchResult result, Resource resource) {
+ return wbRequest.isIdentityContext();
+ }
+
+
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/selector/IdentityRequestSelector.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2010-01-12 22:24:11
|
Revision: 2944
http://archive-access.svn.sourceforge.net/archive-access/?rev=2944&view=rev
Author: bradtofel
Date: 2010-01-12 22:24:04 +0000 (Tue, 12 Jan 2010)
Log Message:
-----------
FEATURE: added copyStream() methods to drain bytes from an InputStream to an OutputStream
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/ByteOp.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/ByteOp.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/ByteOp.java 2010-01-12 22:17:44 UTC (rev 2943)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/ByteOp.java 2010-01-12 22:24:04 UTC (rev 2944)
@@ -24,7 +24,13 @@
*/
package org.archive.wayback.util;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
public class ByteOp {
+ public final static int BUFFER_SIZE = 4096;
+
public static byte[] copy(byte[] src, int offset, int length) {
byte[] copy = new byte[length];
System.arraycopy(src, offset, copy, 0, length);
@@ -41,4 +47,28 @@
}
return true;
}
+ /**
+ * Write all bytes from is to os. Does not close either stream.
+ * @param is to copy bytes from
+ * @param os to copy bytes to
+ * @throws IOException for usual reasons
+ */
+ public static void copyStream(InputStream is, OutputStream os)
+ throws IOException {
+ copyStream(is,os,BUFFER_SIZE);
+ }
+ /**
+ * Write all bytes from is to os. Does not close either stream.
+ * @param is to copy bytes from
+ * @param os to copy bytes to
+ * @param size number of bytes to buffer between read and write operations
+ * @throws IOException for usual reasons
+ */
+ public static void copyStream(InputStream is, OutputStream os, int size)
+ throws IOException {
+ byte[] buffer = new byte[size];
+ for (int r = -1; (r = is.read(buffer, 0, size)) != -1;) {
+ os.write(buffer, 0, r);
+ }
+ }
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bi...@us...> - 2010-01-12 22:17:50
|
Revision: 2943
http://archive-access.svn.sourceforge.net/archive-access/?rev=2943&view=rev
Author: binzino
Date: 2010-01-12 22:17:44 +0000 (Tue, 12 Jan 2010)
Log Message:
-----------
WAX-69. Comment out code that writes crawl_data.
Modified Paths:
--------------
trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/Importer.java
Modified: trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/Importer.java
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/Importer.java 2010-01-11 21:46:57 UTC (rev 2942)
+++ trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/Importer.java 2010-01-12 22:17:44 UTC (rev 2943)
@@ -467,7 +467,14 @@
try
{
- output.collect( key, new NutchWritable( datum ) );
+ // Some weird problem with Hadoop 0.19.x - when the crawl_data
+ // is merged during the reduce step, the classloader cannot
+ // find the org.apache.nutch.protocol.ProtocolStatus class.
+ //
+ // We avoid the whole issue by omitting the crawl_data all
+ // together, which we don't use anyways.
+ //
+ // output.collect( key, new NutchWritable( datum ) );
if ( jobConf.getBoolean( "nutchwax.import.store.content", false ) )
{
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2010-01-11 21:47:07
|
Revision: 2942
http://archive-access.svn.sourceforge.net/archive-access/?rev=2942&view=rev
Author: bradtofel
Date: 2010-01-11 21:46:57 +0000 (Mon, 11 Jan 2010)
Log Message:
-----------
Adding explicit error handler so stack traces aren't exposed to users.
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/web.xml
Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/web.xml
===================================================================
--- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/web.xml 2009-12-22 05:15:56 UTC (rev 2941)
+++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/web.xml 2010-01-11 21:46:57 UTC (rev 2942)
@@ -52,4 +52,11 @@
<realm-name>Secured-Wayback</realm-name>
</login-config>
-->
+
+
+ <error-page>
+ <exception-type>java.lang.Exception</exception-type>
+ <location>/WEB-INF/exception/HTMLError.jsp</location>
+ </error-page>
+
</web-app>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-22 05:16:06
|
Revision: 2941
http://archive-access.svn.sourceforge.net/archive-access/?rev=2941&view=rev
Author: bradtofel
Date: 2009-12-22 05:15:56 +0000 (Tue, 22 Dec 2009)
Log Message:
-----------
Sending File not String to ArchiveReaderFactory.get() methods
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/ArcIndexer.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/WarcIndexer.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/ArcIndexer.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/ArcIndexer.java 2009-12-18 00:34:47 UTC (rev 2940)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/ArcIndexer.java 2009-12-22 05:15:56 UTC (rev 2941)
@@ -69,7 +69,12 @@
*/
public CloseableIterator<CaptureSearchResult> iterator(String pathOrUrl)
throws IOException {
- return iterator(ARCReaderFactory.get(pathOrUrl));
+ File f = new File(pathOrUrl);
+ if(f.isFile()) {
+ return iterator(ARCReaderFactory.get(f));
+ } else {
+ return iterator(ARCReaderFactory.get(pathOrUrl));
+ }
}
/**
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/WarcIndexer.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/WarcIndexer.java 2009-12-18 00:34:47 UTC (rev 2940)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/WarcIndexer.java 2009-12-22 05:15:56 UTC (rev 2941)
@@ -71,7 +71,12 @@
*/
public CloseableIterator<CaptureSearchResult> iterator(String pathOrUrl)
throws IOException {
- return iterator(WARCReaderFactory.get(pathOrUrl));
+ File f = new File(pathOrUrl);
+ if(f.isFile()) {
+ return iterator(WARCReaderFactory.get(f));
+ } else {
+ return iterator(WARCReaderFactory.get(pathOrUrl));
+ }
}
/**
* @param arc
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2940
http://archive-access.svn.sourceforge.net/archive-access/?rev=2940&view=rev
Author: bradtofel
Date: 2009-12-18 00:34:47 +0000 (Fri, 18 Dec 2009)
Log Message:
-----------
BUGFIX(ACC-89): now explicitly declare valid hostname characters, to try to ensure a legitimate match..
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/transformer/JSStringTransformer.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/transformer/JSStringTransformer.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/transformer/JSStringTransformer.java 2009-12-18 00:33:13 UTC (rev 2939)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/transformer/JSStringTransformer.java 2009-12-18 00:34:47 UTC (rev 2940)
@@ -38,7 +38,7 @@
*/
public class JSStringTransformer implements StringTransformer {
private final static Pattern httpPattern = Pattern
- .compile("(http://[^/]*/)");
+ .compile("(http://[A-Za-z0-9:_@.-]+)");
public String transform(ReplayParseContext context, String input) {
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2939
http://archive-access.svn.sourceforge.net/archive-access/?rev=2939&view=rev
Author: bradtofel
Date: 2009-12-18 00:33:13 +0000 (Fri, 18 Dec 2009)
Log Message:
-----------
BUGFIX(ACC-89): now explicitly declare valid hostname characters, to try to ensure a legitimate match..
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlJSReplayRenderer.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlJSReplayRenderer.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlJSReplayRenderer.java 2009-12-09 06:50:07 UTC (rev 2938)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlJSReplayRenderer.java 2009-12-18 00:33:13 UTC (rev 2939)
@@ -63,7 +63,7 @@
}
private final static Pattern httpPattern = Pattern
- .compile("(http://[^/]*/)");
+ .compile("(http://[A-Za-z0-9:_@.-]+)");
protected void updatePage(TextDocument page,
HttpServletRequest httpRequest, HttpServletResponse httpResponse,
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-09 06:50:16
|
Revision: 2938
http://archive-access.svn.sourceforge.net/archive-access/?rev=2938&view=rev
Author: bradtofel
Date: 2009-12-09 06:50:07 +0000 (Wed, 09 Dec 2009)
Log Message:
-----------
INITIAL REV: SearchResultSource composed of a series of alphabetically partitioned ziplined CDX files.
Added Paths:
-----------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/StringPrefixIterator.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinedBlock.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesChunkIterator.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSource.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSourceTest.java
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/StringPrefixIterator.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/StringPrefixIterator.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/StringPrefixIterator.java 2009-12-09 06:50:07 UTC (rev 2938)
@@ -0,0 +1,90 @@
+/* StringPrefixIterator
+ *
+ * $Id$:
+ *
+ * Created on Nov 23, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.resourceindex.ziplines;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+import org.archive.wayback.util.CloseableIterator;
+
+/**
+ * @author brad
+ *
+ */
+public class StringPrefixIterator implements CloseableIterator<String> {
+ private String prefix = null;
+ Iterator<String> inner = null;
+ private String cachedNext = null;
+ private boolean done = false;
+ public StringPrefixIterator(Iterator<String> inner, String prefix) {
+ this.prefix = prefix;
+ this.inner = inner;
+ }
+ /* (non-Javadoc)
+ * @see java.util.Iterator#hasNext()
+ */
+ public boolean hasNext() {
+ if(done) return false;
+ if(cachedNext != null) {
+ return true;
+ }
+ while(inner.hasNext()) {
+ String tmp = inner.next();
+ if(tmp.startsWith(prefix)) {
+ cachedNext = tmp;
+ return true;
+ } else if(tmp.compareTo(prefix) > 0) {
+ done = true;
+ return false;
+ }
+ }
+ return false;
+ }
+ /* (non-Javadoc)
+ * @see java.util.Iterator#next()
+ */
+ public String next() {
+ String tmp = cachedNext;
+ cachedNext = null;
+ return tmp;
+ }
+ /* (non-Javadoc)
+ * @see java.util.Iterator#remove()
+ */
+ public void remove() {
+ // TODO Auto-generated method stub
+
+ }
+ /* (non-Javadoc)
+ * @see java.io.Closeable#close()
+ */
+ public void close() throws IOException {
+ if(inner instanceof CloseableIterator) {
+ CloseableIterator<String> toBeClosed = (CloseableIterator<String>) inner;
+ toBeClosed.close();
+ }
+ }
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/StringPrefixIterator.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinedBlock.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinedBlock.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinedBlock.java 2009-12-09 06:50:07 UTC (rev 2938)
@@ -0,0 +1,68 @@
+/* ZiplinedBlock
+ *
+ * $Id$:
+ *
+ * Created on Nov 23, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.resourceindex.ziplines;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.net.URL;
+import java.net.URLConnection;
+import java.util.zip.GZIPInputStream;
+
+/**
+ * @author brad
+ *
+ */
+public class ZiplinedBlock {
+ String urlOrPath = null;
+ long offset = -1;
+ public final static int BLOCK_SIZE = 128 * 1024;
+ private final static String RANGE_HEADER = "Range";
+ private final static String BYTES_HEADER = "bytes=";
+ private final static String BYTES_MINUS = "-";
+ /**
+ * @param urlOrPath URL where this file can be downloaded
+ * @param offset start of 128K block boundary.
+ */
+ public ZiplinedBlock(String urlOrPath, long offset) {
+ this.urlOrPath = urlOrPath;
+ this.offset = offset;
+ }
+ /**
+ * @return a BufferedReader of the underlying compressed data in this block
+ * @throws IOException for usual reasons
+ */
+ public BufferedReader readBlock() throws IOException {
+ URL u = new URL(urlOrPath);
+ URLConnection uc = u.openConnection();
+ StringBuilder sb = new StringBuilder(16);
+ sb.append(BYTES_HEADER).append(offset).append(BYTES_MINUS);
+ sb.append((offset + BLOCK_SIZE)-1);
+ uc.setRequestProperty(RANGE_HEADER, sb.toString());
+ return new BufferedReader(new InputStreamReader(
+ new GZIPInputStream(uc.getInputStream())));
+ }
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinedBlock.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesChunkIterator.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesChunkIterator.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesChunkIterator.java 2009-12-09 06:50:07 UTC (rev 2938)
@@ -0,0 +1,151 @@
+/* ZiplinesChunkIterator
+ *
+ * $Id$:
+ *
+ * Created on Nov 23, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.resourceindex.ziplines;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.RandomAccessFile;
+import java.util.Iterator;
+import java.util.List;
+import java.util.RandomAccess;
+import java.util.zip.GZIPInputStream;
+
+import org.archive.wayback.util.CloseableIterator;
+
+/**
+ * @author brad
+ *
+ */
+public class ZiplinesChunkIterator implements CloseableIterator<String> {
+ private BufferedReader br = null;
+ private Iterator<ZiplinedBlock> blockItr = null;
+ private String cachedNext = null;
+ /**
+ * @param blocks which should be fetched and unzipped, one after another
+ */
+ public ZiplinesChunkIterator(List<ZiplinedBlock> blocks) {
+ blockItr = blocks.iterator();
+ }
+ /* (non-Javadoc)
+ * @see java.util.Iterator#hasNext()
+ */
+ public boolean hasNext() {
+ if(cachedNext != null) {
+ return true;
+ }
+ while(cachedNext == null) {
+ if(br != null) {
+ // attempt to read the next line from this:
+ try {
+ cachedNext = br.readLine();
+ if(cachedNext == null) {
+ br = null;
+ // next loop:
+ } else {
+ return true;
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ br = null;
+ }
+ } else {
+ // do we have more blocks to use?
+ if(blockItr.hasNext()) {
+ try {
+ br = blockItr.next().readBlock();
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ } else {
+ return false;
+ }
+ }
+ }
+
+ return false;
+ }
+
+ /* (non-Javadoc)
+ * @see java.util.Iterator#next()
+ */
+ public String next() {
+ String tmp = cachedNext;
+ cachedNext = null;
+ return tmp;
+ }
+
+ /* (non-Javadoc)
+ * @see java.util.Iterator#remove()
+ */
+ public void remove() {
+ throw new UnsupportedOperationException();
+ }
+
+ /* (non-Javadoc)
+ * @see java.io.Closeable#close()
+ */
+ public void close() throws IOException {
+ if(br != null) {
+ br.close();
+ }
+ }
+ public static void main(String[] args) {
+ if(args.length != 1) {
+ System.err.println("Usage: ZIPLINES_PATH");
+ System.exit(1);
+ }
+ File f = new File(args[0]);
+ long size = f.length();
+ long numBlocks = (long) (size / ZiplinedBlock.BLOCK_SIZE);
+ long size2 = numBlocks * ZiplinedBlock.BLOCK_SIZE;
+ if(size != size2) {
+ System.err.println("File size of " + args[0] + " is not a mulitple"
+ + " of " + ZiplinedBlock.BLOCK_SIZE);
+ }
+ try {
+ RandomAccessFile raf = new RandomAccessFile(f, "r");
+ for(int i = 0; i < numBlocks; i++) {
+ long offset = i * ZiplinedBlock.BLOCK_SIZE;
+ raf.seek(offset);
+ BufferedReader br = new BufferedReader(new InputStreamReader(
+ new GZIPInputStream(new FileInputStream(raf.getFD()))));
+ String line = br.readLine();
+ if(line == null) {
+ System.err.println("Bad block at " + offset + " in " + args[0]);
+ System.exit(1);
+ }
+ System.out.println(args[0] + " " + offset + " " + line);
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ System.exit(1);
+ }
+ }
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesChunkIterator.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSource.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSource.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSource.java 2009-12-09 06:50:07 UTC (rev 2938)
@@ -0,0 +1,218 @@
+/* ZiplinesSearchResultSource
+ *
+ * $Id$:
+ *
+ * Created on Nov 23, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.resourceindex.ziplines;
+
+import it.unimi.dsi.mg4j.util.FrontCodedStringList;
+
+import java.io.BufferedReader;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+
+import org.archive.wayback.core.CaptureSearchResult;
+import org.archive.wayback.exception.ResourceIndexNotAvailableException;
+import org.archive.wayback.resourceindex.SearchResultSource;
+import org.archive.wayback.resourceindex.cdx.CDXFormatToSearchResultAdapter;
+import org.archive.wayback.resourceindex.cdx.format.CDXFormat;
+import org.archive.wayback.resourceindex.cdx.format.CDXFormatException;
+import org.archive.wayback.util.AdaptedIterator;
+import org.archive.wayback.util.CloseableIterator;
+import org.archive.wayback.util.flatfile.FlatFile;
+
+/**
+ * A set of Ziplines files, which are CDX files specially compressed into a
+ * series of GZipMembers such that:
+ *
+ * 1) each member is exactly 128K, padded using a GZip comment header
+ * 2) each member contains complete lines: no line spans two GZip members
+ *
+ * If the data put into these files is sorted, then the data within the files
+ * can be uncompressed when needed, minimizing the total data to be uncompressed
+ *
+ * This SearchResultSource assumes a set of alphabetically partitioned Ziplined
+ * CDX files, so that each file is sorted, and no regions overlap.
+ *
+ * This class takes 2 files as input:
+ * 1) a specially constructed map of the first N bytes of data from each GZip
+ * member, and the filename and offset of that GZip member.
+ * 2) a mapping of filenames to URLs
+ *
+ * Data from #1 is actually stored in a serialized
+ *
+ *
+ *
+ * @author brad
+ *
+ */
+public class ZiplinesSearchResultSource implements SearchResultSource {
+
+ /**
+ * Local path containing map of URL,TIMESTAMP,CHUNK,OFFSET for each 128K chunk
+ */
+ private String chunkIndexPath = null;
+ private FlatFile chunkIndex = null;
+ /**
+ * Local path containing URL for each CHUNK
+ */
+ private String chunkMapPath = null;
+ private HashMap<String,String> chunkMap = null;
+ private CDXFormat format = null;
+
+ public ZiplinesSearchResultSource() {
+ }
+ public ZiplinesSearchResultSource(CDXFormat format) {
+ this.format = format;
+ }
+ public void init() throws IOException {
+ chunkMap = new HashMap<String, String>();
+ FlatFile ff = new FlatFile(chunkMapPath);
+ Iterator<String> lines = ff.getSequentialIterator();
+ while(lines.hasNext()) {
+ String line = lines.next();
+ String[] parts = line.split("\\s");
+ if(parts.length != 2) {
+ throw new IOException("Bad line(" + line +") in (" +
+ chunkMapPath + ")");
+ }
+ chunkMap.put(parts[0],parts[1]);
+ }
+ chunkIndex = new FlatFile(chunkIndexPath);
+ }
+ protected CloseableIterator<CaptureSearchResult> adaptIterator(Iterator<String> itr)
+ throws IOException {
+ return new AdaptedIterator<String,CaptureSearchResult>(itr,
+ new CDXFormatToSearchResultAdapter(format));
+ }
+
+ /* (non-Javadoc)
+ * @see org.archive.wayback.resourceindex.SearchResultSource#cleanup(org.archive.wayback.util.CloseableIterator)
+ */
+ public void cleanup(CloseableIterator<CaptureSearchResult> c)
+ throws IOException {
+ c.close();
+ }
+
+ /* (non-Javadoc)
+ * @see org.archive.wayback.resourceindex.SearchResultSource#getPrefixIterator(java.lang.String)
+ */
+ public CloseableIterator<CaptureSearchResult> getPrefixIterator(
+ String prefix) throws ResourceIndexNotAvailableException {
+ try {
+ return adaptIterator(getStringPrefixIterator(prefix));
+ } catch (IOException e) {
+ throw new ResourceIndexNotAvailableException(e.getMessage());
+ }
+ }
+
+ public Iterator<String> getStringPrefixIterator(String prefix) throws ResourceIndexNotAvailableException, IOException {
+ Iterator<String> itr = chunkIndex.getRecordIteratorLT(prefix);
+ ArrayList<ZiplinedBlock> blocks = new ArrayList<ZiplinedBlock>();
+ boolean first = true;
+ while(itr.hasNext()) {
+ String blockDescriptor = itr.next();
+ String parts[] = blockDescriptor.split("\t");
+ if(parts.length != 3) {
+ throw new ResourceIndexNotAvailableException("Bad line(" +
+ blockDescriptor + ")");
+ }
+ // only compare the correct length:
+ String prefCmp = prefix;
+ String blockCmp = parts[0];
+// if(prefCmp.length() < blockCmp.length()) {
+// blockCmp = blockCmp.substring(0,prefCmp.length());
+// } else {
+// prefCmp = prefCmp.substring(0,blockCmp.length());
+// }
+ if(first) {
+ // always add first:
+ first = false;
+// } else if(blockCmp.compareTo(prefCmp) > 0) {
+ } else if(!blockCmp.startsWith(prefCmp)) {
+ // all done;
+ break;
+ }
+ // add this and keep lookin...
+ String url = chunkMap.get(parts[1]);
+ long offset = Long.parseLong(parts[2]);
+ blocks.add(new ZiplinedBlock(url, offset));
+ }
+ return new StringPrefixIterator(new ZiplinesChunkIterator(blocks),prefix);
+ }
+
+ /* (non-Javadoc)
+ * @see org.archive.wayback.resourceindex.SearchResultSource#getPrefixReverseIterator(java.lang.String)
+ */
+ public CloseableIterator<CaptureSearchResult> getPrefixReverseIterator(
+ String prefix) throws ResourceIndexNotAvailableException {
+ throw new ResourceIndexNotAvailableException("unsupported op");
+ }
+
+ /* (non-Javadoc)
+ * @see org.archive.wayback.resourceindex.SearchResultSource#shutdown()
+ */
+ public void shutdown() throws IOException {
+ // no-op..
+ }
+ /**
+ * @return the format
+ */
+ public CDXFormat getFormat() {
+ return format;
+ }
+ /**
+ * @param format the format to set
+ */
+ public void setFormat(CDXFormat format) {
+ this.format = format;
+ }
+ /**
+ * @return the chunkIndexPath
+ */
+ public String getChunkIndexPath() {
+ return chunkIndexPath;
+ }
+ /**
+ * @param chunkIndexPath the chunkIndexPath to set
+ */
+ public void setChunkIndexPath(String chunkIndexPath) {
+ this.chunkIndexPath = chunkIndexPath;
+ }
+ /**
+ * @return the chunkMapPath
+ */
+ public String getChunkMapPath() {
+ return chunkMapPath;
+ }
+ /**
+ * @param chunkMapPath the chunkMapPath to set
+ */
+ public void setChunkMapPath(String chunkMapPath) {
+ this.chunkMapPath = chunkMapPath;
+ }
+
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSource.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSourceTest.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSourceTest.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSourceTest.java 2009-12-09 06:50:07 UTC (rev 2938)
@@ -0,0 +1,64 @@
+/* ZiplinesSearchResultSourceTest
+ *
+ * $Id$:
+ *
+ * Created on Nov 23, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.resourceindex.ziplines;
+
+import java.util.Iterator;
+
+import org.archive.wayback.resourceindex.cdx.format.CDXFormat;
+import org.archive.wayback.resourceindex.cdx.format.CDXFormatException;
+
+import junit.framework.TestCase;
+
+/**
+ * @author brad
+ *
+ */
+public class ZiplinesSearchResultSourceTest extends TestCase {
+
+ /**
+ * Test method for {@link org.archive.wayback.resourceindex.ziplines.ZiplinesSearchResultSource#getPrefixIterator(java.lang.String)}.
+ * @throws CDXFormatException
+ */
+ public void testGetPrefixIterator() throws Exception {
+ CDXFormat format = new CDXFormat(" CDX N b a m s k r M V g");
+ ZiplinesSearchResultSource zsrs = new ZiplinesSearchResultSource(format);
+// zsrs.setChunkIndexPath("/home/brad/zipline-test/part-00005-frag.cdx.zlm");
+// zsrs.setChunkMapPath("/home/brad/zipline-test/manifest.txt");
+ zsrs.setChunkIndexPath("/home/brad/ALL.summary");
+ zsrs.setChunkMapPath("/home/brad/ALL.loc");
+ zsrs.init();
+ Iterator<String> i = zsrs.getStringPrefixIterator("krunch.com/ ");
+ int max = 100;
+ int done = 0;
+ while(i.hasNext()) {
+ System.out.println(i.next());
+ if(done++ > max) {
+ break;
+ }
+ }
+ }
+
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/ziplines/ZiplinesSearchResultSourceTest.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-09 06:48:35
|
Revision: 2937
http://archive-access.svn.sourceforge.net/archive-access/?rev=2937&view=rev
Author: bradtofel
Date: 2009-12-09 06:48:28 +0000 (Wed, 09 Dec 2009)
Log Message:
-----------
FEATURE: added method to allow searching for greatest line <= search term
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/flatfile/FlatFile.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/flatfile/FlatFile.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/flatfile/FlatFile.java 2009-12-09 06:47:35 UTC (rev 2936)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/util/flatfile/FlatFile.java 2009-12-09 06:48:28 UTC (rev 2937)
@@ -126,6 +126,39 @@
fh.seek(min);
return min;
}
+ public long findKeyOffsetLT(RandomAccessFile fh, String key) throws IOException {
+ int blockSize = 8192;
+ long fileSize = fh.length();
+ long min = 0;
+ long max = (long) fileSize / blockSize;
+ long mid;
+ String line;
+ while (max - min > 1) {
+ mid = min + (long)((max - min) / 2);
+ fh.seek(mid * blockSize);
+ if(mid > 0) line = fh.readLine(); // probably a partial line
+ line = fh.readLine();
+ if (key.compareTo(line) > 0) {
+ min = mid;
+ } else {
+ max = mid;
+ }
+ }
+ // find the right line
+ min = min * blockSize;
+ fh.seek(min);
+ if(min > 0) line = fh.readLine();
+ long last = min;
+ while(true) {
+ min = fh.getFilePointer();
+ line = fh.readLine();
+ if(line == null) break;
+ if(line.compareTo(key) >= 0) break;
+ last = min;
+ }
+ fh.seek(last);
+ return last;
+ }
/**
* @return Returns the lastMatchOffset.
*/
@@ -157,6 +190,16 @@
return itr;
}
+ public Iterator<String> getRecordIteratorLT(final String prefix) throws IOException {
+ RecordIterator itr = null;
+ RandomAccessFile raf = new RandomAccessFile(file,"r");
+ long offset = findKeyOffsetLT(raf,prefix);
+ lastMatchOffset = offset;
+ BufferedReader br = new BufferedReader(new FileReader(raf.getFD()));
+ itr = new RecordIterator(br);
+ return itr;
+ }
+
/**
*
* @param prefix
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-09 06:47:47
|
Revision: 2936
http://archive-access.svn.sourceforge.net/archive-access/?rev=2936&view=rev
Author: bradtofel
Date: 2009-12-09 06:47:35 +0000 (Wed, 09 Dec 2009)
Log Message:
-----------
Hackery to get live web caching
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/URLCacher.java
Added Paths:
-----------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/ARCCachingProxy.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/FileRegion.java
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/ARCCachingProxy.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/ARCCachingProxy.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/ARCCachingProxy.java 2009-12-09 06:47:35 UTC (rev 2936)
@@ -0,0 +1,157 @@
+/* ARCCachingProxy
+ *
+ * $Id$:
+ *
+ * Created on Dec 8, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.liveweb;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.io.RandomAccessFile;
+import java.net.URL;
+
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+import org.apache.log4j.Logger;
+import org.archive.io.arc.ARCLocation;
+import org.archive.io.arc.ARCRecord;
+import org.archive.wayback.core.CaptureSearchResult;
+import org.archive.wayback.core.Resource;
+import org.archive.wayback.exception.LiveDocumentNotAvailableException;
+import org.archive.wayback.resourcestore.resourcefile.ArcResource;
+import org.archive.wayback.webapp.ServletRequestContext;
+
+/**
+ * @author brad
+ *
+ */
+public class ARCCachingProxy extends ServletRequestContext {
+
+ private final static String EXPIRES_HEADER = "Expires";
+
+ private final static String ARC_RECORD_CONTENT_TYPE = "application/x-arc-record";
+ private static final Logger LOGGER = Logger.getLogger(
+ ARCCachingProxy.class.getName());
+ private ARCCacheDirectory arcCacheDir = null;
+ private URLCacher cacher = null;
+ private long expiresMS = 60 * 60 * 1000;
+ /* (non-Javadoc)
+ * @see org.archive.wayback.webapp.ServletRequestContext#handleRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
+ */
+ @Override
+ public boolean handleRequest(HttpServletRequest httpRequest,
+ HttpServletResponse httpResponse) throws ServletException,
+ IOException {
+
+ StringBuffer sb = httpRequest.getRequestURL();
+ String query = httpRequest.getQueryString();
+ if(query != null) {
+ sb.append("?").append(query);
+ }
+ URL url = new URL(sb.toString());
+ FileRegion r = null;
+ try {
+ r = getLiveResource(url);
+ httpResponse.setStatus(httpResponse.SC_OK);
+ httpResponse.setContentLength((int)r.getLength());
+ httpResponse.setContentType(ARC_RECORD_CONTENT_TYPE);
+ httpResponse.setDateHeader("Expires", System.currentTimeMillis() + expiresMS);
+ r.copyToOutputStream(httpResponse.getOutputStream());
+
+ } catch (LiveDocumentNotAvailableException e) {
+
+ e.printStackTrace();
+ httpResponse.sendError(httpResponse.SC_NOT_FOUND);
+ }
+// httpResponse.setContentType("text/plain");
+// PrintWriter pw = httpResponse.getWriter();
+// pw.println("PathInfo:" + httpRequest.getPathInfo());
+// pw.println("RequestURI:" + httpRequest.getRequestURI());
+// pw.println("RequestURL:" + httpRequest.getRequestURL());
+// pw.println("QueryString:" + httpRequest.getQueryString());
+// pw.println("PathTranslated:" + httpRequest.getPathTranslated());
+// pw.println("ServletPath:" + httpRequest.getServletPath());
+// pw.println("ContextPath:" + httpRequest.getContextPath());
+// if(r != null) {
+// pw.println("CachePath:" + r.file.getAbsolutePath());
+// pw.println("CacheStart:" + r.start);
+// pw.println("CacheEnd:" + r.end);
+// } else {
+// pw.println("FAILED CACHE!");
+// }
+
+ return true;
+ }
+
+
+ private FileRegion getLiveResource(URL url)
+ throws LiveDocumentNotAvailableException, IOException {
+
+ Resource resource = null;
+
+ LOGGER.info("Caching URL(" + url.toString() + ")");
+ FileRegion region = cacher.cache2(arcCacheDir, url.toString());
+ if(region != null) {
+ LOGGER.info("Cached URL(" + url.toString() + ") in " +
+ "ARC(" + region.file.getAbsolutePath() + ") at ("
+ + region.start + " - " + region.end + ")");
+
+ } else {
+ throw new IOException("No location!");
+ }
+
+ return region;
+}
+
+ /**
+ * @return the arcCacheDir
+ */
+ public ARCCacheDirectory getArcCacheDir() {
+ return arcCacheDir;
+ }
+
+ /**
+ * @param arcCacheDir the arcCacheDir to set
+ */
+ public void setArcCacheDir(ARCCacheDirectory arcCacheDir) {
+ this.arcCacheDir = arcCacheDir;
+ }
+
+ /**
+ * @return the cacher
+ */
+ public URLCacher getCacher() {
+ return cacher;
+ }
+
+ /**
+ * @param cacher the cacher to set
+ */
+ public void setCacher(URLCacher cacher) {
+ this.cacher = cacher;
+ }
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/ARCCachingProxy.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Added: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/FileRegion.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/FileRegion.java (rev 0)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/FileRegion.java 2009-12-09 06:47:35 UTC (rev 2936)
@@ -0,0 +1,62 @@
+/* FileRegion
+ *
+ * $Id$:
+ *
+ * Created on Dec 8, 2009.
+ *
+ * Copyright (C) 2006 Internet Archive.
+ *
+ * This file is part of Wayback.
+ *
+ * Wayback is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License, or
+ * any later version.
+ *
+ * Wayback is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser Public License
+ * along with Wayback; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+package org.archive.wayback.liveweb;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.RandomAccessFile;
+
+/**
+ * @author brad
+ *
+ */
+public class FileRegion {
+ File file = null;
+ long start = -1;
+ long end = -1;
+ public long getLength() {
+ return end - start;
+ }
+ public void copyToOutputStream(OutputStream o) throws IOException {
+ long left = end - start;
+ int BUFF_SIZE = 4096;
+ byte buf[] = new byte[BUFF_SIZE];
+ RandomAccessFile raf = new RandomAccessFile(file, "r");
+ raf.seek(start);
+ while(left > 0) {
+ int amtToRead = (int) Math.min(left, BUFF_SIZE);
+ int amtRead = raf.read(buf, 0, amtToRead);
+ if(amtRead < 0) {
+ throw new IOException("Not enough to read! EOF before expected region end");
+ }
+ o.write(buf,0,amtRead);
+ left -= amtRead;
+ }
+ raf.close();
+ }
+
+}
Property changes on: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/FileRegion.java
___________________________________________________________________
Added: svn:keywords
+ Author Date Revision Id
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/URLCacher.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/URLCacher.java 2009-12-01 23:21:59 UTC (rev 2935)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/liveweb/URLCacher.java 2009-12-09 06:47:35 UTC (rev 2936)
@@ -156,21 +156,52 @@
writer.write(url,mime,ip,captureDate.getTime(),len,fis);
writer.checkSize();
-// long newSize = writer.getPosition();
-// long oSize = writer.getFile().length();
+ long newSize = writer.getPosition();
+ long oSize = writer.getFile().length();
+ final long arcEndOffset = oSize;
LOGGER.info("Wrote " + url + " at " + arcPath + ":" + arcOffset);
+ LOGGER.info("NewSize:" + newSize + " oSize: " + oSize);
fis.close();
return new ARCLocation() {
private String filename = arcPath;
private long offset = arcOffset;
+ private long endOffset = arcEndOffset;
public String getName() { return this.filename; }
-
public long getOffset() { return this.offset; }
+ public long getEndOffset() { return this.endOffset; }
+
};
}
+ private FileRegion storeFile2(File file, ARCWriter writer, String url,
+ ExtendedGetMethod method) throws IOException {
+
+ FileInputStream fis = new FileInputStream(file);
+ int len = (int) file.length();
+ String mime = method.getMime();
+ String ip = method.getRemoteIP();
+ Date captureDate = method.getCaptureDate();
+
+ writer.checkSize();
+ final long arcOffset = writer.getPosition();
+ final String arcPath = writer.getFile().getAbsolutePath();
+ writer.write(url,mime,ip,captureDate.getTime(),len,fis);
+ writer.checkSize();
+ long newSize = writer.getPosition();
+ long oSize = writer.getFile().length();
+ final long arcEndOffset = oSize;
+ LOGGER.info("Wrote " + url + " at " + arcPath + ":" + arcOffset);
+ LOGGER.info("NewSize:" + newSize + " oSize: " + oSize);
+ fis.close();
+ FileRegion fr = new FileRegion();
+ fr.file = writer.getFile();
+ fr.start = arcOffset;
+ fr.end = oSize;
+ return fr;
+ }
+
/**
* Retrieve urlString, and store using ARCWriter, returning
* ARCLocation where the document was stored.
@@ -219,7 +250,44 @@
}
return location;
}
+ public FileRegion cache2(ARCCacheDirectory cache, String urlString)
+ throws LiveDocumentNotAvailableException, IOException, URIException {
+ // localize URL
+ File tmpFile = getTmpFile();
+ ExtendedGetMethod method;
+ try {
+ method = urlToFile(urlString,tmpFile);
+ } catch (LiveDocumentNotAvailableException e) {
+ LOGGER.info("Attempted to get " + urlString + " failed...");
+ tmpFile.delete();
+ throw e;
+ } catch (URIException e) {
+ tmpFile.delete();
+ throw e;
+ } catch (IOException e) {
+ tmpFile.delete();
+ throw e;
+ }
+
+ // store URL
+ FileRegion region = null;
+ ARCWriter writer = null;
+ try {
+ writer = cache.getWriter();
+ region = storeFile2(tmpFile, writer, urlString, method);
+ } catch(IOException e) {
+ e.printStackTrace();
+ throw e;
+ } finally {
+ if(writer != null) {
+ cache.returnWriter(writer);
+ }
+ tmpFile.delete();
+ }
+ return region;
+}
+
/**
* @param args
*/
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2935
http://archive-access.svn.sourceforge.net/archive-access/?rev=2935&view=rev
Author: bradtofel
Date: 2009-12-01 23:21:59 +0000 (Tue, 01 Dec 2009)
Log Message:
-----------
IOException(Exception) constructor not available in java 5.
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/cdx/CDXFormatIndex.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/cdx/CDXFormatIndex.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/cdx/CDXFormatIndex.java 2009-12-01 23:19:20 UTC (rev 2934)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/cdx/CDXFormatIndex.java 2009-12-01 23:21:59 UTC (rev 2935)
@@ -58,7 +58,7 @@
try {
cdx = new CDXFormat(CDX_HEADER_MAGIC);
} catch (CDXFormatException e1) {
- throw new IOException(e1);
+ throw new IOException(e1.getMessage());
}
}
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-01 23:19:29
|
Revision: 2934
http://archive-access.svn.sourceforge.net/archive-access/?rev=2934&view=rev
Author: bradtofel
Date: 2009-12-01 23:19:20 +0000 (Tue, 01 Dec 2009)
Log Message:
-----------
IOException(Exception) constructor not available in java 5.
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlSAXRewriteReplayRenderer.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/AfterBodyStartTagJSPExecRule.java
trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/BeforeBodyEndTagJSPExecRule.java
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlSAXRewriteReplayRenderer.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlSAXRewriteReplayRenderer.java 2009-12-01 20:54:41 UTC (rev 2933)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/ArchivalUrlSAXRewriteReplayRenderer.java 2009-12-01 23:19:20 UTC (rev 2934)
@@ -100,7 +100,8 @@
url = new URL(result.getOriginalUrl());
} catch (MalformedURLException e1) {
// TODO: this shouldn't happen...
- throw new IOException(e1);
+ e1.printStackTrace();
+ throw new IOException(e1.getMessage());
}
// To make sure we get the length, we have to buffer it all up...
@@ -132,7 +133,7 @@
delegator.handleParseComplete(context);
} catch (ParserException e) {
e.printStackTrace();
- throw new IOException(e);
+ throw new IOException(e.getMessage());
}
// At this point, baos contains the utf-8 encoded bytes of our result:
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/AfterBodyStartTagJSPExecRule.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/AfterBodyStartTagJSPExecRule.java 2009-12-01 20:54:41 UTC (rev 2933)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/AfterBodyStartTagJSPExecRule.java 2009-12-01 23:19:20 UTC (rev 2934)
@@ -79,7 +79,8 @@
try {
super.emit(context, node);
} catch (ServletException e) {
- throw new IOException(e);
+ e.printStackTrace();
+ throw new IOException(e.getMessage());
}
}
}
Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/BeforeBodyEndTagJSPExecRule.java
===================================================================
--- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/BeforeBodyEndTagJSPExecRule.java 2009-12-01 20:54:41 UTC (rev 2933)
+++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/html/rules/BeforeBodyEndTagJSPExecRule.java 2009-12-01 23:19:20 UTC (rev 2934)
@@ -54,7 +54,8 @@
try {
super.emit(context, node);
} catch (ServletException e) {
- throw new IOException(e);
+ e.printStackTrace();
+ throw new IOException(e.getMessage());
}
}
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-01 20:54:53
|
Revision: 2933
http://archive-access.svn.sourceforge.net/archive-access/?rev=2933&view=rev
Author: bradtofel
Date: 2009-12-01 20:54:41 +0000 (Tue, 01 Dec 2009)
Log Message:
-----------
BUGFIX: was mis-referencing jsBlockHandler
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ArchivalUrlSaxReplay.xml
Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ArchivalUrlSaxReplay.xml
===================================================================
--- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ArchivalUrlSaxReplay.xml 2009-12-01 20:51:38 UTC (rev 2932)
+++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/ArchivalUrlSaxReplay.xml 2009-12-01 20:54:41 UTC (rev 2933)
@@ -160,7 +160,7 @@
</bean>
<bean class="org.archive.wayback.replay.html.rules.AttributeModifyingRule">
<property name="modifyAttributeName" value="ONCLICK" />
- <property name="transformer" ref="jsBlockRewriter" />
+ <property name="transformer" ref="jsBlockHandler" />
</bean>
<bean class="org.archive.wayback.replay.html.rules.AttributeModifyingRule">
<property name="modifyAttributeName" value="style" />
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <bra...@us...> - 2009-12-01 20:51:48
|
Revision: 2932
http://archive-access.svn.sourceforge.net/archive-access/?rev=2932&view=rev
Author: bradtofel
Date: 2009-12-01 20:51:38 +0000 (Tue, 01 Dec 2009)
Log Message:
-----------
Patch to allow dev testing of Wayback under jetty:
Modified Paths:
--------------
trunk/archive-access/projects/wayback/wayback-webapp/pom.xml
Modified: trunk/archive-access/projects/wayback/wayback-webapp/pom.xml
===================================================================
--- trunk/archive-access/projects/wayback/wayback-webapp/pom.xml 2009-11-20 05:50:23 UTC (rev 2931)
+++ trunk/archive-access/projects/wayback/wayback-webapp/pom.xml 2009-12-01 20:51:38 UTC (rev 2932)
@@ -22,6 +22,11 @@
</archive>
</configuration>
</plugin>
+ <plugin>
+ <groupId>org.mortbay.jetty</groupId>
+ <artifactId>maven-jetty-plugin</artifactId>
+ <version>6.1.22</version>
+ </plugin>
</plugins>
</build>
<dependencies>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
Revision: 2931
http://archive-access.svn.sourceforge.net/archive-access/?rev=2931&view=rev
Author: alexoz
Date: 2009-11-20 05:50:23 +0000 (Fri, 20 Nov 2009)
Log Message:
-----------
BUGFIX: Don't NPE when there's no default rule, instead explain the problem.
Modified Paths:
--------------
trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java
Modified: trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java
===================================================================
--- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java 2009-11-12 22:22:32 UTC (rev 2930)
+++ trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java 2009-11-20 05:50:23 UTC (rev 2931)
@@ -59,6 +59,11 @@
throw new RobotsUnavailableException(e);
}
}
+ if (rule == null) {
+ throw new RuntimeException("No applicable rule found."
+ + "Please make sure you have a default rule set"
+ + " on the root SURT '(' in the oracle.");
+ }
return rule.getPolicy();
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|