You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(10) |
Sep
(36) |
Oct
(339) |
Nov
(103) |
Dec
(152) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(141) |
Feb
(102) |
Mar
(125) |
Apr
(203) |
May
(57) |
Jun
(30) |
Jul
(139) |
Aug
(46) |
Sep
(64) |
Oct
(105) |
Nov
(34) |
Dec
(162) |
2007 |
Jan
(81) |
Feb
(57) |
Mar
(141) |
Apr
(72) |
May
(9) |
Jun
(1) |
Jul
(144) |
Aug
(88) |
Sep
(40) |
Oct
(43) |
Nov
(34) |
Dec
(20) |
2008 |
Jan
(44) |
Feb
(45) |
Mar
(16) |
Apr
(36) |
May
(8) |
Jun
(77) |
Jul
(177) |
Aug
(66) |
Sep
(8) |
Oct
(33) |
Nov
(13) |
Dec
(37) |
2009 |
Jan
(2) |
Feb
(5) |
Mar
(8) |
Apr
|
May
(36) |
Jun
(19) |
Jul
(46) |
Aug
(8) |
Sep
(1) |
Oct
(66) |
Nov
(61) |
Dec
(10) |
2010 |
Jan
(13) |
Feb
(16) |
Mar
(38) |
Apr
(76) |
May
(47) |
Jun
(32) |
Jul
(35) |
Aug
(45) |
Sep
(20) |
Oct
(61) |
Nov
(24) |
Dec
(16) |
2011 |
Jan
(22) |
Feb
(34) |
Mar
(11) |
Apr
(8) |
May
(24) |
Jun
(23) |
Jul
(11) |
Aug
(42) |
Sep
(81) |
Oct
(48) |
Nov
(21) |
Dec
(20) |
2012 |
Jan
(30) |
Feb
(25) |
Mar
(4) |
Apr
(6) |
May
(1) |
Jun
(5) |
Jul
(5) |
Aug
(8) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
|
From: <nl...@ar...> - 2011-08-03 00:08:28
|
Wayback-1 - Build # 27 - Successful: Check console output at https://builds.archive.org:1443/job/Wayback-1/27/ to view the results. |
From: <nl...@ar...> - 2011-08-03 00:04:12
|
Access-Control - Build # 29 - Successful: Check console output at https://builds.archive.org:1443/job/Access-Control/29/ to view the results. |
Revision: 3493 http://archive-access.svn.sourceforge.net/archive-access/?rev=3493&view=rev Author: binzino Date: 2011-08-02 23:23:25 +0000 (Tue, 02 Aug 2011) Log Message: ----------- Fix ARI-2784: Add "utf-8" encoding to pdftotext invocation, as well as when reading the output into Java. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/plugin/parse-pdf2/src/java/org/archive/nutchwax/parse/pdf/PDFParser.java Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/plugin/parse-pdf2/src/java/org/archive/nutchwax/parse/pdf/PDFParser.java =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/plugin/parse-pdf2/src/java/org/archive/nutchwax/parse/pdf/PDFParser.java 2011-08-02 22:40:10 UTC (rev 3492) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/plugin/parse-pdf2/src/java/org/archive/nutchwax/parse/pdf/PDFParser.java 2011-08-02 23:23:25 UTC (rev 3493) @@ -83,12 +83,12 @@ String exepath = this.conf.get( "org.archive.nutchwax.parse.pdf.pdftotext.path", "/usr/bin/pdftotext" ); // Now create a Process to call 'pdftotext' to extract the metadata. - ProcessBuilder pb = new ProcessBuilder( exepath, "-q", "-nopgbrk", "-htmlmeta", "-f", "1", "-l", "1", tmpfile.toString(), "-" ); + ProcessBuilder pb = new ProcessBuilder( exepath, "-q", "-nopgbrk", "-enc", "UTF-8", "-htmlmeta", "-f", "1", "-l", "1", tmpfile.toString(), "-" ); Process p = pb.start(); p.getOutputStream( ).close(); - String head = suck( new InputStreamReader( p.getInputStream( ) ) ); + String head = suck( new InputStreamReader( p.getInputStream( ), "utf-8" ) ); byte[] err = suck( p.getErrorStream( ) ); if ( err.length > 0 ) @@ -98,11 +98,11 @@ p.destroy( ); - pb = new ProcessBuilder( exepath, "-q", "-nopgbrk", tmpfile.toString(), "-" ); + pb = new ProcessBuilder( exepath, "-q", "-nopgbrk", "-enc", "UTF-8", tmpfile.toString(), "-" ); p = pb.start( ); p.getOutputStream( ).close( ); - text = suck( new InputStreamReader( p.getInputStream( ) ) ); + text = suck( new InputStreamReader( p.getInputStream( ), "utf-8" ) ); err = suck( p.getErrorStream( ) ); if ( err.length > 0 ) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ikr...@us...> - 2011-08-02 22:40:17
|
Revision: 3492 http://archive-access.svn.sourceforge.net/archive-access/?rev=3492&view=rev Author: ikreymer Date: 2011-08-02 22:40:10 +0000 (Tue, 02 Aug 2011) Log Message: ----------- ACCESS CONTROL: RuleSet rule matching gives preference to rules that match by accessgroup over rules that apply to all access groups. -Commented out debug comment/extra response read Modified Paths: -------------- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/HttpRuleDao.java trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/RuleSet.java Modified: trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/HttpRuleDao.java =================================================================== --- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/HttpRuleDao.java 2011-08-01 21:49:16 UTC (rev 3491) +++ trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/HttpRuleDao.java 2011-08-02 22:40:10 UTC (rev 3492) @@ -14,7 +14,7 @@ /** * The HTTP Rule Data Access Object enables a rule database to be queried via - * the REST interface\xCAan oracle. + * the REST interface. * * For details of the protocol, see: * http://webteam.archive.org/confluence/display/wayback/Exclusions+API @@ -44,8 +44,8 @@ try { http.executeMethod(method); - String response = method.getResponseBodyAsString(); - System.out.println(response); +// String response = method.getResponseBodyAsString(); +// System.out.println(response); rules = (RuleSet) xstream.fromXML(method.getResponseBodyAsStream()); } catch (IOException e) { throw new RuleOracleUnavailableException(e); Modified: trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/RuleSet.java =================================================================== --- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/RuleSet.java 2011-08-01 21:49:16 UTC (rev 3491) +++ trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/RuleSet.java 2011-08-02 22:40:10 UTC (rev 3492) @@ -68,18 +68,28 @@ Date retrievalDate, String who) { NewSurtTokenizer tok = new NewSurtTokenizer(surt); + + // Best general rule (when accessGroup is blank) + Rule ruleGeneral = null; for (String key: tok.getSearchList()) { - Iterable<Rule> rules = rulemap.get(key); + Iterable<Rule> rules = rulemap.get(key); if (rules != null) { for (Rule rule : rules) { if (rule.matches(surt, captureDate, retrievalDate, who)) { - return rule; + // Return this if accessGroup (who) matches exactly + if ((who != null) && who.equals(rule.getWho())) { + return rule; + // otherwise, store the first/best one + } else if (ruleGeneral == null) { + ruleGeneral = rule; + } } } } } - return null; + + return ruleGeneral; } public void addAll(Iterable<Rule> rules) { This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <nl...@ar...> - 2011-08-02 01:46:34
|
Wayback-1 - Build # 26 - Successful: Check console output at https://builds.archive.org:1443/job/Wayback-1/26/ to view the results. |
From: <nl...@ar...> - 2011-08-02 01:42:26
|
Access-Control - Build # 28 - Successful: Check console output at https://builds.archive.org:1443/job/Access-Control/28/ to view the results. |
From: <bi...@us...> - 2011-08-01 21:49:22
|
Revision: 3491 http://archive-access.svn.sourceforge.net/archive-access/?rev=3491&view=rev Author: binzino Date: 2011-08-01 21:49:16 +0000 (Mon, 01 Aug 2011) Log Message: ----------- Changed maxDate to be exclusive. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java 2011-08-01 21:16:54 UTC (rev 3490) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java 2011-08-01 21:49:16 UTC (rev 3491) @@ -81,12 +81,8 @@ recordsStream = new FileInputStream( recordsFile ); } - System.out.println( "this.conf: " + this.getConf() ); - String filterSpecs = this.getConf().get( "nutchwax.filter.dates.allow" ); - System.out.println( "filterSpecs: " + filterSpecs ); - if ( filterSpecs != null ) { String spec = filterSpecs.trim(); @@ -104,16 +100,24 @@ break; case 2: minDate = Long.parseLong( values[0] + "00000000000000".substring( values[0].length() ) ); - maxDate = Long.parseLong( values[1] + "99999999999999".substring( values[1].length() ) ); + maxDate = Long.parseLong( values[1] + "00000000000000".substring( values[1].length() ) ); break; default: - LOG.warn( "Illegal format for nutchwax.filter.dates.allow: " + values ); + LOG.error( "Illegal format for nutchwax.filter.dates.allow: " + values ); + return 1; } } catch ( NumberFormatException nfe ) { - LOG.warn( "Illegal format for nutchwax.filter.dates.allow: " + values, nfe ); + LOG.error( "Illegal format for nutchwax.filter.dates.allow: " + values, nfe ); + return 1; } + + if ( minDate >= maxDate ) + { + LOG.error( "Min date must be before max date for nutchwax.filter.dates.allow: " + minDate + ", " + maxDate ); + return 1; + } } LOG.info( "Allowing dates in range: " + minDate + "-" + maxDate ); @@ -233,14 +237,14 @@ { long d = Long.parseLong( date ); - if ( minDate <= d && d <= maxDate ) + if ( minDate <= d && d < maxDate ) { - LOG.info( "Include date: " + date ); + LOG.debug( "Include date: " + date ); return true; } else { - LOG.info( "Exclude date: " + date ); + LOG.debug( "Exclude date: " + date ); return false; } } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2011-08-01 21:17:00
|
Revision: 3490 http://archive-access.svn.sourceforge.net/archive-access/?rev=3490&view=rev Author: binzino Date: 2011-08-01 21:16:54 +0000 (Mon, 01 Aug 2011) Log Message: ----------- Add date filtering and removal of documents w/o any dates. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java Added Paths: ----------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateCleaner.java Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java 2011-07-08 22:13:00 UTC (rev 3489) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateAdder.java 2011-08-01 21:16:54 UTC (rev 3490) @@ -23,6 +23,9 @@ import java.io.*; import java.util.*; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; @@ -48,12 +51,20 @@ */ public class DateAdder extends Configured implements Tool { + public static final Log LOG = LogFactory.getLog( DateAdder.class ); + + private Configuration conf; + + long minDate = 0; + long maxDate = 99999999999999L; + public int run( String[] args ) throws Exception { if ( args.length < 4 ) { System.out.println( "DateAdder <key-index> <source1> ... <sourceN> <dest> <records>" ); - System.exit( 0 ); + + return 0; } String mainIndexDir = args[0].trim(); @@ -70,6 +81,43 @@ recordsStream = new FileInputStream( recordsFile ); } + System.out.println( "this.conf: " + this.getConf() ); + + String filterSpecs = this.getConf().get( "nutchwax.filter.dates.allow" ); + + System.out.println( "filterSpecs: " + filterSpecs ); + + if ( filterSpecs != null ) + { + String spec = filterSpecs.trim(); + + String values[] = spec.split( "[-]" ); + + try + { + switch ( values.length ) + { + case 0: + break; + case 1: + minDate = Long.parseLong( values[0] + "00000000000000".substring( values[0].length() ) ); + break; + case 2: + minDate = Long.parseLong( values[0] + "00000000000000".substring( values[0].length() ) ); + maxDate = Long.parseLong( values[1] + "99999999999999".substring( values[1].length() ) ); + break; + default: + LOG.warn( "Illegal format for nutchwax.filter.dates.allow: " + values ); + } + } + catch ( NumberFormatException nfe ) + { + LOG.warn( "Illegal format for nutchwax.filter.dates.allow: " + values, nfe ); + } + } + + LOG.info( "Allowing dates in range: " + minDate + "-" + maxDate ); + // Read date-addition records from stdin. Map<String,String> dateRecords = new HashMap<String,String>( ); BufferedReader br = new BufferedReader( new InputStreamReader( recordsStream, "UTF-8" ) ); @@ -98,7 +146,7 @@ } - IndexReader reader = IndexReader.open( new NIOFSDirectory( new File( mainIndexDir ) ), true ); + IndexReader reader = IndexReader.open( new NIOFSDirectory( new File( mainIndexDir ) ), false ); IndexReader sourceReaders[] = new IndexReader[args.length-3]; for ( int i = 0 ; i < sourceReaders.length ; i++ ) @@ -129,6 +177,8 @@ } for ( String date : uniqueDates ) { + if ( ! allowDate( date ) ) continue ; + newDoc.add( new Field( NutchWax.DATE_KEY, date, Field.Store.YES, Field.Index.NO ) ); newDoc.add( new Field( NutchWax.DATE_KEY, date.substring( 0, 4 ), Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS ) ); newDoc.add( new Field( NutchWax.DATE_KEY, date.substring( 0, 6 ), Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS ) ); @@ -159,6 +209,8 @@ { for ( String date : newDates.split("\\s+") ) { + if ( ! allowDate( date ) ) continue ; + newDoc.add( new Field( NutchWax.DATE_KEY, date, Field.Store.YES, Field.Index.NO ) ); newDoc.add( new Field( NutchWax.DATE_KEY, date.substring( 0, 4 ), Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS ) ); newDoc.add( new Field( NutchWax.DATE_KEY, date.substring( 0, 6 ), Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS ) ); @@ -175,6 +227,30 @@ return 0; } + private boolean allowDate( String date ) + { + try + { + long d = Long.parseLong( date ); + + if ( minDate <= d && d <= maxDate ) + { + LOG.info( "Include date: " + date ); + return true; + } + else + { + LOG.info( "Exclude date: " + date ); + return false; + } + } + catch ( NumberFormatException nfe ) + { + LOG.warn( "Invalid date: " + date ); + return false; + } + } + /** * Utility function to instantiate a UrlCanonicalizer based on an * implementation specified in the configuration. Added: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateCleaner.java =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateCleaner.java (rev 0) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/java/org/archive/nutchwax/tools/DateCleaner.java 2011-08-01 21:16:54 UTC (rev 3490) @@ -0,0 +1,92 @@ +/* + * Copyright (C) 2008 Internet Archive. + * + * This file is part of the archive-access tools project + * (http://sourceforge.net/projects/archive-access). + * + * The archive-access tools are free software; you can redistribute them and/or + * modify them under the terms of the GNU Lesser Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or any + * later version. + * + * The archive-access tools are distributed in the hope that they will be + * useful, but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser + * Public License for more details. + * + * You should have received a copy of the GNU Lesser Public License along with + * the archive-access tools; if not, write to the Free Software Foundation, + * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +package org.archive.nutchwax.tools; + +import java.io.*; +import java.util.*; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import org.apache.hadoop.conf.Configured; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; + +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.store.NIOFSDirectory; +import org.apache.lucene.analysis.*; +import org.apache.lucene.search.Query; +import org.apache.lucene.util.Version; +import org.apache.lucene.queryParser.QueryParser; + +import org.apache.nutch.util.NutchConfiguration; + +/** + * + */ +public class DateCleaner extends Configured implements Tool +{ + public static final Log LOG = LogFactory.getLog( DateCleaner.class ); + + public int run( String[] args ) throws Exception + { + if ( args.length != 1 || args[0] == "-h" ) + { + System.out.println( "DateCleaner <index>" ); + System.out.println( "Delete all documents w/o a date from the index." ); + + return 0; + } + + IndexWriter writer = new IndexWriter( new NIOFSDirectory( new File( args[0] ) ), + new KeywordAnalyzer(), + IndexWriter.MaxFieldLength.UNLIMITED ); + + QueryParser qp = new QueryParser( Version.LUCENE_30, "date", new KeywordAnalyzer() ); + qp.setAllowLeadingWildcard( true ); + + Query q = qp.parse( "*:* -date:*" ); + + try + { + writer.deleteDocuments( q ); + writer.close(); + } + catch ( Throwable t ) + { + writer.close(); + } + + return 0; + } + + /** + * Command-line driver. Runs the Importer as a Hadoop job. + */ + public static void main( String args[] ) throws Exception + { + int result = ToolRunner.run( NutchConfiguration.create(), new DateCleaner(), args ); + + System.exit( result ); + } + +} \ No newline at end of file This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <nl...@ar...> - 2011-07-30 02:05:57
|
Wayback-1 - Build # 25 - Successful: Check console output at https://builds.archive.org:1443/job/Wayback-1/25/ to view the results. |
From: <nl...@ar...> - 2011-07-30 02:01:49
|
Access-Control - Build # 27 - Successful: Check console output at https://builds.archive.org:1443/job/Access-Control/27/ to view the results. |
From: <nl...@ar...> - 2011-07-27 20:41:57
|
Wayback-1 - Build # 24 - Successful: Check console output at https://builds.archive.org:1443/job/Wayback-1/24/ to view the results. |
From: <nl...@ar...> - 2011-07-27 20:37:55
|
Access-Control - Build # 26 - Successful: Check console output at https://builds.archive.org:1443/job/Access-Control/26/ to view the results. |
From: <ikr...@us...> - 2011-07-08 22:13:06
|
Revision: 3489 http://archive-access.svn.sourceforge.net/archive-access/?rev=3489&view=rev Author: ikreymer Date: 2011-07-08 22:13:00 +0000 (Fri, 08 Jul 2011) Log Message: ----------- Access Control - Exclusion Oracle: UPDATE: Added support for jetty run for oracle Updated to latest hsqldb FIX: Rule.compareTo includes a compare by access group name so that rules that differ only by access group are distinct Modified Paths: -------------- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/Rule.java trunk/archive-access/projects/access-control/oracle/pom.xml Modified: trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/Rule.java =================================================================== --- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/Rule.java 2011-07-08 05:13:03 UTC (rev 3488) +++ trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/model/Rule.java 2011-07-08 22:13:00 UTC (rev 3489) @@ -277,6 +277,8 @@ i = -1; } else if (getWho() == null && o.getWho() != null) { i = 1; + } else if (getWho() != null && o.getWho() != null) { + i = getWho().compareTo(o.getWho()); } else { i = getPolicyId().compareTo(o.getPolicyId()); } Modified: trunk/archive-access/projects/access-control/oracle/pom.xml =================================================================== --- trunk/archive-access/projects/access-control/oracle/pom.xml 2011-07-08 05:13:03 UTC (rev 3488) +++ trunk/archive-access/projects/access-control/oracle/pom.xml 2011-07-08 22:13:00 UTC (rev 3489) @@ -1,4 +1,4 @@ -<?xml version="1.0" encoding="UTF-8"?><project> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <parent> <artifactId>access-control</artifactId> <groupId>org.archive</groupId> @@ -14,9 +14,9 @@ <build> <extensions> <extension> - <groupId>hsqldb</groupId> - <artifactId>hsqldb</artifactId> - <version>1.8.0.7</version> + <groupId>org.hsqldb</groupId> + <artifactId>hsqldb</artifactId> + <version>2.2.4</version> </extension> </extensions> <plugins> @@ -48,6 +48,24 @@ </componentProperties> </configuration> </plugin> + + <plugin> + <groupId>org.mortbay.jetty</groupId> + <artifactId>maven-jetty-plugin</artifactId> + <configuration> + <systemProperties> + <systemProperty> + <name>jetty.port</name> + <value>8080</value> + </systemProperty> + </systemProperties> + <scanIntervalSeconds>5</scanIntervalSeconds> + <contextPath>oracle</contextPath> + <stopKey>foo</stopKey> + <stopPort>9995</stopPort> + </configuration> + </plugin> + </plugins> </build> <repositories> @@ -115,9 +133,9 @@ <version>8.2-504.jdbc3</version> </dependency> <dependency> - <groupId>hsqldb</groupId> - <artifactId>hsqldb</artifactId> - <version>1.8.0.7</version> + <groupId>org.hsqldb</groupId> + <artifactId>hsqldb</artifactId> + <version>2.2.4</version> </dependency> <dependency> <groupId>com.thoughtworks.xstream</groupId> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3488 http://archive-access.svn.sourceforge.net/archive-access/?rev=3488&view=rev Author: bradtofel Date: 2011-07-08 05:13:03 +0000 (Fri, 08 Jul 2011) Log Message: ----------- Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java 2011-07-08 04:50:15 UTC (rev 3487) +++ trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java 2011-07-08 05:13:03 UTC (rev 3488) @@ -24,6 +24,7 @@ import java.io.IOException; import java.util.Map; +import org.apache.commons.httpclient.URIException; import org.archive.wayback.UrlCanonicalizer; import org.archive.wayback.core.CaptureSearchResult; import org.archive.wayback.util.ObjectFilter; @@ -146,9 +147,10 @@ assertTrue("emptypath",isBlocked(filter,"http://www.peagreenboat.com/")); } - private boolean isBlocked(ObjectFilter<CaptureSearchResult> filter, String url) { + private boolean isBlocked(ObjectFilter<CaptureSearchResult> filter, String url) throws URIException { CaptureSearchResult result = new CaptureSearchResult(); result.setOriginalUrl(url); + result.setUrlKey(canonicalizer.urlStringToKey(url)); int filterResult = filter.filterObject(result); if(filterResult == ObjectFilter.FILTER_EXCLUDE) { return true; @@ -163,7 +165,7 @@ Map<String,Object> map = factory.loadFile(tmpFile.getAbsolutePath()); return new StaticMapExclusionFilter(map,canonicalizer); } - + private void setTmpContents(String[] lines) throws IOException { if(tmpFile != null && tmpFile.exists()) { tmpFile.delete(); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2011-07-08 04:50:21
|
Revision: 3487 http://archive-access.svn.sourceforge.net/archive-access/?rev=3487&view=rev Author: bradtofel Date: 2011-07-08 04:50:15 +0000 (Fri, 08 Jul 2011) Log Message: ----------- DEPENDENCY: moved heritrix-commons to 3.1.0 Modified Paths: -------------- trunk/archive-access/projects/wayback/pom.xml Modified: trunk/archive-access/projects/wayback/pom.xml =================================================================== --- trunk/archive-access/projects/wayback/pom.xml 2011-07-08 04:49:09 UTC (rev 3486) +++ trunk/archive-access/projects/wayback/pom.xml 2011-07-08 04:50:15 UTC (rev 3487) @@ -262,9 +262,16 @@ <dependency> <groupId>org.archive.heritrix</groupId> <artifactId>heritrix-commons</artifactId> - <version>3.0.1-SNAPSHOT</version> + <version>3.1.0-SNAPSHOT</version> </dependency> +<!-- <dependency> + <groupId>org.archive.heritrix</groupId> + <artifactId>heritrix-modules</artifactId> + <version>3.1.0-SNAPSHOT</version> + </dependency> +--> + <dependency> <groupId>org.archive.access-control</groupId> <artifactId>access-control</artifactId> <version>0.0.1-SNAPSHOT</version> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3486 http://archive-access.svn.sourceforge.net/archive-access/?rev=3486&view=rev Author: bradtofel Date: 2011-07-08 04:49:09 +0000 (Fri, 08 Jul 2011) Log Message: ----------- OPTIMIZ: removed unneeded canonicalization - just use the urlKey from the search result Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilter.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilter.java 2011-07-08 04:47:51 UTC (rev 3485) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilter.java 2011-07-08 04:49:09 UTC (rev 3486) @@ -84,14 +84,7 @@ } notifiedSeen = true; } - String url; - try { - url = canonicalizer.urlStringToKey(r.getOriginalUrl()); - } catch (URIException e) { - - //e.printStackTrace(); - return FILTER_EXCLUDE; - } + String url = r.getUrlKey(); if(lastChecked != null) { if(lastChecked.equals(url)) { if(lastCheckedExcluded) { This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3485 http://archive-access.svn.sourceforge.net/archive-access/?rev=3485&view=rev Author: bradtofel Date: 2011-07-08 04:47:51 +0000 (Fri, 08 Jul 2011) Log Message: ----------- LOGGING: changed logging levels for most messages, added PerformanceLogger Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java 2011-07-08 04:40:04 UTC (rev 3484) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/accesscontrol/robotstxt/RobotExclusionFilter.java 2011-07-08 04:47:51 UTC (rev 3485) @@ -40,6 +40,7 @@ import org.archive.wayback.resourceindex.filters.ExclusionFilter; import org.archive.wayback.util.ObjectFilter; import org.archive.wayback.util.url.UrlOperations; +import org.archive.wayback.webapp.PerformanceLogger; /** * CaptureSearchResult Filter that uses a LiveWebCache to retrieve robots.txt @@ -172,16 +173,20 @@ LOGGER.fine("ROBOT: Cached("+urlString+")"); rules = rulesCache.get(urlString); if(!urlString.equals(firstUrlString)) { - LOGGER.info("Adding extra url("+firstUrlString+") for prev cached rules("+urlString+")"); + LOGGER.fine("Adding extra url("+firstUrlString+") for prev cached rules("+urlString+")"); rulesCache.put(firstUrlString, rules); } } else { try { - LOGGER.info("ROBOT: NotCached - Downloading("+urlString+")"); + LOGGER.fine("ROBOT: NotCached - Downloading("+urlString+")"); tmpRules = new RobotRules(); + long start = System.currentTimeMillis(); Resource resource = webCache.getCachedResource(new URL(urlString), maxCacheMS,true); + long elapsed = System.currentTimeMillis() - start; + PerformanceLogger.noteElapsed("RobotRequest", elapsed, urlString); + if(resource.getStatusCode() != 200) { LOGGER.info("ROBOT: NotAvailable("+urlString+")"); throw new LiveDocumentNotAvailableException(urlString); @@ -189,24 +194,24 @@ tmpRules.parse(resource); rulesCache.put(firstUrlString,tmpRules); rules = tmpRules; - LOGGER.info("ROBOT: Downloaded("+urlString+")"); + LOGGER.fine("ROBOT: Downloaded("+urlString+")"); } catch (LiveDocumentNotAvailableException e) { LOGGER.info("ROBOT: LiveDocumentNotAvailableException("+urlString+")"); } catch (MalformedURLException e) { // e.printStackTrace(); - LOGGER.info("ROBOT: MalformedURLException("+urlString+")"); + LOGGER.warning("ROBOT: MalformedURLException("+urlString+")"); return null; } catch (IOException e) { LOGGER.warning("ROBOT: IOException("+urlString+"):"+e.getLocalizedMessage()); return null; } catch (LiveWebCacheUnavailableException e) { - LOGGER.info("ROBOT: LiveWebCacheUnavailableException("+urlString+")"); + LOGGER.severe("ROBOT: LiveWebCacheUnavailableException("+urlString+")"); filterGroup.setLiveWebGone(); return null; } catch (LiveWebTimeoutException e) { - LOGGER.info("ROBOT: LiveDocumentTimedOutException("+urlString+")"); + LOGGER.severe("ROBOT: LiveDocumentTimedOutException("+urlString+")"); filterGroup.setRobotTimedOut(); return null; } @@ -216,7 +221,7 @@ // special-case, allow empty rules if no longer available. rulesCache.put(firstUrlString,emptyRules); rules = emptyRules; - LOGGER.info("No rules available, using emptyRules for:" + firstUrlString); + LOGGER.fine("No rules available, using emptyRules for:" + firstUrlString); } return rules; } @@ -257,9 +262,9 @@ notifiedPassed = true; } filterResult = ObjectFilter.FILTER_INCLUDE; - LOGGER.fine("ROBOT: ALLOWED("+resultURL+")"); + LOGGER.finer("ROBOT: ALLOWED("+resultURL+")"); } else { - LOGGER.info("ROBOT: BLOCKED("+resultURL+")"); + LOGGER.fine("ROBOT: BLOCKED("+resultURL+")"); } } return filterResult; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2011-07-08 04:40:11
|
Revision: 3484 http://archive-access.svn.sourceforge.net/archive-access/?rev=3484&view=rev Author: bradtofel Date: 2011-07-08 04:40:04 +0000 (Fri, 08 Jul 2011) Log Message: ----------- Updated to Heritrix 3.1.0 Modified Paths: -------------- trunk/archive-access/projects/access-control/access-control/pom.xml trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java trunk/archive-access/projects/access-control/oracle/pom.xml Modified: trunk/archive-access/projects/access-control/access-control/pom.xml =================================================================== --- trunk/archive-access/projects/access-control/access-control/pom.xml 2011-07-06 02:20:09 UTC (rev 3483) +++ trunk/archive-access/projects/access-control/access-control/pom.xml 2011-07-08 04:40:04 UTC (rev 3484) @@ -63,8 +63,8 @@ </dependency> <dependency> <groupId>org.archive.heritrix</groupId> - <artifactId>commons</artifactId> - <version>2.0.0-RC1</version> + <artifactId>heritrix-commons</artifactId> + <version>3.1.0-SNAPSHOT</version> </dependency> <dependency> Modified: trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java =================================================================== --- trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java 2011-07-06 02:20:09 UTC (rev 3483) +++ trunk/archive-access/projects/access-control/access-control/src/main/java/org/archive/accesscontrol/AccessControlClient.java 2011-07-08 04:40:04 UTC (rev 3484) @@ -129,8 +129,9 @@ String who) throws RuleOracleUnavailableException { url = ArchiveUtils.addImpliedHttpIfNecessary(url); String surt = SURT.fromURI(url); +// PublicSuffixes.reduceSurtToAssignmentLevel(surt) String publicSuffix = PublicSuffixes - .reduceSurtToTopmostAssigned(getSurtAuthority(surt)); + .reduceSurtToAssignmentLevel(getSurtAuthority(surt)); RuleSet rules = ruleDao.getRuleTree(getScheme(surt) + "(" + publicSuffix); @@ -187,7 +188,7 @@ for (String url: urls) { String surt = SURT.fromURI(ArchiveUtils.addImpliedHttpIfNecessary(url)); publicSuffixes.add(PublicSuffixes - .reduceSurtToTopmostAssigned(getSurtAuthority(surt))); + .reduceSurtToAssignmentLevel(getSurtAuthority(surt))); } ruleDao.prepare(publicSuffixes); Modified: trunk/archive-access/projects/access-control/oracle/pom.xml =================================================================== --- trunk/archive-access/projects/access-control/oracle/pom.xml 2011-07-06 02:20:09 UTC (rev 3483) +++ trunk/archive-access/projects/access-control/oracle/pom.xml 2011-07-08 04:40:04 UTC (rev 3484) @@ -131,8 +131,8 @@ </dependency> <dependency> <groupId>org.archive.heritrix</groupId> - <artifactId>commons</artifactId> - <version>2.0.1-SNAPSHOT</version> + <artifactId>heritrix-commons</artifactId> + <version>3.1.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.hibernate</groupId> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3483 http://archive-access.svn.sourceforge.net/archive-access/?rev=3483&view=rev Author: bradtofel Date: 2011-07-06 02:20:09 +0000 (Wed, 06 Jul 2011) Log Message: ----------- FEATURE: made StringRewriters instance members, not static, changed JSStringTransformer to generic StringTransformer, exposed it for configuration via Spring Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/FastArchivalUrlReplayParseEventHandler.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/FastArchivalUrlReplayParseEventHandler.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/FastArchivalUrlReplayParseEventHandler.java 2011-06-29 10:11:19 UTC (rev 3482) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/archivalurl/FastArchivalUrlReplayParseEventHandler.java 2011-07-06 02:20:09 UTC (rev 3483) @@ -67,19 +67,19 @@ private final static String FRAMESET_TAG = "FRAMESET"; private final static String BODY_TAG = "BODY"; - private static BlockCSSStringTransformer cssBlockTrans = + private BlockCSSStringTransformer cssBlockTrans = new BlockCSSStringTransformer(); - private static InlineCSSStringTransformer cssInlineTrans = + private InlineCSSStringTransformer cssInlineTrans = new InlineCSSStringTransformer(); - private static JSStringTransformer jsBlockTrans = + private StringTransformer jsBlockTrans = new JSStringTransformer(); - private static MetaRefreshUrlStringTransformer metaRefreshTrans = + private MetaRefreshUrlStringTransformer metaRefreshTrans = new MetaRefreshUrlStringTransformer(); - private static URLStringTransformer anchorUrlTrans = null; - static { - anchorUrlTrans = new URLStringTransformer(); - anchorUrlTrans.setJsTransformer(jsBlockTrans); - } + private URLStringTransformer anchorUrlTrans = new URLStringTransformer(); +// static { +// anchorUrlTrans = new URLStringTransformer(); +// anchorUrlTrans.setJsTransformer(jsBlockTrans); +// } private static URLStringTransformer framesetUrlTrans = new URLStringTransformer("fw_"); private static URLStringTransformer cssUrlTrans = @@ -95,6 +95,7 @@ for (String tag : okHeadTags) { okHeadTagMap.put(tag, null); } + anchorUrlTrans.setJsTransformer(jsBlockTrans); } // TODO: This should all be refactored up into an abstract base class with @@ -429,4 +430,20 @@ public void setEndJsp(String endJsp) { this.endJsp = endJsp; } + + /** + * @return the jsBlockTrans + */ + public StringTransformer getJsBlockTrans() { + return jsBlockTrans; + } + + /** + * @param jsBlockTrans the jsBlockTrans to set + */ + public void setJsBlockTrans(StringTransformer jsBlockTrans) { + this.jsBlockTrans = jsBlockTrans; + anchorUrlTrans.setJsTransformer(jsBlockTrans); + + } } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2011-06-29 10:11:25
|
Revision: 3482 http://archive-access.svn.sourceforge.net/archive-access/?rev=3482&view=rev Author: bradtofel Date: 2011-06-29 10:11:19 +0000 (Wed, 29 Jun 2011) Log Message: ----------- TWEAK: removed jetty servlet api from hadoop inclusion. was causing jar mismatch problems under tomcat6 Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/pom.xml Modified: trunk/archive-access/projects/wayback/wayback-core/pom.xml =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/pom.xml 2011-06-26 03:00:40 UTC (rev 3481) +++ trunk/archive-access/projects/wayback/wayback-core/pom.xml 2011-06-29 10:11:19 UTC (rev 3482) @@ -78,6 +78,10 @@ <groupId>commons-httpclient</groupId> <artifactId>commons-httpclient</artifactId> </exclusion> + <exclusion> + <groupId>org.mortbay.jetty</groupId> + <artifactId>servlet-api-2.5</artifactId> + </exclusion> </exclusions> </dependency> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3481 http://archive-access.svn.sourceforge.net/archive-access/?rev=3481&view=rev Author: bradtofel Date: 2011-06-26 03:00:40 +0000 (Sun, 26 Jun 2011) Log Message: ----------- BUGFIX: was not checking all NPE cases: String.compareTo(other) throws NPE when other is null. Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java 2011-06-25 03:40:03 UTC (rev 3480) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java 2011-06-26 03:00:40 UTC (rev 3481) @@ -58,14 +58,14 @@ String urlKey = r.getUrlKey(); try { String redirectKey = canonicalizer.urlStringToKey(redirect); - if((redirectKey != null) && + if((redirectKey != null) && (urlKey != null) && (redirectKey.compareTo(urlKey) == 0)) { // only omit if same scheme: String origScheme = UrlOperations.urlToScheme(r.getOriginalUrl()); String redirScheme = UrlOperations.urlToScheme(redirect); - if((origScheme != null) && + if((origScheme != null) && (redirScheme != null) && (origScheme.compareTo(redirScheme) == 0)) { return FILTER_EXCLUDE; } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2011-06-25 03:40:09
|
Revision: 3480 http://archive-access.svn.sourceforge.net/archive-access/?rev=3480&view=rev Author: bradtofel Date: 2011-06-25 03:40:03 +0000 (Sat, 25 Jun 2011) Log Message: ----------- Added Hadoop reference - needed for new HDFS block loader implementation Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/pom.xml Modified: trunk/archive-access/projects/wayback/wayback-core/pom.xml =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/pom.xml 2011-06-24 13:31:21 UTC (rev 3479) +++ trunk/archive-access/projects/wayback/wayback-core/pom.xml 2011-06-25 03:40:03 UTC (rev 3480) @@ -70,6 +70,16 @@ <groupId>com.flagstone</groupId> <artifactId>transform</artifactId> </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-core</artifactId> + <exclusions> + <exclusion> + <groupId>commons-httpclient</groupId> + <artifactId>commons-httpclient</artifactId> + </exclusion> + </exclusions> + </dependency> </dependencies> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 3479 http://archive-access.svn.sourceforge.net/archive-access/?rev=3479&view=rev Author: bradtofel Date: 2011-06-24 13:31:21 +0000 (Fri, 24 Jun 2011) Log Message: ----------- BUGFIX: Simple NPE - not checking returned values Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java 2011-06-16 17:26:31 UTC (rev 3478) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/SelfRedirectFilter.java 2011-06-24 13:31:21 UTC (rev 3479) @@ -58,13 +58,15 @@ String urlKey = r.getUrlKey(); try { String redirectKey = canonicalizer.urlStringToKey(redirect); - if(redirectKey.compareTo(urlKey) == 0) { + if((redirectKey != null) && + (redirectKey.compareTo(urlKey) == 0)) { // only omit if same scheme: String origScheme = UrlOperations.urlToScheme(r.getOriginalUrl()); String redirScheme = UrlOperations.urlToScheme(redirect); - if(origScheme.compareTo(redirScheme) == 0) { + if((origScheme != null) && + (origScheme.compareTo(redirScheme) == 0)) { return FILTER_EXCLUDE; } } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: Rudolf K. <wes...@gm...> - 2011-06-23 21:16:34
|
Hi, There are few solutions for your question, this one is two step process with using cdx index. 1.) in heritrix/bin is cdx-indexer which will index URLs in your archive files (warc/arc) // http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html#cdx-indexer 2.) than in wayback find CDXCollection.xml - there you can set path to your cdx index and path to archive files . // http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html#WaybackCollection_Configuration I hope it helps, rudolf 2011/6/23 Magnus Norberg <man...@ho...>: > Hello everyone, > > I have made some crawling with Heritrix. Now I want to look at my crawled > web pages with Waybackmachine. When I go to > http://localhost:8080/wayback-1.6.0/ I write some adress and click on "Take > me back" but then I all I get is a white, empty page. > > How do I tell Waybackmachine to find my ARC/WARC-files? In what directory on > my harddrive should I put the ARC/WARC-files? > > Thank you! > > /Greetings from Magnus > > ------------------------------------------------------------------------------ > Simplify data backup and recovery for your virtual environment with vRanger. > Installation's a snap, and flexible recovery options mean your data is safe, > secure and there when you need it. Data protection magic? > Nope - It's vRanger. Get your free trial download today. > http://p.sf.net/sfu/quest-sfdev2dev > _______________________________________________ > Archive-access-cvs mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-cvs > > |
From: Magnus N. <man...@ho...> - 2011-06-23 17:52:38
|
Hello everyone, I have made some crawling with Heritrix. Now I want to look at my crawled web pages with Waybackmachine. When I go to http://localhost:8080/wayback-1.6.0/ I write some adress and click on "Take me back" but then I all I get is a white, empty page. How do I tell Waybackmachine to find my ARC/WARC-files? In what directory on my harddrive should I put the ARC/WARC-files? Thank you! /Greetings from Magnus |