archive-access-cvs Mailing List for Web Archive Access Utilities (Page 156)

Brought to you by: binzino, bradtofel, gojomo, ia_igor, and 5 others

archive-access-cvs — CVS commits

You can subscribe to this list here.

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (10)	Sep (36)	Oct (339)	Nov (103)	Dec (152)
2006	Jan (141)	Feb (102)	Mar (125)	Apr (203)	May (57)	Jun (30)	Jul (139)	Aug (46)	Sep (64)	Oct (105)	Nov (34)	Dec (162)
2007	Jan (81)	Feb (57)	Mar (141)	Apr (72)	May (9)	Jun (1)	Jul (144)	Aug (88)	Sep (40)	Oct (43)	Nov (34)	Dec (20)
2008	Jan (44)	Feb (45)	Mar (16)	Apr (36)	May (8)	Jun (77)	Jul (177)	Aug (66)	Sep (8)	Oct (33)	Nov (13)	Dec (37)
2009	Jan (2)	Feb (5)	Mar (8)	Apr	May (36)	Jun (19)	Jul (46)	Aug (8)	Sep (1)	Oct (66)	Nov (61)	Dec (10)
2010	Jan (13)	Feb (16)	Mar (38)	Apr (76)	May (47)	Jun (32)	Jul (35)	Aug (45)	Sep (20)	Oct (61)	Nov (24)	Dec (16)
2011	Jan (22)	Feb (34)	Mar (11)	Apr (8)	May (24)	Jun (23)	Jul (11)	Aug (42)	Sep (81)	Oct (48)	Nov (21)	Dec (20)
2012	Jan (30)	Feb (25)	Mar (4)	Apr (6)	May (1)	Jun (5)	Jul (5)	Aug (8)	Sep (6)	Oct (6)	Nov	Dec

Flat | Threaded

<< < 1 .. 154 155 156 157 158 .. 171 > >> (Page 156 of 171)

[Archive-access-cvs] archive-access/projects/wera/src/webapps/wera/lib/seal nutch.inc,1.8,1.9

From: Sverre B. <sv...@us...> - 2005-11-03 13:25:54

Update of /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/lib/seal
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25466/lib/seal

Modified Files:
	nutch.inc 
Log Message:
RFE1346889 Google-like result presentation

Index: nutch.inc
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/webapps/wera/lib/seal/nutch.inc,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** nutch.inc	20 Oct 2005 10:40:48 -0000	1.8
--- nutch.inc	3 Nov 2005 13:25:29 -0000	1.9
***************
*** 61,66 ****
    var $sort; 
    var $debug;
-   var $supressduplicates;
    var $morepages;
    
    /**
--- 61,67 ----
    var $sort; 
    var $debug;
    var $morepages;
+   var $dedupfield;
+   var $hitsperdup;
    
    /**
***************
*** 77,82 ****
      $this->offset = 0;
      $this->timespent = 0;
!     $this->unsetSupressDuplicates();
!     $this->morepages = false;  
    }
  
--- 78,83 ----
      $this->offset = 0;
      $this->timespent = 0;
!     $this->morepages = false;
!     $this->setDedup();  
    }
  
***************
*** 116,120 ****
      # e.g &dedupField=date&hitsPerDup=100&sort=date
      if ($sortorder == "ascending" or $sortorder == "descending") {
!         $this->sort = "&dedupField=date&sort=date";
          if ($sortorder == "descending") {
              $this->sort .= "&reverse=true";
--- 117,122 ----
      # e.g &dedupField=date&hitsPerDup=100&sort=date
      if ($sortorder == "ascending" or $sortorder == "descending") {
!     			$this->setDedup(100, "date");
!         $this->sort = "&sort=date";
          if ($sortorder == "descending") {
              $this->sort .= "&reverse=true";
***************
*** 123,140 ****
      
    } 
- 
-   /** 
-   * Set suppress duplicate urls 
-   */
-   function setSupressDuplicates() {
- 	$this->supressduplicates = "&hitsPerDup=1&dedupField=exacturl";
-   } 
    
!   /** 
!   * Unset suppress duplicate urls 
!   */
!   function unsetSupressDuplicates() {
! 	$this->supressduplicates = "&hitsPerDup=0";
!   } 
    
    /**
--- 125,142 ----
      
    } 
    
!   /**
!    * Set deduplication
!    *  
!    * If dedupfield is emty, NutchWax defaults to 'site'
!    * To turn off dedup, set hitsperdup to 0
!    * 
!    * @param  integer  Hits per duplicate
!    * @param  string   Field to deduplicate on
!    */
!   function setDedup($hitsperdup = 0, $dedupfield = "") {
!   		$this->hitsperdup = $hitsperdup;
!   		$this->dedupfield = $dedupfield;
!   }
    
    /**
***************
*** 171,175 ****
      $time_start = microtime_float();
      
!     $this->queryurl = $this->searchengineurl . "?query=" . $this->adaptQuery($this->query) . "&start=" . $this->offset . "&hitsPerPage=" . $this->hitsperset . $this->supressduplicates;
      
      if ($this->sort != "") {
--- 173,177 ----
      $time_start = microtime_float();
      
!     $this->queryurl = $this->searchengineurl . "?query=" . $this->adaptQuery($this->query) . "&start=" . $this->offset . "&hitsPerPage=" . $this->hitsperset . "&hitsPerDup=" . $this->hitsperdup . "&dedupField=" . $this->dedupfield;
      
      if ($this->sort != "") {
***************
*** 287,291 ****
              $this->resultset[$this->hitno]['encoding'] .= $data;
            }
!           break;                             
        }
      }
--- 289,298 ----
              $this->resultset[$this->hitno]['encoding'] .= $data;
            }
!           break;
!         case "NUTCH:SITE":
!           if (in_array("site", $this->resultfields)) {
!             $this->resultset[$this->hitno]['site'] .= $data;
!           }
!           break;                                       
        }
      }

[Archive-access-cvs] archive-access/projects/nutch/xdocs faq.fml,1.14,1.15

From: Michael S. <sta...@us...> - 2005-11-01 19:17:17

Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3465/xdocs

Modified Files:
	faq.fml 
Log Message:

* xdocs/faq.fml 
    More edit of scoring section.


Index: faq.fml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/faq.fml,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** faq.fml	1 Nov 2005 19:13:47 -0000	1.14
--- faq.fml	1 Nov 2005 19:17:09 -0000	1.15
***************
*** 272,276 ****
  query.host.boost, 2.0f
  query.phrase.boost, 1.0f</pre></p>
! <p>You can change the above boosts by editing your nutch-site.xml</p>
  <p>Anchor text makes a large contribution to a document ranking score.
  You can see the anchor text for a page by browsing to the 'explain' then
--- 272,278 ----
  query.host.boost, 2.0f
  query.phrase.boost, 1.0f</pre></p>
! <p>From the list above, you can see that terms found in a document URL get
! the highest boost with anchor text next, etc.
! You can change the above boosts by editing your nutch-site.xml</p>
  <p>Anchor text makes a large contribution to a document ranking score.
  You can see the anchor text for a page by browsing to the 'explain' then

[Archive-access-cvs] archive-access/projects/nutch/xdocs faq.fml,1.13,1.14

From: Michael S. <sta...@us...> - 2005-11-01 19:13:56

Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2303/xdocs

Modified Files:
	faq.fml 
Log Message:

* xdocs/faq.fml 
    Edit on ranking on how you can change query time boost.


Index: faq.fml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/faq.fml,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** faq.fml	1 Nov 2005 19:12:10 -0000	1.13
--- faq.fml	1 Nov 2005 19:13:47 -0000	1.14
***************
*** 272,275 ****
--- 272,276 ----
  query.host.boost, 2.0f
  query.phrase.boost, 1.0f</pre></p>
+ <p>You can change the above boosts by editing your nutch-site.xml</p>
  <p>Anchor text makes a large contribution to a document ranking score.
  You can see the anchor text for a page by browsing to the 'explain' then

[Archive-access-cvs] archive-access/projects/nutch/xdocs faq.fml,1.12,1.13

From: Michael S. <sta...@us...> - 2005-11-01 19:12:18

Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1807/xdocs

Modified Files:
	faq.fml 
Log Message:

* xdocs/faq.fml 
    Add question on nutch ranking.


Index: faq.fml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/faq.fml,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** faq.fml	20 Oct 2005 23:51:35 -0000	1.12
--- faq.fml	1 Nov 2005 19:12:10 -0000	1.13
***************
*** 237,241 ****
  <question>How to sort results by date?
  </question>
- 
  <answer><p>
  <code>http://localhost:8080/archive-access-nutch/search.jsp?query=traditional+irish+music+paddy&amp;hitsPerPage=100&amp;dedupField=date&amp;hitsPerDup=100&amp;sort=date</code>
--- 237,240 ----
***************
*** 251,256 ****
  </p></answer>
      </faq>
!     <faq>
! <question id="mimetype">How to query for mimetypes?
  </question>
  <answer>
--- 250,255 ----
  </p></answer>
      </faq>
!     <faq  id="mimetype">
! <question>How to query for mimetypes?
  </question>
  <answer>
***************
*** 263,266 ****
--- 262,281 ----
  </answer>
      </faq>
+     <faq id="scoring">
+         <question>Tell me more about how scoring is done in
+         nutch/nutchwax.</question>
+         <answer>
+         <p>By default, at query time, the following fields are boosted as follows:
+         <pre>query.url.boost, 4.0f
+ query.anchor.boost, 2.0f
+ query.title.boost, 1.5f
+ query.host.boost, 2.0f
+ query.phrase.boost, 1.0f</pre></p>
+ <p>Anchor text makes a large contribution to a document ranking score.
+ You can see the anchor text for a page by browsing to the 'explain' then
+ editing the URL to put in place 'anchors.jsp' instead of 'explain.jsp'.
+ </p>
+         </answer>
+     </faq>
    </part>
  </faqs>

[Archive-access-cvs] archive-access/projects/wayback .classpath,1.5,1.6

From: Michael S. <sta...@us...> - 2005-10-31 21:08:28

Update of /cvsroot/archive-access/archive-access/projects/wayback
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27518

Modified Files:
	.classpath 
Log Message:

* .classpath 
    Had a full path for the codec jar.  Fix.


Index: .classpath
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/.classpath,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** .classpath	25 Oct 2005 20:09:31 -0000	1.5
--- .classpath	31 Oct 2005 21:08:20 -0000	1.6
***************
*** 18,22 ****
          path="src/webapp/WEB-INF/lib/libidn-0.5.9.jar"/>
  	<classpathentry kind="lib" 
!         path="/src/webapp/WEB-INF/lib/commons-codec-1.3.jar"/>
  	<classpathentry kind="lib" 
          path="src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar"/>
--- 18,22 ----
          path="src/webapp/WEB-INF/lib/libidn-0.5.9.jar"/>
  	<classpathentry kind="lib" 
!         path="src/webapp/WEB-INF/lib/commons-codec-1.3.jar"/>
  	<classpathentry kind="lib" 
          path="src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar"/>

[Archive-access-cvs] archive-access/projects/nutch/src/java/org/archive/access/nutch NutchwaxSegmentMergeTool.java,1.3,1.4

From: Michael S. <sta...@us...> - 2005-10-31 18:00:26

Update of /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13604/src/java/org/archive/access/nutch

Modified Files:
	NutchwaxSegmentMergeTool.java 
Log Message:

* src/java/org/archive/access/nutch/NutchwaxSegmentMergeTool.java 
    Added deduping that counts the collection name.


Index: NutchwaxSegmentMergeTool.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch/NutchwaxSegmentMergeTool.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** NutchwaxSegmentMergeTool.java	27 Oct 2005 16:09:52 -0000	1.3
--- NutchwaxSegmentMergeTool.java	31 Oct 2005 18:00:17 -0000	1.4
***************
*** 238,256 ****
          String name = sr.segmentDir.getName();
          FetcherOutput fo = new FetcherOutput();
          for (long i = 0; i < sr.size; i++) {
            try {
!             if (!sr.get(i, fo, null, null, null)) break;
  
              Document doc = new Document();
              
              // compute boost
!             float boost = IndexSegment.calculateBoost(fo.getFetchListEntry().getPage().getScore(),
                      scorePower, boostByLinkCount, fo.getAnchors().length);
              doc.add(new Field("sd", name + "|" + i, true, false, false));
!             doc.add(new Field("uh", MD5Hash.digest(fo.getUrl().toString()).toString(), true, true, false));
!             doc.add(new Field("ch", fo.getMD5Hash().toString(), true, true, false));
!             doc.add(new Field("time", DateField.timeToString(fo.getFetchDate()), true, false, false));
!             doc.add(new Field("score", boost + "", true, false, false));
!             doc.add(new Field("ul", fo.getUrl().toString().length() + "", true, false, false));
              iw.addDocument(doc);
              processedRecords++;
--- 238,269 ----
          String name = sr.segmentDir.getName();
          FetcherOutput fo = new FetcherOutput();
+         ParseData pd = new ParseData();
          for (long i = 0; i < sr.size; i++) {
            try {
!             if (!sr.get(i, fo, null, null, pd))
!                 break;
  
              Document doc = new Document();
              
              // compute boost
!             float boost = IndexSegment.calculateBoost(
!                     fo.getFetchListEntry().getPage().getScore(),
                      scorePower, boostByLinkCount, fo.getAnchors().length);
              doc.add(new Field("sd", name + "|" + i, true, false, false));
!             // doc.add(new Field("uh", 
!             // MD5Hash.digest(fo.getUrl().toString()).toString(), true, true, false));
!             // doc.add(new Field("ch", fo.getMD5Hash().toString(), 
!             // true, true, false));
!             doc.add(new Field("time", 
!                 DateField.timeToString(fo.getFetchDate()), true, false, false));
!             // doc.add(new Field("score", boost + "", true, false, false));
!             // doc.add(new Field("ul", fo.getUrl().toString().length() + "", true,
!             // false, false));
! 
!             // Hash up the content hash, the url itself and the collection name.
!             String hashStr = fo.getMD5Hash().toString() + fo.getUrl().toString() +
!                 pd.getMetadata().getProperty("collection");
!             doc.add(new Field("ucc", MD5Hash.digest(hashStr).toString(), true, true,
!                 false));
              iw.addDocument(doc);
              processedRecords++;
***************
*** 298,411 ****
        }
        iw.close();
!       LOG.info("* Optimizing index took " + (System.currentTimeMillis() - s1) + " ms");
!         LOG.info("* Skipping deduplicate step...");
! //      LOG.info("* Removing duplicate entries...");
! //      stage = SegmentMergeStatus.STAGE_DEDUP;
!         IndexReader ir = IndexReader.open(masterDir);
! //      int i = 0;
! //      long cnt = 0L;
! //      processedRecords = 0L;
! //      s1 = System.currentTimeMillis();
! //      delta = s1;
! //      TermEnum te = ir.terms();
! //      while(te.next()) {
! //        Term t = te.term();
! //        if (t == null) continue;
! //        if (!(t.field().equals("ch") || t.field().equals("uh"))) continue;
! //        cnt++;
! //        processedRecords = cnt / 2;
! //        if (cnt > 0 && (cnt % (LOG_STEP  * 2) == 0)) {
! //          LOG.info(" Processed " + processedRecords + " records (" +
! //                  (float)(LOG_STEP * 1000)/(float)(System.currentTimeMillis() - delta) + " rec/s)");
! //          delta = System.currentTimeMillis();
! //        }
! //        // Enumerate all docs with the same URL hash or content hash
! //        TermDocs td = ir.termDocs(t);
! //        if (td == null) continue;
! //        if (t.field().equals("uh")) {
! //          // Keep only the latest version of the document with
! //          // the same url hash. Note: even if the content
! //          // hash is identical, other metadata may be different, so even
! //          // in this case it makes sense to keep the latest version.
! //          int id = -1;
! //          String time = null;
! //          Document doc = null;
! //          while (td.next()) {
! //            int docid = td.doc();
! //            if (!ir.isDeleted(docid)) {
! //              doc = ir.document(docid);
! //              if (time == null) {
! //                time = doc.get("time");
! //                id = docid;
! //                continue;
! //              }
! //              String dtime = doc.get("time");
! //              // "time" is a DateField, and can be compared lexicographically
! //              if (dtime.compareTo(time) > 0) {
! //                if (id != -1) {
! //                  ir.delete(id);
! //                }
! //                time = dtime;
! //                id = docid;
! //              } else {
! //                ir.delete(docid);
! //              }
! //            }
! //          }
! //        } else if (t.field().equals("ch")) {
! //          // Keep only the version of the document with
! //          // the highest score, and then with the shortest url.
! //          int id = -1;
! //          int ul = 0;
! //          float score = 0.0f;
! //          Document doc = null;
! //          while (td.next()) {
! //            int docid = td.doc();
! //            if (!ir.isDeleted(docid)) {
! //              doc = ir.document(docid);
! //              if (ul == 0) {
! //                try {
! //                  ul = Integer.parseInt(doc.get("ul"));
! //                  score = Float.parseFloat(doc.get("score"));
! //                } catch (Exception e) {};
! //                id = docid;
! //                continue;
! //              }
! //              int dul = 0;
! //              float dscore = 0.0f;
! //              try {
! //                dul = Integer.parseInt(doc.get("ul"));
! //                dscore = Float.parseFloat(doc.get("score"));
! //              } catch (Exception e) {};
! //              int cmp = Float.compare(dscore, score);
! //              if (cmp == 0) {
! //                // equal scores, select the one with shortest url
! //                if (dul < ul) {
! //                  if (id != -1) {
! //                    ir.delete(id);
! //                  }
! //                  ul = dul;
! //                  id = docid;
! //                } else {
! //                  ir.delete(docid);
! //                }
! //              } else if (cmp < 0) {
! //                ir.delete(docid);
! //              } else {
! //                if (id != -1) {
! //                  ir.delete(id);
! //                }
! //                ul = dul;
! //                id = docid;
! //              }
! //            }
! //          }
! //        }
! //      }
! //      //
! //      // keep the IndexReader open...
! //      //
! //      
! //      LOG.info("* Deduplicating took " + (System.currentTimeMillis() - s1) + " ms");
        stage = SegmentMergeStatus.STAGE_WRITING;
        processedRecords = 0L;
--- 311,375 ----
        }
        iw.close();
!       LOG.info("* Optimizing index took " + (System.currentTimeMillis() - s1) +
!             " ms");
!       LOG.info("* Dedupling based off hash of content-md5 + url + collection...");
!       stage = SegmentMergeStatus.STAGE_DEDUP;
!       IndexReader ir = IndexReader.open(masterDir);
!       int i = 0;
!       long cnt = 0L;
!       processedRecords = 0L;
!       s1 = System.currentTimeMillis();
!       delta = s1;
!       TermEnum te = ir.terms();
!       while(te.next()) {
!         Term t = te.term();
!         if (t == null) continue;
!         if (!(t.field().equals("ucc"))) continue;
!         cnt++;
!         processedRecords = cnt / 2;
!         if (cnt > 0 && (cnt % (LOG_STEP  * 2) == 0)) {
!           LOG.info(" Processed " + processedRecords + " records (" +
!             (float)(LOG_STEP * 1000)/(float)(System.currentTimeMillis() - delta) +
!             " rec/s)");
!           delta = System.currentTimeMillis();
!         }
!         // Enumerate all docs with the same URL + content + collection  hash.
!         TermDocs td = ir.termDocs(t);
!         if (td == null) continue;
!         if (t.field().equals("ucc")) {
!           // Keep only the latest version of the document with
!           // the same url + content + collection hash. 
!           int id = -1;
!           String time = null;
!           Document doc = null;
!           while (td.next()) {
!             int docid = td.doc();
!             if (!ir.isDeleted(docid)) {
!               doc = ir.document(docid);
!               if (time == null) {
!                 time = doc.get("time");
!                 id = docid;
!                 continue;
!               }
!               String dtime = doc.get("time");
!               // "time" is a DateField, and can be compared lexicographically
!               if (dtime.compareTo(time) > 0) {
!                 if (id != -1) {
!                   ir.delete(id);
!                 }
!                 time = dtime;
!                 id = docid;
!               } else {
!                 ir.delete(docid);
!               }
!             }
!           }
!         }
!       }
!       //
!       // keep the IndexReader open...
!       //
!       
!       LOG.info("* Deduplicating took " + (System.currentTimeMillis() - s1) + " ms");
        stage = SegmentMergeStatus.STAGE_WRITING;
        processedRecords = 0L;

[Archive-access-cvs] archive-access/projects/nutch/src/java/org/archive/access/nutch NutchwaxSegmentMergeTool.java,1.2,1.3

From: Michael S. <sta...@us...> - 2005-10-27 16:10:00

Update of /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30856/src/java/org/archive/access/nutch

Modified Files:
	NutchwaxSegmentMergeTool.java 
Log Message:

* src/java/org/archive/access/nutch/NutchwaxSegmentMergeTool.java 
    Change information message from severe to info.


Index: NutchwaxSegmentMergeTool.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/java/org/archive/access/nutch/NutchwaxSegmentMergeTool.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** NutchwaxSegmentMergeTool.java	6 Oct 2005 01:45:35 -0000	1.2
--- NutchwaxSegmentMergeTool.java	27 Oct 2005 16:09:52 -0000	1.3
***************
*** 225,229 ****
        }
        masters.add(masterDir);
!       LOG.severe("MasterDir is " + masterDir.toString());
        IndexWriter iw = new IndexWriter(masterDir, new WhitespaceAnalyzer(), true);
        iw.setUseCompoundFile(false);
--- 225,229 ----
        }
        masters.add(masterDir);
!       LOG.info("MasterDir is " + masterDir.toString());
        IndexWriter iw = new IndexWriter(masterDir, new WhitespaceAnalyzer(), true);
        iw.setUseCompoundFile(false);

[Archive-access-cvs] archive-access/projects/wera/src/articles what-is-wera.xml,1.4,1.5

From: Sverre B. <sv...@us...> - 2005-10-26 09:21:13

Update of /cvsroot/archive-access/archive-access/projects/wera/src/articles
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31313/src/articles

Modified Files:
	what-is-wera.xml 
Log Message:
Added section on WERA future 

Index: what-is-wera.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wera/src/articles/what-is-wera.xml,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** what-is-wera.xml	21 Oct 2005 07:33:39 -0000	1.4
--- what-is-wera.xml	26 Oct 2005 09:21:02 -0000	1.5
***************
*** 138,200 ****
        </listitem>
      </itemizedlist>
  
!     <section>
!       <title>Practical use</title>
  
!       <para>The original vision for the <ulink
!       url="http://nwa.nb.no">NwaToolset</ulink> (the predecessor of Wera) was
!       to enable search across the different Nordic Web Archives and provide
!       seamless navigation within the different archives. The ability to search
!       across the different indexes was solved by the using <ulink
!       url="http://fastsearch.com/">Fast Search &amp; Transfer</ulink>'s multi
!       node architecture. To enable Wera to retrieve a particular document with
!       a given <literal>aid</literal> (Archive ID) from the right archive the
!       collection field was introduced in the index (also present in the
!       NutchWax index). The Wera config file holds the mapping from collection
!       to archive (or rather Wera installation).</para>
  
!       <para>Another reason to include the collection field was to ensure that
!       the actual link rewriting was done by the owner of the document. Each
!       archive holder would have to set up their own Wera installation. When
!       one Wera was requesting a document from a remote archive, the remote
!       Wera should make the necessary changes to the document before delivering
!       it to the calling Wera. The reason for this was to make sure that the
!       owner had full control over what was delivered to the calling site, thus
!       being able to threat the document in accordance with local policies
!       rather than the policies of the caller site. The figure below
!       illustrates the currently supported use of mapping between collection
!       and archive nodes.</para>
  
!       <figure>
!         <title>Wera interfacing several archive nodes</title>
  
!         <mediaobject>
!           <imageobject>
!             <imagedata fileref="images/wera3.png" />
!           </imageobject>
!         </mediaobject>
!       </figure>
  
!       <para>In the Wera installation of <emphasis>W1</emphasis> the different
!       collections indexed in NutchWax are mapped to corresponding Wera
!       installations of <emphasis>W2- Wn</emphasis>. When the timeline view on
!       W1 encounters a resource located on a different node (e.g. the
!       collection mapping points to the Wera installation of
!       <emphasis>W2</emphasis>) it requests that resource from the Wera
!       installation at <literal>W2</literal>. Wera at <literal>W2</literal>
!       fetches the resource from its Retriever and does the necessary changes
!       to the file before delivering it to Wera at <literal>W1</literal> (e.g.
!       inserts javascript link rewriter or rewrites it server side). When Wera
!       at <literal>W1</literal> receives this file it does an additional
!       rewrite in order to have the links point to itself rather than to
!       <literal>W2</literal>'s Wera.</para>
  
!       <para>In a real-life large scale Web Archive where the ARC files are
!       distributed across tens or hundreds of hosts it will not be practical to
!       set up one Wera installation for each of these. A better solution will
!       be to introduce communication between the different retrievers or have
!       one front-end retriever interfacing all the other retrievers within one
!       archive. This has to be added in a later release of Wera.</para>
!     </section>
    </section>
  </article>
\ No newline at end of file
--- 138,236 ----
        </listitem>
      </itemizedlist>
+   </section>
  
!   <section>
!     <title>Practical use</title>
  
!     <para>The original vision for the <ulink
!     url="http://nwa.nb.no">NwaToolset</ulink> (the predecessor of Wera) was to
!     enable search across the different Nordic Web Archives and provide
!     seamless navigation within the different archives. The ability to search
!     across the different indexes was solved by the using <ulink
!     url="http://fastsearch.com/">Fast Search &amp; Transfer</ulink>'s multi
!     node architecture. To enable Wera to retrieve a particular document with a
!     given <literal>aid</literal> (Archive ID) from the right archive the
!     collection field was introduced in the index (also present in the NutchWax
!     index). The Wera config file holds the mapping from collection to archive
!     (or rather Wera installation).</para>
  
!     <para>Another reason to include the collection field was to ensure that
!     the actual link rewriting was done by the owner of the document. Each
!     archive holder would have to set up their own Wera installation. When one
!     Wera was requesting a document from a remote archive, the remote Wera
!     should make the necessary changes to the document before delivering it to
!     the calling Wera. The reason for this was to make sure that the owner had
!     full control over what was delivered to the calling site, thus being able
!     to threat the document in accordance with local policies rather than the
!     policies of the caller site. The figure below illustrates the currently
!     supported use of mapping between collection and archive nodes.</para>
  
!     <figure>
!       <title>Wera interfacing several archive nodes</title>
  
!       <mediaobject>
!         <imageobject>
!           <imagedata fileref="images/wera3.png" />
!         </imageobject>
!       </mediaobject>
!     </figure>
  
!     <para>In the Wera installation of <emphasis>W1</emphasis> the different
!     collections indexed in NutchWax are mapped to corresponding Wera
!     installations of <emphasis>W2- Wn</emphasis>. When the timeline view on W1
!     encounters a resource located on a different node (e.g. the collection
!     mapping points to the Wera installation of <emphasis>W2</emphasis>) it
!     requests that resource from the Wera installation at
!     <literal>W2</literal>. Wera at <literal>W2</literal> fetches the resource
!     from its Retriever and does the necessary changes to the file before
!     delivering it to Wera at <literal>W1</literal> (e.g. inserts javascript
!     link rewriter or rewrites it server side). When Wera at
!     <literal>W1</literal> receives this file it does an additional rewrite in
!     order to have the links point to itself rather than to
!     <literal>W2</literal>'s Wera.</para>
  
!     <para>In a real-life large scale Web Archive where the ARC files are
!     distributed across tens or hundreds of hosts it will not be practical to
!     set up one Wera installation for each of these. A better solution will be
!     to introduce communication between the different retrievers or have one
!     front-end retriever interfacing all the other retrievers within one
!     archive. This has to be added in a later release of Wera.</para>
!   </section>
! 
!   <section>
!     <title>The future of WERA</title>
! 
!     <para>As long as there are institutions using WERA, and these institutions
!     see a need for fixing bugs and adding functionality, WERA will evolve. Of
!     course, the actual work put into it will depend on the resources available
!     at these institutions. We also hope that future enhancements of WERA will
!     be funded, or partly funded by IIPC, as was the case with the work done to
!     enable release 0.4.0 of WERA (and NutchWax).</para>
! 
!     <para>The most important requirement for a future release of WERA will be
!     to support retrieval from several Web Archive hosts through one single ARC
!     retriever interface. In addition we need to do something with the
!     remaining bugs that didn't make it into the 0.4.0. release (handling of
!     redirects and better handling of frames). There are also a few requests
!     for enhancements registered that needs attention, one of them being the
!     advanced search interface.</para>
! 
!     <para>One of the main complaints from users has been that WERA required
!     the user to install and set up Tomcat, Apache + PHP and Perl + a number of
!     CPAN modules. The dependency on Perl is long since removed but WERA still
!     requires Tomcat (java Arc Retriever) and Apache (PHP web applications for
!     searching and navigating). Over time, we would like WERA to move
!     completely to Java, both for simplifying the install, setup and
!     maintenance as well as improving the chances of getting users involved in
!     the further development of WERA. Fortunately the move to Java may be done
!     gradually because WERA is modular, and http is used to communicate between
!     the different modules. The work of porting WERA to Java should be
!     coordinated with the work done on <ulink
!     url="http://archive-access.sourceforge.net/projects/wayback/">wayback</ulink>,
!     to prevent implementing the same functionallity twice.</para>
! 
!     <para>We strongly encourage users of WERA/NutchWax to contribute by
!     submitting bugs and RFE's, as well as providing feedback on the
!     usefullness of the tools.</para>
    </section>
  </article>
\ No newline at end of file

[Archive-access-cvs] archive-access/projects/wayback/src/webapp/WEB-INF web.xml,1.3,1.4

From: Brad <bra...@us...> - 2005-10-26 01:17:24

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/WEB-INF
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12135/src/webapp/WEB-INF

Modified Files:
	web.xml 
Log Message:
TWEAK: switched to JSReplayUI

Index: web.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/WEB-INF/web.xml,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** web.xml	21 Oct 2005 03:24:40 -0000	1.3
--- web.xml	26 Oct 2005 01:17:13 -0000	1.4
***************
*** 19,23 ****
--- 19,26 ----
  	<context-param>
  		<param-name>replayui.class</param-name>
+ <!--
  		<param-value>org.archive.wayback.rawreplayui.RawReplayUI</param-value>
+ -->
+ 		<param-value>org.archive.wayback.jsreplayui.JSReplayUI</param-value>
  		<description>Class that implements ReplayUI for this Wayback</description>
  	</context-param>
***************
*** 103,115 ****
  	</context-param>
  
- <!--
- 	<context-param>
- 		<param-name></param-name>
- 		<param-value></param-value>
- 		<description></description>
- 	</context-param>
- -->
- 
- 
  
  	<!-- Replay Servlet Configuration -->
--- 106,109 ----

[Archive-access-cvs] archive-access/projects/wayback/xdocs faq.fml,1.1,1.2

From: Brad <bra...@us...> - 2005-10-26 01:16:58

Update of /cvsroot/archive-access/archive-access/projects/wayback/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12039/xdocs

Modified Files:
	faq.fml 
Log Message:
FEATURE: added basic "what is" answer, added question and answer, "how to install"

Index: faq.fml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/xdocs/faq.fml,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** faq.fml	20 Oct 2005 01:30:37 -0000	1.1
--- faq.fml	26 Oct 2005 01:16:47 -0000	1.2
***************
*** 11,15 ****
        <answer>
          <p>
!         Fill in..
          </p>
        </answer>
--- 11,79 ----
        <answer>
          <p>
!         The project is designed to replace the current Wayback Machine with an
!         all Java solution that is flexible enough to provide an easy-to-use 
!         solution for the single-machine at-home user, as well as scaling up
!         to hundreds of machines for a full historical collection.
!         </p>
!         <p>
!         Primarily it is a few interfaces, and some core classes that utilize 
!         those interfaces to provide the Wayback service. Presently only 
!         trivial implementations of those interfaces have been developed,
!         but we hope that these interfaces will allow a high degree of
!         flexibility and experimentation.
!         </p>
!       </answer>
!     </faq>
!     <faq id="install">
!       <question>
!             How can I install and use this?
!       </question>
!       <answer>
!         <p>
!         The project output is a .WAR file, so it can be used with any servlet
!         container (but it has only been tested on Tomcat on Linux.)
!         </p>
!         <p>
!         Once it is unpacked, there are 5 modifications that can
!         be made to the web.xml file:
!         <table>
!         	<tr>
!         		<td>parameter</td>
!         		<td>description</td>
!         		<td>default</td>
!         	</tr>
!         	<tr>
!         		<td>arcpath</td>
!         		<td>directory where ARC are found</td>
!         		<td><b>/tmp/wayback/arcs</b></td>
!         	</tr>
!         	<tr>
!         		<td>resourceindex.indexpath</td>
!         		<td>directory where index should be stored</td>
!         		<td><b>/tmp/wayback/index</b></td>
!         	</tr>
!         	<tr>
!         		<td>resourceindex.dbname</td>
!         		<td>Name of index within directory</td>
!         		<td><b>DB1</b></td>
!         	</tr>
!         	<tr>
!         		<td>indexpipeline.workpath</td>
!         		<td>directory where temporary files and processing state is stored</td>
!         		<td><b>/tmp/wayback/pipeline</b></td>
!         	</tr>
!         	<tr>
!         		<td>indexpipeline.runpipeline</td>
!         		<td>if set to '1', then new ARC files will be indexed</td>
!         		<td><b>1</b></td>
!         	</tr>
!         </table>
!         </p>
!         <p>
!         	All directories MUST exist before the servlet is initialized. After 
!         	these configurations are set, and the servlet container is running,
!         	the service can be accessed at http://localhost:8080/wayback/. 
!         	Of course, you might be running on a different port, machine, or
!         	ContextPath, so you might need to vary the URL.
          </p>
        </answer>

[Archive-access-cvs] archive-access/projects/wayback/xdocs index.xml,1.2,1.3

From: Brad <bra...@us...> - 2005-10-26 01:16:15

Update of /cvsroot/archive-access/archive-access/projects/wayback/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11951/xdocs

Modified Files:
	index.xml 
Log Message:
TWEAK -- slightly flushed out, long ways to go

Index: index.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/xdocs/index.xml,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** index.xml	20 Oct 2005 16:51:54 -0000	1.2
--- index.xml	26 Oct 2005 01:16:08 -0000	1.3
***************
*** 10,16 ****
    <body>
      <section name="Introduction">
!         <p><b>wayback</b> is an open source implementation of the
!         The Internet Archive Wayback Machine. Stay tuned for first release.
!         	</p>
      </section>
    </body>
--- 10,27 ----
    <body>
      <section name="Introduction">
!         <p><b>wayback</b> is an open source java implementation of the
!         The Internet Archive Wayback Machine.
!         </p>
!         <p>
!         The first revision is intended to operate as a standalone webapp. 
!         It currently supports Archival URL queries, similar to the current 
!         Wayback Machine, and hopefully soon will integrate fully with 
!         Heritrix to provide browsing of crawled data as it is crawled.
!         </p>
!         <p>
!         This version includes some basic ARC file indexing, so it can be directed to
!         scan for and automatically index new content in the location that Heritrix 
!         is writing output ARC files.
!         </p>
      </section>
    </body>

[Archive-access-cvs] archive-access/projects/wayback/src/webapp help.jsp,1.1,1.2 index.jsp,1.1,1.2

From: Brad <bra...@us...> - 2005-10-26 01:15:43

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11845/src/webapp

Modified Files:
	help.jsp index.jsp 
Log Message:
TWEAK: minimal UI improvement -- still very rough..

Index: help.jsp
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/help.jsp,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** help.jsp	20 Oct 2005 00:40:41 -0000	1.1
--- help.jsp	26 Oct 2005 01:15:35 -0000	1.2
***************
*** 1,3 ****
  <jsp:include page="template/UI-header.jsp" />
! Sorry, no help yet.
  <jsp:include page="template/UI-footer.jsp" />
--- 1,4 ----
  <jsp:include page="template/UI-header.jsp" />
! Please refer to the FAQs
! <a href="http://archive-access.sourceforge.net/projects/wayback/faq.html">here</a>.
  <jsp:include page="template/UI-footer.jsp" />

Index: index.jsp
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/index.jsp,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** index.jsp	20 Oct 2005 00:40:41 -0000	1.1
--- index.jsp	26 Oct 2005 01:15:35 -0000	1.2
***************
*** 1,3 ****
  <jsp:include page="template/UI-header.jsp" />
! This is the wayback Machine!
  <jsp:include page="template/UI-footer.jsp" />
--- 1,10 ----
  <jsp:include page="template/UI-header.jsp" />
! <p>
! This is the new Wayback Machine prototype. Any URL in ARC files accessible to 
! this sevice can be searched above.
! </p>
! <p>
! If you have configured the ARC indexing pipeline, basic status can be accessed
! <a href="pipeline">here</a>.
! </p>
  <jsp:include page="template/UI-footer.jsp" />

[Archive-access-cvs] archive-access/projects/wayback/src/webapp/template UI-header.jsp,1.1,1.2

From: Brad <bra...@us...> - 2005-10-26 01:15:16

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/template
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11715/src/webapp/template

Modified Files:
	UI-header.jsp 
Log Message:
BUGFIX: after ArchivalUrl Query or PathQuery, form at top of page was missing -- the ACTION was relative, now is absolute

Index: UI-header.jsp
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/template/UI-header.jsp,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** UI-header.jsp	20 Oct 2005 00:40:40 -0000	1.1
--- UI-header.jsp	26 Oct 2005 01:15:01 -0000	1.2
***************
*** 44,48 ****
  
  									<!-- URL FORM -->
! 									<form action="query" method="GET">
  
  
--- 44,48 ----
  
  									<!-- URL FORM -->
! 									<form action="<%= request.getContextPath() %>/query" method="GET">

[Archive-access-cvs] archive-access/projects/wayback/src/webapp/jsp/QueryUI requestform.jsp,1.1,1.2

From: Brad <bra...@us...> - 2005-10-26 01:14:08

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11461/src/webapp/jsp/QueryUI

Modified Files:
	requestform.jsp 
Log Message:
TWEAK: added minimal instructions, put FORM into TABLE to pretty it up a bit.

Index: requestform.jsp
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/requestform.jsp,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** requestform.jsp	20 Oct 2005 00:40:41 -0000	1.1
--- requestform.jsp	26 Oct 2005 01:14:00 -0000	1.2
***************
*** 1,12 ****
  <jsp:include page="../../template/UI-header.jsp" />
  <FORM ACTION="../../query">
! URL:<INPUT TYPE="TEXT" NAME="url" WIDTH="80"><BR>
! Exact Date:<INPUT TYPE="TEXT" NAME="date" WIDTH="80"><BR>
! Earliest Date:<INPUT TYPE="TEXT" NAME="earliest" WIDTH="80"><BR>
! Latest Date:<INPUT TYPE="TEXT" NAME="latest" WIDTH="80"><BR>
! Type:
! Query<INPUT TYPE="RADIO" NAME="type" VALUE="query" CHECKED="YES">
! PathQuery<INPUT TYPE="RADIO" NAME="type" VALUE="pathQuery">
! <INPUT TYPE="SUBMIT" VALUE="Submit">
  </FORM>
  <jsp:include page="../../template/UI-footer.jsp" />
--- 1,24 ----
  <jsp:include page="../../template/UI-header.jsp" />
+ <h2>Wayabck Search form:</h2>
+ <p>The URL field is required. All date fields are optional.<br>
+ To search for a single URL only, use the Query Type.<br>
+ To search for all URLs beginning with a prefix URL, use PathQuery Type.<br>
+ </p>
+ <hr>
+ <table>
  <FORM ACTION="../../query">
! <tr><td>URL:</td><td><INPUT TYPE="TEXT" NAME="url" WIDTH="80"></td></tr>
! <tr><td>Exact Date:</td><td><INPUT TYPE="TEXT" NAME="date" WIDTH="80"></td></tr>
! <tr><td>Earliest Date:</td><td><INPUT TYPE="TEXT" NAME="earliest" WIDTH="80"></td></tr>
! <tr><td>Latest Date:</td><td><INPUT TYPE="TEXT" NAME="latest" WIDTH="80"></td></tr>
! <tr>
! 	<td>Type:</td>
! 	<td>
! 		Query <INPUT TYPE="RADIO" NAME="type" VALUE="query" CHECKED="YES">
! 		PathQuery <INPUT TYPE="RADIO" NAME="type" VALUE="pathQuery">
! 	</td>
! </tr>
! <tr><td colspan="2" align="left"><INPUT TYPE="SUBMIT" VALUE="Submit"></td></tr>
  </FORM>
+ </table>
  <jsp:include page="../../template/UI-footer.jsp" />

[Archive-access-cvs] archive-access/projects/wayback/src/webapp/jsp/QueryUI PathQueryResults.jsp,1.2,1.3

From: Brad <bra...@us...> - 2005-10-26 01:13:38

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11316/src/webapp/jsp/QueryUI

Modified Files:
	PathQueryResults.jsp 
Log Message:
TWEAK: added HR before new URLs to help break up the results.

Index: PathQueryResults.jsp
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/jsp/QueryUI/PathQueryResults.jsp,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** PathQueryResults.jsp	20 Oct 2005 00:40:41 -0000	1.2
--- PathQueryResults.jsp	26 Oct 2005 01:13:30 -0000	1.3
***************
*** 52,56 ****
  	if(newUrl) {
  		%>
! 		<B><%= url %></B><BR>
  		<%
  	}
--- 52,56 ----
  	if(newUrl) {
  		%>
! 		<HR><B><%= url %></B><BR>
  		<%
  	}

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui RawReplayUI.java,1.4,1.5

From: Brad <bra...@us...> - 2005-10-26 01:12:43

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11139/src/java/org/archive/wayback/rawreplayui

Modified Files:
	RawReplayUI.java 
Log Message:
BUGFIX: now uses current timestamp as end of search, instead of last possible timestamp (in 2099...)

Index: RawReplayUI.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui/RawReplayUI.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** RawReplayUI.java	25 Oct 2005 21:42:34 -0000	1.4
--- RawReplayUI.java	26 Oct 2005 01:12:35 -0000	1.5
***************
*** 145,149 ****
  				wmRequest.setExactTimestamp(Timestamp.parseBefore(dateStr));
  				wmRequest.setStartTimestamp(Timestamp.earliestTimestamp());
! 				wmRequest.setEndTimestamp(Timestamp.latestTimestamp());
  			} catch (ParseException e1) {
  				e1.printStackTrace();
--- 145,149 ----
  				wmRequest.setExactTimestamp(Timestamp.parseBefore(dateStr));
  				wmRequest.setStartTimestamp(Timestamp.earliestTimestamp());
! 				wmRequest.setEndTimestamp(Timestamp.currentTimestamp());
  			} catch (ParseException e1) {
  				e1.printStackTrace();

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui JSReplayUI.java,NONE,1.1

From: Brad <bra...@us...> - 2005-10-26 01:11:39

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10835/src/java/org/archive/wayback/jsreplayui

Added Files:
	JSReplayUI.java 
Log Message:
FEATURE: new ReplayUI that adds Javascript to HTML result pages which attempts to make URLs point back to this service.

--- NEW FILE: JSReplayUI.java ---
/* JSReplayUI
 *
 * Created on Oct 25, 2005
 *
 * Copyright (C) 2005 Internet Archive.
 *
 * This file is part of the wayback (crawler.archive.org).
 *
 * wayback is free software; you can redistribute it and/or modify
 * it under the terms of the GNU Lesser Public License as published by
 * the Free Software Foundation; either version 2.1 of the License, or
 * any later version.
 *
 * wayback is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Lesser Public License for more details.
 *
 * You should have received a copy of the GNU Lesser Public License
 * along with wayback; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

package org.archive.wayback.jsreplayui;

import java.io.IOException;
import java.text.ParseException;

import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.archive.io.arc.ARCRecord;
import org.archive.wayback.core.Resource;
import org.archive.wayback.core.ResourceResult;
import org.archive.wayback.core.ResourceResults;
import org.archive.wayback.core.Timestamp;
import org.archive.wayback.core.WMRequest;
import org.archive.wayback.rawreplayui.RawReplayUI;

/**
 * ReplayUI that inserts classic Wayback Machine Javascript into pages to
 * rewrite images and anchors for HTML pages.
 * 
 * @author brad
 * @version $Date: 2005/10/26 01:11:27 $, $Revision: 1.1 $
 */
public class JSReplayUI extends RawReplayUI {

	/**
	 * Constructor
	 */
	public JSReplayUI() {
		super();
		// TODO Auto-generated constructor stub
	}

	private boolean isRawReplayResult(ResourceResult result) {
		if (-1 == result.getMimeType().indexOf("text/html")) {
			return true;
		}
		return false;
	}

	public void replayResource(WMRequest wmRequest, ResourceResult result,
			Resource resource, HttpServletRequest request,
			HttpServletResponse response, ResourceResults results)
			throws IOException {

		if (resource == null) {
			throw new IllegalArgumentException("No resource");
		}
		if (result == null) {
			throw new IllegalArgumentException("No result");
		}
		if (isRawReplayResult(result)) {
			super.replayResource(wmRequest, result, resource, request,
					response, results);
			return;
		}

		ARCRecord record = resource.getArcRecord();
		record.skipHttpHeader();
		copyRecordHttpHeader(response, record, true);
		// slurp the whole thing into RAM:
		byte[] bbuffer = new byte[4 * 1024];
		StringBuffer sbuffer = new StringBuffer();
		for (int r = -1; (r = record.read(bbuffer, 0, bbuffer.length)) != -1;) {
			String chunk = new String(bbuffer);
			sbuffer.append(chunk.substring(0, r));
		}

		markUpPage(sbuffer, result, results, request);
		
		response.setHeader("Content-Length", "" + sbuffer.length());
		ServletOutputStream out = response.getOutputStream();
		out.print(new String(sbuffer));
	}

	private void markUpPage(StringBuffer page, ResourceResult result,
			ResourceResults results, HttpServletRequest request) {
		insertBaseTag(page, result, request);
		insertJavascript(page, result, request);
	}

	private void insertBaseTag(StringBuffer page, ResourceResult result,
			HttpServletRequest request) {
		String resultUrl = result.getUrl();
		String baseTag = "<BASE HREF=\"http://" + resultUrl + "\">";
		int insertPoint = page.indexOf("<head>");
		if (-1 == insertPoint) {
			insertPoint = page.indexOf("<HEAD>");
		}
		if (-1 == insertPoint) {
			insertPoint = 0;
		} else {
			insertPoint += 6; // just after the tag
		}
		page.insert(insertPoint, baseTag);
	}

	private void insertJavascript(StringBuffer page, ResourceResult result,
			HttpServletRequest request) {
		String resourceTS = result.getTimestamp().getDateStr();
		String nowTS;
		try {
			nowTS = Timestamp.currentTimestamp().getDateStr();
		} catch (ParseException e) {
			nowTS = "UNKNOWN";
		}
		
		String protocol = "http";
		String serverName = request.getServerName();
		int serverPort = request.getServerPort();
		String context = request.getContextPath();
		String contextPath = protocol + "://" + serverName
				+ (serverPort == 80 ? "" : ":" + serverPort) + context + "/"
				+ result.getTimestamp().getDateStr() + "/";
		
		String scriptInsert = "<SCRIPT language=\"Javascript\">\n"
				+ "<!--\n"
				+ "\n"
				+ "//		 FILE ARCHIVED ON " + resourceTS + " AND RETRIEVED FROM THE\n"
				+ "//		 INTERNET ARCHIVE ON " + nowTS + ".\n"
				+ "//		 JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.\n"
				+ "//\n"
				+ "// ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.\n"
				+ "// SECTION 108(a)(3)).\n"
				+ "\n"
				+ "		   var sWayBackCGI = \"" + contextPath + "\";\n"
				+ "		   \n"
				
				+ "function xResolveUrl(url) {\n"
				+ "   var image = new Image();\n"
				+ "   image.src = url;\n"
				+ "   return image.src;\n"
				+ "}\n"
				+ "function xLateUrl(aCollection, sProp) {\n"
				+ "   var i = 0;\n"
				+ "   for(i = 0; i < aCollection.length; i++) {\n"
				+ "      if (typeof(aCollection[i][sProp]) == \"string\") {\n" 
				+ "       if (aCollection[i][sProp].indexOf(\"mailto:\") == -1 &&\n"
				+ "          aCollection[i][sProp].indexOf(\"javascript:\") == -1) {\n"
				+ "          if(aCollection[i][sProp].indexOf(\"http\") == 0) {\n"
				+ "              aCollection[i][sProp] = sWayBackCGI + aCollection[i][sProp];\n"
				+ "          } else {\n"
				+ "              aCollection[i][sProp] = sWayBackCGI + xResolveUrl(aCollection[i][sProp]);\n"
				+ "          }\n"
				+ "       }\n"
				+ "      }\n"
				+ "   }\n"
				+ "}\n"
				+ "		   \n"
				+ "        xLateUrl(document.getElementsByTagName(\"IMG\"),\"src\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"A\"),\"href\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"AREA\"),\"href\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"OBJECT\"),\"codebase\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"OBJECT\"),\"data\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"APPLET\"),\"codebase\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"APPLET\"),\"archive\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"EMBED\"),\"src\");\n"
				+ "        xLateUrl(document.getElementsByTagName(\"BODY\"),\"background\");\n"
				+ "\n" 
				+ "//		-->\n" 
				+ "\n"
				+ "</SCRIPT>\n";

		int insertPoint = page.indexOf("</body>");
		if (-1 == insertPoint) {
			insertPoint = page.indexOf("</BODY>");
		}
		if (-1 == insertPoint) {
			insertPoint = page.length();
		}
		page.insert(insertPoint, scriptInsert);
	}
}

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui - New directory

From: Brad <bra...@us...> - 2005-10-26 01:10:37

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10481/src/java/org/archive/wayback/jsreplayui

Log Message:
Directory /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/jsreplayui added to the repository

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/core WMRequest.java,1.3,1.4

From: Brad <bra...@us...> - 2005-10-26 01:10:15

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10314/src/java/org/archive/wayback/core

Modified Files:
	WMRequest.java 
Log Message:
BUGFIX: was not correctly parsing CGI Queries where url had no trailing '/'

Index: WMRequest.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/core/WMRequest.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** WMRequest.java	20 Oct 2005 00:40:41 -0000	1.3
--- WMRequest.java	26 Oct 2005 01:10:00 -0000	1.4
***************
*** 269,275 ****
  		}
  		parseCGIArgsDates(queryMap);
! 		if (!requestURIStr.startsWith("http://")) {
  			requestURIStr = "http://" + requestURIStr;
  		}
  		requestURI = new UURI(requestURIStr,false);
  		setRetrieval();
--- 269,283 ----
  		}
  		parseCGIArgsDates(queryMap);
! 		if (requestURIStr.startsWith("http://")) {
! 			if(-1 == requestURIStr.indexOf('/',8)) {
! 				requestURIStr = requestURIStr + "/";				
! 			}
! 		} else {
! 			if (!requestURIStr.contains("/")) {
! 				requestURIStr = requestURIStr + "/";
! 			}			
  			requestURIStr = "http://" + requestURIStr;
  		}
+ 		
  		requestURI = new UURI(requestURIStr,false);
  		setRetrieval();
***************
*** 302,306 ****
  		}
  		parseCGIArgsDates(queryMap);
! 		if (!requestURIStr.startsWith("http://")) {
  			requestURIStr = "http://" + requestURIStr;
  		}
--- 310,321 ----
  		}
  		parseCGIArgsDates(queryMap);
! 		if (requestURIStr.startsWith("http://")) {
! 			if(-1 == requestURIStr.indexOf('/',8)) {
! 				requestURIStr = requestURIStr + "/";				
! 			}
! 		} else {
! 			if (!requestURIStr.contains("/")) {
! 				requestURIStr = requestURIStr + "/";
! 			}			
  			requestURIStr = "http://" + requestURIStr;
  		}
***************
*** 358,362 ****
  			// the latest possible:
  			if(origExactDateRequest == null) {
! 				endTimestamp = Timestamp.latestTimestamp();
  			} else {
  				// no end specified, but they asked for an exact date.
--- 373,377 ----
  			// the latest possible:
  			if(origExactDateRequest == null) {
! 				endTimestamp = Timestamp.currentTimestamp();
  			} else {
  				// no end specified, but they asked for an exact date.
***************
*** 365,369 ****
  
  				if(origExactDateRequest.equals(exactTimestamp.getDateStr())) {
! 					endTimestamp = Timestamp.latestTimestamp();
  				} else {
  					endTimestamp = Timestamp.parseAfter(exactDateRequest);
--- 380,384 ----
  
  				if(origExactDateRequest.equals(exactTimestamp.getDateStr())) {
! 					endTimestamp = Timestamp.currentTimestamp();
  				} else {
  					endTimestamp = Timestamp.parseAfter(exactDateRequest);

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/arcindexer ArcIndexer.java,1.3,1.4

From: Brad <bra...@us...> - 2005-10-26 01:08:30

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/arcindexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9934/src/java/org/archive/wayback/arcindexer

Modified Files:
	ArcIndexer.java 
Log Message:
BUGFIX: was including filedesc record in output

Index: ArcIndexer.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/arcindexer/ArcIndexer.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** ArcIndexer.java	21 Oct 2005 03:24:40 -0000	1.3
--- ArcIndexer.java	26 Oct 2005 01:08:22 -0000	1.4
***************
*** 83,87 ****
  				continue;
  			}
! 			results.addResourceResult(result);
  		}
  		return results;
--- 83,89 ----
  				continue;
  			}
! 			if(result != null) {
! 				results.addResourceResult(result);
! 			}
  		}
  		return results;
***************
*** 103,107 ****
  		result.setMd5Fragment(meta.getDigest());
  		result.setMimeType(meta.getMimetype());
! 		UURI uri = new UURI(meta.getUrl(), false);
  		result.setOrigHost(uri.getHost());
  
--- 105,115 ----
  		result.setMd5Fragment(meta.getDigest());
  		result.setMimeType(meta.getMimetype());
! 		
! 		String uriStr = meta.getUrl();
! 		if(uriStr.startsWith("filedesc")) {
! 			// skip filedesc record...
! 			return null;
! 		}
! 		UURI uri = new UURI(uriStr, false);
  		result.setOrigHost(uri.getHost());

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/simplequeryui SimpleQueryUI.java,1.3,1.4

From: Brad <bra...@us...> - 2005-10-25 21:42:45

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/simplequeryui
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3731/src/java/org/archive/wayback/simplequeryui

Modified Files:
	SimpleQueryUI.java 
Log Message:
BUGFIX: incorrect generation of query arguments -- only append '?' + query args if they are present.

Index: SimpleQueryUI.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/simplequeryui/SimpleQueryUI.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** SimpleQueryUI.java	22 Oct 2005 00:29:20 -0000	1.3
--- SimpleQueryUI.java	25 Oct 2005 21:42:34 -0000	1.4
***************
*** 82,87 ****
  		WMRequest wmRequest = null;
  		Matcher matcher = null;
! 
! 		String origRequestPath = request.getRequestURI() + "?" + request.getQueryString();
  		String contextPath = request.getContextPath();
  		if (!origRequestPath.startsWith(contextPath)) {
--- 82,90 ----
  		WMRequest wmRequest = null;
  		Matcher matcher = null;
! 		String queryString = request.getQueryString();
! 		String origRequestPath = request.getRequestURI();
! 		if(queryString != null) {
! 			origRequestPath = request.getRequestURI() + "?" + queryString;
! 		}
  		String contextPath = request.getContextPath();
  		if (!origRequestPath.startsWith(contextPath)) {

[Archive-access-cvs] archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui RawReplayUI.java,1.3,1.4

From: Brad <bra...@us...> - 2005-10-25 21:42:44

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3731/src/java/org/archive/wayback/rawreplayui

Modified Files:
	RawReplayUI.java 
Log Message:
BUGFIX: incorrect generation of query arguments -- only append '?' + query args if they are present.

Index: RawReplayUI.java
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/src/java/org/archive/wayback/rawreplayui/RawReplayUI.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** RawReplayUI.java	22 Oct 2005 00:29:20 -0000	1.3
--- RawReplayUI.java	25 Oct 2005 21:42:34 -0000	1.4
***************
*** 98,102 ****
  		Matcher matcher = null;
  
! 		String origRequestPath = request.getRequestURI() + "?" + request.getQueryString();
  		String contextPath = request.getContextPath();
  		if (!origRequestPath.startsWith(contextPath)) {
--- 98,106 ----
  		Matcher matcher = null;
  
! 		String queryString = request.getQueryString();
! 		String origRequestPath = request.getRequestURI();
! 		if(queryString != null) {
! 			origRequestPath = request.getRequestURI() + "?" + queryString;
! 		}
  		String contextPath = request.getContextPath();
  		if (!origRequestPath.startsWith(contextPath)) {

[Archive-access-cvs] archive-access/projects/wayback .classpath,1.4,1.5 project.properties,1.2,1.3 project.xml,1.3,1.4

From: Michael S. <sta...@us...> - 2005-10-25 20:09:45

Update of /cvsroot/archive-access/archive-access/projects/wayback
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13537

Modified Files:
	.classpath project.properties project.xml 
Log Message:

* .classpath
* project.properties
* project.xml 
* build.xml
* src/webapp/WEB-INF/lib/libidn-0.5.9.jar 
    Add libidn jar.


Index: .classpath
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/.classpath,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** .classpath	25 Oct 2005 03:23:41 -0000	1.4
--- .classpath	25 Oct 2005 20:09:31 -0000	1.5
***************
*** 16,19 ****
--- 16,21 ----
          path="src/webapp/WEB-INF/lib/arc-1.5.1-200510181911.jar"/>
  	<classpathentry kind="lib" 
+         path="src/webapp/WEB-INF/lib/libidn-0.5.9.jar"/>
+ 	<classpathentry kind="lib" 
          path="/src/webapp/WEB-INF/lib/commons-codec-1.3.jar"/>
  	<classpathentry kind="lib" 

Index: project.properties
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/project.properties,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** project.properties	25 Oct 2005 03:23:41 -0000	1.2
--- project.properties	25 Oct 2005 20:09:31 -0000	1.3
***************
*** 21,24 ****
--- 21,25 ----
  maven.jar.je = ${basedir}/src/webapp/WEB-INF/lib/je-2.0.83.jar
  maven.jar.arc = ${basedir}/src/webapp/WEB-INF/lib/arc-1.5.1-200510181911.jar
+ maven.jar.libidn = ${basedir}/src/webapp/WEB-INF/lib/libidn-0.5.9.jar
  maven.jar.commons-codec = ${basedir}/src/webapp/WEB-INF/lib/commons-codec-1.3.jar
  maven.jar.dsi-unimi-it = ${basedir}/src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar

Index: project.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/project.xml,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** project.xml	25 Oct 2005 03:23:41 -0000	1.3
--- project.xml	25 Oct 2005 20:09:31 -0000	1.4
***************
*** 211,214 ****
--- 211,231 ----
          </dependency>
          <dependency>
+             <id>libidn</id>
+             <version>0.5.9</version>
+             <url>http://www.gnu.org/software/libidn/</url>
+             <properties>
+                 <war.bundle>true</war.bundle>
+                 <ear.bundle>true</ear.bundle>
+                 <ear.bundle.dir>APP-INF/lib</ear.bundle.dir>
+                 <description>GNU Libidn is an implementation of the Stringprep, 
+                 Punycode and IDNA specifications defined by the IETF 
+                 Internationalized Domain Names (IDN) working group, used for 
+                 internationalized domain names.
+                 </description>
+                 <license>GNU Lesser General Public License 
+                 http://www.gnu.org/licenses/lgpl.txt</license>
+             </properties>
+         </dependency>
+         <dependency>
              <id>commons-codec</id>
              <version>1.3</version>

[Archive-access-cvs] archive-access/projects/wayback/src/webapp/WEB-INF/lib libidn-0.5.9.jar,NONE,1.1

From: Michael S. <sta...@us...> - 2005-10-25 20:09:40

Update of /cvsroot/archive-access/archive-access/projects/wayback/src/webapp/WEB-INF/lib
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13537/src/webapp/WEB-INF/lib

Added Files:
	libidn-0.5.9.jar 
Log Message:

* .classpath
* project.properties
* project.xml 
* build.xml
* src/webapp/WEB-INF/lib/libidn-0.5.9.jar 
    Add libidn jar.


--- NEW FILE: libidn-0.5.9.jar ---
(This appears to be a binary file; contents omitted.)

[Archive-access-cvs] archive-access/projects/wayback build.xml,NONE,1.1 .classpath,1.3,1.4 project.properties,1.1,1.2 project.xml,1.2,1.3

From: Michael S. <sta...@us...> - 2005-10-25 03:23:51

Update of /cvsroot/archive-access/archive-access/projects/wayback
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29017

Modified Files:
	.classpath project.properties project.xml 
Added Files:
	build.xml 
Log Message:
* .classpath
* project.properties
* project.xml 
    Add commons-codec and dsi lib.
* build.xml
    Empty, placeholder build.xml (Prevents harmless exception spew during
    maven build).
* src/webapp/WEB-INF/lib/commons-codec-1.3.jar 
* src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar 
    Added.



Index: .classpath
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/.classpath,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** .classpath	20 Oct 2005 01:37:39 -0000	1.3
--- .classpath	25 Oct 2005 03:23:41 -0000	1.4
***************
*** 15,18 ****
--- 15,22 ----
  	<classpathentry kind="lib" 
          path="src/webapp/WEB-INF/lib/arc-1.5.1-200510181911.jar"/>
+ 	<classpathentry kind="lib" 
+         path="/src/webapp/WEB-INF/lib/commons-codec-1.3.jar"/>
+ 	<classpathentry kind="lib" 
+         path="src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar"/>
  	<classpathentry kind="output" path="src/webapp/WEB-INF/classes"/>
  </classpath>

Index: project.properties
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/project.properties,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** project.properties	20 Oct 2005 01:30:36 -0000	1.1
--- project.properties	25 Oct 2005 03:23:41 -0000	1.2
***************
*** 21,24 ****
--- 21,26 ----
  maven.jar.je = ${basedir}/src/webapp/WEB-INF/lib/je-2.0.83.jar
  maven.jar.arc = ${basedir}/src/webapp/WEB-INF/lib/arc-1.5.1-200510181911.jar
+ maven.jar.commons-codec = ${basedir}/src/webapp/WEB-INF/lib/commons-codec-1.3.jar
+ maven.jar.dsi-unimi-it = ${basedir}/src/webapp/WEB-INF/lib/dsi-unimi-it-1.0.0.kb.jar
  
  # Junit properties

--- NEW FILE: build.xml ---
<?xml version="1.0" encoding="UTF-8"?>
<!--Use maven to build.  Ant not supported. 

    (This is a placeholder build.xml. Without it, the maven build of src
    will try to autogenerate an ant build file spewing an ugly exception
    into the build).
 -->

Index: project.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/wayback/project.xml,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** project.xml	20 Oct 2005 16:51:54 -0000	1.2
--- project.xml	25 Oct 2005 03:23:41 -0000	1.3
***************
*** 210,213 ****
--- 210,247 ----
              </properties>
          </dependency>
+         <dependency>
+             <id>commons-codec</id>
+             <version>1.3</version>
+             <url>http://jakarta.apache.org/commons/codec/</url>
+             <properties>
+                 <war.bundle>true</war.bundle>
+                 <ear.bundle>true</ear.bundle>
+                 <ear.bundle.dir>APP-INF/lib</ear.bundle.dir>
+                 <description>Commons Codec provides implementations of common
+                 encoders and decoders such as Base64, Hex, various phonetic
+                 encodings, and URLs.</description>
+                 <license>Apache 2.0
+                 http://www.apache.org/licenses/LICENSE-2.0</license>
+             </properties>
+         </dependency>
+         <dependency>
+             <id>dsi-unimi-it</id>
+             <version>1.0.0</version>
+             <url>http://mg4j.dsi.unimi.it/</url>
+             <properties>
+                 <war.bundle>true</war.bundle>
+                 <ear.bundle>true</ear.bundle>
+                 <ear.bundle.dir>APP-INF/lib</ear.bundle.dir>
+                 <description>Alternatives to String, 
+                 StringBuffer, unsynchronized I/0, and a ConsistentHashFunction.
+                 Made from subsets of mg4j-0.9.1 and fastutil-4.4.0,
+                 -- two jars that came of the ubicrawler project,
+                 http://ubi0.iit.cnr.it/projects/ubi/ -- using autojar:
+                 java -jar ~/workspace/autojar-1.2.2/autojar-1.2.2.jar -v -o
+                 dss.unimi.it-1.0.0.jar -c fastutil-4.4.0/fastutil-4.4.0.jar:mg4j-0.9.1/mg4j-0.9.1.jar:ubix-1.0.3/ubix-1.0.3.jar: it.unimi.dsi.mg4j.util.MutableString.class it.unimi.dsi.mg4j.io.FastBufferedInputStream.class it.unimi.dsi.mg4j.io.FastBufferedOutputStream.class it.unimi.dsi.mg4j.io.FastBufferedReader.class it.unimi.dsi.mg4j.io.FastByteArrayInputStream.class it.unimi.dsi.mg4j.io.FastByteArrayOutputStream.class it.unimi.dsi.mg4j.io.FastMultiByteArrayInputStream.class it.unimi.dsi.ubix.ConsistentHashFunction.class</description>
+                 <license>MG4J, ConsistentHashFunction, and fastutils are
+                 LGPL</license>
+             </properties>
+         </dependency>
    </dependencies>

Flat | Threaded

<< < 1 .. 154 155 156 157 158 .. 171 > >> (Page 156 of 171)