You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(10) |
Sep
(36) |
Oct
(339) |
Nov
(103) |
Dec
(152) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(141) |
Feb
(102) |
Mar
(125) |
Apr
(203) |
May
(57) |
Jun
(30) |
Jul
(139) |
Aug
(46) |
Sep
(64) |
Oct
(105) |
Nov
(34) |
Dec
(162) |
2007 |
Jan
(81) |
Feb
(57) |
Mar
(141) |
Apr
(72) |
May
(9) |
Jun
(1) |
Jul
(144) |
Aug
(88) |
Sep
(40) |
Oct
(43) |
Nov
(34) |
Dec
(20) |
2008 |
Jan
(44) |
Feb
(45) |
Mar
(16) |
Apr
(36) |
May
(8) |
Jun
(77) |
Jul
(177) |
Aug
(66) |
Sep
(8) |
Oct
(33) |
Nov
(13) |
Dec
(37) |
2009 |
Jan
(2) |
Feb
(5) |
Mar
(8) |
Apr
|
May
(36) |
Jun
(19) |
Jul
(46) |
Aug
(8) |
Sep
(1) |
Oct
(66) |
Nov
(61) |
Dec
(10) |
2010 |
Jan
(13) |
Feb
(16) |
Mar
(38) |
Apr
(76) |
May
(47) |
Jun
(32) |
Jul
(35) |
Aug
(45) |
Sep
(20) |
Oct
(61) |
Nov
(24) |
Dec
(16) |
2011 |
Jan
(22) |
Feb
(34) |
Mar
(11) |
Apr
(8) |
May
(24) |
Jun
(23) |
Jul
(11) |
Aug
(42) |
Sep
(81) |
Oct
(48) |
Nov
(21) |
Dec
(20) |
2012 |
Jan
(30) |
Feb
(25) |
Mar
(4) |
Apr
(6) |
May
(1) |
Jun
(5) |
Jul
(5) |
Aug
(8) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
|
From: <bi...@us...> - 2008-07-03 22:01:39
|
Revision: 2405 http://archive-access.svn.sourceforge.net/archive-access/?rev=2405&view=rev Author: binzino Date: 2008-07-03 15:01:43 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Create NutchWAX 0.12 release tag. Added Paths: ----------- tags/nutchwax-0_12/ Copied: tags/nutchwax-0_12 (from rev 2404, trunk/archive-access/projects/nutchwax) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 21:39:48
|
Revision: 2404 http://archive-access.svn.sourceforge.net/archive-access/?rev=2404&view=rev Author: binzino Date: 2008-07-03 14:39:57 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Updated with current SVN revision for Nutch that we build against. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/INSTALL.txt Modified: trunk/archive-access/projects/nutchwax/archive/INSTALL.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/INSTALL.txt 2008-07-03 20:37:17 UTC (rev 2403) +++ trunk/archive-access/projects/nutchwax/archive/INSTALL.txt 2008-07-03 21:39:57 UTC (rev 2404) @@ -46,11 +46,11 @@ Nutch SVN trunk. The specific SVN revision that NutchWAX 0.12 is built against is: - 673464 + 673823 To checkout this revision of Nutch, use: - $ svn checkout -r 673464 http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch + $ svn checkout -r 673823 http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch $ cd nutch This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 20:37:13
|
Revision: 2403 http://archive-access.svn.sourceforge.net/archive-access/?rev=2403&view=rev Author: binzino Date: 2008-07-03 13:37:17 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Initial revision. Added Paths: ----------- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt Added: trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt (rev 0) +++ trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt 2008-07-03 20:37:17 UTC (rev 2403) @@ -0,0 +1,62 @@ + +RELEASE-NOTES.TXT +2007-07-03 +Aaron Binns + +Release notes for NutchWAX 0.12 + +For the most recent updates and information on NutchWAX, +please visit the project wiki at: + + http://webteam.archive.org/confluence/display/search/NutchWAX + + +====================================================================== +Overview +====================================================================== + +NutchWAX 0.12-beta-1 was released on June 2, 2008. We anticipated +releasing another beta mid-June with bug fixes and some minor +enhancements based on feedback from the community. + +During internal testing by the Internet Archive Web Team, a few +serious problems were found, the most critical being the failure to +store different copies of the same URL when importing large batches of +archive files. + +The NutchWAX team canceled the mid-month release in order to focus on +fixing this problem. + +The good news is that not only has that problem been fixed, but the +solution is part of a broader enhancement to manage the de-duplication +of archive contnet during import and indexing. + +For more details on de-duplication in NutchWAX, please see + + HOWTO-dedup.txt + README-dedup.txt + + +====================================================================== +Issues +====================================================================== + +For an up-to-date list of NutchWAX issues: + + http://webteam.archive.org/jira/browse/WAX + +Issues resolved in this release: + +WAX-9 Entire file not imported +WAX-8 Investigate why so many PDFs fail to parse + + Fixing the first one caused nearly all of the PDF parsing errors to + disappear. + +WAX-7 Change config to that URL filters are not applied during link inversion + + This is easily achieved by using command-line options when invoking + the Nutch "invertlinks" command. + +WAX-3 Observe content size limit on importing +WAX-2 Date queries cause TooManyClauses exceptions This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 18:53:14
|
Revision: 2402 http://archive-access.svn.sourceforge.net/archive-access/?rev=2402&view=rev Author: binzino Date: 2008-07-03 11:53:12 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Added comments regarding WARCs. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt Modified: trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt 2008-07-03 18:29:09 UTC (rev 2401) +++ trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt 2008-07-03 18:53:12 UTC (rev 2402) @@ -75,7 +75,7 @@ ====================================================================== -Generate DUP +Generate DUP/Revisits ====================================================================== Now that we have 'all.cdx' containing a sorted list of all the records @@ -98,6 +98,25 @@ This file is then used as an exlusion filter for importing. + +WARC +---- +If we are using WARC files with revisit records instead of ARC files, +then we don't generate a list of duplicate records because there +shouldn't be any. + +However, the revisit records in the WARC files do have the dates when +a URL was revisited and seen to have not changed -- which is more or +less the same thing as our "dup" lines above. + +For extracting these revisits from WARC CDX files, we use the +'revisits' utility provided by NutchWAX + + $ revisits all-warc.cdx > all-warc.dup + +The output of 'revisits' is in the same format as 'dedup-cdx'. + + ====================================================================== Import ====================================================================== @@ -121,7 +140,13 @@ If you examine the Nutch "hadoop.log" file, you will see INFO-level lines from the NutchWAX Importer showing which URLs were excluded. +WARC +---- +If you are importing WARC files with revisit records, then you +typically won't need to provide an exclusion file as the WARC files +were de-duplicated during the crawl. + ====================================================================== Update and Invert ====================================================================== @@ -224,6 +249,15 @@ the previous "dates" index with the new one. +WARC +---- +This step is the same for ARCs and WARCs. + +The only difference is that our "all.dup" file containing the list of +revisit dates was created by different utilities: 'dedup-cdx' for ARCs +and 'revisits' for WARCs. + + ====================================================================== Search ====================================================================== This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 18:29:05
|
Revision: 2401 http://archive-access.svn.sourceforge.net/archive-access/?rev=2401&view=rev Author: binzino Date: 2008-07-03 11:29:09 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Initial revision. Added Paths: ----------- trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt Added: trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt (rev 0) +++ trunk/archive-access/projects/nutchwax/archive/HOWTO-dedup.txt 2008-07-03 18:29:09 UTC (rev 2401) @@ -0,0 +1,289 @@ + +HOWTO-dedup.txt +2008-07-03 +Aaron Binns + +Table of Contents + o Prerequisites + - NutchWAX HOWTO.txt + - Wayback 1.2.1 + o Overview + o Generate CDX + o Generate DUP + o Import + o Update and Invert + o Index + o Add Revisit Dates + o Search + o Web deployment + + +====================================================================== +Prerequisites +====================================================================== + +This de-duplication HOWTO assumes you've already read the main HOWTO +and are familiar with importing and indexing archive files with +NutchWAX. + +For de-duplication, the Wayback Machine tools are required. This guide +assumes you have Wayback 1.2.1 installed in + + /opt/wayback-1.2.1 + + +====================================================================== +Overview +====================================================================== + +The README-dedup.txt explains the de-duplication process in greater +detail, including implementation details. + +NutchWAX does not automagically detect and eliminate duplicate records +when importing and indexing. However, tools are provided to help the +user implement a system to perform de-duplication. + +This guide describes one such system using the tools provided by +NutchWAX and Wayback. + + +====================================================================== +Generate CDX +====================================================================== + +The first step is to generate a list of duplicate records for a set of +ARC files. + +This step is not necessary if your archive files are in WARC format +and de-duplication was performed during the crawl. + +To generate the list of duplicates, we use the Wayback 'arc-indexer' +with the NutchWAX 'dedup-cdx' utility. The CDX files *must* be +sorted. + + $ arc-indexer foo.arc.gz | sort > foo.cdx + $ arc-indexer bar.arc.gz | sort > bar.cdx + $ arc-indexer baz.arc.gz | sort > baz.cdx + +Then we combine the CDX files into one sorted CDX containing all the +records: + + $ sort -m foo.cdx bar.cdx baz.cdx > all.cdx + +The "-m" option speeds up the sort by merging the already-sorted +files. + + +====================================================================== +Generate DUP +====================================================================== + +Now that we have 'all.cdx' containing a sorted list of all the records +in the ARC files, we can generate a list of duplicates therein: + + $ dedup-cdx all.cdx > all.dup + +This "all.dup" file contains lines of the form + + example.org/robots.txt sha1:4G3PAROKCYJNRGZIHJO5PVLZ724FX3GN 20080618133034 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080613194800 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080616061312 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080618132204 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080618132213 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080619132911 + +Where each line is + + URL digest date + +This file is then used as an exlusion filter for importing. + +====================================================================== +Import +====================================================================== + +The import process is essentially the same as in NutchWAX, but now +we use "all.dup" as our exclusion list. + +First, we create a manifest + + $ cat > manifest + foo.arc.gz test-collection + bar.arc.gz test-collection + baz.arc.gz test-collection + ^D + + $ nutchwax import -e all.dup manifest + +The result will be a newly-created Nutch segment, same as importing +without de-duplication. + +If you examine the Nutch "hadoop.log" file, you will see INFO-level +lines from the NutchWAX Importer showing which URLs were excluded. + + +====================================================================== +Update and Invert +====================================================================== + +Perform the Nutch "updatedb" and "invertlinks" steps as normal. + +Nothing special/different to do here with respect to de-duplication. + + +====================================================================== +Index +====================================================================== + +The only chage we make to the indexing step is the destination of the +index directory. + +By default, Nutch expects the per-segment index directory to live in a +sub-directory called 'indexes' and the index command is accordingly + + $ nutch index indexes crawldb linkdb segments/* + +Resulting in an index directory structure of the form + + indexes/part-00000 + +For de-duplication, we use a slightly different directory structure, +which will be used by a de-duplication-aware NutchWaxBean at +search-time. The directory structure we use is: + + pindexes/<segment>/part-00000 + +Using the segment name is not strictly required, but it is a good +practice and is strongly recommended. This way the segment and its +corresponding index directory are easily matched. + +Let's assume that the segment directory created during the import is +named + + segments/20080703050349 + +In that case, our index command becomes: + + $ nutch index pindexes/20080703050349 crawldb linkdb segments/20080703050349 + +Upon completion, the Lucene index is created in + + pindexes/20080703050349/part-0000 + +This index is exactly the same as one normally created by Nutch, the +only difference is the location. + + +====================================================================== +Add Revisit Dates +====================================================================== + +Now that we have the Nutch index, we add the revisit dates to it. + +Examine the "all.dup" file again, it has lines of the form + + example.org/robots.txt sha1:4G3PAROKCYJNRGZIHJO5PVLZ724FX3GN 20080618133034 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080613194800 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080616061312 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080618132204 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080618132213 + example.org/robots.txt sha1:AGW5DJIEUBL67473477TDVBBGDZ37AEZ 20080619132911 + +These are the revisit dates that need to be added to the records in +the Lucene index. When we generated the index, only the date of the +first visit was put in the index. Now we have to add these. + +As explained in README-dedup.txt, modifying the Lucene index to +actually add these dates is infeasible. What we do is create a +parallel index next to the main index (the part-00000 created above) +that contains all the dates for each record. + +The NutchWAX 'add-dates' command creates this parallel index for us. + + $ nutchwax add-dates pindexes/20080703050349/part-0000 \ + pindexes/20080703050349/part-0000 \ + pindexes/20080703050349/dates \ + all.dup + +Yes, the part-0000 argument does appear twice. This is beacuse it is +both the "key" index and the "source" index. + + +Suppose we did another crawl and had even more dates to add to the +existing index. In that case we would run + + $ nutchwax add-dates pindexes/20080703050349/part-0000 \ + pindexes/20080703050349/dates \ + pindexes/20080703050349/new-dates \ + new-crawl.dup + $ rm -r pindexes/20080703050349/dates + $ mv pindexes/20080703050349/new-dates pindexes/20080703050349/dates + +This copies the existing dates from "dates" to "new-dates" and adds +additional ones from "new-crawl.dup" along the way. Then we replace +the previous "dates" index with the new one. + + +====================================================================== +Search +====================================================================== + +Test/debug searches can be run from the command-line, but instead of +using the 'NutchBean' we use 'NutchWaxBean'. + +The "NutchWaxBean" extends NutchBean by adding support for parallel +indexes. + + $ nutch org.archive.nutchwax.NutchWaxBean <query> + +The "NutchWaxBean" also gives slightly more verbose and useful ouput, + + $ nutch org.archive.nutchwax.NutchWaxBean carolina + Total hits: 247338 + 0 [20080702053119] [http://www.ncfilm.com/incentives-benefits/facilities/carolina-pinnacle-studios.html] [http://www.ncfilm.com/incentives-benefits/facilities/carolina-pinnacle-studios.html sha1:WAMSFQPBRDMLOV3KETKCCTLJE3OTB23A] [sha1:WAMSFQPBRDMLOV3KETKCCTLJE3OTB23A] [20080618133218, 20080618133218] + ... Studios Blue Ridge Motion Pictures Carolina Pinnacle Creative Network EUE/Screen ... Trailblazer Studios Federal Tax Incentive Carolina Pinnacle Studios ... + 1 [20080703023605] [http://www.ncfilm.com/incentives-benefits/facilities/carolina-pinnacle-studios.html] [http://www.ncfilm.com/incentives-benefits/facilities/carolina-pinnacle-studios.html sha1:WAMSFQPBRDMLOV3KETKCCTLJE3OTB23A] [sha1:WAMSFQPBRDMLOV3KETKCCTLJE3OTB23A] [20080613200046, 20080618133218] + +The output consists of + + hit number + segment + url + key (which is url + digest) + digest + dates + +The most useful bit here for testing de-duplication is the list of +dates. + + +====================================================================== +Web Deployment +====================================================================== + +As noted in the HOWTO.txt document, when the nutch(wax) webapp is +deployed, changes made to the configuration must be also applied to +the deployed webapp. + +In addition to those configuration changes, the "web.xml" file must +also be modified. + +In Nutch, the "web.xml" file contains a directive to call a static +method on 'NutchBean' to initialize it. In order to search the +parallel indexes we have to use 'NutchWaxBean'. This is done by +modifying the "web.xml" to call a NutchWaxBean initializer after the +NutchBean initializer. + +Change "web.xml" from + + <listener> + <listener-class>org.apache.nutch.searcher.NutchBean$NutchBeanConstructor</listener-class> + </listener> + +to: + + <listener> + <listener-class>org.apache.nutch.searcher.NutchBean$NutchBeanConstructor</listener-class> + <listener-class>org.archive.nutchwax.NutchWaxBean$NutchWaxBeanConstructor</listener-class> + </listener> + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 18:28:48
|
Revision: 2400 http://archive-access.svn.sourceforge.net/archive-access/?rev=2400&view=rev Author: binzino Date: 2008-07-03 11:28:55 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Added info on new configuration properties. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/HOWTO.txt Modified: trunk/archive-access/projects/nutchwax/archive/HOWTO.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/HOWTO.txt 2008-07-03 02:03:41 UTC (rev 2399) +++ trunk/archive-access/projects/nutchwax/archive/HOWTO.txt 2008-07-03 18:28:55 UTC (rev 2400) @@ -120,12 +120,13 @@ to - protocol-http|parse-(text|html|js|pdf)|index-(basic|anchor|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-opic + protocol-http|parse-(text|html|js|pdf)|index-(basic|anchor|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-opic|urlfilter-nutchwax In short, we add: index-nutchwax query-nutchwax + urlfilter-nutchwax parse-pdf and remove: @@ -136,19 +137,37 @@ The only *required* changes are the additions of the NutchWAX index and query plugins. The rest are optional, but recommended. -The addition of the "parse-pdf" plugin is simply because we have lots -of PDFs in our archives and we want to index them. We sometimes -remove the "parse-js" plugin if we don't care to index JavaScript -files. +The "parse-pdf" plugin is added simply because we have lots of PDFs in +our archives and we want to index them. We sometimes remove the +"parse-js" plugin if we don't care to index JavaScript files. -We also remove the URL filtering and normalizing plugins because we do -not need the URLs normalized nor filtered. We trust that the tool -that produced the ARC/WARC file will have normalized the URLs -contained therein according to its own rules so there's no need to -normalize here. Also, we don't filter by URL since we want to index -as much of the ARC/WARC file as we have parsers for. +We also remove the default Nutch URL filtering and normalizing plugins +because we do not need the URLs normalized nor filtered. We trust +that the tool that produced the ARC/WARC file will have normalized the +URLs contained therein according to its own rules so there's no need +to normalize here. Also, we don't filter by URL since we want to +index as much of the ARC/WARC file as we have parsers for. +We do, however, add the NutchWAX URL filter. If de-duplication is +being performed upon import, this plugin is required. It performs URL +filtering of the list of ARC records to exclude based on +URL+digest+date. + -------------------------------------------------- +indexingfilter.order +-------------------------------------------------- + +Add this property with a value of + + org.apache.nutch.indexer.basic.BasicIndexingFilter + org.archive.nutchwax.index.ConfigurableIndexingFilter + +So that the NutchWAX indexing filter is run after the Nutch basic +indexing filter. + +A full explanation is given in "README-dedup.txt". + +-------------------------------------------------- mime.type.magic -------------------------------------------------- We disable mimetype detection in Nutch for two reasons: @@ -172,12 +191,12 @@ nutchwax.filter.index -------------------------------------------------- Configure the 'index-nutchwax' plugin. Specify how the metadata -fields added by the ArcsToSegment are mapped to the Lucene documents -during indexing. +fields added by the Importer are mapped to the Lucene documents during +indexing. The specifications here are of the form: - src-key:lowercase:store:tokenize:dest-key + src-key:lowercase:store:tokenize:exclusive:dest-key where the only required part is the "src-key", the rest will assume the following defaults: @@ -185,6 +204,7 @@ lowercase = true store = true tokenize = false + exclusive = true dest-key = src-key We recommend: @@ -192,6 +212,9 @@ <property> <name>nutchwax.filter.index</name> <value> + url:false:true:true + orig:false + digest:false arcname:false collection date @@ -199,39 +222,50 @@ </value> </property> +The "url", "orig" and "digest" values are required, the rest are +optional, but strongly recommended. + -------------------------------------------------- nutchwax.filter.query -------------------------------------------------- Configure the 'query-nutchwax' plugin. Specify which fields to make -searchable via "[field]:[term|phrase]" query syntax, and whether they +searchable via "field:[term|phrase]" query syntax, and whether they are "raw" fields or not. -The specification format is +The specification format is one of: - raw:name:lowercase:boost -or - field:name:boost + field:<name>:<boost> + raw:<name>:<lowercase>:<boost> + group:<name>:<lowercase>:<delimiter>:<boost> Default values are lowercase = true + delimiter = "," boost = 1.0f There is no "lowercase" property for "field" specification because the Nutch FieldQueryFilter doesn't expose the option, unlike the RawFieldQueryFilter. -NTOE: We do *not* use this filter for handling "date" queries, there is a -specific filter for that: DateQueryFilter +The "group" fields are raw fields that can accept multiple values, +separated by a delimiter. Multiple values appearing in a query are +automagically translated into required OR-groups, such as + collection:"193,221,36" => +(collection:193 collection:221 collection:36) + +NOTE: We do *not* use this filter for handling "date" queries, there +is a specific filter for that: DateQueryFilter + We recommend: <property> <name>nutchwax.filter.query</name> <value> + raw:digest:false raw:arcname:false - raw:collection - raw:type + group:collection + group:type field:anchor field:content field:host @@ -240,6 +274,52 @@ </property> +-------------------------------------------------- +nutchwax.urlfilter.wayback.exclusions +-------------------------------------------------- +File containing the exclusion list for importing. + +Normally, this is specified on the command line with the NutchWAX +Importer is invoked. It can be specified here if preferred. + +-------------------------------------------------- +nutchwax.urlfilter.wayback.canonicalizer +-------------------------------------------------- + +For CDX-based de-duplication, the same URL canonicalization algorithm +must be used here as was used to generate the CDX files. + +The default canonicalizer in Wayback's '(w)arc-indexer' utility +is + + org.archive.wayback.util.url.AggressiveUrlCanonicalizer + +which is the value provided in "nutch-site.xml". + +If the '(w)arc-indexer' is executed with the "-i" (identity) +command-line option, then the matching canonicalizer + + org.archive.wayback.util.url.IdentityUrlCanonicalizer + +must be specified here. + +-------------------------------------------------- +nutchwax.import.content.limit +-------------------------------------------------- +Similar to Nutch's + + file.content.limit + http.content.limit + ftp.content.limit + +properties, this specifies a limit on the size of a document imported +via NutchWAX. + +We recommend setting this to a size compatible with the memory +capacity of the computers performing the import. Something in the +1-4MB range is typical. + + ====================================================================== Create a manifest ====================================================================== This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 02:03:32
|
Revision: 2399 http://archive-access.svn.sourceforge.net/archive-access/?rev=2399&view=rev Author: binzino Date: 2008-07-02 19:03:41 -0700 (Wed, 02 Jul 2008) Log Message: ----------- Updated with changes in RC-1. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/README.txt Modified: trunk/archive-access/projects/nutchwax/archive/README.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/README.txt 2008-07-03 02:03:09 UTC (rev 2398) +++ trunk/archive-access/projects/nutchwax/archive/README.txt 2008-07-03 02:03:41 UTC (rev 2399) @@ -1,6 +1,6 @@ README.txt -2008-05-20 +2008-07-02 Aaron Binns Welcome to NutchWAX 0.12! @@ -22,13 +22,13 @@ The goal of NutchWAX is to enable full-text indexing and searching of documents stored in web archive file formats (ARC and WARC). -The way we achieve that goal is by providing add-on tools and plugins +The way we achieve that goal is by providing plugins and add-on tools to Nutch to read documents directly from ARC/WARC files. We call this process "importing" archive files. -Importing produces a Nutch segment, the same as if Nutch had actually -crawled the documents itself. In this scenario, document importing -replaces the conventional "generate/fetch/update" cycle of Nutch. +Importing produces a Nutch segment, similar to Nutch crawling the +documents itself. In this scenario, document importing replaces the +conventional "generate/fetch/update" cycle of Nutch. Once the archival documents have been imported into a segment, the regular Nutch commands to update the 'crawldb', invert the links and @@ -36,12 +36,12 @@ ====================================================================== -The NutchWAX add-ons consist of: +The main NutchWAX add-ons are: bin/nutchwax - A shell script that is used to run the NutchWAX command-line tools, - such as document importing. + A shell script that is used to run the NutchWAX commands, such as + document importing. This is patterned after the 'bin/nutch' shell script. @@ -55,6 +55,16 @@ Query plugin which allows for querying against the metadata fields added by 'index-nutchwax'. + plugins/urlfilter-nutchwax + + Filtering plugin which can be used to exclude URLs from import. It + can be used as part of a NutchWAX de-duplication scheme. + + conf/nutch-site.xml + + Sample configuration properties file showing suggested settings for + Nutch and NutchWAX. + There is no separate 'lib/nutchwax.jar' file for NutchWAX. NutchWAX is distributed in source code form and is intended to be built in conjunction with Nutch. @@ -84,7 +94,7 @@ already familiar with the inner workings of Nutch. Still, special attention on one class is worth while: - src/java/org/archive/nutchwax/ArcsToSegment.java + src/java/org/archive/nutchwax/Importer.java This is where ARC/WARC files are read and their documents are imported into a Nutch segment. @@ -113,10 +123,14 @@ o We add metadata fields to the document, which are then available to the "index-nutchwax" plugin at indexing-time. - ArcsToSegment.importRecord() + Importer.importRecord() ... contentMetadata.set( NutchWax.CONTENT_TYPE_KEY, meta.getMimetype() ); contentMetadata.set( NutchWax.ARCNAME_KEY, meta.getArcFile().getName() ); contentMetadata.set( NutchWax.COLLECTION_KEY, collectionName ); contentMetadata.set( NutchWax.DATE_KEY, meta.getDate() ); ... + + +====================================================================== + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 02:03:00
|
Revision: 2398 http://archive-access.svn.sourceforge.net/archive-access/?rev=2398&view=rev Author: binzino Date: 2008-07-02 19:03:09 -0700 (Wed, 02 Jul 2008) Log Message: ----------- Changed sort to sort -u to only emit uniq revisits. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/bin/revisits Modified: trunk/archive-access/projects/nutchwax/archive/bin/revisits =================================================================== --- trunk/archive-access/projects/nutchwax/archive/bin/revisits 2008-07-03 02:02:38 UTC (rev 2397) +++ trunk/archive-access/projects/nutchwax/archive/bin/revisits 2008-07-03 02:03:09 UTC (rev 2398) @@ -9,4 +9,4 @@ exit 1; fi -cat $@ | awk '{ if ( $9 == "-" ) print $1 " sha1:" $6 " " $2 }' | sort +cat $@ | awk '{ if ( $9 == "-" ) print $1 " sha1:" $6 " " $2 }' | sort -u This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 02:02:28
|
Revision: 2397 http://archive-access.svn.sourceforge.net/archive-access/?rev=2397&view=rev Author: binzino Date: 2008-07-02 19:02:38 -0700 (Wed, 02 Jul 2008) Log Message: ----------- Updated with latest Nutch SVN revision NW 0.12 built against. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/INSTALL.txt Modified: trunk/archive-access/projects/nutchwax/archive/INSTALL.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/INSTALL.txt 2008-07-03 02:01:46 UTC (rev 2396) +++ trunk/archive-access/projects/nutchwax/archive/INSTALL.txt 2008-07-03 02:02:38 UTC (rev 2397) @@ -1,6 +1,6 @@ INSTALL.txt -2008-06-02 +2008-07-02 Aaron Binns This installation guide assumes the reader is already familiar with @@ -46,11 +46,11 @@ Nutch SVN trunk. The specific SVN revision that NutchWAX 0.12 is built against is: - 650739 + 673464 To checkout this revision of Nutch, use: - $ svn checkout -r 650739 http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch + $ svn checkout -r 673464 http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch $ cd nutch This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-07-03 02:01:39
|
Revision: 2396 http://archive-access.svn.sourceforge.net/archive-access/?rev=2396&view=rev Author: binzino Date: 2008-07-02 19:01:46 -0700 (Wed, 02 Jul 2008) Log Message: ----------- Initial revision. Very rough draft. Added Paths: ----------- trunk/archive-access/projects/nutchwax/archive/README-dedup.txt Added: trunk/archive-access/projects/nutchwax/archive/README-dedup.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/README-dedup.txt (rev 0) +++ trunk/archive-access/projects/nutchwax/archive/README-dedup.txt 2008-07-03 02:01:46 UTC (rev 2396) @@ -0,0 +1,697 @@ + +README-dedup.txt +2008-07-02 +Aaron Binns + +De-duplication and NutchWAX + +This document assumes that the reader is familiar with the topic of +de-duplication with regards to archiving web data. That said, let us +review what we mean by de-duplication in NutchWAX. + +When archive files (ARC/WARC) are written, the tool used to create +them may or may not prevent multiple copies of the same URL to be +written. Some archive file creation tools perform duplicate +prevention, but many do not. + +What NutchWAX has to contend with is the scenario where one or more +archive files that are imported and indexed have multiple copies of an +URL. + +Ideally, NutchWAX would only import and index one unique version of +the URL. If the same version of the URL was seen a second, third, +fourth, etc. time, then NutchWAX would simply update the existing +record in its search index by adding the subsequent crawl dates to it. +This way, if a URL was crawled 10 times and didn't change, there would +only be one entry in the search index for it, but with 10 crawl dates +associated with it. + +====================================================================== + +This sounds simple enough, but in practice the implementation is not +as straightfoward as suggested by the above. + +For one, Nutch's underlying Lucene search indexes are not easily +modified "in place". That is, updating an existing record by adding +an additional date to it is not easily accomplished via the Lucene +public APIs. The Lucene documentation informs us that records are not +modified in place, but rather are deleted and re-added with the +modified/new information. + +Doing a complete delete+re-add on a large Lucene database containing +possibly millions of records is a computationally expensive process. +Furthermore, since many fields in Nutch's Lucene indexes are not +stored, it is infeasible to delete and re-add them w/o data loss. + +Fortunately, using parallel Lucene indexes and the ParallelIndexReader +can help solve the problem. More on that later. + +====================================================================== + +Another challenge in handling duplicates is defining what makes for a +unique version of a URL. + +Most tools, Nutch included, use the URL as a unique identifier for a +page. Since most tools don't care about old versions of pages, +retaining only the latest version and using the URL to identify it is +sufficient. + +However, for archive data, we need to use more than just the URL to +identify a page, we need something that has the URL but also some +notion of the *version* of the page. + +For example, consider a page like + + http://www.cnn.com/index.html + +This page changes frequently. If it were crawled 10 times, once per +week, each crawl could capture a different version of the page. We +would have 10 different, unique versions. + +Now, if NutchWAX used *only* the URL as the unique identifier for the +page, there would be no way to distinguish the first one from the +second, from the third, etc. + +NutchWAX needs a unique identifier that has the URL and also some +notion of the *version* of the page. For that we use a digest of the +page's content. The digest is used as a version number of sorts. +Each version will have a different digest. So, if we need to find a +specific version of the page, we can use the URL combined with the +digest to uniquely identify it. + +Currently we use SHA-1 for digesting the content. + +Using URL+digest rather than just the URL as a unique page identifier +is conceptually simple, but does have some repercussions within Nutch. +Nutch assumes that the URL alone is a unique identifier and that +assumption is coded into the software in various ways. To use the +URL+digest instead, we had to work around some of those hard-coded +assumptions in various ways. More on that later. + +====================================================================== + +The next challenge is to know if a version of a URL (the URL+digest +described above) has already been imported and indexed so that we +don't import it again. + +To prevent the importing of multiple copies of the same version of a +page, we could get the URL+digest of the page to be imported, then +look in the existing Nutch index to see if we alread have it. If we +do, do not import it, instead add the crawl date to the existing +record in the search index. + +Now, the above describes two challenges: + + 1. Searching the existing index to see if there is an existing record + to be updated. + + 2. Updating an existing record. This was discussed above and we do + have a solution, which we'll describe in more detail later. + +The first doesn't seem challenging at first and in theory it isn't. +However, in practice it is difficult becuase for a a large deployment, +we usually have many Lucene indexes spread over many machines. It's +not as simple as opening up a single Lucene index on the local machine +and searching for a matching URL+digest. In one of the deployments at +the Internet Archive, we have 100s of Lucene indexes spread over 5 +machines. + +Now, we could use the Nutch web search rather than accessing the +Lucene indexes directly. That is, to find out if we have already +indexed a URL+digest, we could send an HTTP request to the Nutch +search server asking if the URL+digest is already in the index or not. + +Although this is a workable solution, performing a search for each and +every URL being imported would likely put too much strain on the +search server and would slow down the importing process. When some +import & index jobs process 100s of millions of documents and take +weeks to run, adding a 5-second HTTP request to each URL import is a +significant cost. + +What would be ideal is a centralized database of all the URLs +processed by NutchWAX. Ideally, this centralized database would also +be used by the archiver (e.g. Heritrix) to perform de-duplication +during a crawl; and also by the Wayback for storing historical +metadata. + +The de-duplication strategy described in this document utilizes the +Wayback tools and CDX files as the central URL database for performing +NutchWAX de-duplication. + +====================================================================== + +Review +------ + +Our de-duplication strategy for NutchWAX as described so far has three +key elements: + + o Use URL+digest as a unique identifier for a unique version of a page. + o Use ParallelIndexReader to provide index record modification/update. + o Use Wayback and CDX files as a central database of URL processing state. + +====================================================================== + +Using CDX files to detect duplicate pages in a set of archive files is +fortunately rather straightforward. + +CDX files are text files with one line for each and every page +(record) in an archive file. These CDX lines have three bits of data +we can use for detecting duplicate pages: + + o URL + o digest + o date + +NutchWAX provides a 'dedup-cdx' script that reads a CDX file and +produces a "duplicates" file containing the URL, digest and date of +each duplicate copy of a unique version of a URL in the CDX file. + +For example, suppose we have a collection of 100 ARC files. In those +ARC files, the page + + http://www.example.org/index.html + +appears 10 times, but only 5 of those are different, the other 5 are +duplicate copies. Suppose we have + + Date Digest Content sample + 2007-10-01 abc123 Hello, welcome to my page. + 2007-10-02 abc123 Hello, welcome to my page. + 2007-10-03 def456 Sorry I haven't updated this in a while. + 2007-10-04 def456 Sorry I haven't updated this in a while. + 2007-10-05 abc123 Hello, welcome to my page. + 2007-10-06 abc123 Hello, welcome to my page. + 2007-10-07 ghi789 Hey, I finally updated this. + 2007-10-08 jkl012 Under construction. + 2007-10-09 jkl012 Under construction. + 2007-10-10 mno345 My homepage is great! + +Notice how we started with the "abc123" version, changed to the +"def456" version then reverted back to the "abc123" version. In this +simple example, we have an webmaster who just can't make up his mind +on what to say. + +Thep point is that our CDX file will have lines of the form + + 20071001 abc123 example.org/index.html + 20071002 abc123 example.org/index.html + 20071003 def456 example.org/index.html + 20071004 def456 example.org/index.html + 20071005 abc123 example.org/index.html + 20071006 abc123 example.org/index.html + 20071007 ghi789 example.org/index.html + 20071008 jkl012 example.org/index.html + 20071009 jkl012 example.org/index.html + 20071010 mno345 example.org/index.html + +It's easy to find the duplicate lines in the CDX file. + +The NutchWAX 'dedup-cdx' script will extract the duplicates, writing out all +the duplicate lines, except for the first. For the above, the output is + + 20071002 abc123 example.org/index.html + 20071004 def456 example.org/index.html + 20071005 abc123 example.org/index.html + 20071006 abc123 example.org/index.html + 20071009 jkl012 example.org/index.html + +Only the 2nd, 3rd, etc. instance of a URL+digest line are printed. +The first instance of "abc123" is not printed, or is "ghi789" since it +has no duplicates. + +Now what do we do with these? + +When importing archive files with NutchWAX, we pass it this list of +duplicates, which it uses as an exclusion list. Any URL+digest+date +on the list is excluded from import, all others pass through. + +Looking at our CDX sample again + + Date Digest URL Import? + 20071001 abc123 example.org/index.html Y + 20071002 abc123 example.org/index.html N + 20071003 def456 example.org/index.html Y + 20071004 def456 example.org/index.html N + 20071005 abc123 example.org/index.html N + 20071006 abc123 example.org/index.html N + 20071007 ghi789 example.org/index.html Y + 20071008 jkl012 example.org/index.html Y + 20071009 jkl012 example.org/index.html N + 20071010 mno345 example.org/index.html Y + +Excellent, we've just prevented duplicate copies of the same version +of a page from being imported! + +====================================================================== + +But what about the fact that we crawled the page on 5 dates and it +didn't change, we want to record that somewhere right? + +Yes. + +NutchWAX provides an "add-dates" command (in the 'nutchwax' script) +for adding dates to an existing index by creating a parallel index for +it. + +Using our "add-dates" command, we can add those crawl dates to the +index so that each unique version of the page will have all the crawl +dates associated with it. For our above example, resulting in: + + Date Digest URL + 20071001, abc123 example.org/index.html + 20071002, + 20071005, + 20071006 + + 20071003, def456 example.org/index.html + 20071004 + + 20071007 ghi789 example.org/index.html + + 20071008, jkl012 example.org/index.html + 20071009 + + 20071010 mno345 example.org/index.html + +Voila! + +====================================================================== + +Recap +----- + +By using CDX files and the NutchWAX tools we are able to de-duplicate +during import. + +For example, for a list of arcs + + $ wayback/bin/arc-indexer foo.arc.gz > foo.cdx + $ nutchwax/bin/dedup-cdx foo.cdx > foo.dup + $ echo "foo.arc.gz" > manifest + $ nutchwax/bin/nutchwax import -e foo.dup manifest + $ nutchwax/bin/nutch updatedb crawldb -dir segments + $ nutchwax/bin/nutch invertlinks linkdb -dir segments + $ nutchwax/bin/nutch index indexes crawldb linkdb segments/* + $ nutchwax/bin/nutchwax add-dates indexes/part-00000 indexes/part-00000 indexes/dates foo.dup + +The important steps being the creation of the the "foo.dup" file +containing the duplicate records, the use of that file to exclude +duplicates during import, and the use of that same file for adding the +crawl dates to the index. + +====================================================================== + +Parallel Indexes + +Since updating an existing Lucene index is not feasible, we "virtually +update" an index by using a modified version of the Lucene +ParallelIndexReader. + +The basic idea is to take the metadata field you want to update and +put it in a parallel index. In DB table-speak, this would be moving a +column to a separate table and using the record index/position as the +foreign key to join the two tables. + +The NutchWAX 'add-dates' command does this for the date metadata +field. It will take an existing index and create a parallel index, +adding dates listed in an external file. + +The command-line syntax is of the form: + + nutchwax add-dates <key index> <source indices>... <dest index> <dates> + +Suppose we have an index created by the Nutch "index" command and we also have +a list of crawl dates we want to add to it. The index is in a sub-directory +"indexes/part-00000" and the dates are in a file "dates.txt" + + $ nutchwax add-dates indexes/part-00000 indexes/part-00000 indexes/dates dates.txt + +In this case our key index and source index are the same, since we +want to preserve any dates in the original index and add the new dates +to them. But let's suppose we've already done this once, but then have even more +dates to add, in a file "dates2.txt" + + $ nutchwax add-dates indexes/part-00000 indexes/dates indexes/dates2 dates2.txt + $ rm -r indexes/dates + $ mv indexes/dates2 indexes/dates + +In this case, we copy the values from the existing "dates" index, +adding the new dates to them. Afterwards, we replace the old "dates" +index with the new, fully up-to-date one. + +---------------------------------------------------------------------- + +Using Parallel Index + +This is all well and good, but how to we make Nutch(WAX) use these +parallel indices? + +NutchWAX provides a NutchWaxBean, which extends NutchBean by adding +support for parallel indices. The NutchWaxBean follows the NutchBean +conventions by looking for a directory containing the indices in a +directory named "crawl" or as specified in the "searcher.dir" +configuration property. + +However, rather than looking for indices in "index" and "indexes", +NutchWaxBean looks in "pindexes". If that directory is found, it +iterates through all sub-directories and expects each to contain a set +of parallel indices within it. A sample directory structure might +look like: + + crawl/pindexes/foo + dates + main + bar + dates + main + baz + dates + main + +where "dates" and "main" are parallel indexes. + +---------------------------------------------------------------------- + +This is all fine and good when calling the NutchWaxBean from +the command-line, but what about in a webapp? + +The NutchBean has a static method for self-initialization upon recipt +of a application startup message from the servlet container. We have +a similar hook in NutchWaxBean, which is run after the NutchBean is +initialized. + +The NutchWaxBean hook must be added to the Nutch web.xml file: + + <listener> + <listener-class>org.apache.nutch.searcher.NutchBean$NutchBeanConstructor</listener-class> + <listener-class>org.archive.nutchwax.NutchWaxBean$NutchWaxBeanConstructor</listener-class> + </listener> + +If you don't do this, then the NutchBean won't use the +ParallelIndexReader and your parallel indices won't be used. + +====================================================================== + +WARC + revisit records + +The WARC format supports revisit records. Revisit records are +typically written by WARC writing tools (such as Heritrix) when a URL +is visited a second, third, etc time and the content hasn't changed. + +Taking our example from above, whenever the page is crawled and hasn't +changed, a revisit record would be written to the WARC file. + +For de-duplication, WARC files with revisit records are nice becuase +the crawler is doing the duplicate detection for us. Rather than write +a duplicate copy of the page, it writes a record that has + + URL + digest + date + +of the visit. Now, if you look at the output of 'dedup-cdx' you'll +notice similarity. + +In fact, WARC records can be used to create a list of additional crawl +dates without having to actually perform the full CDX de-duplication +(which can be computationally expensive). + +A CDX file generated from a WARC will have the 9th field set to "-" +for revisit records. We can use this to easily find those lines and +generate a list of crawl dates for a URL+digest. + +NutchWAX comes with a script called 'revisits' the does precisely +that. It takes CDX files as input, finds the lines for the revisit +records, then emits them in a form that can be used by the 'add-dates' +command. + +For example + + $ wayback/bin/warc-indexer foo.warc.gz > foo.cdx + $ nutchwax/bin/revisits foo.cdx > foo.dup + $ nutchwax/bin/add-dates indexes/part-00000 indexes/part-00000 indexes/dates foo.dup + +Since the WARC files are known not to contain duplicates, we don't +have to de-dup them in order to provide the importing process with an +exclusion list. However, we still use the 'revisits' script to +generate a list of crawl dates for the revisit records so we can add +them to the parallel index. + +====================================================================== + +Doesn't NutchWAX (0.10) already handle duplicates? + +All this business about URL vs. URL+digest as a unique identifier for +a version of a page may seem a surprising to some. Many users of +NutchWAX have been importing and indexing ARC files and haven't seen a +situation where a newer version of a URL over-writes an older one. + +That is true, in certain circumstances different versions of a page +will peacefully co-exist in a Nutch deployment. + +* The key is in the grouping of ARC files for importing. * + +When I said that by default, only one version of a URL can live in a +Nutch index I was being a bit general. Actually, only one version of +a URL can live in a Nutch *segment*. + +When a batch of ARC files are imported, a new segment is created. If +you are lucky, then ARC files containing duplicates will be imported +in different batches and the different versions of the same URL will +each live in a separate segment. + +Consider the most extreme case, where a NutchWAX user imports ARC +files one-at-a-time. The result would be a Nutch segment for each ARC +file. This would be nice because if there were 5 different versions +of a URL in 5 different ARC files; then there will be 5 segments, each +containing one of the 5 versions of the URL. No conflicts among +versions. + +However, using a one-segment-per-ARC plan is not practical since most +NutchWAX users have 1000s, 10000s, 100000s or more ARC files. Having +100000 segment directories on disk is simply not practical. + +Most NutchWAX users import ARC files in groups that either correspond +to distinct crawls, or groups that are sized according to memory +and/or CPU limits. + +We can't rely on good fortune to provide us with ARC file batches that +don't have multiple versions of a URL. + +---------------------------------------------------------------------- + +The worst-case scenario is if all the ARC files for a single +collection are imported in one batch. In this case, they would all go +into a single Nutch segment and only 1 version of each URL would be +imported and indexed. All other versions would be discarded. + +---------------------------------------------------------------------- + +If Nutch does this "automatic deduplication" by URL within each +segment, why does it have a "dedup" command? + +That command is designed to operate on a set of segment indexes. The +segments are deduped internally automatically, the "dedup" command +removes duplicates across segments. + +---------------------------------------------------------------------- + +The fact that a later version of a URL replaced an earlier one is not +always easy to notice just by performing searches against the +resulting index. One would have to know the contents of the pages +such that a query would be able to find one specific version -- or not +if it wasn't there. + +And especially with large collections, if a version of a page is +missing from the search index, it could easily go unnoticed for quite +some time. + + +One way to test an existing index is to use CDX files in conjunction +with the NutchWAX 'dumpindex' command. + + o Generate a list of duplicate records from all CDX for the entire + collection. + + o Using the Wayback, identify a URL that has many different + versions. Choose a URL that will be indexed for full-text search, + such as a HTML, text or PDF document; not an image. + + o Dump the entire Lucene index with NutchWAX 'dumpindex' and + find all the records for the URL. + +Chances are some of the versions of the URL will be in the index but +not all. + + +====================================================================== + +Is this all necessary? + +No. If you don't want to de-duplicate ARC files during import and +indexing you don't have to. + +You can continue to perform the import, update, invert and index steps +like before and just live with the consequences of not de-duplicating. + +If you don't de-duplicate, you will just have redundant records in +your search index. This means that you'll have a search result hit +for each copy of the page in the index. If you imported the same page +10 times, then a search query that finds that page will find all 10 +copies and return 10 identical search results -- one for eaach copy. + + +In addition, the de-duplication feature and the add-dates feature with +the parallel index are also independent of each other. You can +de-duplicate but decide to not use parallel indices to add dates to +the records in the Lucene index. + +In this case, you would only have 1 date associated with each record: +the date the record was imorted. Any information about subsequent +revisits to the same version of the page would not be in the search +index. + + +Also, if you have a system of your own devising that keeps track of +duplicates in archive files; have it output the duplicates files in +the same form as the 'dedup-cdx' script. The import command doesn't +care where the exclusion list comes from, just that it has the correct +format. + + +====================================================================== + +Implementation notes on URL+digest vs URL + +Although the use of 'dedup-cdx' and associated tools for de-duping and +managing revisit dates are entirely optional and have no impact on +Nutch(WAX) if not used, one area of change in NutchWAX that does +impact Nutch is changing the unique 'key' for a document from URL to +URL+digest. + +Without this change, you cannot have different versions of the same +URL in a Nutch segment. Such a limitation is simply incompatible with +NutchWAX and archive files. This change is not optional. + +The core of the change from URL to URL+digest happens in the NutchWAX +Indexer class. In that class the segment is created and all the +document-related information is added to it. When a document is added +to a segment, it is written to a Haddop MapFile. + +Hadoop MapFiles act like Java Maps. They are essentially key/value +pairs. In Nutch, the key is the URL and the value is a collection of +information for that URL. + +In the Importer.java source code, where we add the information to the +segment, we use + + <URL> <digest> + +as the key, such as + + "http://www.example.org/index.html sha1:HJG5ZWG3MQQKHIN43BXJY3FUWP7WTU43" + +instead of simply + + "http://www.example.org/index.html" + +We also stuff the URL into the document in a metadata field titled +"url", which we use later in our indexing filter plugin. + +This is simple enough in the Importer code, it does however have a few +consequences elsewhere in Nutch. The places where it affects Nutch +are where Nutch assumes + + URL == key + +There are two places in particular where this assumption causes a +problem because the URL is no longer the key. + +1. BasicIndexingFilter (index-basic plugin) + +In the call from Indexer.java to BasicIndexingFilter.java, the key is +treated as the URL: + + Indexer.java: + + 249: doc = this.filters.filter(doc, parse, key, fetchDatum, inlinks); + + BasicIndexingFilter.java: + + 55 public Document filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) + 56 throws IndexingException { + 57 + 58 Text reprUrl = (Text) datum.getMetaData().get(Nutch.WRITABLE_REPR_URL_KEY); + 59 String reprUrlString = reprUrl != null ? reprUrl.toString() : null; + 60 String urlString = url.toString(); + +The Indexer passes the key, the BasicIndexingFilter treats it as the +URL. + +Not only that, but the BasicIndexingFilter goes on to insert that +urlString into the Lucene document in the "url" field. + +We work around this by configuring our NutchWAX indexin filter plugin +to run *after* the BasicIndexingFilter and over-write the "url" field +with the correct URL. + +We do this by setting the Nutch configuration property (in +nutch-site.xml for example) with + + <property> + <name>indexingfilter.order</name> + <value> + org.apache.nutch.indexer.basic.BasicIndexingFilter + org.archive.nutchwax.index.ConfigurableIndexingFilter + </value> + </property> + +without this property, the indexing filters are run in an arbitrary +order. We need our ConfigurableIndexingFilter to run after the +BasicIndexingFilter. + +The configuration for the ConfigurableIndexingFilter specifies that +the "url" field will be filled with the value from the "url" metadata +field (which we set in Importer.java remember) and over-write any +previous value. + + +2. FetchedSegments + +This class has a lovely little routine called "getUrl" which is used +*not* to get the URL per se, rather it gets the URL from a Lucene +document /in order to use it as a document key/. + +Let's take a look: + + private Text getUrl(HitDetails details) { + String url = details.getValue("orig"); + if (StringUtils.isBlank(url)) { + url = details.getValue("url"); + } + return new Text(url); + } + +The problem is that we've stored the true URL in the "url" field, so +the value returned is the true URL. Now when the code that calls this +method tries to use it as the key, it can't find the document since +the key is "URL digest". + +Since this method is private and this code is rather deep inside of +Nutch, over-riding it with a subclass isn't feasible. + +But, if you notice, getURL does come with a little oddity where it +first consults "orig" before "url". We don't use "orig" for anything, +so in our Importer, we set the "orig" metadata field to be the key. + +This way, when getUrl calls + + String url = details.getValue("orig"); + +the key is found and everything is happy. + +Yes, it's a hack. No, I'm not ashamed. + +====================================================================== + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2395 http://archive-access.svn.sourceforge.net/archive-access/?rev=2395&view=rev Author: bradtofel Date: 2008-07-01 18:13:24 -0700 (Tue, 01 Jul 2008) Log Message: ----------- FEATURE: added numSeen() Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/WindowEndFilter.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/WindowEndFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/WindowEndFilter.java 2008-07-02 01:02:08 UTC (rev 2394) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/WindowEndFilter.java 2008-07-02 01:13:24 UTC (rev 2395) @@ -48,6 +48,9 @@ public int getNumReturned() { return numReturned; } + public int getNumSeen() { + return numSeen; + } /* (non-Javadoc) * @see org.archive.wayback.util.ObjectFilter#filterObject(java.lang.Object) */ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2394 http://archive-access.svn.sourceforge.net/archive-access/?rev=2394&view=rev Author: bradtofel Date: 2008-07-01 18:02:08 -0700 (Tue, 01 Jul 2008) Log Message: ----------- BUGFIX(unreported): was not setting number of results requested Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/LocalResourceIndex.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/LocalResourceIndex.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/LocalResourceIndex.java 2008-07-02 00:35:41 UTC (rev 2393) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/LocalResourceIndex.java 2008-07-02 01:02:08 UTC (rev 2394) @@ -478,7 +478,7 @@ } public void annotateResults(SearchResults results) { results.setFirstReturned(startResult); - results.setReturnedCount(resultsPerPage); + results.setNumRequested(resultsPerPage); // how many went by the filters: results.setMatchingCount(startFilter.getNumSeen()); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:35:32
|
Revision: 2393 http://archive-access.svn.sourceforge.net/archive-access/?rev=2393&view=rev Author: bradtofel Date: 2008-07-01 17:35:41 -0700 (Tue, 01 Jul 2008) Log Message: ----------- BUGFIX(unreported) SearchResult count methods now return long values not int values. Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp 2008-07-02 00:33:10 UTC (rev 2392) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp 2008-07-02 00:35:41 UTC (rev 2393) @@ -16,7 +16,7 @@ String searchString = results.getSearchUrl(); - int resultCount = results.getResultsReturned(); +long resultCount = results.getResultsReturned(); Timestamp searchStartTs = results.getStartTimestamp(); Timestamp searchEndTs = results.getEndTimestamp(); Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp 2008-07-02 00:33:10 UTC (rev 2392) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp 2008-07-02 00:35:41 UTC (rev 2393) @@ -25,11 +25,11 @@ // new PathQuerySearchResultPartitioner(results.getResults(), // results.getURIConverter()); -int firstResult = results.getFirstResult(); -int lastResult = results.getLastResult(); -int resultCount = results.getResultsMatching(); +long firstResult = results.getFirstResult(); +long lastResult = results.getLastResult(); +long resultCount = results.getResultsMatching(); -int totalCaptures = results.getResultsMatching(); +long totalCaptures = results.getResultsMatching(); %> <%= fmt.format("PathPrefixQuery.showingResults",firstResult,lastResult, This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:33:11
|
Revision: 2392 http://archive-access.svn.sourceforge.net/archive-access/?rev=2392&view=rev Author: bradtofel Date: 2008-07-01 17:33:10 -0700 (Tue, 01 Jul 2008) Log Message: ----------- German translation, thanks Andreas! Added Paths: ----------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/classes/WaybackUI_de.properties Added: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/classes/WaybackUI_de.properties =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/classes/WaybackUI_de.properties (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/WEB-INF/classes/WaybackUI_de.properties 2008-07-02 00:33:10 UTC (rev 2392) @@ -0,0 +1,114 @@ +Exception.wayback.title=Wayback Fehler +Exception.wayback.message=Ein unbekannter Fehler ist aufgetreten. {0} +Exception.accessControl.title=Zugriffsfehler +Exception.accessControl.message=Der Zugriff auf den Inhalt ist gesperrt. {0} +Exception.authenticationControl.title=Authentisierungsfehler +Exception.authenticationControl.message=Dieser Inhalt ist f\xFCr den aktuellen Benutzer oder vom aktuellen Ort nicht möglich. {0} +Exception.badContent.title=Inhaltsfehler +Exception.badContent.message=Der archivierte Inhalt konnte nicht wiedergegeben werden. +Exception.badQuery.title=Anfragefehler +Exception.badQuery.message=Für die Anfrage fehlen Informationen oder konnte vom Server nicht verstanden werden. {0} +Exception.betterRequest.title=Anfragefehler +Exception.betterRequest.message=Die gemachte Anfrage kann durch einen andere Anfrage besser ausgedrückt werden. {0} +Exception.configuration.title=Konfigurationsfehler +Exception.configuration.message=Das Service wurde nicht korrekt konfiguriert. {0} +Exception.resourceIndexNotAvailable.title=Der Ressourcen Index ist nicht verfübar Exception +Exception.resourceIndexNotAvailable.message=Der, für Anfrage notwendige Ressourcen Index ist zwischenzeitlich nicht verfügbar. Bitte versuchen Sie es später nocheinmal. +Exception.resourceNotAvailable.title=Ressource ist nicht verfügbar +Exception.resourceNotAvailable.message=Die angeforderte Ressource ist zwischenzeitlich nicht verfügbar. Bitte versuchen Sie es später nocheinmal. +Exception.resourceNotInArchive.title=Ressource ist nicht im Archiv +Exception.resourceNotInArchive.message=Die angeforderte Ressource ist nicht im Archiv. + +UIGlobal.pageTitle=Internet Archive Wayback Machine +UIGlobal.helpLink=Hilfe +UIGlobal.enterWebAddress=Internet Adresse: +UIGlobal.selectYearAll=Alle +UIGlobal.urlSearchButton=Suche +UIGlobal.advancedSearchLink=Erweiterte Suche +UIGlobal.homeLink=Home +UIGlobal.indexPage=Das ist der neue Wayback Machine Prototyp. Jede URL, die in den ARC Dateien verfügbar ist, kann oben gesucht werden. +UIGlobal.helpPage=Bitte beziehen sie sich auf <a href="{0}">Wayback FAQ</a>. + +MetaReplay.title=Document Metadata +MetaReplay.HTTPHeaders=HTTP Headers +MetaReplay.originalURL=Original URL +MetaReplay.URLKey=URL Schlüssel +MetaReplay.captureDate=Speicherdatum +MetaReplay.captureDateDisplay={0,date,dd.MM.yyyy HH:mm:ss} +MetaReplay.archiveID=Archive ID +MetaReplay.MIMEType=Mime Type +MetaReplay.digest=Digest + +TimelineView.viewingVersion=Anzeige der Version {0,number,integer} von {1,number,integer} +TimelineView.viewingVersionDate={0,date,dd.MM.yyyy HH:mm:ss} +TimelineView.timeRange=Zeitraum +TimelineView.timeRange.years=Jahre +TimelineView.timeRange.twomonths=Monate +TimelineView.timeRange.months=Monate +TimelineView.timeRange.days=Tage +TimelineView.timeRange.hours=Stunden +TimelineView.timeRange.unknown=unbekannt +TimelineView.timeRange.auto=Auto({0}) +TimelineView.metaDataCheck=Metadata: +TimelineView.markDateTitle={0,date,dd.MM.yyyy HH:mm:ss} +TimelineView.firstVersionTitle=Erste Version ({0,date,dd.MM.yyyy HH:mm:ss}) +TimelineView.prevVersionTitle=Vorherige Version ({0,date,dd.MM.yyyy HH:mm:ss}) +TimelineView.nextVersionTitle=Nächste Version ({0,date,dd.MM.yyyy HH:mm:ss}) +TimelineView.lastVersionTitle=Letzte Version ({0,date,dd.MM.yyyy HH:mm:ss}) +TimelineView.frameSetTitle=WB-Zeitstrahl +TimelineView.frameSetNoFramesMessage=Ein Browser der Frames unterstützt wird f\xFCr die Anzeige benötigt. + + +ReplayView.banner=Wayback - externe Links, Formulare und Suchabfragen werden f\xFCr diese Kollektion nicht funktionieren. Url: {0} time: {1,date,dd.MM.yyyy HH:mm:ss} +ReplayView.bannerHideLink=[versteckt] + +PathQuery.resultsSummary={0,number,integer} Resultate für {1} +PathQuery.resultRange=zwischen {0,date,dd.MM.yyyy} und {1,date,dd.MM.yyyy} +PathQuery.newVersionIndicator=(neue Version) +PathQuery.redirectIndicator=(redirect) +PathQuery.classicResultLinkText={0,date,dd.MM.yyyy} + +PathPrefixQuery.showingResults=Anzeige von {0,number,integer} bis {1,number,integer} von {2,number,integer} Resultaten für {3} +PathPrefixQuery.unchangedIndicator=unverändert + +PathQueryClassic.searchedFor=Suche nach <a href="{0}"><b>{0}</b></a> +PathQueryClassic.searchResults=Suchergebnis für {0,date,dd.MM.yyyy} - {1,date,dd.MM.yyyy} +PathQueryClassic.resultsSummary={0,choice,0#0 Treffer|1#1 Treffer|1<{0,number,integer} Treffer} +PathQueryClassic.versionsSummary={0,choice,0#(0 Versionen)|1#(1 Version)|1<({0,number,integer} Versionen)} + + +# 0 = number of unique versions of a page +PathPrefixQuery.versionCount={0,choice,1#1 Version|1<{0,number,integer} Versionen} + +# shown when only a single capture of an URL is found in the index: +# 0 = Date of capture +PathPrefixQuery.singleCaptureDate=1 Seite von {0,date,dd.MM.yyyy} + +# shown when multiple captures of an URL are found in the index: +# 0 = number of captures +# 1 = Date of first capture +# 2 = Date of last capture +PathPrefixQuery.multiCaptureDate={0,choice,1#1 Seite|1<{0,number,integer} Seiten} zwischen {1,date,dd.MM.yyyy} und {2,date,dd.MM.yyyy} + +ResultPartition.columnSummary={0,choice,0#0 Seiten|1#1 Seite|1<{0,number,integer} Seiten} +ResultPartitions.day={0,date,d.M.} +ResultPartitions.hour={0,date,h a} +ResultPartitions.month={0,date,M/yyyy} +ResultPartitions.twoMonth={0,date,M/yyyy} - {1,date,M/yyyy} +ResultPartitions.week={0,date,d.M.} - {1,date,d.M.} +ResultPartitions.year={0,date,yyyy} + +ReplayView.javaScriptComment=\ +// DATEI ARCHIVIERT AM {0,date,dd.MM.yyyy HH:mm:ss} UND EMPFANGEN VOM\n\ +// INTERNET ARCHIVE AM {1,date,dd.MM.yyyy HH:mm:ss}.\n\ +// JAVASCRIPT HINZUGEFÜ\xDCGT VON WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.\n\ +//\n\ +// JEDER ANDERE INHALT IST EBENSO GESCHÜTZT DURCH COPYRIGHT (17 U.S.C.\n\ +// SECTION 108(a)(3)).\n\ +\n + +AdvancedSearch.url=URL: +AdvancedSearch.exactDate=Genaues Datum: +AdvancedSearch.earliestDate=Frühestes Datum: +AdvancedSearch.latestDate=Spätestes Datum: +AdvancedSearch.submitButton=Suche This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:31:36
|
Revision: 2391 http://archive-access.svn.sourceforge.net/archive-access/?rev=2391&view=rev Author: bradtofel Date: 2008-07-01 17:31:45 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REMOVED: replaced with /query/(HTML|XML)(Url|Capture)Results.jsp Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLResults.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLResults.jsp Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLResults.jsp 2008-07-02 00:30:47 UTC (rev 2390) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLResults.jsp 2008-07-02 00:31:45 UTC (rev 2391) @@ -1,194 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="java.util.Iterator" %> -<%@ page import="java.util.ArrayList" %> -<%@ page import="java.util.Date" %> -<%@ page import="org.archive.wayback.WaybackConstants" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> -<%@ page import="org.archive.wayback.core.Timestamp" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.resourceindex.filters.CaptureToUrlResultFilter" %> - -<%@ page import="org.archive.wayback.query.UIQueryResults" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<jsp:include page="/template/UI-header.jsp" flush="true" /> -<% - -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); - -String searchString = results.getSearchUrl(); - - -if(results.isCaptureResults()) { - - int resultCount = results.getResultsReturned(); - - Timestamp searchStartTs = results.getStartTimestamp(); - Timestamp searchEndTs = results.getEndTimestamp(); - Date searchStartDate = searchStartTs.getDate(); - Date searchEndDate = searchEndTs.getDate(); - - Iterator itr = results.resultsIterator(); - %> - <%= fmt.format("PathQuery.resultsSummary",resultCount,searchString) %> - <br></br> - <%= fmt.format("PathQuery.resultRange",searchStartDate,searchEndDate) %> - <hr></hr> - <% - boolean first = false; - String lastMD5 = null; - while(itr.hasNext()) { - SearchResult result = (SearchResult) itr.next(); - - String url = result.get(WaybackConstants.RESULT_URL); - - String prettyDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); - String origHost = result.get(WaybackConstants.RESULT_ORIG_HOST); - String MD5 = result.get(WaybackConstants.RESULT_MD5_DIGEST); - String redirectFlag = (0 == result.get( - WaybackConstants.RESULT_REDIRECT_URL).compareTo("-")) - ? "" : fmt.format("PathQuery.redirectIndicator"); - String httpResponse = result.get(WaybackConstants.RESULT_HTTP_CODE); - String mimeType = result.get(WaybackConstants.RESULT_MIME_TYPE); - - String arcFile = result.get(WaybackConstants.RESULT_ARC_FILE); - String arcOffset = result.get(WaybackConstants.RESULT_OFFSET); - - String replayUrl = results.resultToReplayUrl(result); - - boolean updated = false; - if(lastMD5 == null) { - lastMD5 = MD5; - updated = true; - } else if(0 != lastMD5.compareTo(MD5)) { - updated = true; - lastMD5 = MD5; - } - if(updated) { - %> - <a href="<%= replayUrl %>"><%= prettyDate %></a> - <span style="color:black;"><%= origHost %></span> - <span style="color:gray;"><%= httpResponse %></span> - <span style="color:brown;"><%= mimeType %></span> - <!-- - <span style="color:red;"><%= arcFile %></span> - <span style="color:red;"><%= arcOffset %></span> - --> - <%= redirectFlag %> - <%= fmt.format("PathQuery.newVersionIndicator") %> - - <br/> - <% - } else { - %> - <a href="<%= replayUrl %>"><%= prettyDate %></a> - <span style="color:green;"><%= origHost %></span> - <!-- - <span style="color:red;"><%= arcFile %></span> - <span style="color:red;"><%= arcOffset %></span> - --> - <br/> - <% - } - } - -} else if(results.isUrlResults()) { - - - - Date searchStartDate = results.getStartTimestamp().getDate(); - Date searchEndDate = results.getEndTimestamp().getDate(); - -// PathQuerySearchResultPartitioner partitioner = -// new PathQuerySearchResultPartitioner(results.getResults(), -// results.getURIConverter()); - - int firstResult = results.getFirstResult(); - int lastResult = results.getLastResult(); - int resultCount = results.getResultsMatching(); - - int totalCaptures = results.getResultsMatching(); - - %> - <%= fmt.format("PathPrefixQuery.showingResults",firstResult,lastResult, - resultCount,searchString) %> - <br/> - - <hr></hr> - <% - Iterator itr = results.resultsIterator(); - while(itr.hasNext()) { - SearchResult result = (SearchResult) itr.next(); - - String url = result.get(CaptureToUrlResultFilter.RESULT_ORIGINAL_URL); - String urlKey = result.get(CaptureToUrlResultFilter.RESULT_URL); - String firstDateTS = result.get(CaptureToUrlResultFilter.RESULT_FIRST_CAPTURE); - String lastDateTS = result.get(CaptureToUrlResultFilter.RESULT_LAST_CAPTURE); - int numCaptures = Integer.valueOf(result.get(CaptureToUrlResultFilter.RESULT_NUM_CAPTURES)); - int numVersions = Integer.valueOf(result.get(CaptureToUrlResultFilter.RESULT_NUM_VERSIONS)); - - Date firstDate = results.timestampToDate(firstDateTS); - Date lastDate = results.timestampToDate(lastDateTS); - - if(numCaptures == 1) { - String anchor = results.makeReplayUrl(url,firstDateTS); - %> - <a href="<%= anchor %>"> - <%= url %> - </a> - <span class="mainSearchText"> - <%= fmt.format("PathPrefixQuery.versionCount",numVersions) %> - </span> - <br/> - <span class="mainSearchText"> - <%= fmt.format("PathPrefixQuery.singleCaptureDate",firstDate) %> - </span> - <% - - } else { - String anchor = results.makeCaptureQueryUrl(url); - %> - <a href="<%= anchor %>"> - <%= url %> - </a> - <span class="mainSearchText"> - <%= fmt.format("PathPrefixQuery.versionCount",numVersions) %> - </span> - <br/> - <span class="mainSearchText"> - <%= fmt.format("PathPrefixQuery.multiCaptureDate",numCaptures,firstDate,lastDate) %> - </span> - <% - } - %> - <br/> - <br/> - <% - } -} -// show page indicators: -int curPage = results.getCurPage(); -if(curPage > results.getNumPages()) { - %> - <hr></hr> - <a href="<%= results.urlForPage(1) %>">First results</a> - <% -} else if(results.getNumPages() > 1) { - %> - <hr></hr> - <% - for(int i = 1; i <= results.getNumPages(); i++) { - if(i == curPage) { - %> - <b><%= i %></b> - <% - } else { - %> - <a href="<%= results.urlForPage(i) %>"><%= i %></a> - <% - } - } -} -%> - -<jsp:include page="/template/UI-footer.jsp" flush="true" /> Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLResults.jsp 2008-07-02 00:30:47 UTC (rev 2390) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLResults.jsp 2008-07-02 00:31:45 UTC (rev 2391) @@ -1,58 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<%@ page language="java" pageEncoding="utf-8" contentType="text/xml;charset=utf-8"%> -<%@ page import="java.util.Iterator" %> -<%@ page import="java.util.ArrayList" %> -<%@ page import="java.util.Properties" %> -<%@ page import="java.util.Enumeration" %> -<%@ page import="org.archive.wayback.WaybackConstants" %> -<%@ page import="org.archive.wayback.core.SearchResults" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> -<%@ page import="org.archive.wayback.core.Timestamp" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> -<% -UIQueryResults uiResults = (UIQueryResults) UIResults.getFromRequest(request); -SearchResults results = uiResults.getResults(); -Iterator itr = uiResults.resultsIterator(); -%> -<wayback> - <request> -<% - Properties p = results.getFilters(); - for (Enumeration e = p.keys(); e.hasMoreElements();) { - String key = UIQueryResults.encodeXMLEntity((String) e.nextElement()); - String value = UIQueryResults.encodeXMLContent((String) p.get(key)); - %> - <<%= key %>><%= value %></<%= key %>> - <% - } - String type = WaybackConstants.RESULTS_TYPE_CAPTURE; - if(uiResults.isUrlResults()) { - type = WaybackConstants.RESULTS_TYPE_URL; - } -%> - <<%= WaybackConstants.RESULTS_TYPE %>><%= type %></<%= WaybackConstants.RESULTS_TYPE %>> - </request> - <results> -<% - while(itr.hasNext()) { - %> - <result> - <% - SearchResult result = (SearchResult) itr.next(); - Properties p2 = result.getData(); - for (Enumeration e = p2.keys(); e.hasMoreElements();) { - // TODO: encode! - String key = UIQueryResults.encodeXMLEntity((String) e.nextElement()); - String value = UIQueryResults.encodeXMLContent((String) p2.get(key)); - %> - <<%= key %>><%= value %></<%= key %>> - <% - } - %> - </result> - <% - } -%> - </results> -</wayback> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:30:38
|
Revision: 2390 http://archive-access.svn.sourceforge.net/archive-access/?rev=2390&view=rev Author: bradtofel Date: 2008-07-01 17:30:47 -0700 (Tue, 01 Jul 2008) Log Message: ----------- MOVE: replay related .jsp files to /replay/ Added Paths: ----------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Redirect.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ResultMeta.jsp Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/Redirect.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/ResultMeta.jsp Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/Redirect.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/Redirect.jsp 2008-07-02 00:25:51 UTC (rev 2389) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/Redirect.jsp 2008-07-02 00:30:47 UTC (rev 2390) @@ -1,14 +0,0 @@ -<%@ page import="org.archive.wayback.core.Timestamp" %> - -<% - String url = request.getParameter("url"); - String time = request.getParameter("time"); - - // Put time-mapping for this id, or if no id, the ip-addr. - String id = request.getHeader("Proxy-Id"); - if(id == null) id = request.getRemoteAddr(); - Timestamp.addTimestampForId(request.getContextPath(),id, time); - - // Now redirect to the page the user wanted. - response.sendRedirect(url); -%> Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/ResultMeta.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/ResultMeta.jsp 2008-07-02 00:25:51 UTC (rev 2389) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/ResultMeta.jsp 2008-07-02 00:30:47 UTC (rev 2390) @@ -1,125 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="java.util.Iterator" %> -<%@ page import="java.util.Map" %> -<%@ page import="org.archive.wayback.core.Timestamp" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.replay.UIReplayResult" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<% - -UIReplayResult uiResults = (UIReplayResult) UIResults.getFromRequest(request); -StringFormatter fmt = uiResults.getFormatter(); - -String origUrl = uiResults.getOriginalUrl(); -String urlKey = uiResults.getUrlKey(); -String archiveID = uiResults.getArchiveID(); -Timestamp captureTS = uiResults.getCaptureTimestamp(); -String capturePrettyDateTime = fmt.format("MetaReplay.captureDateDisplay", - captureTS.getDate()); -String mimeType = uiResults.getMimeType(); -String digest = uiResults.getDigest(); -Map<String,String> headers = uiResults.getHttpHeaders(); - -%> -<html> - <head> - <title> - <%= fmt.format("MetaReplay.title") + urlKey +" / " + - capturePrettyDateTime %> - </title> - </head> - <body> - <h2> - <%= fmt.format("MetaReplay.title") %> - </h2> - <table> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.originalURL") %> - </td> - <td class="value-cell"> - <b> - <%= origUrl %> - </b> - </td> - </tr> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.URLKey") %> - </td> - <td class="value-cell"> - <b> - <%= urlKey %> - </b> - </td> - </tr> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.captureDate") %> - </td> - <td class="value-cell"> - <b> - <%= capturePrettyDateTime %> - </b> - </td> - </tr> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.archiveID") %> - </td> - <td class="value-cell"> - <b> - <%= archiveID %> - </b> - </td> - </tr> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.MIMEType") %> - </td> - <td class="value-cell"> - <b> - <%= mimeType %> - </b> - </td> - </tr> - <tr> - <td class="field-cell"> - <%= fmt.format("MetaReplay.digest") %> - </td> - <td class="value-cell"> - <b> - <%= digest %> - </b> - </td> - </tr> - </table> - <p> - <h2> - <%= fmt.format("MetaReplay.HTTPHeaders") %> - </h2> - <table> - <% - Iterator<String> itr = headers.keySet().iterator(); - while(itr.hasNext()) { - String key = itr.next(); - String value = headers.get(key); - %> - <tr> - <td class="field-cell"> - <%= key %> - </td> - <td class="value-cell"> - <b> - <%= value %> - </b> - </td> - </tr> - <% - } - %> - </table> - - </body> -</html> - Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Redirect.jsp (from rev 2055, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/Redirect.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Redirect.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Redirect.jsp 2008-07-02 00:30:47 UTC (rev 2390) @@ -0,0 +1,14 @@ +<%@ page import="org.archive.wayback.core.Timestamp" %> + +<% + String url = request.getParameter("url"); + String time = request.getParameter("time"); + + // Put time-mapping for this id, or if no id, the ip-addr. + String id = request.getHeader("Proxy-Id"); + if(id == null) id = request.getRemoteAddr(); + Timestamp.addTimestampForId(request.getContextPath(),id, time); + + // Now redirect to the page the user wanted. + response.sendRedirect(url); +%> Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ResultMeta.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/ResultMeta.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ResultMeta.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ResultMeta.jsp 2008-07-02 00:30:47 UTC (rev 2390) @@ -0,0 +1,125 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.util.Map" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<% + +UIReplayResult uiResults = (UIReplayResult) UIResults.getFromRequest(request); +StringFormatter fmt = uiResults.getFormatter(); + +String origUrl = uiResults.getOriginalUrl(); +String urlKey = uiResults.getUrlKey(); +String archiveID = uiResults.getArchiveID(); +Timestamp captureTS = uiResults.getCaptureTimestamp(); +String capturePrettyDateTime = fmt.format("MetaReplay.captureDateDisplay", + captureTS.getDate()); +String mimeType = uiResults.getMimeType(); +String digest = uiResults.getDigest(); +Map<String,String> headers = uiResults.getHttpHeaders(); + +%> +<html> + <head> + <title> + <%= fmt.format("MetaReplay.title") + urlKey +" / " + + capturePrettyDateTime %> + </title> + </head> + <body> + <h2> + <%= fmt.format("MetaReplay.title") %> + </h2> + <table> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.originalURL") %> + </td> + <td class="value-cell"> + <b> + <%= origUrl %> + </b> + </td> + </tr> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.URLKey") %> + </td> + <td class="value-cell"> + <b> + <%= urlKey %> + </b> + </td> + </tr> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.captureDate") %> + </td> + <td class="value-cell"> + <b> + <%= capturePrettyDateTime %> + </b> + </td> + </tr> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.archiveID") %> + </td> + <td class="value-cell"> + <b> + <%= archiveID %> + </b> + </td> + </tr> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.MIMEType") %> + </td> + <td class="value-cell"> + <b> + <%= mimeType %> + </b> + </td> + </tr> + <tr> + <td class="field-cell"> + <%= fmt.format("MetaReplay.digest") %> + </td> + <td class="value-cell"> + <b> + <%= digest %> + </b> + </td> + </tr> + </table> + <p> + <h2> + <%= fmt.format("MetaReplay.HTTPHeaders") %> + </h2> + <table> + <% + Iterator<String> itr = headers.keySet().iterator(); + while(itr.hasNext()) { + String key = itr.next(); + String value = headers.get(key); + %> + <tr> + <td class="field-cell"> + <%= key %> + </td> + <td class="value-cell"> + <b> + <%= value %> + </b> + </td> + </tr> + <% + } + %> + </table> + + </body> +</html> + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:25:42
|
Revision: 2389 http://archive-access.svn.sourceforge.net/archive-access/?rev=2389&view=rev Author: bradtofel Date: 2008-07-01 17:25:51 -0700 (Tue, 01 Jul 2008) Log Message: ----------- MOVE: moved all query related .jsps to /query/. separated URL and Capture query renderers into seprate .jsp files now use UICaptureQueryResults and UIUrlQueryResults for context. Added Paths: ----------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLCaptureResults.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLUrlResults.jsp Added: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLCaptureResults.jsp 2008-07-02 00:25:51 UTC (rev 2389) @@ -0,0 +1,114 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.util.ArrayList" %> +<%@ page import="java.util.Date" %> +<%@ page import="org.archive.wayback.WaybackConstants" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.query.UICaptureQueryResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<jsp:include page="/template/UI-header.jsp" flush="true" /> +<% + +UICaptureQueryResults results = (UICaptureQueryResults) UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); + +String searchString = results.getSearchUrl(); + + int resultCount = results.getResultsReturned(); + + Timestamp searchStartTs = results.getStartTimestamp(); + Timestamp searchEndTs = results.getEndTimestamp(); + Date searchStartDate = searchStartTs.getDate(); + Date searchEndDate = searchEndTs.getDate(); + + Iterator<CaptureSearchResult> itr = results.resultsIterator(); + %> + <%= fmt.format("PathQuery.resultsSummary",resultCount,searchString) %> + <br></br> + <%= fmt.format("PathQuery.resultRange",searchStartDate,searchEndDate) %> + <hr></hr> + <% + boolean first = false; + String lastMD5 = null; + while(itr.hasNext()) { + CaptureSearchResult result = (CaptureSearchResult) itr.next(); + + String url = result.getUrlKey(); + + String prettyDate = result.getCaptureTimestamp(); + String origHost = result.getOriginalHost(); + String MD5 = result.getDigest(); + String redirectFlag = (0 == result.getRedirectUrl().compareTo("-")) + ? "" : fmt.format("PathQuery.redirectIndicator"); + String httpResponse = result.getHttpCode(); + String mimeType = result.getMimeType(); + + String arcFile = result.getFile(); + String arcOffset = String.valueOf(result.getOffset()); + + String replayUrl = results.resultToReplayUrl(result); + + boolean updated = false; + if(lastMD5 == null) { + lastMD5 = MD5; + updated = true; + } else if(0 != lastMD5.compareTo(MD5)) { + updated = true; + lastMD5 = MD5; + } + if(updated) { + %> + <a href="<%= replayUrl %>"><%= prettyDate %></a> + <span style="color:black;"><%= origHost %></span> + <span style="color:gray;"><%= httpResponse %></span> + <span style="color:brown;"><%= mimeType %></span> + <!-- + <span style="color:red;"><%= arcFile %></span> + <span style="color:red;"><%= arcOffset %></span> + --> + <%= redirectFlag %> + <%= fmt.format("PathQuery.newVersionIndicator") %> + + <br/> + <% + } else { + %> + <a href="<%= replayUrl %>"><%= prettyDate %></a> + <span style="color:green;"><%= origHost %></span> + <!-- + <span style="color:red;"><%= arcFile %></span> + <span style="color:red;"><%= arcOffset %></span> + --> + <br/> + <% + } + } + +// show page indicators: +int curPage = results.getCurPage(); +if(curPage > results.getNumPages()) { + %> + <hr></hr> + <a href="<%= results.urlForPage(1) %>">First results</a> + <% +} else if(results.getNumPages() > 1) { + %> + <hr></hr> + <% + for(int i = 1; i <= results.getNumPages(); i++) { + if(i == curPage) { + %> + <b><%= i %></b> + <% + } else { + %> + <a href="<%= results.urlForPage(i) %>"><%= i %></a> + <% + } + } +} +%> + +<jsp:include page="/template/UI-footer.jsp" flush="true" /> Added: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/HTMLUrlResults.jsp 2008-07-02 00:25:51 UTC (rev 2389) @@ -0,0 +1,116 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.util.ArrayList" %> +<%@ page import="java.util.Date" %> +<%@ page import="org.archive.wayback.WaybackConstants" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.core.UrlSearchResult" %> +<%@ page import="org.archive.wayback.query.UIUrlQueryResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<jsp:include page="/template/UI-header.jsp" flush="true" /> +<% + +UIUrlQueryResults results = (UIUrlQueryResults) UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); + +String searchString = results.getSearchUrl(); + + + +Date searchStartDate = results.getStartTimestamp().getDate(); +Date searchEndDate = results.getEndTimestamp().getDate(); + +//PathQuerySearchResultPartitioner partitioner = +// new PathQuerySearchResultPartitioner(results.getResults(), +// results.getURIConverter()); + +int firstResult = results.getFirstResult(); +int lastResult = results.getLastResult(); +int resultCount = results.getResultsMatching(); + +int totalCaptures = results.getResultsMatching(); + +%> +<%= fmt.format("PathPrefixQuery.showingResults",firstResult,lastResult, + resultCount,searchString) %> +<br/> + +<hr></hr> +<% +Iterator<UrlSearchResult> itr = results.resultsIterator(); +while(itr.hasNext()) { + UrlSearchResult result = itr.next(); + + String urlKey = result.getUrlKey(); + String originalUrl = result.getOriginalUrl(); + String firstDateTS = result.getFirstCaptureTimestamp(); + String lastDateTS = result.getLastCaptureTimestamp(); + long numCaptures = result.getNumCaptures(); + long numVersions = result.getNumVersions(); + + Date firstDate = results.timestampToDate(firstDateTS); + Date lastDate = results.timestampToDate(lastDateTS); + + if(numCaptures == 1) { + String anchor = results.makeReplayUrl(originalUrl,firstDateTS); + %> + <a href="<%= anchor %>"> + <%= urlKey %> + </a> + <span class="mainSearchText"> + <%= fmt.format("PathPrefixQuery.versionCount",numVersions) %> + </span> + <br/> + <span class="mainSearchText"> + <%= fmt.format("PathPrefixQuery.singleCaptureDate",firstDate) %> + </span> + <% + + } else { + String anchor = results.makeCaptureQueryUrl(originalUrl); + %> + <a href="<%= anchor %>"> + <%= urlKey %> + </a> + <span class="mainSearchText"> + <%= fmt.format("PathPrefixQuery.versionCount",numVersions) %> + </span> + <br/> + <span class="mainSearchText"> + <%= fmt.format("PathPrefixQuery.multiCaptureDate",numCaptures,firstDate,lastDate) %> + </span> + <% + } + %> + <br/> + <br/> + <% +} + +// show page indicators: +int curPage = results.getCurPage(); +if(curPage > results.getNumPages()) { + %> + <hr></hr> + <a href="<%= results.urlForPage(1) %>">First results</a> + <% +} else if(results.getNumPages() > 1) { + %> + <hr></hr> + <% + for(int i = 1; i <= results.getNumPages(); i++) { + if(i == curPage) { + %> + <b><%= i %></b> + <% + } else { + %> + <a href="<%= results.urlForPage(i) %>"><%= i %></a> + <% + } + } +} +%> + +<jsp:include page="/template/UI-footer.jsp" flush="true" /> Added: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLCaptureResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLCaptureResults.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLCaptureResults.jsp 2008-07-02 00:25:51 UTC (rev 2389) @@ -0,0 +1,59 @@ +<?xml version="1.0" encoding="UTF-8"?> +<%@ page language="java" pageEncoding="utf-8" contentType="text/xml;charset=utf-8"%> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.util.ArrayList" %> +<%@ page import="java.util.Map" %> +<%@ page import="java.util.Enumeration" %> +<%@ page import="org.archive.wayback.WaybackConstants" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResults" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.query.UICaptureQueryResults" %> +<% +UICaptureQueryResults uiResults = (UICaptureQueryResults) UIResults.getFromRequest(request); + +CaptureSearchResults results = uiResults.getResults(); +Iterator<CaptureSearchResult> itr = uiResults.resultsIterator(); +%> +<wayback> + <request> +<% + Map<String,String> p = results.getFilters(); + Iterator<String> kitr = p.keySet().iterator(); + while(kitr.hasNext()) { + String key = kitr.next(); + String oKey = UIResults.encodeXMLEntity(key); + String oValue = UIResults.encodeXMLContent(p.get(key)); + %> + <<%= oKey %>><%= oValue %></<%= oKey %>> + <% + } +%> + <<%= WaybackConstants.RESULTS_TYPE %>><%= WaybackConstants.RESULTS_TYPE_CAPTURE %></<%= WaybackConstants.RESULTS_TYPE %>> + </request> + <results> +<% + while(itr.hasNext()) { + %> + <result> + <% + CaptureSearchResult result = itr.next(); + Map<String,String> p2 = result.toCanonicalStringMap(); + kitr = p2.keySet().iterator(); + + while(kitr.hasNext()) { + String key = kitr.next(); + String oKey = UIResults.encodeXMLEntity(key); + String oValue = UIResults.encodeXMLContent(p2.get(key)); + %> + <<%= oKey %>><%= oValue %></<%= oKey %>> + <% + } + %> + </result> + <% + } +%> + </results> +</wayback> Added: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLUrlResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLUrlResults.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/XMLUrlResults.jsp 2008-07-02 00:25:51 UTC (rev 2389) @@ -0,0 +1,59 @@ +<?xml version="1.0" encoding="UTF-8"?> +<%@ page language="java" pageEncoding="utf-8" contentType="text/xml;charset=utf-8"%> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.util.ArrayList" %> +<%@ page import="java.util.Map" %> +<%@ page import="java.util.Enumeration" %> +<%@ page import="org.archive.wayback.WaybackConstants" %> +<%@ page import="org.archive.wayback.core.UrlSearchResults" %> +<%@ page import="org.archive.wayback.core.UrlSearchResult" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.query.UIUrlQueryResults" %> +<% +UIUrlQueryResults uiResults = (UIUrlQueryResults) UIResults.getFromRequest(request); + +UrlSearchResults results = uiResults.getResults(); +Iterator<UrlSearchResult> itr = uiResults.resultsIterator(); +%> +<wayback> + <request> +<% + Map<String,String> p = results.getFilters(); + Iterator<String> kitr = p.keySet().iterator(); + while(kitr.hasNext()) { + String key = kitr.next(); + String oKey = UIResults.encodeXMLEntity(key); + String oValue = UIResults.encodeXMLContent(p.get(key)); + %> + <<%= oKey %>><%= oValue %></<%= oKey %>> + <% + } +%> + <<%= WaybackConstants.RESULTS_TYPE %>><%= WaybackConstants.RESULTS_TYPE_URL %></<%= WaybackConstants.RESULTS_TYPE %>> + </request> + <results> +<% + while(itr.hasNext()) { + %> + <result> + <% + UrlSearchResult result = itr.next(); + Map<String,String> p2 = result.toCanonicalStringMap(); + kitr = p2.keySet().iterator(); + + while(kitr.hasNext()) { + String key = kitr.next(); + String oKey = UIResults.encodeXMLEntity(key); + String oValue = UIResults.encodeXMLContent(p2.get(key)); + %> + <<%= oKey %>><%= oValue %></<%= oKey %>> + <% + } + %> + </result> + <% + } +%> + </results> +</wayback> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:25:07
|
Revision: 2388 http://archive-access.svn.sourceforge.net/archive-access/?rev=2388&view=rev Author: bradtofel Date: 2008-07-01 17:25:15 -0700 (Tue, 01 Jul 2008) Log Message: ----------- MOVE: moved all query related .jsps to /query/. separated URL and Capture query renderers into seprate .jsp files now use UICaptureQueryResults and UIUrlQueryResults for context. Added Paths: ----------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/CalendarResults.jsp Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CalendarResults.jsp Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CalendarResults.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CalendarResults.jsp 2008-07-02 00:22:06 UTC (rev 2387) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CalendarResults.jsp 2008-07-02 00:25:15 UTC (rev 2388) @@ -1,175 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="java.util.ArrayList" %> -<%@ page import="java.util.Date" %> -<%@ page import="java.util.Iterator" %> -<%@ page import="java.text.ParseException" %> -<%@ page import="org.archive.wayback.WaybackConstants" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> -<%@ page import="org.archive.wayback.core.Timestamp" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> -<%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartitionsFactory" %> -<%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartition" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<jsp:include page="/template/UI-header.jsp" flush="true" /> -<% - -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); -String searchString = results.getSearchUrl(); - -Date searchStartDate = results.getStartTimestamp().getDate(); -Date searchEndDate = results.getEndTimestamp().getDate(); -int firstResult = results.getFirstResult(); -int lastResult = results.getLastResult(); -int resultCount = results.getResultsMatching(); - -//Timestamp searchStartTs = results.getStartTimestamp(); -//Timestamp searchEndTs = results.getEndTimestamp(); -//String prettySearchStart = results.prettyDateFull(searchStartTs.getDate()); -//String prettySearchEnd = results.prettyDateFull(searchEndTs.getDate()); - -ArrayList<ResultsPartition> partitions = ResultsPartitionsFactory.get( - results.getResults(),results.getWbRequest()); -int numPartitions = partitions.size(); -%> -<table border="0" cellpadding="5" width="100%" class="mainSearchBanner" cellspacing="0"> - <tr> - <td> - <%= fmt.format("PathQueryClassic.searchedFor",searchString) %> - </td> - <td align="right"> - <%= fmt.format("PathQueryClassic.resultsSummary",resultCount) %> - </td> - </tr> -</table> -<br> - - -<table border="0" width="100%"> - <tr bgcolor="#CCCCCC"> - <td colspan="<%= numPartitions %>" align="center" class="mainCalendar"> - <%= fmt.format("PathQueryClassic.searchResults",searchStartDate,searchEndDate) %> - </td> - </tr> - -<!-- RESULT COLUMN HEADERS --> - <tr bgcolor="#CCCCCC"> -<% - for(int i = 0; i < numPartitions; i++) { - ResultsPartition partition = partitions.get(i); -%> - <td align="center" class="mainBigBody"> - <%= partition.getTitle() %> - </td> -<% - } -%> - </tr> -<!-- /RESULT COLUMN HEADERS --> - - - -<!-- RESULT COLUMN COUNTS --> - <tr bgcolor="#CCCCCC"> -<% - for(int i = 0; i < numPartitions; i++) { - ResultsPartition partition = (ResultsPartition) partitions.get(i); -%> - <td align="center" class="mainBigBody"> - <%= fmt.format("ResultPartition.columnSummary",partition.resultsCount()) %> - </td> -<% - } -%> - </tr> -<!-- /RESULT COLUMN COUNTS --> - - -<!-- RESULT COLUMN DATA --> - <tr bgcolor="#EBEBEB"> -<% - boolean first = false; - String lastMD5 = null; - - for(int i = 0; i < numPartitions; i++) { - ResultsPartition partition = (ResultsPartition) partitions.get(i); - ArrayList<SearchResult> partitionResults = partition.getMatches(); -%> - <td nowrap class="mainBody" valign="top"> -<% - if(partitionResults.size() == 0) { -%> - -<% - } else { - - for(int j = 0; j < partitionResults.size(); j++) { - - SearchResult result = partitionResults.get(j); - String url = result.get(WaybackConstants.RESULT_URL); - String captureDate = result.get(WaybackConstants.RESULT_CAPTURE_DATE); - Timestamp captureTS = Timestamp.parseBefore(captureDate); - String prettyDate = fmt.format("PathQuery.classicResultLinkText", - captureTS.getDate()); - String origHost = result.get(WaybackConstants.RESULT_ORIG_HOST); - String MD5 = result.get(WaybackConstants.RESULT_MD5_DIGEST); - String redirectFlag = (0 == result.get( - WaybackConstants.RESULT_REDIRECT_URL).compareTo("-")) - ? "" : fmt.format("PathPrefixQuery.redirectIndicator"); - String httpResponse = result.get(WaybackConstants.RESULT_HTTP_CODE); - String mimeType = result.get(WaybackConstants.RESULT_MIME_TYPE); - - String arcFile = result.get(WaybackConstants.RESULT_ARC_FILE); - String arcOffset = result.get(WaybackConstants.RESULT_OFFSET); - - String replayUrl = results.resultToReplayUrl(result); - - boolean updated = false; - if(lastMD5 == null) { - lastMD5 = MD5; - updated = true; - } else if(0 != lastMD5.compareTo(MD5)) { - updated = true; - lastMD5 = MD5; - } - String updateStar = updated ? "*" : ""; -%> - <a href="<%= replayUrl %>"><%= prettyDate %></a> <%= updateStar %><br></br> -<% - - } - - } -%> - </td> -<% - } - -%> - </tr> -<!-- /RESULT COLUMN DATA --> -</table> - - -<% -// show page indicators: -if(results.getNumPages() > 1) { - int curPage = results.getCurPage(); - %> - <hr></hr> - <% - for(int i = 1; i <= results.getNumPages(); i++) { - if(i == curPage) { - %> - <b><%= i %></b> - <% - } else { - %> - <a href="<%= results.urlForPage(i) %>"><%= i %></a> - <% - } - } -} -%> -<jsp:include page="/template/UI-footer.jsp" flush="true" /> Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/CalendarResults.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CalendarResults.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/CalendarResults.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/query/CalendarResults.jsp 2008-07-02 00:25:15 UTC (rev 2388) @@ -0,0 +1,174 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="java.util.ArrayList" %> +<%@ page import="java.util.Date" %> +<%@ page import="java.util.Iterator" %> +<%@ page import="java.text.ParseException" %> +<%@ page import="org.archive.wayback.WaybackConstants" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> +<%@ page import="org.archive.wayback.core.Timestamp" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.query.UICaptureQueryResults" %> +<%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartitionsFactory" %> +<%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartition" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<jsp:include page="/template/UI-header.jsp" flush="true" /> +<% + +UICaptureQueryResults results = (UICaptureQueryResults) UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); +String searchString = results.getSearchUrl(); + +Date searchStartDate = results.getStartTimestamp().getDate(); +Date searchEndDate = results.getEndTimestamp().getDate(); +long firstResult = results.getFirstResult(); +long lastResult = results.getLastResult(); +long resultCount = results.getResultsMatching(); + +//Timestamp searchStartTs = results.getStartTimestamp(); +//Timestamp searchEndTs = results.getEndTimestamp(); +//String prettySearchStart = results.prettyDateFull(searchStartTs.getDate()); +//String prettySearchEnd = results.prettyDateFull(searchEndTs.getDate()); + +ArrayList<ResultsPartition> partitions = ResultsPartitionsFactory.get( + results.getResults(),results.getWbRequest()); +int numPartitions = partitions.size(); +%> +<table border="0" cellpadding="5" width="100%" class="mainSearchBanner" cellspacing="0"> + <tr> + <td> + <%= fmt.format("PathQueryClassic.searchedFor",searchString) %> + </td> + <td align="right"> + <%= fmt.format("PathQueryClassic.resultsSummary",resultCount) %> + </td> + </tr> +</table> +<br> + + +<table border="0" width="100%"> + <tr bgcolor="#CCCCCC"> + <td colspan="<%= numPartitions %>" align="center" class="mainCalendar"> + <%= fmt.format("PathQueryClassic.searchResults",searchStartDate,searchEndDate) %> + </td> + </tr> + +<!-- RESULT COLUMN HEADERS --> + <tr bgcolor="#CCCCCC"> +<% + for(int i = 0; i < numPartitions; i++) { + ResultsPartition partition = partitions.get(i); +%> + <td align="center" class="mainBigBody"> + <%= partition.getTitle() %> + </td> +<% + } +%> + </tr> +<!-- /RESULT COLUMN HEADERS --> + + + +<!-- RESULT COLUMN COUNTS --> + <tr bgcolor="#CCCCCC"> +<% + for(int i = 0; i < numPartitions; i++) { + ResultsPartition partition = (ResultsPartition) partitions.get(i); +%> + <td align="center" class="mainBigBody"> + <%= fmt.format("ResultPartition.columnSummary",partition.resultsCount()) %> + </td> +<% + } +%> + </tr> +<!-- /RESULT COLUMN COUNTS --> + + +<!-- RESULT COLUMN DATA --> + <tr bgcolor="#EBEBEB"> +<% + boolean first = false; + String lastMD5 = null; + + for(int i = 0; i < numPartitions; i++) { + ResultsPartition partition = (ResultsPartition) partitions.get(i); + ArrayList<CaptureSearchResult> partitionResults = partition.getMatches(); +%> + <td nowrap class="mainBody" valign="top"> +<% + if(partitionResults.size() == 0) { +%> + +<% + } else { + + for(int j = 0; j < partitionResults.size(); j++) { + + CaptureSearchResult result = partitionResults.get(j); + String url = result.getUrlKey(); + String captureDate = result.getCaptureTimestamp(); + Timestamp captureTS = Timestamp.parseBefore(captureDate); + String prettyDate = fmt.format("PathQuery.classicResultLinkText", + captureTS.getDate()); + String origHost = result.getOriginalHost(); + String MD5 = result.getDigest(); + String redirectFlag = (0 == result.getRedirectUrl().compareTo("-")) + ? "" : fmt.format("PathPrefixQuery.redirectIndicator"); + String httpResponse = result.getHttpCode(); + String mimeType = result.getMimeType(); + + String arcFile = result.getFile(); + String arcOffset = String.valueOf(result.getOffset()); + + String replayUrl = results.resultToReplayUrl(result); + + boolean updated = false; + if(lastMD5 == null) { + lastMD5 = MD5; + updated = true; + } else if(0 != lastMD5.compareTo(MD5)) { + updated = true; + lastMD5 = MD5; + } + String updateStar = updated ? "*" : ""; +%> + <a href="<%= replayUrl %>"><%= prettyDate %></a> <%= updateStar %><br></br> +<% + + } + + } +%> + </td> +<% + } + +%> + </tr> +<!-- /RESULT COLUMN DATA --> +</table> + + +<% +// show page indicators: +if(results.getNumPages() > 1) { + int curPage = results.getCurPage(); + %> + <hr></hr> + <% + for(int i = 1; i <= results.getNumPages(); i++) { + if(i == curPage) { + %> + <b><%= i %></b> + <% + } else { + %> + <a href="<%= results.urlForPage(i) %>"><%= i %></a> + <% + } + } +} +%> +<jsp:include page="/template/UI-footer.jsp" flush="true" /> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:22:00
|
Revision: 2387 http://archive-access.svn.sourceforge.net/archive-access/?rev=2387&view=rev Author: bradtofel Date: 2008-07-01 17:22:06 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REFACTOR: now uses UIReplayResult object to extract context Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ArchiveComment.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ClientSideJSInsert.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Disclaimer.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/JSLessTimeline.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Timeline.jsp Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ArchiveComment.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ArchiveComment.jsp 2008-07-02 00:17:37 UTC (rev 2386) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ArchiveComment.jsp 2008-07-02 00:22:06 UTC (rev 2387) @@ -2,12 +2,12 @@ <%@ page import="java.util.Date" %> <%@ page import="org.archive.wayback.core.Timestamp" %> <%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> <%@ page import="org.archive.wayback.util.StringFormatter" %> <% -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); +UIReplayResult results = (UIReplayResult) UIResults.getFromRequest(request); StringFormatter fmt = results.getFormatter(); -Date exactDate = results.getExactRequestedTimestamp().getDate(); +Date exactDate = results.getResult().getCaptureDate(); Date now = new Date(); String prettyDateFormat = "{0,date,H:mm:ss MMM d, yyyy}"; String prettyArchiveString = fmt.format(prettyDateFormat,exactDate); Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ClientSideJSInsert.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ClientSideJSInsert.jsp 2008-07-02 00:17:37 UTC (rev 2386) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/ClientSideJSInsert.jsp 2008-07-02 00:22:06 UTC (rev 2387) @@ -4,13 +4,12 @@ <%@ page import="org.archive.wayback.core.Timestamp" %> <%@ page import="org.archive.wayback.core.UIResults" %> <%@ page import="org.archive.wayback.core.WaybackRequest" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> <%@ page import="org.archive.wayback.util.StringFormatter" %> <% -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); -ResultURIConverter uriConverter = results.getURIConverter(); -String requestDate = results.getExactRequestedTimestamp().getDateStr(); -String contextPath = uriConverter.makeReplayURI(requestDate, ""); +UIReplayResult results = (UIReplayResult) UIResults.getFromRequest(request); +String requestDate = results.getResult().getCaptureTimestamp(); +String contextPath = results.makeReplayUrl("",requestDate); String contextRoot = request.getScheme() + "://" + request.getServerName() + ":" + request.getServerPort() + request.getContextPath(); Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Disclaimer.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Disclaimer.jsp 2008-07-02 00:17:37 UTC (rev 2386) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Disclaimer.jsp 2008-07-02 00:22:06 UTC (rev 2387) @@ -2,21 +2,20 @@ <%@ page import="java.util.Date" %> <%@ page import="org.archive.wayback.WaybackConstants" %> <%@ page import="org.archive.wayback.core.Timestamp" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> <%@ page import="org.archive.wayback.core.UIResults" %> <%@ page import="org.archive.wayback.core.WaybackRequest" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> <%@ page import="org.archive.wayback.util.StringFormatter" %> <% -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); +UIReplayResult results = (UIReplayResult) UIResults.getFromRequest(request); StringFormatter fmt = results.getFormatter(); -SearchResult result = results.getResult(); +CaptureSearchResult result = results.getResult(); String dupeMsg = ""; if(result != null) { - String dupeType = result.get(WaybackConstants.RESULT_DUPLICATE_ANNOTATION); - if(dupeType != null) { - String dupeDate = result.get(WaybackConstants.RESULT_DUPLICATE_STORED_DATE); + if(result.isDuplicateDigest()) { + String dupeDate = result.getDuplicateDigestStoredTimestamp(); String prettyDate = ""; if(dupeDate != null) { Timestamp dupeTS = Timestamp.parseBefore(dupeDate); @@ -29,10 +28,10 @@ } } -Date requestDate = results.getExactRequestedTimestamp().getDate(); -String requestUrl = results.getSearchUrl(); +Date resultDate = result.getCaptureDate(); +String resultUrl = result.getOriginalUrl(); -String wmNotice = fmt.format("ReplayView.banner", requestUrl, requestDate); +String wmNotice = fmt.format("ReplayView.banner", resultUrl, resultDate); String wmHideNotice = fmt.format("ReplayView.bannerHideLink"); String contextRoot = request.getScheme() + "://" + request.getServerName() + ":" Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/JSLessTimeline.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/JSLessTimeline.jsp 2008-07-02 00:17:37 UTC (rev 2386) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/JSLessTimeline.jsp 2008-07-02 00:22:06 UTC (rev 2387) @@ -4,11 +4,12 @@ <%@ page import="java.util.Date" %> <%@ page import="java.text.ParseException" %> <%@ page import="org.archive.wayback.WaybackConstants" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResults" %> <%@ page import="org.archive.wayback.core.Timestamp" %> <%@ page import="org.archive.wayback.core.UIResults" %> <%@ page import="org.archive.wayback.core.WaybackRequest" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> <%@ page import="org.archive.wayback.query.resultspartitioner.ResultsTimelinePartitionsFactory" %> <%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartition" %> <%@ page import="org.archive.wayback.util.StringFormatter" %> @@ -17,40 +18,38 @@ String contextRoot = request.getScheme() + "://" + request.getServerName() + ":" + request.getServerPort() + request.getContextPath(); -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); +UIReplayResult results = (UIReplayResult) UIResults.getFromRequest(request); StringFormatter fmt = results.getFormatter(); - -Timestamp searchStartTs = results.getStartTimestamp(); -Timestamp searchEndTs = results.getEndTimestamp(); -Timestamp exactTs = results.getExactRequestedTimestamp(); -String searchUrl = results.getSearchUrl(); -Date exactDate = exactTs.getDate(); - -String exactDateStr = exactTs.getDateStr(); WaybackRequest wbRequest = results.getWbRequest(); +CaptureSearchResults cResults = results.getResults(); + +String exactDateStr = wbRequest.get(WaybackConstants.REQUEST_DATE); +String searchUrl = wbRequest.get(WaybackConstants.REQUEST_URL); String resolution = wbRequest.get(WaybackConstants.REQUEST_RESOLUTION); +String metaMode = wbRequest.get(WaybackConstants.REQUEST_META_MODE); + +Date exactDate = Timestamp.parseBefore(exactDateStr).getDate(); + + if(resolution == null) { resolution = WaybackConstants.REQUEST_RESOLUTION_AUTO; } -String metaMode = wbRequest.get(WaybackConstants.REQUEST_META_MODE); String metaChecked = ""; if(metaMode != null && metaMode.equals("yes")) { metaChecked = "checked"; } -String searchString = results.getSearchUrl(); +CaptureSearchResult first = null; +CaptureSearchResult prev = null; +CaptureSearchResult next = null; +CaptureSearchResult last = null; -SearchResult first = null; -SearchResult prev = null; -SearchResult next = null; -SearchResult last = null; - -int resultCount = results.getResultsReturned(); +long resultCount = cResults.getReturnedCount(); int resultIndex = 1; -Iterator<SearchResult> it = results.resultsIterator(); +Iterator<CaptureSearchResult> it = cResults.iterator(); while(it.hasNext()) { - SearchResult res = it.next(); - String resDateStr = res.get(WaybackConstants.RESULT_CAPTURE_DATE); + CaptureSearchResult res = it.next(); + String resDateStr = res.getCaptureTimestamp(); int compared = resDateStr.compareTo(exactDateStr.substring(0,resDateStr.length())); if(compared < 0) { resultIndex++; @@ -72,8 +71,7 @@ String hoursOptSelected = ""; String autoOptSelected = ""; -String minResolution = ResultsTimelinePartitionsFactory.getMinResolution( - results.getResults()); +String minResolution = ResultsTimelinePartitionsFactory.getMinResolution(cResults); String optimal = ""; if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_HOURS)) { @@ -174,7 +172,7 @@ if(first != null) { titleString = "title=\"" + fmt.format("TimelineView.firstVersionTitle", - results.resultToDate(first)) + "\""; + first.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(first) %>"><% } %><img <%= titleString %> wmSpecial="1" border=0 width=19 height=20 src="<%= contextRoot %>/images/first.jpg"><% @@ -185,7 +183,7 @@ if(prev != null) { titleString = "title=\"" + fmt.format("TimelineView.prevVersionTitle", - results.resultToDate(prev)) + "\""; + prev.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(prev) %>"><% } %><img <%= titleString %> wmSpecial="1" border=0 width=13 height=20 src="<%= contextRoot %>/images/prev.jpg"><% @@ -204,15 +202,15 @@ String prettyDateTime = null; if(numResults == 1) { imageUrl = contextRoot + "/images/mark_one.jpg"; - SearchResult result = (SearchResult) partitionResults.get(0); + CaptureSearchResult result = (CaptureSearchResult) partitionResults.get(0); replayUrl = results.resultToReplayUrl(result); - prettyDateTime = fmt.format("TimelineView.markDateTitle",results.resultToDate(result)); + prettyDateTime = fmt.format("TimelineView.markDateTitle",result.getCaptureDate()); } else if (numResults > 1) { imageUrl = contextRoot + "/images/mark_several.jpg"; - SearchResult result = (SearchResult) partitionResults.get(numResults - 1); + CaptureSearchResult result = (CaptureSearchResult) partitionResults.get(numResults - 1); replayUrl = results.resultToReplayUrl(result); - prettyDateTime = fmt.format("TimelineView.markDateTitle",results.resultToDate(result)); + prettyDateTime = fmt.format("TimelineView.markDateTitle",result.getCaptureDate()); } if((i > 0) && (i < numPartitions)) { @@ -238,7 +236,7 @@ if(next != null) { titleString = "title=\"" + fmt.format("TimelineView.nextVersionTitle", - results.resultToDate(next)) + "\""; + next.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(next) %>"><% } %><img wmSpecial="1" <%= titleString %> border=0 width=13 height=20 src="<%= contextRoot %>/images/next.jpg"><% @@ -249,7 +247,7 @@ if(last != null) { titleString = "title=\"" + fmt.format("TimelineView.lastVersionTitle", - results.resultToDate(last)) + "\""; + last.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(last) %>"><% } %><img wmSpecial="1" <%= titleString %> border=0 width=19 height=20 src="<%= contextRoot %>/images/last.jpg"><% Modified: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Timeline.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Timeline.jsp 2008-07-02 00:17:37 UTC (rev 2386) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/replay/Timeline.jsp 2008-07-02 00:22:06 UTC (rev 2387) @@ -4,11 +4,12 @@ <%@ page import="java.util.Date" %> <%@ page import="java.text.ParseException" %> <%@ page import="org.archive.wayback.WaybackConstants" %> -<%@ page import="org.archive.wayback.core.SearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResult" %> +<%@ page import="org.archive.wayback.core.CaptureSearchResults" %> <%@ page import="org.archive.wayback.core.Timestamp" %> <%@ page import="org.archive.wayback.core.UIResults" %> <%@ page import="org.archive.wayback.core.WaybackRequest" %> -<%@ page import="org.archive.wayback.query.UIQueryResults" %> +<%@ page import="org.archive.wayback.replay.UIReplayResult" %> <%@ page import="org.archive.wayback.query.resultspartitioner.ResultsTimelinePartitionsFactory" %> <%@ page import="org.archive.wayback.query.resultspartitioner.ResultsPartition" %> <%@ page import="org.archive.wayback.util.StringFormatter" %> @@ -17,53 +18,51 @@ String contextRoot = request.getScheme() + "://" + request.getServerName() + ":" + request.getServerPort() + request.getContextPath(); -UIQueryResults results = (UIQueryResults) UIResults.getFromRequest(request); +UIReplayResult results = (UIReplayResult) UIResults.getFromRequest(request); StringFormatter fmt = results.getFormatter(); - -Timestamp searchStartTs = results.getStartTimestamp(); -Timestamp searchEndTs = results.getEndTimestamp(); -Timestamp exactTs = results.getExactRequestedTimestamp(); -String searchUrl = results.getSearchUrl(); -Date exactDate = exactTs.getDate(); - -String exactDateStr = exactTs.getDateStr(); WaybackRequest wbRequest = results.getWbRequest(); +CaptureSearchResults cResults = results.getResults(); + +String exactDateStr = wbRequest.get(WaybackConstants.REQUEST_DATE); +String searchUrl = wbRequest.get(WaybackConstants.REQUEST_URL); String resolution = wbRequest.get(WaybackConstants.REQUEST_RESOLUTION); +String metaMode = wbRequest.get(WaybackConstants.REQUEST_META_MODE); + +Date exactDate = Timestamp.parseBefore(exactDateStr).getDate(); + + if(resolution == null) { - resolution = WaybackConstants.REQUEST_RESOLUTION_AUTO; + resolution = WaybackConstants.REQUEST_RESOLUTION_AUTO; } -String metaMode = wbRequest.get(WaybackConstants.REQUEST_META_MODE); String metaChecked = ""; if(metaMode != null && metaMode.equals("yes")) { - metaChecked = "checked"; + metaChecked = "checked"; } -String searchString = results.getSearchUrl(); +CaptureSearchResult first = null; +CaptureSearchResult prev = null; +CaptureSearchResult next = null; +CaptureSearchResult last = null; -SearchResult first = null; -SearchResult prev = null; -SearchResult next = null; -SearchResult last = null; - -int resultCount = results.getResultsReturned(); +long resultCount = cResults.getReturnedCount(); int resultIndex = 1; -Iterator<SearchResult> it = results.resultsIterator(); +Iterator<CaptureSearchResult> it = cResults.iterator(); while(it.hasNext()) { - SearchResult res = it.next(); - String resDateStr = res.get(WaybackConstants.RESULT_CAPTURE_DATE); - int compared = resDateStr.compareTo(exactDateStr.substring(0,resDateStr.length())); - if(compared < 0) { - resultIndex++; - prev = res; - if(first == null) { - first = res; - } - } else if(compared > 0) { - last = res; - if(next == null) { - next = res; - } - } + CaptureSearchResult res = it.next(); + String resDateStr = res.getCaptureTimestamp(); + int compared = resDateStr.compareTo(exactDateStr.substring(0,resDateStr.length())); + if(compared < 0) { + resultIndex++; + prev = res; + if(first == null) { + first = res; + } + } else if(compared > 0) { + last = res; + if(next == null) { + next = res; + } + } } // string to indicate which select option is currently active String yearsOptSelected = ""; @@ -72,50 +71,49 @@ String hoursOptSelected = ""; String autoOptSelected = ""; -String minResolution = ResultsTimelinePartitionsFactory.getMinResolution( - results.getResults()); +String minResolution = ResultsTimelinePartitionsFactory.getMinResolution(cResults); String optimal = ""; if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_HOURS)) { - optimal = fmt.format("TimelineView.timeRange.hours"); + optimal = fmt.format("TimelineView.timeRange.hours"); } else if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_DAYS)) { - optimal = fmt.format("TimelineView.timeRange.days"); + optimal = fmt.format("TimelineView.timeRange.days"); } else if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_MONTHS)) { - optimal = fmt.format("TimelineView.timeRange.months"); + optimal = fmt.format("TimelineView.timeRange.months"); } else if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_TWO_MONTHS)) { - optimal = fmt.format("TimelineView.timeRange.twomonths"); + optimal = fmt.format("TimelineView.timeRange.twomonths"); } else if(minResolution.equals(WaybackConstants.REQUEST_RESOLUTION_YEARS)) { - optimal = fmt.format("TimelineView.timeRange.years"); + optimal = fmt.format("TimelineView.timeRange.years"); } else { - optimal = fmt.format("TimelineView.timeRange.unknown"); + optimal = fmt.format("TimelineView.timeRange.unknown"); } String autoOptString = fmt.format("TimelineView.timeRange.auto",optimal); ArrayList<ResultsPartition> partitions; if(resolution.equals(WaybackConstants.REQUEST_RESOLUTION_HOURS)) { - hoursOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getHour(results.getResults(), - wbRequest); + hoursOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getHour(results.getResults(), + wbRequest); } else if(resolution.equals(WaybackConstants.REQUEST_RESOLUTION_DAYS)) { - daysOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getDay(results.getResults(), - wbRequest); + daysOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getDay(results.getResults(), + wbRequest); } else if(resolution.equals(WaybackConstants.REQUEST_RESOLUTION_MONTHS)) { - monthsOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getMonth(results.getResults(), - wbRequest); + monthsOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getMonth(results.getResults(), + wbRequest); } else if(resolution.equals(WaybackConstants.REQUEST_RESOLUTION_TWO_MONTHS)) { - monthsOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getTwoMonth(results.getResults(), - wbRequest); + monthsOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getTwoMonth(results.getResults(), + wbRequest); } else if(resolution.equals(WaybackConstants.REQUEST_RESOLUTION_YEARS)) { - yearsOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getYear(results.getResults(), - wbRequest); + yearsOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getYear(results.getResults(), + wbRequest); } else { - autoOptSelected = "selected"; - partitions = ResultsTimelinePartitionsFactory.getAuto(results.getResults(), - wbRequest); + autoOptSelected = "selected"; + partitions = ResultsTimelinePartitionsFactory.getAuto(results.getResults(), + wbRequest); } int numPartitions = partitions.size(); ResultsPartition firstP = (ResultsPartition) partitions.get(0); @@ -196,7 +194,7 @@ if(first != null) { titleString = "title=\"" + fmt.format("TimelineView.firstVersionTitle", - results.resultToDate(first)) + "\""; + first.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(first) %>"><% } %><img <%= titleString %> wmSpecial="1" border=0 width=19 height=20 src="<%= contextRoot %>/images/first.jpg"><% @@ -207,7 +205,7 @@ if(prev != null) { titleString = "title=\"" + fmt.format("TimelineView.prevVersionTitle", - results.resultToDate(prev)) + "\""; + prev.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(prev) %>"><% } %><img <%= titleString %> wmSpecial="1" border=0 width=13 height=20 src="<%= contextRoot %>/images/prev.jpg"><% @@ -226,15 +224,15 @@ String prettyDateTime = null; if(numResults == 1) { imageUrl = contextRoot + "/images/mark_one.jpg"; - SearchResult result = (SearchResult) partitionResults.get(0); + CaptureSearchResult result = (CaptureSearchResult) partitionResults.get(0); replayUrl = results.resultToReplayUrl(result); - prettyDateTime = fmt.format("TimelineView.markDateTitle",results.resultToDate(result)); + prettyDateTime = fmt.format("TimelineView.markDateTitle",result.getCaptureDate()); } else if (numResults > 1) { imageUrl = contextRoot + "/images/mark_several.jpg"; - SearchResult result = (SearchResult) partitionResults.get(numResults - 1); + CaptureSearchResult result = (CaptureSearchResult) partitionResults.get(numResults - 1); replayUrl = results.resultToReplayUrl(result); - prettyDateTime = fmt.format("TimelineView.markDateTitle",results.resultToDate(result)); + prettyDateTime = fmt.format("TimelineView.markDateTitle",result.getCaptureDate()); } if((i > 0) && (i < numPartitions)) { @@ -260,7 +258,7 @@ if(next != null) { titleString = "title=\"" + fmt.format("TimelineView.nextVersionTitle", - results.resultToDate(next)) + "\""; + next.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(next) %>"><% } %><img wmSpecial="1" <%= titleString %> border=0 width=13 height=20 src="<%= contextRoot %>/images/next.jpg"><% @@ -271,7 +269,7 @@ if(last != null) { titleString = "title=\"" + fmt.format("TimelineView.lastVersionTitle", - results.resultToDate(last)) + "\""; + last.getCaptureDate()) + "\""; %><a wmSpecial="1" href="<%= results.resultToReplayUrl(last) %>"><% } %><img wmSpecial="1" <%= titleString %> border=0 width=19 height=20 src="<%= contextRoot %>/images/last.jpg"><% This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-02 00:17:28
|
Revision: 2386 http://archive-access.svn.sourceforge.net/archive-access/?rev=2386&view=rev Author: bradtofel Date: 2008-07-01 17:17:37 -0700 (Tue, 01 Jul 2008) Log Message: ----------- MOVED: exception related rendering .jsps to /exception/ Added Paths: ----------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/CSSError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/HTMLError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/JavaScriptError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/XMLError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/error_image.gif Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CSSError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/JavaScriptError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLError.jsp trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/error_image.gif Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/CSSError.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CSSError.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/CSSError.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/CSSError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -0,0 +1,18 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="org.archive.wayback.exception.WaybackException" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<% + +WaybackException e = (WaybackException) request.getAttribute("exception"); +UIResults results = UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); +response.setStatus(e.getStatus()); + +%> +/* CSS wayback retrieval error: + + Title: <%= fmt.format(e.getTitleKey()) %> + Message: <%= fmt.format(e.getMessageKey()) %> + + */ Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/HTMLError.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLError.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/HTMLError.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/HTMLError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -0,0 +1,19 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="org.archive.wayback.exception.WaybackException" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<% +WaybackException e = (WaybackException) request.getAttribute("exception"); +e.setupResponse(response); +%> +<jsp:include page="/template/UI-header.jsp" flush="true" /> +<% + +UIResults results = UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); + +%> + +<h2><%= fmt.format(e.getTitleKey()) %></h2> +<p><b><%= fmt.format(e.getMessageKey(),e.getMessage()) %></b></p> +<jsp:include page="/template/UI-footer.jsp" flush="true" /> Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/JavaScriptError.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/JavaScriptError.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/JavaScriptError.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/JavaScriptError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -0,0 +1,16 @@ +<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> +<%@ page import="org.archive.wayback.exception.WaybackException" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<% + +WaybackException e = (WaybackException) request.getAttribute("exception"); +UIResults results = UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); +response.setStatus(e.getStatus()); + +%> +// Javascript wayback retrieval error: +// +// Title: <%= fmt.format(e.getTitleKey()) %> +// Message: <%= fmt.format(e.getMessageKey()) %> Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/XMLError.jsp (from rev 2228, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLError.jsp) =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/XMLError.jsp (rev 0) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/XMLError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -0,0 +1,19 @@ +<?xml version="1.0" encoding="UTF-8"?> +<%@ page language="java" pageEncoding="utf-8" contentType="text/xml;charset=utf-8"%> +<%@ page import="org.archive.wayback.exception.WaybackException" %> +<%@ page import="org.archive.wayback.core.UIResults" %> +<%@ page import="org.archive.wayback.util.StringFormatter" %> +<% + +WaybackException e = (WaybackException) request.getAttribute("exception"); +UIResults results = UIResults.getFromRequest(request); +StringFormatter fmt = results.getFormatter(); +//response.setStatus(e.getStatus()); + +%> +<wayback> + <error> + <title><%= UIResults.encodeXMLContent(fmt.format(e.getTitleKey())) %></title> + <message><%= UIResults.encodeXMLContent(fmt.format(e.getMessageKey())) %></message> + </error> +</wayback> Copied: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/exception/error_image.gif (from rev 2055, trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/error_image.gif) =================================================================== (Binary files differ) Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CSSError.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CSSError.jsp 2008-07-02 00:16:07 UTC (rev 2385) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/CSSError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -1,18 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="org.archive.wayback.exception.WaybackException" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<% - -WaybackException e = (WaybackException) request.getAttribute("exception"); -UIResults results = UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); -response.setStatus(e.getStatus()); - -%> -/* CSS wayback retrieval error: - - Title: <%= fmt.format(e.getTitleKey()) %> - Message: <%= fmt.format(e.getMessageKey()) %> - - */ Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLError.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLError.jsp 2008-07-02 00:16:07 UTC (rev 2385) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/HTMLError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -1,19 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="org.archive.wayback.exception.WaybackException" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<% -WaybackException e = (WaybackException) request.getAttribute("exception"); -e.setupResponse(response); -%> -<jsp:include page="/template/UI-header.jsp" flush="true" /> -<% - -UIResults results = UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); - -%> - -<h2><%= fmt.format(e.getTitleKey()) %></h2> -<p><b><%= fmt.format(e.getMessageKey(),e.getMessage()) %></b></p> -<jsp:include page="/template/UI-footer.jsp" flush="true" /> Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/JavaScriptError.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/JavaScriptError.jsp 2008-07-02 00:16:07 UTC (rev 2385) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/JavaScriptError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -1,16 +0,0 @@ -<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%> -<%@ page import="org.archive.wayback.exception.WaybackException" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<% - -WaybackException e = (WaybackException) request.getAttribute("exception"); -UIResults results = UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); -response.setStatus(e.getStatus()); - -%> -// Javascript wayback retrieval error: -// -// Title: <%= fmt.format(e.getTitleKey()) %> -// Message: <%= fmt.format(e.getMessageKey()) %> Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLError.jsp =================================================================== --- trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLError.jsp 2008-07-02 00:16:07 UTC (rev 2385) +++ trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/XMLError.jsp 2008-07-02 00:17:37 UTC (rev 2386) @@ -1,19 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<%@ page language="java" pageEncoding="utf-8" contentType="text/xml;charset=utf-8"%> -<%@ page import="org.archive.wayback.exception.WaybackException" %> -<%@ page import="org.archive.wayback.core.UIResults" %> -<%@ page import="org.archive.wayback.util.StringFormatter" %> -<% - -WaybackException e = (WaybackException) request.getAttribute("exception"); -UIResults results = UIResults.getFromRequest(request); -StringFormatter fmt = results.getFormatter(); -//response.setStatus(e.getStatus()); - -%> -<wayback> - <error> - <title><%= UIResults.encodeXMLContent(fmt.format(e.getTitleKey())) %></title> - <message><%= UIResults.encodeXMLContent(fmt.format(e.getMessageKey())) %></message> - </error> -</wayback> Deleted: trunk/archive-access/projects/wayback/wayback-webapp/src/main/webapp/jsp/error_image.gif =================================================================== (Binary files differ) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2385 http://archive-access.svn.sourceforge.net/archive-access/?rev=2385&view=rev Author: bradtofel Date: 2008-07-01 17:16:07 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REFACTOR: SearchResult => (Url|Capture)SearchResult Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-mapreduce-prereq/src/main/java/org/archive/wayback/resourceindex/indexer/hadoop/Driver.java Modified: trunk/archive-access/projects/wayback/wayback-mapreduce-prereq/src/main/java/org/archive/wayback/resourceindex/indexer/hadoop/Driver.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-mapreduce-prereq/src/main/java/org/archive/wayback/resourceindex/indexer/hadoop/Driver.java 2008-07-02 00:15:22 UTC (rev 2384) +++ trunk/archive-access/projects/wayback/wayback-mapreduce-prereq/src/main/java/org/archive/wayback/resourceindex/indexer/hadoop/Driver.java 2008-07-02 00:16:07 UTC (rev 2385) @@ -24,8 +24,8 @@ import org.archive.io.arc.ARCRecord; import org.archive.mapred.ARCMapRunner; import org.archive.mapred.ARCRecordMapper; -import org.archive.wayback.core.SearchResult; -import org.archive.wayback.resourcestore.ARCRecordToSearchResultAdapter; +import org.archive.wayback.core.CaptureSearchResult; +import org.archive.wayback.resourcestore.indexer.ARCRecordToSearchResultAdapter; import org.archive.wayback.resourceindex.cdx.SearchResultToCDXLineAdapter; /** @@ -58,7 +58,7 @@ ObjectWritable ow = (ObjectWritable) value; ARCRecord rec = (ARCRecord) ow.get(); String line; - SearchResult result = ARtoSR.adapt(rec); + CaptureSearchResult result = ARtoSR.adapt(rec); if(result != null) { line = SRtoCDX.adapt(result); if(line != null) { This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2384 http://archive-access.svn.sourceforge.net/archive-access/?rev=2384&view=rev Author: bradtofel Date: 2008-07-01 17:15:22 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REMOVED: no longer needed with new simplified ReplayDispatcher interface. Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/BaseReplayDispatcher.java Deleted: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/BaseReplayDispatcher.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/BaseReplayDispatcher.java 2008-07-01 23:56:58 UTC (rev 2383) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/BaseReplayDispatcher.java 2008-07-02 00:15:22 UTC (rev 2384) @@ -1,210 +0,0 @@ -/* ReplayRendererDispatcher - * - * $Id$ - * - * Created on 5:23:35 PM Aug 8, 2007. - * - * Copyright (C) 2007 Internet Archive. - * - * This file is part of wayback-core. - * - * wayback-core is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser Public License as published by - * the Free Software Foundation; either version 2.1 of the License, or - * any later version. - * - * wayback-core is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser Public License for more details. - * - * You should have received a copy of the GNU Lesser Public License - * along with wayback-core; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -package org.archive.wayback.replay; - -import java.io.IOException; -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -import javax.servlet.RequestDispatcher; -import javax.servlet.ServletException; -import javax.servlet.http.HttpServletRequest; -import javax.servlet.http.HttpServletResponse; - -import org.archive.wayback.ReplayDispatcher; -import org.archive.wayback.ReplayRenderer; -import org.archive.wayback.ResultURIConverter; -import org.archive.wayback.WaybackConstants; -import org.archive.wayback.core.Resource; -import org.archive.wayback.core.SearchResult; -import org.archive.wayback.core.SearchResults; -import org.archive.wayback.core.UIResults; -import org.archive.wayback.core.WaybackRequest; -import org.archive.wayback.exception.WaybackException; - -/** - * - * - * @author brad - * @version $Date$, $Revision$ - */ -public abstract class BaseReplayDispatcher implements ReplayDispatcher { - - private String errorJsp = "/jsp/HTMLError.jsp"; - private String imageErrorJsp = "/jsp/HTMLError.jsp"; - private String javascriptErrorJsp = "/jsp/JavaScriptError.jsp"; - private String cssErrorJsp = "/jsp/CSSError.jsp"; - - protected final Pattern IMAGE_REGEX = Pattern - .compile(".*\\.(jpg|jpeg|gif|png|bmp|tiff|tif)$"); - - /* ERROR HANDLING RESPONSES: */ - - private boolean requestIsEmbedded(HttpServletRequest httpRequest, - WaybackRequest wbRequest) { - // without a wbRequest, assume it is not embedded: send back HTML - if (wbRequest == null) { - return false; - } - String referer = wbRequest.get(WaybackConstants.REQUEST_REFERER_URL); - return (referer != null && referer.length() > 0); - } - - private boolean requestIsImage(HttpServletRequest httpRequest, - WaybackRequest wbRequest) { - String requestUrl = wbRequest.get(WaybackConstants.REQUEST_URL); - if (requestUrl == null) - return false; - Matcher matcher = IMAGE_REGEX.matcher(requestUrl); - return (matcher != null && matcher.matches()); - } - - private boolean requestIsJavascript(HttpServletRequest httpRequest, - WaybackRequest wbRequest) { - - String requestUrl = wbRequest.get(WaybackConstants.REQUEST_URL); - return (requestUrl != null) && requestUrl.endsWith(".js"); - } - - private boolean requestIsCSS(HttpServletRequest httpRequest, - WaybackRequest wbRequest) { - - String requestUrl = wbRequest.get(WaybackConstants.REQUEST_URL); - return (requestUrl != null) && requestUrl.endsWith(".css"); - } - - /* - * (non-Javadoc) - * - * @see org.archive.wayback.ReplayRenderer#renderException(javax.servlet.http.HttpServletRequest, - * javax.servlet.http.HttpServletResponse, - * org.archive.wayback.core.WaybackRequest, - * org.archive.wayback.exception.WaybackException) - */ - public void renderException(HttpServletRequest httpRequest, - HttpServletResponse httpResponse, WaybackRequest wbRequest, - WaybackException exception) throws ServletException, IOException { - - // the "standard HTML" response handler: - String finalJspPath = errorJsp; - - // try to not cause client errors by sending the HTML response if - // this request is ebedded, and is obviously one of the special types: - if (requestIsEmbedded(httpRequest, wbRequest)) { - - if (requestIsJavascript(httpRequest, wbRequest)) { - - finalJspPath = javascriptErrorJsp; - - } else if (requestIsCSS(httpRequest, wbRequest)) { - - finalJspPath = cssErrorJsp; - - } else if (requestIsImage(httpRequest, wbRequest)) { - - finalJspPath = imageErrorJsp; - - } - } - - httpRequest.setAttribute("exception", exception); - UIResults uiResults = new UIResults(wbRequest); - uiResults.storeInRequest(httpRequest, finalJspPath); - - RequestDispatcher dispatcher = httpRequest - .getRequestDispatcher(finalJspPath); - if(dispatcher == null) { - throw new ServletException("Null dispatcher for " + finalJspPath); - } - dispatcher.forward(httpRequest, httpResponse); - } - - /** - * @param wbRequest - * @param result - * @param resource - * @return the correct ReplayRenderer for the Resource - */ - public abstract ReplayRenderer getRenderer(WaybackRequest wbRequest, - SearchResult result, Resource resource); - - /* - * (non-Javadoc) - * - * @see org.archive.wayback.ReplayRenderer#renderResource(javax.servlet.http.HttpServletRequest, - * javax.servlet.http.HttpServletResponse, - * org.archive.wayback.core.WaybackRequest, - * org.archive.wayback.core.SearchResult, - * org.archive.wayback.core.Resource, - * org.archive.wayback.ResultURIConverter, - * org.archive.wayback.core.SearchResults) - */ - public void renderResource(HttpServletRequest httpRequest, - HttpServletResponse httpResponse, WaybackRequest wbRequest, - SearchResult result, Resource resource, - ResultURIConverter uriConverter, SearchResults results) - throws ServletException, IOException { - - ReplayRenderer renderer = getRenderer(wbRequest, result, resource); - try { - renderer.renderResource(httpRequest, httpResponse, wbRequest, result, - resource, uriConverter, results); - } catch (WaybackException e) { - renderException(httpRequest, httpResponse, wbRequest, e); - } - } - - public String getErrorJsp() { - return errorJsp; - } - - public void setErrorJsp(String errorJsp) { - this.errorJsp = errorJsp; - } - - public String getImageErrorJsp() { - return imageErrorJsp; - } - - public void setImageErrorJsp(String imageErrorJsp) { - this.imageErrorJsp = imageErrorJsp; - } - - public String getJavascriptErrorJsp() { - return javascriptErrorJsp; - } - - public void setJavascriptErrorJsp(String javascriptErrorJsp) { - this.javascriptErrorJsp = javascriptErrorJsp; - } - - public String getCssErrorJsp() { - return cssErrorJsp; - } - - public void setCssErrorJsp(String cssErrorJsp) { - this.cssErrorJsp = cssErrorJsp; - } -} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2383 http://archive-access.svn.sourceforge.net/archive-access/?rev=2383&view=rev Author: bradtofel Date: 2008-07-01 16:56:58 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REFACTOR: replaced with adapter. Removed Paths: ------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/CaptureToUrlResultFilter.java Deleted: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/CaptureToUrlResultFilter.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/CaptureToUrlResultFilter.java 2008-07-01 23:56:23 UTC (rev 2382) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourceindex/filters/CaptureToUrlResultFilter.java 2008-07-01 23:56:58 UTC (rev 2383) @@ -1,117 +0,0 @@ -/* CaptureToUrlResultFilter - * - * $Id$ - * - * Created on 6:23:07 PM Apr 19, 2007. - * - * Copyright (C) 2007 Internet Archive. - * - * This file is part of wayback-core. - * - * wayback-core is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser Public License as published by - * the Free Software Foundation; either version 2.1 of the License, or - * any later version. - * - * wayback-core is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser Public License for more details. - * - * You should have received a copy of the GNU Lesser Public License - * along with wayback-core; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -package org.archive.wayback.resourceindex.filters; - -import java.util.HashMap; -import java.util.Properties; - -import org.archive.wayback.WaybackConstants; -import org.archive.wayback.core.SearchResult; -import org.archive.wayback.util.ObjectFilter; - -/** - * - * - * @author brad - * @version $Date$, $Revision$ - */ -public class CaptureToUrlResultFilter implements ObjectFilter<SearchResult> { - private String currentUrl; - private String firstCapture; - private String lastCapture; - private int numCaptures; - private HashMap<String,Object> digests; - private SearchResult resultRef = null; - - /** - * - */ - public final static String RESULT_URL = "result.url"; - /** - * - */ - public final static String RESULT_FIRST_CAPTURE = "result.firstcapture"; - /** - * - */ - public final static String RESULT_LAST_CAPTURE = "result.lastcapture"; - /** - * - */ - public final static String RESULT_NUM_CAPTURES = "result.numcaptures"; - /** - * - */ - public final static String RESULT_NUM_VERSIONS = "result.numversions"; - /** - * - */ - public final static String RESULT_ORIGINAL_URL = "result.originalurl"; - - private void fungeSearchResult(SearchResult result) { - String originalUrl = result.get(WaybackConstants.RESULT_URL); - currentUrl = result.get(WaybackConstants.RESULT_URL_KEY); - firstCapture = result.get(WaybackConstants.RESULT_CAPTURE_DATE); - lastCapture = result.get(WaybackConstants.RESULT_CAPTURE_DATE); - digests = new HashMap<String,Object>(); - digests.put(result.get(WaybackConstants.RESULT_MD5_DIGEST),null); - numCaptures = 1; - - Properties p = result.getData(); - p.clear(); - resultRef = result; - resultRef.put(RESULT_ORIGINAL_URL,originalUrl); - resultRef.put(RESULT_URL,currentUrl); - resultRef.put(RESULT_FIRST_CAPTURE,firstCapture); - resultRef.put(RESULT_LAST_CAPTURE,lastCapture); - resultRef.put(RESULT_NUM_CAPTURES,"1"); - resultRef.put(RESULT_NUM_VERSIONS,"1"); - } - - public int filterObject(SearchResult r) { - String urlKey = r.get(WaybackConstants.RESULT_URL_KEY); - if(resultRef == null || !currentUrl.equals(urlKey)) { - fungeSearchResult(r); - return FILTER_INCLUDE; - } - - // same url -- accumulate: - String captureDate = r.get(WaybackConstants.RESULT_CAPTURE_DATE); - if(captureDate.compareTo(firstCapture) < 0) { - firstCapture = captureDate; - resultRef.put(RESULT_FIRST_CAPTURE,firstCapture); - } - if(captureDate.compareTo(lastCapture) > 0) { - lastCapture = captureDate; - resultRef.put(RESULT_LAST_CAPTURE,lastCapture); - } - numCaptures++; - digests.put(r.get(WaybackConstants.RESULT_MD5_DIGEST), null); - resultRef.put(RESULT_NUM_CAPTURES,String.valueOf(numCaptures)); - resultRef.put(RESULT_NUM_VERSIONS,String.valueOf(digests.size())); - return FILTER_EXCLUDE; - } - -} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
Revision: 2382 http://archive-access.svn.sourceforge.net/archive-access/?rev=2382&view=rev Author: bradtofel Date: 2008-07-01 16:56:23 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REFACTOR: SearchResult => (Url|Capture)SearchResult Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java 2008-07-01 23:56:08 UTC (rev 2381) +++ trunk/archive-access/projects/wayback/wayback-core/src/test/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterTest.java 2008-07-01 23:56:23 UTC (rev 2382) @@ -29,8 +29,7 @@ import java.io.IOException; import java.util.Map; -import org.archive.wayback.WaybackConstants; -import org.archive.wayback.core.SearchResult; +import org.archive.wayback.core.CaptureSearchResult; import org.archive.wayback.util.ObjectFilter; import junit.framework.TestCase; @@ -72,21 +71,21 @@ String bases[] = {"http://www.peagreenboat.com/", "http://peagreenboat.com/"}; // setTmpContents(bases); - ObjectFilter<SearchResult> filter = getFilter(bases); - assertTrue("unmassaged",isBlocked(filter,"www.peagreenboat.com")); - assertTrue("unmassaged",isBlocked(filter,"peagreenboat.com")); - assertFalse("other1",isBlocked(filter,"peagreenboatt.com")); - assertFalse("other2",isBlocked(filter,"peagreenboat.org")); - assertFalse("other3",isBlocked(filter,"www.peagreenboat.org")); + ObjectFilter<CaptureSearchResult> filter = getFilter(bases); + assertTrue("unmassaged",isBlocked(filter,"http://www.peagreenboat.com")); + assertTrue("unmassaged",isBlocked(filter,"http://peagreenboat.com")); + assertFalse("other1",isBlocked(filter,"http://peagreenboatt.com")); + assertFalse("other2",isBlocked(filter,"http://peagreenboat.org")); + assertFalse("other3",isBlocked(filter,"http://www.peagreenboat.org")); // there is a problem with the SURTTokenizer... deal with ports! -// assertFalse("other4",isBlocked(filter,"www.peagreenboat.com:8080")); - assertTrue("subpath",isBlocked(filter,"www.peagreenboat.com/foo")); - assertTrue("emptypath",isBlocked(filter,"www.peagreenboat.com/")); +// assertFalse("other4",isBlocked(filter,"http://www.peagreenboat.com:8080")); + assertTrue("subpath",isBlocked(filter,"http://www.peagreenboat.com/foo")); + assertTrue("emptypath",isBlocked(filter,"http://www.peagreenboat.com/")); } - private boolean isBlocked(ObjectFilter<SearchResult> filter, String url) { - SearchResult result = new SearchResult(); - result.put(WaybackConstants.RESULT_URL,url); + private boolean isBlocked(ObjectFilter<CaptureSearchResult> filter, String url) { + CaptureSearchResult result = new CaptureSearchResult(); + result.setOriginalUrl(url); int filterResult = filter.filterObject(result); if(filterResult == ObjectFilter.FILTER_EXCLUDE) { return true; @@ -94,7 +93,7 @@ return false; } - private ObjectFilter<SearchResult> getFilter(String lines[]) + private ObjectFilter<CaptureSearchResult> getFilter(String lines[]) throws IOException { setTmpContents(lines); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bra...@us...> - 2008-07-01 23:55:59
|
Revision: 2381 http://archive-access.svn.sourceforge.net/archive-access/?rev=2381&view=rev Author: bradtofel Date: 2008-07-01 16:56:08 -0700 (Tue, 01 Jul 2008) Log Message: ----------- REFACTOR: SearchResult => (Url|Capture)SearchResult Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/webapp/AccessPoint.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/webapp/AccessPoint.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/webapp/AccessPoint.java 2008-07-01 23:55:46 UTC (rev 2380) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/webapp/AccessPoint.java 2008-07-01 23:56:08 UTC (rev 2381) @@ -41,11 +41,12 @@ import org.archive.wayback.ResultURIConverter; import org.archive.wayback.WaybackConstants; import org.archive.wayback.accesscontrol.ExclusionFilterFactory; +import org.archive.wayback.core.CaptureSearchResult; import org.archive.wayback.core.CaptureSearchResults; import org.archive.wayback.core.Resource; -import org.archive.wayback.core.SearchResult; import org.archive.wayback.core.SearchResults; import org.archive.wayback.core.UIResults; +import org.archive.wayback.core.UrlSearchResults; import org.archive.wayback.core.WaybackRequest; import org.archive.wayback.exception.AuthenticationControlException; import org.archive.wayback.exception.BaseExceptionRenderer; @@ -230,7 +231,7 @@ WaybackRequest wbRequest = new WaybackRequest(); wbRequest.setContextPrefix(getAbsoluteLocalPrefix(httpRequest)); wbRequest.setContext(this); - UIResults uiResults = new UIResults(wbRequest); + UIResults uiResults = new UIResults(wbRequest,uriConverter); String translated = "/" + translateRequestPathQuery(httpRequest); uiResults.storeInRequest(httpRequest,translated); RequestDispatcher dispatcher = null; @@ -310,7 +311,7 @@ CaptureSearchResults captureResults = (CaptureSearchResults) results; // TODO: check which versions are actually accessible right now? - SearchResult closest = captureResults.getClosest(wbRequest); + CaptureSearchResult closest = captureResults.getClosest(wbRequest); resource = collection.getResourceStore().retrieveResource(closest); ReplayRenderer renderer = replay.getRenderer(wbRequest, closest, resource); renderer.renderResource(httpRequest, httpResponse, wbRequest, @@ -327,18 +328,19 @@ throws ServletException, IOException, WaybackException { SearchResults results = collection.getResourceIndex().query(wbRequest); - if(results.getResultsType().equals( - WaybackConstants.RESULTS_TYPE_CAPTURE)) { + if(results instanceof CaptureSearchResults) { CaptureSearchResults cResults = (CaptureSearchResults) results; - SearchResult closest = cResults.getClosest(wbRequest); - closest.put(WaybackConstants.RESULT_CLOSEST_INDICATOR, - WaybackConstants.RESULT_CLOSEST_VALUE); + CaptureSearchResult closest = cResults.getClosest(wbRequest); + closest.setClosest(true); + query.renderCaptureResults(httpRequest,httpResponse,wbRequest, + cResults,uriConverter); + + } else if(results instanceof UrlSearchResults) { + UrlSearchResults uResults = (UrlSearchResults) results; query.renderUrlResults(httpRequest,httpResponse,wbRequest, - results,uriConverter); - + uResults,uriConverter); } else { - query.renderUrlPrefixResults(httpRequest,httpResponse,wbRequest, - results,uriConverter); + throw new WaybackException("Unknown index format"); } } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |