You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
| 2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
| 2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
| 2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
| 2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
| 2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
| 2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
| 2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
| 2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|
From: Brad T. <br...@ar...> - 2007-01-13 00:29:02
|
This note is to announce release of Wayback 0.8.0. Its available for download from sourceforge at: http://sourceforge.net/project/showfiles.php?group_id=118427. Wayback 0.8.0 includes numerous bug-fixes, improves character detection reliability, and introduces a new ResourceIndex implementation using sorted CDX flat-files, which allow far larger indexes to be used with the Wayback software. This new version also includes several new command-line tools for creating, maintaining, and transitioning between BDB and CDX indexes. The site documentation has also been significantly revised. This new version requires significant changes to the web.xml file -- recommended transition strategy is to start with the new default web.xml, and repeat any customizations made in previous versions. It also requires a new format of BDB data, which will need to be recreated. Yours, Internet Archive Webteam |
|
From: Michael S. <st...@ar...> - 2006-12-26 17:48:55
|
Artem Antonov wrote: > Hello, > > I'm a novice at the NutchWax. > I'm using the lateset version from the Sourceforge (0.8.0 Release). > > Please, could you give me a hint how I can parse ARC file from my Java > application using the NutchWax. Here is a pointer to the code that does parse of ARCs in NutchWAX: http://archive-access.sourceforge.net/projects/nutch/xref/org/archive/access/nutch/ImportArcs.html#434. To obtain an ARCReader, use the ArchiveReaderFactory: http://crawler.archive.org/apidocs/org/archive/io/ArchiveReaderFactory.html. Does this answer your question? St.Ack > > Thanks. > > Regards, > Artem Antonov. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Artem A. <ant...@ya...> - 2006-12-25 12:18:36
|
Hello, =0A=0AI'm a novice at the NutchWax.=0AI'm using the lateset version = from the Sourceforge (0.8.0 Release).=0A=0APlease, could you give me a hint= how I can parse ARC file from my Java application using the NutchWax.=0A= =0AThanks.=0A=0ARegards,=0AArtem Antonov.=0A=0A=0A=0A______________________= ____________________________=0ADo You Yahoo!?=0ATired of spam? Yahoo! Mail= has the best spam protection around =0Ahttp://mail.yahoo.com |
|
From: Michael S. <st...@ar...> - 2006-12-20 20:01:47
|
Hey Kaisa. The script is built into the nutchwax.jar but its a bug that its not found when you run in standalone mode (I'm guessing this is what you're doing since if you run it distributed -- or even pseudo-distributed the parse-pdf.sh script is found). As a workaround, you can download the script from here: http://archive-access.cvs.sourceforge.net/*checkout*/archive-access/archive-access/projects/nutch/src/plugin/parse-waxext/bin/parse-pdf.sh?content-type=text%2Fplain <http://archive-access.cvs.sourceforge.net/*checkout*/archive-access/archive-access/projects/nutch/src/plugin/parse-waxext/bin/parse-pdf.sh?content-type=text%2Fplain> (or unjar the jar and get it from there) and put it where it can be found by the indexing job -- such as under a 'bin' directory in your current working directory (where the latter is wherever you launched the indexing from) -- or you can try running pseudo-distributed mode. I should fix this issue but lets have 0.8 stew for a bit and see if any other issues show up first before I spend time on a new release. Thanks Kaisa. St.Ack Kaisa Kaunonen wrote: > Thanks for the new nutchwax release 0.8.0 > > I haven't yet studied it deeper, only test-indexed one > collection. I had a problem with pdf files because a script > 'parse-pdf' is missing. I can't find it in nutchwax-0.8.0/bin > Yes, I have xpdf installed in path but I guess this script > is needed to launch it? > > Quote from logs => > 'External command /bin/bash ./bin/parse-pdf.sh failed with error: > /bin/bash: ./bin/parse-pdf.sh: No such file or directory..' > > Otherwise, it's very useful to now have incremental indexing > and multiple collections in a single index. > > Best, > Kaisa > > > ---------- Forwarded message ---------- > Date: Tue, 12 Dec 2006 17:45:20 -0800 > From: Michael Stack <st...@ar...> > To: arc...@li... > Subject: [Archive-access-discuss] [ANN] nutchwax-0.8.0 released > > This note is to announce release of NutchWAX 0.8.0. Its available for > download from sourceforge at > http://sourceforge.net/project/showfiles.php?group_id=118427&package_id=128933&release_id=470852. > NutchWAX 0.8.0 is built against Nutch 0.8.1, released 09/24/2006. A > version of this software was recently used to make an index of greater > than 400 million documents. See Release Notes > [http://archive-access.sourceforge.net/projects/nutch/articles/releasenotes.html] > for significant changes and fixes since NutchWAX 0.6.0. The site > documentation has also been significantly revised. > > Yours, > Internet Archive Webteam > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Kaisa K. <kau...@cc...> - 2006-12-20 11:02:12
|
Thanks for the new nutchwax release 0.8.0 I haven't yet studied it deeper, only test-indexed one collection. I had a problem with pdf files because a script 'parse-pdf' is missing. I can't find it in nutchwax-0.8.0/bin Yes, I have xpdf installed in path but I guess this script is needed to launch it? Quote from logs => 'External command /bin/bash ./bin/parse-pdf.sh failed with error: /bin/bash: ./bin/parse-pdf.sh: No such file or directory..' Otherwise, it's very useful to now have incremental indexing and multiple collections in a single index. Best, Kaisa ---------- Forwarded message ---------- Date: Tue, 12 Dec 2006 17:45:20 -0800 From: Michael Stack <st...@ar...> To: arc...@li... Subject: [Archive-access-discuss] [ANN] nutchwax-0.8.0 released This note is to announce release of NutchWAX 0.8.0. Its available for download from sourceforge at http://sourceforge.net/project/showfiles.php?group_id=118427&package_id=128933&release_id=470852. NutchWAX 0.8.0 is built against Nutch 0.8.1, released 09/24/2006. A version of this software was recently used to make an index of greater than 400 million documents. See Release Notes [http://archive-access.sourceforge.net/projects/nutch/articles/releasenotes.html] for significant changes and fixes since NutchWAX 0.6.0. The site documentation has also been significantly revised. Yours, Internet Archive Webteam |
|
From: Michael S. <st...@ar...> - 2006-12-13 01:45:25
|
This note is to announce release of NutchWAX 0.8.0. Its available for download from sourceforge at http://sourceforge.net/project/showfiles.php?group_id=118427&package_id=128933&release_id=470852. NutchWAX 0.8.0 is built against Nutch 0.8.1, released 09/24/2006. A version of this software was recently used to make an index of greater than 400 million documents. See Release Notes [http://archive-access.sourceforge.net/projects/nutch/articles/releasenotes.html] for significant changes and fixes since NutchWAX 0.6.0. The site documentation has also been significantly revised. Yours, Internet Archive Webteam |
|
From: Armel T. N. <arm...@id...> - 2006-12-06 23:43:48
|
Hi, I have setup Nutch to crawl my local filesystem. I set a topN 20 and Depth 2. But when Nutch re-crawls, it re-crawls the same files over and over again. The directory doesn't contain any other sub-directories, can someone let me what might be the cause. There are more than 20 files in the directory so why nutch only getting the same twenty files? Thanks, Armel -----Original Message----- From: Michael Stack [mailto:st...@ar...] Sent: 06 December 2006 16:04 To: Shay Lawless Cc: nut...@lu...; nut...@lu...; arc...@li... Subject: Re: [Archive-access-discuss] Full List of Metadata Fields Hey Shay. Some friendly advice. Cross-posting a question will make you unpopular fast. Its best to start on the most appropriate seeming list and only move on from there if you are getting no satisfaction. The below question looks best at home over on the archive-access list. Let me have a go at answering it there. Yours, St.Ack Shay Lawless wrote: > Hi all, > > I'm using NutchWax (Version 0.7.0-200611082313) and Wera (Version > 0.5.0-200611082313) to Index a collection of ARC files generated by a > web crawl using the Heritrix web crawler (Version 1.4.0). > > When I check the metadata tag on the wera front-end the following list > of tags are displayed > > ARC Identifier > URL > Time of Archival > Last Modified Time > Mime-Type > File Status > Content Checksum > HTTP Header > > When I click on the explain link in the NutchWax front-end the > following list of tags are displayed > > Segment > Digest > Date > ARCDate > Encoding > Collection > ARCName > ARCOffset > ContentLength > PrimaryType > subType > URL > Title > Boost > > Is there a full list of the metadata fields that NutchWax/Nutch > creates when indexing? I'm particularly interested in tags relating to > the actual content on each page i.e. content type, description etc etc > When searching does NutchWax/Nutch search across such tags or just > across the parsed text of each page for occurances of keywords etc? > > Any help you can provide would be greatly appreciated! > > Shay > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Michael S. <st...@ar...> - 2006-12-06 16:00:25
|
Hey Shay. Some friendly advice. Cross-posting a question will make you unpopular fast. Its best to start on the most appropriate seeming list and only move on from there if you are getting no satisfaction. The below question looks best at home over on the archive-access list. Let me have a go at answering it there. Yours, St.Ack Shay Lawless wrote: > Hi all, > > I'm using NutchWax (Version 0.7.0-200611082313) and Wera (Version > 0.5.0-200611082313) to Index a collection of ARC files generated by a > web crawl using the Heritrix web crawler (Version 1.4.0). > > When I check the metadata tag on the wera front-end the following list > of tags are displayed > > ARC Identifier > URL > Time of Archival > Last Modified Time > Mime-Type > File Status > Content Checksum > HTTP Header > > When I click on the explain link in the NutchWax front-end the > following list of tags are displayed > > Segment > Digest > Date > ARCDate > Encoding > Collection > ARCName > ARCOffset > ContentLength > PrimaryType > subType > URL > Title > Boost > > Is there a full list of the metadata fields that NutchWax/Nutch > creates when indexing? I'm particularly interested in tags relating to > the actual content on each page i.e. content type, description etc etc > When searching does NutchWax/Nutch search across such tags or just > across the parsed text of each page for occurances of keywords etc? > > Any help you can provide would be greatly appreciated! > > Shay > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Shay L. <sea...@gm...> - 2006-12-06 15:31:49
|
Hi all, I'm using NutchWax (Version 0.7.0-200611082313) and Wera (Version 0.5.0-200611082313) to Index a collection of ARC files generated by a web crawl using the Heritrix web crawler (Version 1.4.0). When I check the metadata tag on the wera front-end the following list of tags are displayed ARC Identifier URL Time of Archival Last Modified Time Mime-Type File Status Content Checksum HTTP Header When I click on the explain link in the NutchWax front-end the following list of tags are displayed Segment Digest Date ARCDate Encoding Collection ARCName ARCOffset ContentLength PrimaryType subType URL Title Boost Is there a full list of the metadata fields that NutchWax/Nutch creates when indexing? I'm particularly interested in tags relating to the actual content on each page i.e. content type, description etc etc When searching does NutchWax/Nutch search across such tags or just across the parsed text of each page for occurances of keywords etc? Any help you can provide would be greatly appreciated! Shay |
|
From: Dang N. H. <dan...@ya...> - 2006-12-06 07:18:36
|
Hi everyone,=0AMy project used Wayback to render the webpages crawled by He= ritrix. However, we encounted some problem related to server-side redirecte= d link. =0AThe website that we want to crawl is JSP site running on Tomcat.= There are many link which are redirected by the server (not using js, but = by the jsp script itself). I can check that Heritrix actually follows these= redirected link and crawls these webpage into ARC file. However, when we t= ry to render using Wayback, we can not follow these redirected link (and th= e Wayback display error of "No resource available"). So I wonder whether an= y of you have a plan to fix it and what is your approach?=0AThanks=0ANam Ha= i=0A=0A=0A =0A_____________________________________________________________= _______________________=0AAny questions? Get answers on any topic at www.An= swers.yahoo.com. Try it now. |
|
From: Lukas M. <lma...@gm...> - 2006-11-27 16:35:58
|
Dne ned=C4=9Ble 26 listopad 2006 04:57 AaRon napsal(a): > Hi, > > I have been following the Getting Started guide to get NutchWAX up and > running in standalone configuration but I just keep getting the following > error: > > 06/11/26 11:24:58 WARN mapred.LocalJobRunner: job_afrgrp > java.lang.ClassCastException: org.apache.nutch.crawl.CrawlDatum > at org.apache.nutch.indexer.Indexer$InputFormat$1.next(Indexer.ja= va I run into the same problem. lukas > > :67) > > at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:203) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > LocalJobRunner.java:107) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399) > at org.archive.access.nutch.NutchwaxIndexer.index( > NutchwaxIndexer.java:193) > at org.archive.access.nutch.Nutchwax.doIndexing(Nutchwax.java:241) > at org.archive.access.nutch.Nutchwax.doIndexing(Nutchwax.java:234) > at org.archive.access.nutch.Nutchwax.doAll(Nutchwax.java:154) > at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:379) > at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > I'd appreciate if someone can help me on this. I'm using a > hadoop-0.8.0installation with nightly build nutchwax ( > nutchwax-0.7.0-200611202206). > > Thanks, > Aaron |
|
From: AaRon <aw...@gm...> - 2006-11-26 03:57:47
|
Hi,
I have been following the Getting Started guide to get NutchWAX up and
running in standalone configuration but I just keep getting the following
error:
06/11/26 11:24:58 WARN mapred.LocalJobRunner: job_afrgrp
java.lang.ClassCastException: org.apache.nutch.crawl.CrawlDatum
at org.apache.nutch.indexer.Indexer$InputFormat$1.next(Indexer.java
:67)
at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:203)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:107)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
at org.archive.access.nutch.NutchwaxIndexer.index(
NutchwaxIndexer.java:193)
at org.archive.access.nutch.Nutchwax.doIndexing(Nutchwax.java:241)
at org.archive.access.nutch.Nutchwax.doIndexing(Nutchwax.java:234)
at org.archive.access.nutch.Nutchwax.doAll(Nutchwax.java:154)
at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:379)
at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:651)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
I'd appreciate if someone can help me on this. I'm using a
hadoop-0.8.0installation with nightly build nutchwax (
nutchwax-0.7.0-200611202206).
Thanks,
Aaron
|
|
From: James G. <jg...@si...> - 2006-11-17 23:07:40
|
> 1) Images *usually* don't seem to be displayed. Aha, the image thing was my fault; seems the images I was missing were on a different domain (and I had restricted my crawl to the single domain). I'll have to do another test, but I think that's explained. jamesG James Grahn wrote: > A few quick comments: > > 1) Images *usually* don't seem to be displayed. Though I saved images > in one of the ARC files I'm using, they do not appear on the page in > WERA. I've also noticed this occurring on the WERA test site > http://nwa.nb.no/wera/ when I search for "library" and examine the front > page of the library of congress. > > Behavior: The image will appear, only to be replaced by the image's > "alt" tag as the page has its links remapped. > > Expected behavior: The image should reappear after the links are > remapped (because the image should be in the ARC). > > 2) There are some webpages that throw off the formatting of WERA. They > seem to be primarily textareas with html embedded. > > When indexed, they sometimes throw off the table formatting of WERA and > sometimes cause input boxes and submit buttons to appear on the search page. > > Always-valid examples: > http://cl.cnn.com/ctxtlink/jsp/cnn/cl/1.5/cnn-story-cl.jsp > http://sportsillustrated.cnn.com/.element/ssi/misc/2.0/contextual/story.html > http://www.cnn.com/.element/ssi/sect/1.3/WEATHER/weatherPageBox.html > http://www.cnn.com/WEATHER/ > http://cnn.dyn.cnn.com/intlWeatherBox.html > > > Perhaps-not-always-valid examples: > http://www.cnn.com/.element/ssi/www/breaking_news/1.1/banner.exclude.html > > An example of such offending html: > <textarea name="breakingNews"><!--breaking news banner--> > <div id="cnnBNBBreakingNews"> > <table cellpadding="0" cellspacing="0" border="0"> > <tr valign="middle"> > <td width="181" valign="top"><img > src="http://i.a.cnn.net/cnn/.element/img/1.5/ceiling/bnb/breaking_news.gif" > alt="" width="181" height="47" hspace="0" vspace="0" border="0"></td> > <td class="right"><div id="cnnNarrowBulletinText">Britney Spears > files for divorce from her husband Kevin Federline, citing > irreconcilable differences. </div></td> > </tr> > </table> > </div> > <!--/breaking news banner--> > </textarea> > > > 3) This xml file resulted in an abrupt end of a table in WERA: > http://edition.cnn.com/.element/img/1.3/swf/pipeline_mainpage/config_intl.xml > > The source for this was a crawl of CNN at a depth of 2 links. > Hopefully the examples are revealing. > > jamesG > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |
|
From: James G. <jg...@si...> - 2006-11-17 01:06:18
|
A few quick comments: 1) Images *usually* don't seem to be displayed. Though I saved images in one of the ARC files I'm using, they do not appear on the page in WERA. I've also noticed this occurring on the WERA test site http://nwa.nb.no/wera/ when I search for "library" and examine the front page of the library of congress. Behavior: The image will appear, only to be replaced by the image's "alt" tag as the page has its links remapped. Expected behavior: The image should reappear after the links are remapped (because the image should be in the ARC). 2) There are some webpages that throw off the formatting of WERA. They seem to be primarily textareas with html embedded. When indexed, they sometimes throw off the table formatting of WERA and sometimes cause input boxes and submit buttons to appear on the search page. Always-valid examples: http://cl.cnn.com/ctxtlink/jsp/cnn/cl/1.5/cnn-story-cl.jsp http://sportsillustrated.cnn.com/.element/ssi/misc/2.0/contextual/story.html http://www.cnn.com/.element/ssi/sect/1.3/WEATHER/weatherPageBox.html http://www.cnn.com/WEATHER/ http://cnn.dyn.cnn.com/intlWeatherBox.html Perhaps-not-always-valid examples: http://www.cnn.com/.element/ssi/www/breaking_news/1.1/banner.exclude.html An example of such offending html: <textarea name="breakingNews"><!--breaking news banner--> <div id="cnnBNBBreakingNews"> <table cellpadding="0" cellspacing="0" border="0"> <tr valign="middle"> <td width="181" valign="top"><img src="http://i.a.cnn.net/cnn/.element/img/1.5/ceiling/bnb/breaking_news.gif" alt="" width="181" height="47" hspace="0" vspace="0" border="0"></td> <td class="right"><div id="cnnNarrowBulletinText">Britney Spears files for divorce from her husband Kevin Federline, citing irreconcilable differences. </div></td> </tr> </table> </div> <!--/breaking news banner--> </textarea> 3) This xml file resulted in an abrupt end of a table in WERA: http://edition.cnn.com/.element/img/1.3/swf/pipeline_mainpage/config_intl.xml The source for this was a crawl of CNN at a depth of 2 links. Hopefully the examples are revealing. jamesG |
|
From: James G. <jg...@si...> - 2006-11-15 20:37:53
|
Forwarded by St.Ack's request. Michael Stack wrote: > I'm glad its working for you now. Suggestions for improving doc. so > others don't fall into your little wormhole? > Thanks James, > St.Ack The "Getting Started" document was great for initial testing of the system. I had a problem with Hadoop early, but I was using the hadoop that came packaged with nutch 0.8.1... which turned out to be version 0.4. I had assumed incorrectly that nutch itself would be using a recent version of hadoop. When I wanted to begin working with WERA and keep multiple versions of a page around, however, my resources were: St.Ack's response to someone else on this list, the bug report about keeping multiple versions of a webpage, and revisiting the "Getting Started" document (since it contained the listing of commands in order). So I'd say a guide outlining the steps to take to preserve multiple versions of a webpage would have been a plus. Current documentation about how to do incremental indexing would be nice too, as this is something I'll be working on soon (I suppose the old FAQ solution applies?). Outside of documentation, most of my desires from Heritrix/NutchWAX/WERA would be for automation and integration: - I'm looking forward to the automatic recrawling that I've seen on the roadmap of Heritrix. - A non-manual way of importing new crawls from Heritrix to NutchWAX would be desirable - It would have been really nice if WERA was Tomcat friendly, so WERA, NutchWAX, ArcRetriever, and even Heritrix could coexist on one server. - It would have also been nice if ArcRetriever had the same args as the wayback machine, so that either could be used with NutchWAX. (though perhaps they are compatible and I missed it?) But, as I said, I realize that most of these tools are pre-version-1.0, and I'm happy that they're around to begin with. jamesG |
|
From: James G. <jg...@si...> - 2006-11-10 03:15:42
|
I'm currently having the same problem that Natalia initially had... I'm using the nightly build (from a few days back) of nutchwax and am trying to build an index that will be used by wera. It seems to me that if you are going to store the crawls under different collection names, then you have to do multiple imports (with differing collection names), before proceeding through update, invert, index, dedup, and merge. I have been attempting to do this with multiple collections, using the optional "segments" arguments to keep the tools aware of the multiple collections. I've gone through several permutations of the command line arguments but have not had any luck yet; what's the proper sequence of commands to get this running? Thanks, James Michael Stack wrote: > Natalia Torres wrote: >> Hello >> >> I'm trying nutchwax+wera whith multiple crawls of some web pages. After >> index it I can't see it on wera. The Overview page only shows one crawl >> date. For us that's an important issue. >> >> >> I found it as a bug from july in the Nutchwax bug list (1518431 - Search >> multiple versions of one URL broken). >> >> >> There's a new version cooming soon? How can I solve it? >> >> > Did you give each crawl a different collection name or are they indexed > all with the same collection name? > > In nutch, the URL for a page is used as the key in mapreduce processing > (Keys are used to identify records and must be unique). It makes it so > you can only have one URL in a nutch index. While an URL as primary key > is far from optimal, its convenient having the key be an URL. It makes > it so the URL is easily available at various points during indexing > processing. > > In nutchwax, we've made it so that the key is collection-name + URL so > you can have multiple URLs as long as they are of different > collections. This is a climb-down from how it used to work in nutchwax > -- pre-mapreduce -- where you could have multiple URLs distingushed by > date alone. > > I'm wondering if a key of collection-name+URL is sufficient? It means > indexing, collection names must be carefully chosen. Otherwise, we need > to make the key uglier still: collection-name+URL+date. > > Yours, > St.Ack > P.S. Yes a new release is imminent. > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |
|
From: Natalia T. <nt...@ce...> - 2006-11-09 16:56:44
|
Thanks Michael, All this crawls (diferent number of crawls from more than 20 diferents uris) are indexed in the same collection because I want make specific searches on this collection (using "collection:mycollection query" as search) If I index the crawl using a collection for each date in which urls were crawled I can see they on the overview page. But, how can I do the same search? More questions How many collections can I create? The number of collections affects the response time when a search is made? Natalia |
|
From: Shay L. <sea...@gm...> - 2006-11-09 16:51:32
|
Happens everytime I click the "RSS" link at the bottom of the nutchWax
screen.
061109 165132 11 query request from 134.226.35.130
061109 165132 11 query: introduction select statement
061109 165132 11 searching for 20 raw hits
061109 165132 11 total hits: 111
061109 165132 11 SEVERE Servlet.service() for servlet NutchwaxOpenSearch
threw exception
java.lang.StringIndexOutOfBoundsException: String index out of range: -87
at java.lang.String.substring(String.java:1768)
at org.archive.access.nutch.NutchwaxOpenSearchServlet.xmlize(
NutchwaxOpenSearchServlet.java:372)
at org.archive.access.nutch.NutchwaxOpenSearchServlet.getXmlStr(
NutchwaxOpenSearchServlet.java:331)
at org.archive.access.nutch.NutchwaxOpenSearchServlet.addNode(
NutchwaxOpenSearchServlet.java:280)
at org.archive.access.nutch.NutchwaxOpenSearchServlet.doGet(
NutchwaxOpenSearchServlet.java:181)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)
Thanks,
Shay
On 09/11/06, Michael Stack <st...@ar...> wrote:
>
> Happens on every URL?
>
> Paste in the full stacktrace. That might help figure the problem.
>
> St.Ack
>
> Shay Lawless wrote:
> > Hi,
> >
> > I have installed nutchWax (0.6.1) and it seems to be indexing and
> > searching my arc files fine. However when I click on the RSS tag I am
> > getting the following error message.
> >
> > SEVERE Servlet.service() for servlet NutchwaxOpenSearch threw exception
> > java.lang.StringIndexOutOfBoundsException: String index out of range:
> -58
> >
> > This appears to be a problem with the opensearch servlet generating
> > the rss version of the url. Any ideas on this?
> >
> > Thanks in advance
> >
> > Shay
> > ------------------------------------------------------------------------
> >
> >
> -------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services,
> security?
> > Get stuff done quickly with pre-integrated technology to make your job
> easier
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Archive-access-discuss mailing list
> > Arc...@li...
> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
> >
>
>
|
|
From: Shay L. <sea...@gm...> - 2006-11-09 12:07:38
|
Hi, I have installed nutchWax (0.6.1) and it seems to be indexing and searching my arc files fine. However when I click on the RSS tag I am getting the following error message. SEVERE Servlet.service() for servlet NutchwaxOpenSearch threw exception java.lang.StringIndexOutOfBoundsException: String index out of range: -58 This appears to be a problem with the opensearch servlet generating the rss version of the url. Any ideas on this? Thanks in advance Shay |
|
From: Michael S. <st...@ar...> - 2006-11-08 17:24:04
|
Natalia Torres wrote: > Hello > > I'm trying nutchwax+wera whith multiple crawls of some web pages. After > index it I can't see it on wera. The Overview page only shows one crawl > date. For us that's an important issue. > > > I found it as a bug from july in the Nutchwax bug list (1518431 - Search > multiple versions of one URL broken). > > > There's a new version cooming soon? How can I solve it? > > Did you give each crawl a different collection name or are they indexed all with the same collection name? In nutch, the URL for a page is used as the key in mapreduce processing (Keys are used to identify records and must be unique). It makes it so you can only have one URL in a nutch index. While an URL as primary key is far from optimal, its convenient having the key be an URL. It makes it so the URL is easily available at various points during indexing processing. In nutchwax, we've made it so that the key is collection-name + URL so you can have multiple URLs as long as they are of different collections. This is a climb-down from how it used to work in nutchwax -- pre-mapreduce -- where you could have multiple URLs distingushed by date alone. I'm wondering if a key of collection-name+URL is sufficient? It means indexing, collection names must be carefully chosen. Otherwise, we need to make the key uglier still: collection-name+URL+date. Yours, St.Ack P.S. Yes a new release is imminent. |
|
From: Natalia T. <nt...@ce...> - 2006-11-08 15:40:00
|
Hello I'm trying nutchwax+wera whith multiple crawls of some web pages. After index it I can't see it on wera. The Overview page only shows one crawl date. For us that's an important issue. I found it as a bug from july in the Nutchwax bug list (1518431 - Search multiple versions of one URL broken). There's a new version cooming soon? How can I solve it? Thanks Natalia |
|
From: Michael S. <st...@ar...> - 2006-11-07 18:46:42
|
James Grahn wrote: > Just FYI, > My problem was resolved by switching to the nightly build of NutchWAX > (at St.Ack's advice) and switching to the .5 version of hadoop (I think > I was using 0.6.2). > > I now can generate a search page properly. > > A few problems remain, though. > 1) All results link to a non-existent page: > http://example.com/test/--dateOfCrawl--/http://--actualwebsite--.com/ > Checkout the 'Searching' section here: http://archive-access.sourceforge.net/projects/nutch/apidocs/overview-summary.html. It doesn't make mention of the 'wax.host' property you'll need to change -- I'll fix that -- but if you look at this file, available in the src version of nutchwax, it notes the property to change and others you might want to also change: http://archive-access.cvs.sourceforge.net/*checkout*/archive-access/archive-access/projects/nutch/conf/hadoop-site.xml.template?revision=1.7 > 2) The "Other versions" link likewise directs me example.com > > I have looked for a way to change that in the configuration, but > couldn't find it. The "Other versions" has me curious though; is this > going to be an integration point for something like WERA? > > By default, we'll only show the most recent version of a page. If there are multiple versions in an index, we'll show all (Sets hitsPerDup to '0' which says show all -- usually hitsPerDup is 1). > Additional problem: > 3) Inaccurate "hits" count: the page claims to display results 1-3 out > of 20, but the "next page" displays nothing ("Results 4-3"). This bug > seems to originate from it not taking into account the pages hidden by > the "more from cnn.com". Because I currently just have a single-domain > crawled, it's especially obvious. > Yeah. Known issue. Need to fix. Also in play is the fact that we only show one hit per site by default (Add hitsPerSite=0 to your query string to confirm this is rather than 'more' is the issue). > ... > > Also, I was wondering; would implementing something like query expansion > be accomplished in the same manner as it is in nutch? That is, would > changing the nutch configuration file in the webapps directory to > perform query expansion work in NutchWAX? > Nutchwax includes near all of nutch. Only reason it wouldn't work would be because we've not built in a plugin or some conf file or our jsp page diverges slightly from default nutch. If you let me know whats missing, I'll change build scripts to include it. Yours, St.Ack > Thanks, > James > > James Grahn wrote: > >> Greets, >> I have been attempting to follow the tutorial to get NutchWAX up and >> running in standalone mode, but I've reached an error that confounds me. >> >> The printlns seem to indicate that NutchWAX does successfully import the >> ARC files. >> >> I see this line: >> opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz >> >> And after many individual pages being imported, I see this line: >> >> 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz >> >> This followed by more individual pages. So that seems fine. But no >> index is generated and the printlns end like this: >> >> ... >> 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 >> text/html >> 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) >> at >> org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) >> at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) >> at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) >> at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) >> at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:585) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:130) >> >> >> -------- >> >> Any suggestions for this error? I am using a hadoop installation I >> acquired with the current version of nutch, and am running the "all" >> command as per the tutorial: >> >> ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all >> /tmp/inputs /tmp/outputs test >> >> >> Thanks, >> James >> >> ------------------------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Archive-access-discuss mailing list >> Arc...@li... >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >> >> >> > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: James G. <jg...@si...> - 2006-11-07 18:22:49
|
Just FYI, My problem was resolved by switching to the nightly build of NutchWAX (at St.Ack's advice) and switching to the .5 version of hadoop (I think I was using 0.6.2). I now can generate a search page properly. A few problems remain, though. 1) All results link to a non-existent page: http://example.com/test/--dateOfCrawl--/http://--actualwebsite--.com/ 2) The "Other versions" link likewise directs me example.com I have looked for a way to change that in the configuration, but couldn't find it. The "Other versions" has me curious though; is this going to be an integration point for something like WERA? Additional problem: 3) Inaccurate "hits" count: the page claims to display results 1-3 out of 20, but the "next page" displays nothing ("Results 4-3"). This bug seems to originate from it not taking into account the pages hidden by the "more from cnn.com". Because I currently just have a single-domain crawled, it's especially obvious. ... Also, I was wondering; would implementing something like query expansion be accomplished in the same manner as it is in nutch? That is, would changing the nutch configuration file in the webapps directory to perform query expansion work in NutchWAX? Thanks, James James Grahn wrote: > Greets, > I have been attempting to follow the tutorial to get NutchWAX up and > running in standalone mode, but I've reached an error that confounds me. > > The printlns seem to indicate that NutchWAX does successfully import the > ARC files. > > I see this line: > opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz > > And after many individual pages being imported, I see this line: > > 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz > > This followed by more individual pages. So that seems fine. But no > index is generated and the printlns end like this: > > ... > 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 > text/html > 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) > at > org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) > at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) > at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) > at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) > at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:130) > > > -------- > > Any suggestions for this error? I am using a hadoop installation I > acquired with the current version of nutch, and am running the "all" > command as per the tutorial: > > ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all > /tmp/inputs /tmp/outputs test > > > Thanks, > James > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |
|
From: Kaisa K. <kau...@cc...> - 2006-11-06 11:54:44
|
I deleted everything old and started a new index and now the log shines with the golden words 'nutchwax finished'. I vaguely remember deleting old indexes every now and then when testing different versions of hadoop+nutchwax, but probably didn't do it when really needed. Ok the one arc test went smoothly with hadoop-0.5.0 + nutchwax-0.7.0-200611030343 Next I'll try to index the whole of our library's recent mini size music archive. Many thanks, Kaisa On Sat, 4 Nov 2006, Michael Stack wrote: > When you changed hadoop+nutchwax combinations, did you clean the target > directory of all previous outputs? What I see in the log below is that the > import works fine but when we move to do the crawldb update, its complaining > that the sequencefiles its being fed don't jibe with what it already > digested. Was there a crawldb already in-place made with a different > version of hadoop? > > You should use the latest nutchwax build+hadoop-0.5.0. Current nutchwax is > based on the nutch 0.8.1 release. Nutch 0.8.1 is built against hadoop-0.5.0. > Nutch and Hadoop are moving at different rates. > The latest nutchwax+hadoop-0.5.0 is what we're currently using internally > running a large indexing job: ~800milllion documents. We're learning lots > operating at this new scale. I'll try and summarize our findings and post > them alongside the new release when it goes out (Should happen when this big > job completes -- in a week or so). > > Yours, > St.Ack > > > > Kaisa Kaunonen wrote: >> >> Hi all, >> >> I don't seem to find a combination of hadoop-0.5.0 and >> nutchwax-0.6.x or nutchwax-0.7.x that would index on my >> machines. >> >> hadoop-0.5.0 + nutchwax-0.6.1 (latest official) fails >> (for different reasons than 0.7.0-200611030343) >> >> hadoop-0.5.0 + nutchwax-0.7.0-200611030343 (latest build artifact) fails >> >> Attached log from the 0.7.0 run when trying to index one arc. >> The run stops by saying 'A record version mismatch occurred. >> Expecting v3, found v5' >> >> >> Best, >> Kaisa Kaunonen >> Nat.Lib.Finland |
|
From: Lukas M. <lma...@gm...> - 2006-11-05 19:51:16
|
Dne p=E1tek 03 listopad 2006 15:33 Shay Lawless napsal(a): > Hi, > > I am using nutchWax to index a series of ARC files created in a webcrawl > using the Heritrix crawler. which version of NutchWax do you use? > > My problem occurs when I perform a query on nutchWax and attempt to view > the results, nutch attempts to send me to the URL in question rather than > the archived content item. As a result I am getting an error as the URL is > not being correctly formed. > > Has anyone any experience with displaying content from an ARC content > archive rather than directly from the URL. Do I require an ARC-access > redisplay tool such as 'Wayback Machine' to achieve this. If so, can anyo= ne > give advice on this or other similar tools for ARC redisplay? arcretriever, part of WERA (previous NWA), allows retrieving of ARCRecord=20 through offset and arcname. =20 > > Any help would be greatly appreciated, thanks in advance > > Seamus Lukas |