You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
| 2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
| 2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
| 2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
| 2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
| 2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
| 2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
| 2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
| 2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|
From: Natalia T. <nt...@ce...> - 2008-01-04 11:20:00
|
Hello I installed wayback 0.8 following the instructions on web page: placing .war file in appropriate location, Waiting for Tomcat to unpack the .war file, customizing base wayback.xml file as and restarting tomcat. I customize web.xml for using the wayback machine in timeline access mode using - Local-ARC ResourceStore OPTIONS resourcestore.autoindex =1 resourcestore.indexinterval = 1000 set all path - Local-BDB ResourceIndex OPTIONS resourceindex.mergeinterval = 10000 maxresults = 1000 After 03 days from the tomcat restart i can't search many os urls on those arcs. The arc directory () contains 6548 arc.gz files; listing then information generated by wayback the index-data/merged dir has only 14 files and the arc-index/word dir the 6534 other files. The log file has no error messages and is reading arc files How many time is needed to index the arcs? More changes on the configuration? Thanks N. |
|
From: Gerhardt, M. <Mat...@sb...> - 2007-12-06 15:56:32
|
Hi @all, =20 I'd like to run NutchWax in corporation with Wera to search through my = arc-files That are created with Heritrix.=20 The crawl of the arc files looks fine, my opionion. The indexing with hadoop and nutchwax.war looks fine, too. It will be created the indexes and the segments for nutch. If I try a first search with nutchwax it feels good as well, and I get = the first information=20 >From the index. Further searching through the fulltext - if I'm trying = to get it - is=20 Honoured with an error-message =20 HTTP Status 404 - /blubb/20071203121503/http://crossasia.org/en/home/ type Status report message /blubb/20071203121503/http://crossasia.org/en/home/ description The requested resource = (/blubb/20071203121503/http://crossasia.org/en/home/) is not available. =20 You can just have a look at http://ogea.crossasia.org:8080/nutchwax/ to = get a feeling for it. =20 If I'm trying to get information from Wera I get similar results, the = list is ok, but the results in the archived version Isn't showing at all. I got a white sheet. If I'm trying to get the = metadata from the chosen site in the chosen timeline, I get an error : Failed to open stream.=20 =20 Have a look here = http://ogea.crossasia.org/wera/result.php?auto=3Don&meta=3Don&query=3Dcro= ssasia&url=3Dhttp%3A%2F%2Fcrossasia.org%2Fen%2Fhome%2F&time=3D20071203111= 503&level=3D6&autolevel=3D5&manlevel=3D5&autocheckbox=3D1&metacheckbox=3D= 1 =20 =20 What kind of misconfiguration could I have done? Any help there outside for me ? =20 Kind regards, =20 Matthias Gerhardt ___________________________________________ =20 technischer Leiter Virtuelle Fachbibliothek Ostasien =20 Staatsbibliothek zu Berlin - Preu=DFischer Kulturbesitz 10772 Berlin, Germany Telefon: +49(0)30-266-2496=20 E-Mail : mat...@sb... = <mailto:mat...@sb...>=20 ___________________________________________ =20 |
|
From: Darius S. <Dar...@si...> - 2007-12-04 09:05:42
|
Hello everyone, I use nutchwax 0.10.0 and I'm trying to search for word which contains non-ascii character. I get no results and I get broken search fraze in search field. All non-ascii characters replaced by ugly symbols. This search fraze definately exists in arc. All ascii frazes works fine. Is this nutchwax, hadoop or tomcat problem? Where can I define encoding or something? I looked through xmls and I didn't find nothing. P.s. does nutchwax do highlighting for searched frazes? Thank you in advance, Darius |
|
From: Brad T. <br...@ar...> - 2007-11-27 21:32:53
|
Hi Thomas, Sorry for the delay in response to your offline post, I was out of the office last week. I've just reproduced the problem -- pretty simple and silly bug. The thread variable in the LocalArcResourceStore is marked "static", so yes, with the current software, only one indexing thread can be active at a time. The problem also exists in the BDBIndexUpdater class, so there can only be one merging thread as well. I've created a bug, ACC-8: http://webteam.archive.org/jira/browse/ACC-8 We will either make a new 1.0.2 release which fixes this, or may postpone the fix until 1.2.0 is released in the next couple weeks. As a workaround, you can configure/activate them independently until all ARCs in each collection are indexed and merged. Thanks for posting the problem! Brad Thomas Beekman wrote: > Hello all, > > > > My name is Thomas Beekman, and I'm the Technical Lead of Web archiving > at the Royal Library of the Netherlands. I'm testing the Open Source > Wayback Machine for about two weeks now, but I have found a rather > strange behavior when using two different sets of indexes. > > > > I'm trying to set up a regular index, for production use (DB1, as a > BDB), and a test index, for QA (DBQA, also BDB). It seems like the > second DB is not automatically indexed, even when I put CDX files > manually in the incoming directory of the index-data folder. When > copying an existing DB into the /tmp/wayback-qa (the QA DB folder), it > seems to work though. So I guess there is a problem with the > indexClient. I have included my wayback.xml which I used to test this. > > > > I hope that someone could help me, or will reply if the problem does not > lie in the software but in my configuration. > > > > > > Greetings, > > Thomas Beekman > > > > > > ------------------------------------------------------------------------ > > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"> > <beans> > > <!-- > The following 3 beans are required when using the ArcProxy for providing > HTTP 1.1 remote access to ARC files distributed across multiple computers > or directories. > --> > <!-- > <bean id="filelocationdb" class="org.archive.wayback.resourcestore.http.FileLocationDB" > init-method="init"> > <property name="bdbPath" value="/tmp/wayback/arc-db" /> > <property name="bdbName" value="DB1" /> > <property name="logPath" value="/tmp/wayback/arc-db.log" /> > </bean> > > <bean name="8080:arcproxy" class="org.archive.wayback.resourcestore.http.ArcProxyServlet"> > <property name="locationDB" ref="filelocationdb" /> > </bean> > <bean name="8080:locationdb" class="org.archive.wayback.resourcestore.http.FileLocationDBServlet"> > <property name="locationDB" ref="filelocationdb" /> > </bean> > --> > > > <!-- > The following 2 beans are required when using exclusions based on live > robots.txt documents. > --> > <!-- > <bean id="livewebcache" class="org.archive.wayback.liveweb.LiveWebCache"> > > <property name="arcCacheDir"> > <bean class="org.archive.wayback.liveweb.ARCCacheDirectory" > init-method="init"> > > <property name="arcDir" value="/tmp/wayback/liveweb/arcs/" /> > <property name="arcPrefix" value="live" /> > </bean> > </property> > > <property name="cacher"> > <bean class="org.archive.wayback.liveweb.URLCacher"> > <property name="tmpDir" value="/tmp/wayback/liveweb/tmp/" /> > </bean> > </property> > > <property name="index"> > <bean class="org.archive.wayback.liveweb.LiveWebLocalResourceIndex"> > > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > > <property name="bdbName" value="DB1" /> > <property name="bdbPath" value="/tmp/wayback/liveweb/db/" /> > </bean> > </property> > </bean> > </property> > </bean> > > <bean id="excluder-factory-robot" class="org.archive.wayback.accesscontrol.robotstxt.RobotExclusionFilterFactory"> > <property name="maxCacheMS" value="86400000" /> > <property name="userAgent" value="ia_archiver" /> > <property name="webCache" ref="livewebcache" /> > </bean> > --> > > <bean id="localbdbcollection" class="org.archive.wayback.webapp.WaybackCollection"> > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs/" /> > <property name="queuedDir" value="/tmp/wayback/arc-indexer/queued" /> > <property name="workDir" value="/tmp/wayback/arc-indexer/work" /> > <property name="runInterval" value="10000" /> > <property name="indexClient"> > <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> > <property name="tmpDir" value="/tmp/wayback/arc-indexer/tmp" /> > <property name="target" value="/tmp/wayback/index-data/incoming" /> > </bean> > </property> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > <property name="bdbName" value="DB1" /> > <property name="bdbPath" value="/tmp/wayback/index/" /> > <property name="updater"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndexUpdater"> > <property name="incoming" value="/tmp/wayback/index-data/incoming/" /> > <property name="failed" value="/tmp/wayback/index-data/failed/" /> > <property name="merged" value="/tmp/wayback/index-data/merged/" /> > <property name="runInterval" value="10000" /> > </bean> > </property> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > <bean id="localqacollection" class="org.archive.wayback.webapp.WaybackCollection"> > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs-qa/" /> > <property name="queuedDir" value="/tmp/wayback-qa/arc-indexer/queued" /> > <property name="workDir" value="/tmp/wayback-qa/arc-indexer/work" /> > <property name="runInterval" value="10000" /> > <property name="indexClient"> > <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> > <property name="tmpDir" value="/tmp/wayback-qa/arc-indexer/tmp" /> > <property name="target" value="/tmp/wayback-qa/index-data/incoming" /> > </bean> > </property> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > <property name="bdbName" value="DBQA" /> > <property name="bdbPath" value="/tmp/wayback-qa/index/" /> > <property name="updater"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndexUpdater"> > <property name="incoming" value="/tmp/wayback-qa/index-data/incoming/" /> > <property name="failed" value="/tmp/wayback-qa/index-data/failed/" /> > <property name="merged" value="/tmp/wayback-qa/index-data/merged/" /> > <property name="runInterval" value="10000" /> > </bean> > </property> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > <!-- > The following WaybackCollection bean template is required when using a > manually built local CDX index. > --> > > <bean id="localcdxcollection" class="org.archive.wayback.webapp.WaybackCollection"> > > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs-qa/" /> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean id="cdxsearchresultsource" class="org.archive.wayback.resourceindex.cdx.CDXIndex"> > <property name="path" value="/tmp/wayback-qa/cdx-index/index.cdx" /> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > > > <!-- > The following WaybackCollection bean template is required when using a > remote ResourceIndex and ResourceStore implementation. This will also > required setting up an arcproxy and locationdb on the host specified by > the resourceStore:urlPrefix configuration, and an addition AccessPoint > on the host specified by the resourceIndex:searchUrlBase configuration. > --> > <!-- > <bean id="remotecollection" class="org.archive.wayback.webapp.WaybackCollection"> > > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.HttpARCResourceStore"> > <property name="urlPrefix" value="http://localhost:8080/arcproxy/" /> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.RemoteResourceIndex" > init-method="init"> > <property name="searchUrlBase" value="http://indexhost:8080/index/xmlquery" /> > </bean> > </property> > </bean> > --> > > <!-- > This is the only AccessPoint defined by default within this wayback.xml > Spring configuration file, providing an ArchivalURL Replay UI to the > "localbdbcollection" by providing ArchivalURL-specific implementations > of the replay, parser, and uriConverter. > > This AccessPoint currently will provide access only from the machine > running Tomcat. To provide external access, replace "localhost" with your > fully qualified hostname of the computer running Tomcat. > --> > > <!-- QueryUI templates --> > <bean id="standardquery" class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/HTMLResults.jsp" /> > </bean> > <bean id="calendarquery" class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/CalendarResults.jsp" /> > </bean> > > <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> > <property name="collection" ref="localbdbcollection" /> > <property name="query" ref="calendarquery" /> > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="2006" /> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback/" /> > </bean> > </property> > > <!-- > <property name="query"> > <bean class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/HTMLResults.jsp" /> > </bean> > </property> > > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" > init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="1996" /> > </bean> > </property> > > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback/" /> > </bean> > </property> > --> > </bean> > > <bean name="8080:wayback-qa" class="org.archive.wayback.webapp.AccessPoint"> > <property name="collection" ref="localqacollection" /> > <property name="query" ref="standardquery" /> > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback-qa/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="2006" /> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback-qa/" /> > </bean> > </property> > </bean> > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but only allows access from the specified IP network. > --> > <!-- > <bean name="8080:netsecure" parent="8080:wayback"> > <property name="authentication"> > <bean class="org.archive.wayback.authenticationcontrol.IPMatchesBooleanOperator"> > <property name="allowedRanges"> > <list> > <value>192.168.1.16/24</value> > </list> > </property> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://192.168.1.16:8080/netsecure/" /> > </bean> > </property> > </bean> > --> > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but checks live web robots.txt documents to determine if > archived content should be accessible. > > Note: using this AccessPoint requires enabling the "livewebcache" and > "excluder-factory-robot" beans declared at the top of this file. > --> > <!-- > <bean name="8080:robots" parent="8080:wayback"> > <property name="exclusionFactory" ref="excluder-factory-robot" /> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/robots/" /> > </bean> > </property> > </bean> > --> > > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but provides a Proxy Replay UI to the same collection. These > two access points can be used simultaneously on the same Tomcat > installation. > > Note: using this AccessPoint requires adding a "Connector" on port 8090 > in your Tomcat's server.xml file. > --> > <!-- > <bean name="8090" parent="8080:wayback"> > <property name="useServerName" value="true" /> > <property name="replay"> > <bean class="org.archive.wayback.proxy.ProxyReplayDispatcher" /> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.proxy.RedirectResultURIConverter"> > <property name="redirectURI" value="http://foo.archive.org:8090/jsp/Redirect.jsp" /> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.proxy.ProxyRequestParser" init-method="init"> > <property name="localhostNames"> > <list> > <value>foo.archive.org</value> > </list> > </property> > <property name="maxRecords" value="1000" /> > </bean> > </property> > </bean> > --> > </beans> > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Brad T. <br...@ar...> - 2007-11-27 18:59:26
|
Have you considered proxy mode? We're working on some software called "browser monkeys" which helps automate large scale processing like you're describing, link checking, for example, using a firefox plugin, among other components. There may be a version available in the next one to two months. For this testing, we use proxy mode, and if that's an option for your link checking system, I'd recommend it. We'll announce the release of the browser monkey software on this forum. Re: the domain prefix, I'll try to get some rough documentation online in the near term, but it may be a couple of weeks. For now, you'll need to look at the wayback.xml and wayback-templates.xml files, and the source code, but feel free to post specific issues, observations, and questions here. Brad > > > Hi, > we have almost completed development of an HTTrack archive to ARC > conversion tool and are in the middle of testing. To ensure that our > conversion process is successful Ive installed wayback 1.0 (which seems to > be working well) and have loaded some of our converted ARC files into the > system. From here I intended to run a link checker over the harvested > website to check how well the conversion process worked. Because the link > checker wont process the Javascript that internalizes the links, I was > hoping that the domain-prefix replay mode would allow us to get around > this. > > Is this the case? and where can we get configuration information for > domain-prefix? > > Thanks, > This e-mail is intended for the addressee only and may contain information > which is subject to legal privilege. The contents are not necessarily the > official view or communication of the National Library of New Zealand. If > you are not the intended recipient you must not use, disclose, copy or > distribute this e-mail or any information in, or attached to it. If you > have received this e-mail in error, please contact the sender immediately > or return the original message to the National Library by e-mail, and > destroy any copies. The National Library does not accept any liability for > changes made to this e-mail or attachments after sending. > > > > All e-mails have been scanned for viruses and content by security > software. The National Library reserves the right to monitor all e-mail > communications through its network. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Mathew B. <Mat...@na...> - 2007-11-27 04:58:14
|
DQoNCkhpLA0Kd2UgaGF2ZSBhbG1vc3QgY29tcGxldGVkIGRldmVsb3BtZW50IG9mIGFuIEhUVHJh Y2sgYXJjaGl2ZSB0byBBUkMgY29udmVyc2lvbiB0b29sIGFuZCBhcmUgaW4gdGhlIG1pZGRsZSBv ZiB0ZXN0aW5nLiBUbyBlbnN1cmUgdGhhdCBvdXIgY29udmVyc2lvbiBwcm9jZXNzIGlzIHN1Y2Nl c3NmdWwgSXZlIGluc3RhbGxlZCB3YXliYWNrIDEuMCAod2hpY2ggc2VlbXMgdG8gYmUgd29ya2lu ZyB3ZWxsKSBhbmQgaGF2ZSBsb2FkZWQgc29tZSBvZiBvdXIgY29udmVydGVkIEFSQyBmaWxlcyBp bnRvIHRoZSBzeXN0ZW0uIEZyb20gaGVyZSBJIGludGVuZGVkIHRvIHJ1biBhIGxpbmsgY2hlY2tl ciBvdmVyIHRoZSBoYXJ2ZXN0ZWQgd2Vic2l0ZSB0byBjaGVjayBob3cgd2VsbCB0aGUgY29udmVy c2lvbiBwcm9jZXNzIHdvcmtlZC4gQmVjYXVzZSB0aGUgbGluayBjaGVja2VyIHdvbnQgcHJvY2Vz cyB0aGUgSmF2YXNjcmlwdCB0aGF0IGludGVybmFsaXplcyB0aGUgbGlua3MsIEkgd2FzIGhvcGlu ZyB0aGF0IHRoZSBkb21haW4tcHJlZml4IHJlcGxheSBtb2RlIHdvdWxkIGFsbG93IHVzIHRvIGdl dCBhcm91bmQgdGhpcy4NCg0KSXMgdGhpcyB0aGUgY2FzZT8gYW5kIHdoZXJlIGNhbiB3ZSBnZXQg Y29uZmlndXJhdGlvbiBpbmZvcm1hdGlvbiBmb3IgZG9tYWluLXByZWZpeD8NCg0KVGhhbmtzLA0K VGhpcyBlLW1haWwgaXMgaW50ZW5kZWQgZm9yIHRoZSBhZGRyZXNzZWUgb25seSBhbmQgbWF5IGNv bnRhaW4gaW5mb3JtYXRpb24gd2hpY2ggaXMgc3ViamVjdCB0byBsZWdhbCBwcml2aWxlZ2UuIFRo ZSBjb250ZW50cyBhcmUgbm90IG5lY2Vzc2FyaWx5IHRoZSBvZmZpY2lhbCB2aWV3IG9yIGNvbW11 bmljYXRpb24gb2YgdGhlIE5hdGlvbmFsIExpYnJhcnkgb2YgTmV3IFplYWxhbmQuIElmIHlvdSBh cmUgbm90IHRoZSBpbnRlbmRlZCByZWNpcGllbnQgeW91IG11c3Qgbm90IHVzZSwgZGlzY2xvc2Us IGNvcHkgb3IgZGlzdHJpYnV0ZSB0aGlzIGUtbWFpbCBvciBhbnkgaW5mb3JtYXRpb24gaW4sIG9y IGF0dGFjaGVkIHRvIGl0LiBJZiB5b3UgaGF2ZSByZWNlaXZlZCB0aGlzIGUtbWFpbCBpbiBlcnJv ciwgcGxlYXNlIGNvbnRhY3QgdGhlIHNlbmRlciBpbW1lZGlhdGVseSBvciByZXR1cm4gdGhlIG9y aWdpbmFsIG1lc3NhZ2UgdG8gdGhlIE5hdGlvbmFsIExpYnJhcnkgYnkgZS1tYWlsLCBhbmQgZGVz dHJveSBhbnkgY29waWVzLiBUaGUgTmF0aW9uYWwgTGlicmFyeSBkb2VzIG5vdCBhY2NlcHQgYW55 IGxpYWJpbGl0eSBmb3IgY2hhbmdlcyBtYWRlIHRvIHRoaXMgZS1tYWlsIG9yIGF0dGFjaG1lbnRz IGFmdGVyIHNlbmRpbmcuDQoNCiANCg0KQWxsIGUtbWFpbHMgaGF2ZSBiZWVuIHNjYW5uZWQgZm9y IHZpcnVzZXMgYW5kIGNvbnRlbnQgYnkgc2VjdXJpdHkgc29mdHdhcmUuIFRoZSBOYXRpb25hbCBM aWJyYXJ5IHJlc2VydmVzIHRoZSByaWdodCB0byBtb25pdG9yIGFsbCBlLW1haWwgY29tbXVuaWNh dGlvbnMgdGhyb3VnaCBpdHMgbmV0d29yay4NCg0K |
|
From: Chris V. <cv...@gm...> - 2007-11-19 22:58:00
|
HI all, I'm back to working on this problem, but still with no success. Basically, what I want to do is to add additional metadata to an index created by NutchWax. I am able to add new fields and values to documents using standard Lucene classes, IndexReader, IndexWriter, IndexSearcher, and SimpleAnalyzer, and following the proper technique for updating Lucene documents (I think). New fields are added as stored, indexed, and un-tokenized. After the documents are updated, there is some strange behavior during querying. Queries against collection, date, url, and the newly added fields work fine. Unfortunately, queries against content and title no longer work. So it seems like the technique I'm using to update the documents is either insufficient (further action on index components is needed), or damaging (mangling part of the index or documents). If anyone is interested, I have a small sample index that exhibits the problem. Any insight is greatly appreciated. Thanks, Chris On Oct 23, 2007 5:06 PM, Chris Vicary <cv...@gm...> wrote: > Hi, > > I'd like to add extra metadata to indexes produced by NutchWax. The goal is > to perform searches against this metadata and full text at the same time. My > initial idea is to update documents similarly to suggested practices for > updating documents in Lucene indexes: retrieve documents based on search > term(s), delete documents from index, add new fields to documents, and then > add documents back to index. I am able to follow this strategy using the > Lucene 2.0 classes IndexSearcher, IndexReader and IndexWriter (or > IndexModifier). After the index documents have been updated, I can query > against the new metadata using the IndexSearcher class without any problem. > I can also use Luke to view the contents of the index and verify that the > metadata has been added to the documents. The problem is that once the > Index* classes are done updating the index documents, the NutchWax webapp is > unable to locate those documents (even after a restart). > > My question is what is the best way to add fields to NutchWax index > documents? Are there any Nutch or NutchWax classes I should use instead of > the Lucene Index* classes (I didn't see any likely candidates in either > project)? Is it possible I am leaving out some important steps when using > the Lucene Index* classes? > > Any help is appreciated, > > Chris > |
|
From: Brad T. <br...@ar...> - 2007-11-15 19:55:55
|
There are some newline problems in your last message, but it looks OK at =
first glance.
I haven't run wayback on Windows in a long time - this could be the=20
problem..
Your procedure to install the webapp looks OK. Your changes to the=20
wayback.xml also look good, and placing the ARC file in the 'arcDir'=20
directory should be all that is needed with that configuration.
Are there additional messages in the catalina.out log file indicating=20
that the ARC file(s) were indexed and then merged?
There is also a command line tool (again, not sure if these work on=20
Windows...) called 'bdb-client' that may help you debug the BDB file, or =
at least verify that your documents are in the index, which is the=20
0000000.jdb file you mentioned. As the BDB grows, more files,=20
00000001.jdb, etc, will be created, too.
I'll do a test on Windows in the next day or two.
Brad
Sitttichai Sombat wrote:
> Thanks Brad,for this answer but I checking the catalina.out log file,i=
ts good. This is the catalina.out log file.15 =BE.=C2. 2550 10:52:21 org.=
apache.catalina.core.AprLifecycleListener lifecycleEventINFO: The Apache =
Portable Runtime which allows optimal performance in production environme=
nts was not found on the java.library.path: C:\Program Files\Apache Softw=
are Foundation\Tomcat 5.5\bin;.;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS=
\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem15 =BE.=C2. 2550 10:52:21 or=
g.apache.coyote.http11.Http11BaseProtocol initINFO: Initializing Coyote H=
TTP/1.1 on http-808015 =BE.=C2. 2550 10:52:21 org.apache.catalina.startup=
=2ECatalina loadINFO: Initialization processed in 922 ms15 =BE.=C2. 2550 =
10:52:21 org.apache.catalina.core.StandardService startINFO: Starting ser=
vice Catalina15 =BE.=C2. 2550 10:52:21 org.apache.catalina.core.StandardE=
ngine startINFO: Starting Servlet Engine: Apache Tomcat/5.5.1215 =BE.=C2.=
2550 10:52:21 org.apache.catalina.core.StandardHost startINFO: XML v
> alidation disabled15 =BE.=C2. 2550 10:52:22 org.apache.catalina.startu=
p.HostConfig deployWARINFO: Deploying web application archive wayback-web=
app-1.0.1.war15 =BE.=C2. 2550 10:52:22 org.apache.catalina.startup.Contex=
tConfig validateSecurityRolesINFO: WARNING: Security role name wayback us=
ed in an <auth-constraint> without being defined in a <security-role>15 =BE=
=2E=C2. 2550 10:52:22 org.archive.wayback.webapp.RequestFilter initINFO: =
Wayback Filter initializing...15 =BE.=C2. 2550 10:52:23 org.springframewo=
rk.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitionsINFO: Loa=
ding XML bean definitions from file [C:\Program Files\Apache Software Fou=
ndation\Tomcat 5.5\webapps\wayback-webapp-1.0.1\WEB-INF\wayback.xml]15 =BE=
=2E=C2. 2550 10:52:23 org.springframework.beans.factory.support.DefaultLi=
stableBeanFactory preInstantiateSingletonsINFO: Pre-instantiating singlet=
ons in org.springframework.beans.factory.xml.XmlBeanFactory@1c5f743: defi=
ning beans [localbdbcollection,8080:wayback]; root of factory hi
> erarchy15 =BE.=C2. 2550 10:52:23 org.archive.wayback.resourcestore.Loc=
alARCResourceStore$AutoARCIndexThread <init>INFO: AutoARCIndexThread is a=
live.15 =BE.=C2. 2550 10:52:24 org.archive.wayback.resourceindex.bdb.BDBI=
ndexUpdater$BDBIndexUpdaterThread <init>INFO: BDBIndexUpdaterThread is al=
ive.15 =BE.=C2. 2550 10:52:24 org.archive.wayback.webapp.RequestFilter in=
itINFO: Wayback Filter initialization complete.15 =BE.=C2. 2550 10:52:25 =
org.apache.coyote.http11.Http11BaseProtocol startINFO: Starting Coyote HT=
TP/1.1 on http-808015 =BE.=C2. 2550 10:52:25 org.apache.jk.common.Channel=
Socket initINFO: JK: ajp13 listening on /0.0.0.0:800915 =BE.=C2. 2550 10:=
52:25 org.apache.jk.server.JkMain startINFO: Jk running ID=3D0 time=3D0/9=
4 config=3Dnull15 =BE.=C2. 2550 10:52:25 org.apache.catalina.storeconfig=
=2EStoreLoader loadINFO: Find registry server-registry.xml at classpath r=
esource15 =BE.=C2. 2550 10:52:25 org.apache.catalina.startup.Catalina sta=
rtINFO: Server startup in 3937 msThanks again,and this is all work about =
wayback at
> I do.1. I use waback version 1.0.1.2. I place .war file in a non-ROOT=
at "wayback-webapp-1.0.1".3. I place .arc.gz file in the "arcDir" as con=
figured in the wayback.xml configuration file.4. I modify wayback.xml fil=
e as follows: <property name=3D"jsInserts"> <l=
ist> <value>http://localhost:8080/wayback-webapp-1=
=2E0.1/wm.js</value> from =
<value>http://localhost:8080/wm.js</valu=
e> </list> </property> and <prope=
rty name=3D"uriConverter"> <bean class=3D"org.archive.wayback.=
archivalurl.ArchivalUrlResultURIConverter"> <property name=
=3D"replayURIPrefix" value=3D"http://localhost:8080/wayback-webapp-1.0.1/=
wayback/" /> from <property name=3D"replay=
URIPrefix" value=3D"http://localhost:8080/wayback/" /> </bean>=
</property>5. This work is enough.6. File searched=20
> is "00000000.jdb" file yes or no?Thank you very much.Sorry,if this que=
stion is use wrong sentence.> Hi Sitttichai,> > > >> > 1. If I have a ARC=
file. Can I use it immediately?> > The current version requires compress=
ed ARC files (.arc.gz) which using> the automatic indexing features of th=
e wayback software should be all you> need. Just place the .arc.gz files =
in the "arcDir" as configured in the> wayback.xml configuration file. Wit=
h the default wayback.xml> configuration, automatic indexing is enabled. =
If you're still having> problems, you might try checking the catalina.out=
Tomcat log file and see> if you find anything that looks bad -- feel fre=
e to post log messages with> questions.> > The next release, which should=
be available in the next few weeks will> support uncompressed ARC files.=
> > > 2. If I cannot use immediately then how I should be?> > see #1> > >=
3. Wayback must be work relation with NutchWax?> > Wayback does not req=
uire NutchWax. NutchWax allows you to perfo
> rm> full-text searches against documents in your ARC files. Wayback cu=
rrently> allows URL-based searching only, which is enough to browse conte=
nt in ARC> files, and perform queries by URL.> > Brad> > >> > thank yo> >=
> > _________________________________________________________________> > =
Express yourself instantly with MSN Messenger! Download today it's FREE!>=
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/-------=
------------------------------------------------------------------> > Thi=
s SF.net email is sponsored by: Splunk Inc.> > Still grepping through log=
files to find problems? Stop.> > Now Search log events and configuratio=
n files using AJAX and a browser.> > Download your FREE copy of Splunk no=
w >>> > http://get.splunk.com/___________________________________________=
____> > Archive-access-discuss mailing list> > Archive-access-discuss@lis=
ts.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/archiv=
e-access-discuss> >> >=20
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's FREE=
!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> =20
|
|
From: Pope, J. <Jac...@bl...> - 2007-11-15 09:21:47
|
Hiya Brad, Erik's fix has partially solved the problem, now when I click on a search result in NutchWax, it appears correctly in Wayback, and when I enter the archival URL directly (either specific time or '*' ) it works too, however entering the same URL in the wayback search box and clicking 'Take me back' generates the following URL: http://194.66.226.116:8080/wayback/query?type=3Durlquery&url=3Dhttp%3A%2F= %2F whenthebelfastchildsinsagain.blogspot.com%2Fsearch%2Flabel%2Fbelfast&dat e=3D&Submit=3DTake+Me+Back And returns Resource not found in archive. Also, the number of instances (both in the timeline pane and the search results page in wayback) is 99, and yet only one is listed. 99 is the number I set ArchivalUrlRequestParser.maxRecords to, as part of Erik's fix. I've extracted my nutch index from HADOOP, and copied it to an NFS share, both nutchwax and wayback are pointing to the same index. I've also attached my wayback.xml. Coincidently, I changed the name of the property for the NutchResourceIndex from remotenutchindex (as shown on the documentation webpage) to resourceIndex, to fix a crash on loading tomcat.=20 Cheers, Jack Jackson Pope Technical Lead Web Archiving Team The British Library +44 (0)1937 54 6942 -----Original Message----- From: Brad Tofel [mailto:br...@ar...]=20 Sent: 13 November 2007 20:10 To: Pope, Jackson Subject: Re: [Archive-access-discuss] Wayback 1.0.1 and NutchWax Hi Jackson, Are you trying to use the Nutch index for the Wayback, or have you built a separate index for the Wayback? Functionality for using the remote Nutch resource Index may not be=20 working at the moment -- we found too many performance issues with this, and moved to having both. If you are using a separate wayback-specific Index, does the wayback=20 function independently of Nutch? Can you send on your wayback.xml file, and the link for some search=20 results from NutchWax? Brad Pope, Jackson wrote: > Hiya All, > > =20 > > I'm trying to get NutchWax working with Wayback 1.0.1. I've installed > wayback as ROOT, and have a single collection (8080:wayback). I've > created the index (which NutchWax is searching ok) and made the arc > files available. However when I click on a search result in NutchWax or > enter something in the search box in Wayback it fails. The returned URL > looks right, but I get the following error message: > > > Bad Query Exception > > > The request is missing information, or is not understood by this server. > {0} > > Has anyone experienced this? Any ideas what the cause might be? > > =20 > > Cheers, > > =20 > > Jack > > =20 > > Jackson Pope > > Technical Lead > > Web Archiving Team > > The British Library > > +44 (0)1937 54 6942 > > ************************************************************************ ** > =20 > Experience the British Library online at www.bl.uk > =20 > The British Library's new interactive Annual Report and Accounts 2006/07 : www.bl.uk/mylibrary > =20 > Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook > =20 > The Library's St Pancras site is WiFi - enabled > =20 > ************************************************************************ * > =20 > The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the pos...@bl... : The contents of this e-mail must not be disclosed or copied without the sender's consent.=20 > =20 > The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.=20 > =20 > ************************************************************************ * > > =20 > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > =20 |
|
From: Sitttichai S. <non...@ho...> - 2007-11-15 06:07:38
|
Thanks Brad,for this answer but I checking the catalina.out log file,its good.
This is the catalina.out log file.
15 พ.ย. 2550 10:52:21 org.apache.catalina.core.AprLifecycleListener lifecycleEventINFO: The Apache Portable Runtime which allows optimal performance in production environments was not found on the java.library.path: C:\Program Files\Apache Software Foundation\Tomcat 5.5\bin;.;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem
15 พ.ย. 2550 10:52:21 org.apache.coyote.http11.Http11BaseProtocol initINFO: Initializing Coyote HTTP/1.1 on http-8080
15 พ.ย. 2550 10:52:21 org.apache.catalina.startup.Catalina loadINFO: Initialization processed in 922 ms
15 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardService startINFO: Starting service Catalina
15 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardEngine startINFO: Starting Servlet Engine: Apache Tomcat/5.5.12
15 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardHost startINFO: XML validation disabled
15 พ.ย. 2550 10:52:22 org.apache.catalina.startup.HostConfig deployWARINFO: Deploying web application archive wayback-webapp-1.0.1.war
15 พ.ย. 2550 10:52:22 org.apache.catalina.startup.ContextConfig validateSecurityRolesINFO: WARNING: Security role name wayback used in an <auth-constraint> without being defined in a <security-role>
15 พ.ย. 2550 10:52:22 org.archive.wayback.webapp.RequestFilter initINFO: Wayback Filter initializing...
15 พ.ย. 2550 10:52:23 org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitionsINFO: Loading XML bean definitions from file [C:\Program Files\Apache Software Foundation\Tomcat 5.5\webapps\wayback-webapp-1.0.1\WEB-INF\wayback.xml]
15 พ.ย. 2550 10:52:23 org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletonsINFO: Pre-instantiating singletons in org.springframework.beans.factory.xml.XmlBeanFactory@1c5f743: defining beans [localbdbcollection,8080:wayback]; root of factory hierarchy
15 พ.ย. 2550 10:52:23 org.archive.wayback.resourcestore.LocalARCResourceStore$AutoARCIndexThread <init>INFO: AutoARCIndexThread is alive.
15 พ.ย. 2550 10:52:24 org.archive.wayback.resourceindex.bdb.BDBIndexUpdater$BDBIndexUpdaterThread <init>INFO: BDBIndexUpdaterThread is alive.
15 พ.ย. 2550 10:52:24 org.archive.wayback.webapp.RequestFilter initINFO: Wayback Filter initialization complete.
15 พ.ย. 2550 10:52:25 org.apache.coyote.http11.Http11BaseProtocol startINFO: Starting Coyote HTTP/1.1 on http-8080
15 พ.ย. 2550 10:52:25 org.apache.jk.common.ChannelSocket initINFO: JK: ajp13 listening on /0.0.0.0:8009
15 พ.ย. 2550 10:52:25 org.apache.jk.server.JkMain startINFO: Jk running ID=0 time=0/94 config=null
15 พ.ย. 2550 10:52:25 org.apache.catalina.storeconfig.StoreLoader loadINFO: Find registry server-registry.xml at classpath resource
15 พ.ย. 2550 10:52:25 org.apache.catalina.startup.Catalina startINFO: Server startup in 3937 ms
Thanks again,and this is all work about wayback at I do.
1. I use waback version 1.0.1.
2. I place .war file in a non-ROOT at "wayback-webapp-1.0.1".
3. I place .arc.gz file in the "arcDir" as configured in the wayback.xml configuration file.
4. I modify wayback.xml file as follows:
<property name="jsInserts">
<list>
<value>http://localhost:8080/wayback-webapp-1.0.1/wm.js</value>
from
<value>http://localhost:8080/wm.js</value>
</list>
</property>
and
<property name="uriConverter">
<bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter">
<property name="replayURIPrefix" value="http://localhost:8080/wayback-webapp-1.0.1/wayback/" />
from
<property name="replayURIPrefix" value="http://localhost:8080/wayback/" />
</bean>
</property>
5. This work is enough.
6. File searched is "00000000.jdb" file yes or no?
Thank you very much.Sorry,if this question is use wrong sentence.
> Hi Sitttichai,
>
>
> >
> > 1. If I have a ARC file. Can I use it immediately?
>
> The current version requires compressed ARC files (.arc.gz) which using
> the automatic indexing features of the wayback software should be all you
> need. Just place the .arc.gz files in the "arcDir" as configured in the
> wayback.xml configuration file. With the default wayback.xml
> configuration, automatic indexing is enabled. If you're still having
> problems, you might try checking the catalina.out Tomcat log file and see
> if you find anything that looks bad -- feel free to post log messages with
> questions.
>
> The next release, which should be available in the next few weeks will
> support uncompressed ARC files.
>
> > 2. If I cannot use immediately then how I should be?
>
> see #1
>
> > 3. Wayback must be work relation with NutchWax?
>
> Wayback does not require NutchWax. NutchWax allows you to perform
> full-text searches against documents in your ARC files. Wayback currently
> allows URL-based searching only, which is enough to browse content in ARC
> files, and perform queries by URL.
>
> Brad
>
> >
> > thank yo
> >
> > _________________________________________________________________
> > Express yourself instantly with MSN Messenger! Download today it's FREE!
> > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/-------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems? Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >>
> > http://get.splunk.com/_______________________________________________
> > Archive-access-discuss mailing list
> > Arc...@li...
> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
> >
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ |
|
From: Sitttichai S. <non...@ho...> - 2007-11-15 04:31:31
|
Thanks Brad,for this answer but I checking the catalina.out log file,its good. This is the catalina.out log file.15 พ.ย. 2550 10:52:21 org.apache.catalina.core.AprLifecycleListener lifecycleEventINFO: The Apache Portable Runtime which allows optimal performance in production environments was not found on the java.library.path: C:\Program Files\Apache Software Foundation\Tomcat 5.5\bin;.;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem15 พ.ย. 2550 10:52:21 org.apache.coyote.http11.Http11BaseProtocol initINFO: Initializing Coyote HTTP/1.1 on http-808015 พ.ย. 2550 10:52:21 org.apache.catalina.startup.Catalina loadINFO: Initialization processed in 922 ms15 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardService startINFO: Starting service Catalina15 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardEngine startINFO: Starting Servlet Engine: Apache Tomcat/5.5.1215 พ.ย. 2550 10:52:21 org.apache.catalina.core.StandardHost startINFO: XML validation disabled15 พ.ย. 2550 10:52:22 org.apache.catalina.startup.HostConfig deployWARINFO: Deploying web application archive wayback-webapp-1.0.1.war15 พ.ย. 2550 10:52:22 org.apache.catalina.startup.ContextConfig validateSecurityRolesINFO: WARNING: Security role name wayback used in an <auth-constraint> without being defined in a <security-role>15 พ.ย. 2550 10:52:22 org.archive.wayback.webapp.RequestFilter initINFO: Wayback Filter initializing...15 พ.ย. 2550 10:52:23 org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitionsINFO: Loading XML bean definitions from file [C:\Program Files\Apache Software Foundation\Tomcat 5.5\webapps\wayback-webapp-1.0.1\WEB-INF\wayback.xml]15 พ.ย. 2550 10:52:23 org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletonsINFO: Pre-instantiating singletons in org.springframework.beans.factory.xml.XmlBeanFactory@1c5f743: defining beans [localbdbcollection,8080:wayback]; root of factory hierarchy15 พ.ย. 2550 10:52:23 org.archive.wayback.resourcestore.LocalARCResourceStore$AutoARCIndexThread <init>INFO: AutoARCIndexThread is alive.15 พ.ย. 2550 10:52:24 org.archive.wayback.resourceindex.bdb.BDBIndexUpdater$BDBIndexUpdaterThread <init>INFO: BDBIndexUpdaterThread is alive.15 พ.ย. 2550 10:52:24 org.archive.wayback.webapp.RequestFilter initINFO: Wayback Filter initialization complete.15 พ.ย. 2550 10:52:25 org.apache.coyote.http11.Http11BaseProtocol startINFO: Starting Coyote HTTP/1.1 on http-808015 พ.ย. 2550 10:52:25 org.apache.jk.common.ChannelSocket initINFO: JK: ajp13 listening on /0.0.0.0:800915 พ.ย. 2550 10:52:25 org.apache.jk.server.JkMain startINFO: Jk running ID=0 time=0/94 config=null15 พ.ย. 2550 10:52:25 org.apache.catalina.storeconfig.StoreLoader loadINFO: Find registry server-registry.xml at classpath resource15 พ.ย. 2550 10:52:25 org.apache.catalina.startup.Catalina startINFO: Server startup in 3937 msThanks again,and this is all work about wayback at I do.1. I use waback version 1.0.1.2. I place .war file in a non-ROOT at "wayback-webapp-1.0.1".3. I place .arc.gz file in the "arcDir" as configured in the wayback.xml configuration file.4. I modify wayback.xml file as follows: <property name="jsInserts"> <list> <value>http://localhost:8080/wayback-webapp-1.0.1/wm.js</value> from <value>http://localhost:8080/wm.js</value> </list> </property> and <property name="uriConverter"> <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> <property name="replayURIPrefix" value="http://localhost:8080/wayback-webapp-1.0.1/wayback/" /> from <property name="replayURIPrefix" value="http://localhost:8080/wayback/" /> </bean> </property>5. This work is enough.6. File searched is "00000000.jdb" file yes or no?Thank you very much.Sorry,if this question is use wrong sentence.> Hi Sitttichai,> > > >> > 1. If I have a ARC file. Can I use it immediately?> > The current version requires compressed ARC files (.arc.gz) which using> the automatic indexing features of the wayback software should be all you> need. Just place the .arc.gz files in the "arcDir" as configured in the> wayback.xml configuration file. With the default wayback.xml> configuration, automatic indexing is enabled. If you're still having> problems, you might try checking the catalina.out Tomcat log file and see> if you find anything that looks bad -- feel free to post log messages with> questions.> > The next release, which should be available in the next few weeks will> support uncompressed ARC files.> > > 2. If I cannot use immediately then how I should be?> > see #1> > > 3. Wayback must be work relation with NutchWax?> > Wayback does not require NutchWax. NutchWax allows you to perform> full-text searches against documents in your ARC files. Wayback currently> allows URL-based searching only, which is enough to browse content in ARC> files, and perform queries by URL.> > Brad> > >> > thank yo> >> > _________________________________________________________________> > Express yourself instantly with MSN Messenger! Download today it's FREE!> > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/-------------------------------------------------------------------------> > This SF.net email is sponsored by: Splunk Inc.> > Still grepping through log files to find problems? Stop.> > Now Search log events and configuration files using AJAX and a browser.> > Download your FREE copy of Splunk now >>> > http://get.splunk.com/_______________________________________________> > Archive-access-discuss mailing list> > Arc...@li...> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss> >> > _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ |
|
From: Brad T. <br...@ar...> - 2007-11-14 22:23:41
|
Hi Sitttichai, > > 1. If I have a ARC file. Can I use it immediately? The current version requires compressed ARC files (.arc.gz) which using the automatic indexing features of the wayback software should be all you need. Just place the .arc.gz files in the "arcDir" as configured in the wayback.xml configuration file. With the default wayback.xml configuration, automatic indexing is enabled. If you're still having problems, you might try checking the catalina.out Tomcat log file and see if you find anything that looks bad -- feel free to post log messages with questions. The next release, which should be available in the next few weeks will support uncompressed ARC files. > 2. If I cannot use immediately then how I should be? see #1 > 3. Wayback must be work relation with NutchWax? Wayback does not require NutchWax. NutchWax allows you to perform full-text searches against documents in your ARC files. Wayback currently allows URL-based searching only, which is enough to browse content in ARC files, and perform queries by URL. Brad > > thank yo > > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Chris V. <cv...@gm...> - 2007-11-14 21:44:40
|
Thanks Brad, that did the trick. I had an idea it was something simple that was unclear to me. -Chris On Nov 9, 2007 7:03 PM, Brad Tofel <br...@ar...> wrote: > Hi Chris, > > If you've deployed the .war as "wayback-webapp-1.0.1.war" (on port 8080, > for example) , and you define an AccessPoint named "8080:wayback", then > you need to use the public access URL: > > http://yourhost.org:8080/wayback-webapp-1.0.1/wayback/ > > which would place subsequent queries at: > > http://yourhost.org:8080/wayback-webapp-1.0.1/wayback/query?... > > The fact that the wayback returns "normal" looking UI pages when users > do not provide the "name" of the AccessPoint has been the source of a > lot of confusion.. This should be addressed in the next release. > > Let me know if adding the AccessPoint name works for you, or if you're > still having other problems, please forward on your wayback.xml > configuration. > > Brad > > > Chris Vicary wrote: > > Hi, > > > > I'm having some difficulty configuring the newest version of wayback. > > It is deployed in a non-ROOT webapp context (wayback-webapp-1.0.1, by > > default) in tomcat 5.5.20 and configured for a localcdxindex > > collection. Tomcat starts normally and I am able to access the wayback > > search interface, but whenever I attempt a URL query, a 404 page is > > returned with the description "The requested resource > > (/wayback-webapp-1.0.1/query) is not available." There are no errors > > reported in catalina.out. I'm guessing I missed something simple in a > > configuration file, but not sure what. > > > > Thanks, > > > > Chris > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > > |
|
From: Sitttichai S. <non...@ho...> - 2007-11-14 04:09:09
|
1. If I have a ARC file. Can I use it immediately? 2. If I cannot use immediately then how I should be? 3. Wayback must be work relation with NutchWax? thank yo _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ |
|
From: Erik H. <eri...@uc...> - 2007-11-13 21:52:01
|
At Tue, 13 Nov 2007 16:41:16 -0000,
"Pope, Jackson" <Jac...@bl...> wrote:
> Hiya All,
>
> I'm trying to get NutchWax working with Wayback 1.0.1. I've installed
> wayback as ROOT, and have a single collection (8080:wayback). I've
> created the index (which NutchWax is searching ok) and made the arc
> files available. However when I click on a search result in NutchWax or
> enter something in the search box in Wayback it fails. The returned URL
> looks right, but I get the following error message:
>
>
> Bad Query Exception
>
>
> The request is missing information, or is not understood by this server.
> {0}
>
> Has anyone experienced this? Any ideas what the cause might be?
We encountered the same issue here.
The cause is an exception which is thrown when maxRecords >
wbrequest.getResultsPerPage. The getResultsPerPage variable is set by
the form request parser to the value of maxRecords.
Basically, you need to set the ‘maxRecords’ var of your
org.archive.wayback.resourceindex.NutchResourceIndex bean and your
org.archive.wayback.archivalurl.ArchivalUrlRequestParser bean, and the
former value must be greater the latter value.
That eliminates the error, but paging still seems to be broken with a
nutch resourceindex.
best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
|
|
From: Pope, J. <Jac...@bl...> - 2007-11-13 16:41:17
|
Hiya All,
=20
I'm trying to get NutchWax working with Wayback 1.0.1. I've installed
wayback as ROOT, and have a single collection (8080:wayback). I've
created the index (which NutchWax is searching ok) and made the arc
files available. However when I click on a search result in NutchWax or
enter something in the search box in Wayback it fails. The returned URL
looks right, but I get the following error message:
Bad Query Exception
The request is missing information, or is not understood by this server.
{0}
Has anyone experienced this? Any ideas what the cause might be?
=20
Cheers,
=20
Jack
=20
Jackson Pope
Technical Lead
Web Archiving Team
The British Library
+44 (0)1937 54 6942
*************************************************************************=
*
=20
Experience the British Library online at www.bl.uk
=20
The British Library's new interactive Annual Report and Accounts 2006/07 =
: www.bl.uk/mylibrary
=20
Help the British Library conserve the world's knowledge. Adopt a Book. =
www.bl.uk/adoptabook
=20
The Library's St Pancras site is WiFi - enabled
=20
*************************************************************************=
=20
The information contained in this e-mail is confidential and may be =
legally privileged. It is intended for the addressee(s) only. If you are =
not the intended recipient, please delete this e-mail and notify the =
pos...@bl... : The contents of this e-mail must not be disclosed or =
copied without the sender's consent.=20
=20
The statements and opinions expressed in this message are those of the =
author and do not necessarily reflect those of the British Library. The =
British Library does not take any responsibility for the views of the =
author.=20
=20
*************************************************************************=
|
|
From: Sitttichai S. <non...@ho...> - 2007-11-13 08:12:25
|
Hi, I have a problem with using wayback project. I download wayback 1.0.1 install to tomcat 5.5 in localhost on windowsXP and use default configuration. I put arc file to "/tmp/wayback/arcs". But when I test search. Result is " Resource Not In Archive " . How I correct this problem. Thank you. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ |
|
From: Brad T. <br...@ar...> - 2007-11-12 20:35:46
|
Hi Oskar, This functionality will be back in the next release, and will be=20 available before then in SVN and on the nightly build box. I'll drop a=20 line when it has been checked in and is available. Brad Oskar Grenholm wrote: > Hello Brad (and everyone else)! > > I sent an email to you a couple of weeks ago about my first impressions= of the Wayback 1.0, and asked some questions (mainly about getting the W= AXToolbar to work with 1.0). But it seems that the mail never reached you= , so I'll try sending it to the Archive-access list instead.=20 > > Here is the old mail: > =20 > "Hi Brad! > This is Oskar Grenholm from the Swedish National Library (we met briefl= y when we visited SF and the Archive in June, but I'll forgive you if you= don't remember that ;-) > > First I must compliment you on a fine piece of software. I downloaded t= he 1.0 a couple of days ago, and it looks good. The Spring beans are a bi= g improvement and it feels really thought through and well designed. > I especially like the idea of Access Points and WB Collections.=20 > > I've got it working in both Archival and Proxy Mode, but only with loca= l stuff so far. I'll try with the remote stuff some day though. > > But when I tried it with my old WAXToolbar, that didn't go as smooth. I= t worked okey, and surfed the archive, but I couldn't get the list of all= available dates to work. A quick review of the javascript code revealed = that it relied on an annotation in the <result> tag called <closest> that= should be set to "true" (in the wayback-xml returned from a xmlquery). T= his annotation wasn't there anymore (I can't remember if this was somethi= ng that I wrote and added or if it was there already and I just used it).= So now I wonder is there any special reason that this isn't there anymor= e? Do you have any good ideas of another way to easy know which date is t= he closest to the date being surfed (i.e., being the one looked at) from = within my javascript code. I guess I could hold a reference to the time b= eing surfed and calculate it myself in there, but I liked the idea of the= Wayback doing all that (since it problably does that anyway). > > Best regards, > Oskar." > > > -----Ursprungligt meddelande----- > Fr=E5n: arc...@li... genom Brad= Tofel > Skickat: fr 2007-10-19 01:41 > Till: arc...@li... > =C4mne: [Archive-access-discuss] [ANN] Wayback 1.0.1 released > =20 > This maintenance release fixes a bug which prevented AccessPoints from = > working properly when the webapp was deployed to a non-ROOT context. > > -----------------------------------------------------------------------= -- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser.= > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > =20 |
|
From: Oskar G. <Osk...@kb...> - 2007-11-11 17:06:52
|
Hello Brad (and everyone else)! I sent an email to you a couple of weeks ago about my first impressions = of the Wayback 1.0, and asked some questions (mainly about getting the = WAXToolbar to work with 1.0). But it seems that the mail never reached = you, so I'll try sending it to the Archive-access list instead.=20 Here is the old mail: =20 "Hi Brad! This is Oskar Grenholm from the Swedish National Library (we met briefly = when we visited SF and the Archive in June, but I'll forgive you if you = don't remember that ;-) First I must compliment you on a fine piece of software. I downloaded = the 1.0 a couple of days ago, and it looks good. The Spring beans are a = big improvement and it feels really thought through and well designed. I especially like the idea of Access Points and WB Collections.=20 I've got it working in both Archival and Proxy Mode, but only with local = stuff so far. I'll try with the remote stuff some day though. But when I tried it with my old WAXToolbar, that didn't go as smooth. It = worked okey, and surfed the archive, but I couldn't get the list of all = available dates to work. A quick review of the javascript code revealed = that it relied on an annotation in the <result> tag called <closest> = that should be set to "true" (in the wayback-xml returned from a = xmlquery). This annotation wasn't there anymore (I can't remember if = this was something that I wrote and added or if it was there already and = I just used it). So now I wonder is there any special reason that this = isn't there anymore? Do you have any good ideas of another way to easy = know which date is the closest to the date being surfed (i.e., being the = one looked at) from within my javascript code. I guess I could hold a = reference to the time being surfed and calculate it myself in there, but = I liked the idea of the Wayback doing all that (since it problably does = that anyway). Best regards, Oskar." -----Ursprungligt meddelande----- Fr=E5n: arc...@li... genom Brad = Tofel Skickat: fr 2007-10-19 01:41 Till: arc...@li... =C4mne: [Archive-access-discuss] [ANN] Wayback 1.0.1 released =20 This maintenance release fixes a bug which prevented AccessPoints from=20 working properly when the webapp was deployed to a non-ROOT context. -------------------------------------------------------------------------= This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
|
From: Brad T. <br...@ar...> - 2007-11-10 00:02:35
|
Hi Chris, If you've deployed the .war as "wayback-webapp-1.0.1.war" (on port 8080, for example) , and you define an AccessPoint named "8080:wayback", then you need to use the public access URL: http://yourhost.org:8080/wayback-webapp-1.0.1/wayback/ which would place subsequent queries at: http://yourhost.org:8080/wayback-webapp-1.0.1/wayback/query?... The fact that the wayback returns "normal" looking UI pages when users do not provide the "name" of the AccessPoint has been the source of a lot of confusion.. This should be addressed in the next release. Let me know if adding the AccessPoint name works for you, or if you're still having other problems, please forward on your wayback.xml configuration. Brad Chris Vicary wrote: > Hi, > > I'm having some difficulty configuring the newest version of wayback. > It is deployed in a non-ROOT webapp context (wayback-webapp-1.0.1, by > default) in tomcat 5.5.20 and configured for a localcdxindex > collection. Tomcat starts normally and I am able to access the wayback > search interface, but whenever I attempt a URL query, a 404 page is > returned with the description "The requested resource > (/wayback-webapp-1.0.1/query) is not available." There are no errors > reported in catalina.out. I'm guessing I missed something simple in a > configuration file, but not sure what. > > Thanks, > > Chris > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Chris V. <cv...@gm...> - 2007-11-09 22:08:12
|
Hi, I'm having some difficulty configuring the newest version of wayback. It is deployed in a non-ROOT webapp context (wayback-webapp-1.0.1, by default) in tomcat 5.5.20 and configured for a localcdxindex collection. Tomcat starts normally and I am able to access the wayback search interface, but whenever I attempt a URL query, a 404 page is returned with the description "The requested resource (/wayback-webapp-1.0.1/query) is not available." There are no errors reported in catalina.out. I'm guessing I missed something simple in a configuration file, but not sure what. Thanks, Chris |
|
From: Gina J. <gj...@lo...> - 2007-11-08 20:39:26
|
Hi Brad, We are not finding any problems accessing collections using Wayback 1.0. = David Brooks and I did a final quality review on our Supreme Court = collection using Wayback 1.0. No odd or unexpected behaviours. =20 I like the timeline/archival mode, and plan to ask Ignacio to configure = the layout differently when we implement it here, most specifically to = change the Timestamp, the layout, and content. =20 However, there are numerous layout issues caused by the timeline in = Wayback 1.0 which need to be fixed before we migrate our collections to = it. I have done screenshots of where I have seen problems. I can put = those up if you like or send urls. Ignacio, am going to repeat your evaluation of what you think the problem = may be: <snip> >From Ignacio I looked a little bit at the code in the file and for what I've seen, = the=20 problem may be in two places. 1. The page being displayed uses absolue position to display its top=20 elements, which will end up covered by the timeline box, since it is=20 position at the top by the javascript. 2. The page has some coding problems and the Parent of the timeline=20 element is not the BODY tag, but some other tag, which will not move = the=20 box to the top of the page. The way the script works, it grabs the actual code for the timeline box=20 and also records the PARENT of the element. Then it removes the box (from the botton of the page) and places it at = the=20 beginning of the PARENT element (in front of the first child) This works if the PARENT of the box is the BODY tag. </snip> Gina |
|
From: Brad T. <br...@ar...> - 2007-11-06 19:53:18
|
I think I did the deploy at "wb-webapp" ServletContext and had success, but I'll double check that this did work on our setup. Re: the trailing slash, this is a bug that has come up twice in the last week. There'll be a fix checked into SVN and available on our build box in the next day or two; I'll send a note when it's available. The static map file should not require leading "http://", let me know if you find this is not the case. The wayback currently does not have much flexibility in the URL canonicalization -- currently leading "www." and "www[0-9]*\." are stripped from hostnames. We intend to make this functionality configurable going forward, but have no schedule yet. Let me know what priority it is for your installations. We are looking at doing a maintenance 1.2.0 release in the next few weeks, which will also have this fix present. Brad > Hello Brad, > > Thanks for the response... > > I just installed once again Wayback 1.0.1 in a non-root context and it > seems > like this time it worked. I do not know what the problem was before, but > it > working with no problems. > The only thing I can think of that might have been a problem is the name > of > the context. I was using "wb-webapp" before and I installed as "wayback" > this time. > It might be a problem with the "-" (dash) in the middle of the context > name > that was making the application fail. > > The addition of 'init-method="init"' to the exclusion bean also worked, so > we are good there too. > The only question regarding this is: > What is the naming convention for URLs that we must use. >>From my testing it seems like the "http://" always needs to be present >> and > then it does not matter if the www. are there or not. > I tried without http:// and nothing got blocked, so I am assuming the > template would be: http://(www.)?DOMAIN.TO.BLOCK > > One last thing that you did not address in my first email is the issue of > accessing the accessPoints without the trailing slash. > After making version 1.0.1 work, it seems like the problem is still there: > http://xyz.com/wayback/collectionA/ -> works > http://xyz.com/wayback/collectionA -> does not work > Do you have any ideas on this one? > > Thank you. > > On 11/2/07, Brad Tofel <br...@ar...> wrote: >> >> Hi Ignacio, >> >> Glad the new configuration system is working well for you. >> >> I am still unable to reproduce the problem with the non-ROOT context, I >> will hopefully have more info Monday on this issue. >> >> I am also unable to reproduce the multiple IP address problem -- perhaps >> adding some additional logging within this module will simplify. A >> restart of Tomcat should be all that's needed to reload the new >> wayback.xml configuration, so let me know if you're still having >> problems with this. >> >> Re: the same collection exported with different Exclusion configuration, >> you'll need multiple AccessPoints. >> >> AccessPoint A: exports Collection Foo with no Exclusions, and limits via >> authentication users within one of your IP ranges. >> >> AccessPoint B: exports Collection Foo with your administrative list >> Exclusions, and has no authentication configuration. >> >> Re: the administrative list exclusions.. I just noticed that the >> documentation does not include the 'init-method="init"' which needs to >> be part of the StaticMapExclusionFilterFactory bean definition. I've >> just checked in this documentation change now, and will push it live to >> the wayback website on Monday. >> >> Let me know how this works for you. >> >> Brad >> >> Ignacio Garcia wrote: >> > Hello Brad, everyone, >> > >> > I have been playing around with Wayback 1.0 for a couple of weeks, >> since >> it >> > got released and here is a list of my comments, questions and issues. >> > >> > I will start by saying that I really like the changes that have been >> made, >> > specially in the configuration aspect of the tool. >> > It is now much easier to configure, to understand what each section >> does >> and >> > set up the environment. >> > >> > I have been able to set up several AccessPoints (3) that access >> different >> > collections (3) and they all seem to work as expected. >> > They are set up on port 8088, so changing the port is not an issue and >> can >> > be done easily, using the AccessPoint configuration. >> > All three collections use CDX indexes, so this also works perfectly. >> > However, I was only able to make Wayback work using version 1.0.0 >> under >> the >> > ROOT context. >> > I downloaded and tried version 1.0.1 but it did not start due to >> errors >> in >> > the configuration (even using the default set up). >> > I do not think that using the ROOT context is a big issue, since the >> > AccessPoints provide path control and differentiation, but it wold be >> good >> > if we could deploy Wayback under different contexts. >> > Also, I have found that if you try to access an AccessPoint location >> without >> > the trailing slash '/' it will not work. A Not-Found (404) error is >> > displayed instead. >> > This means that typing: http://xyz.com/myCollection/ displays the >> Wayback >> > interface successfully, but using http://xyz.com/myCollection will >> not. >> > I do not know if this is something that should be corrected in the >> server >> > configuration and it is not a Wayback issue, but I thought I should >> let >> you >> > know. >> > >> > My next comments are regarding the exclusion and restriction >> mechanisms. >> > Have in mind that I am using version 1.0.0, so I do not know if a >> working >> > 1.0.1 has this issues resolved. >> > >> > I was able to successfully implement an IP-based restriction on one of >> my >> > collections, and it did block content for all IPs outside of the >> specified >> > range. >> > However, I had some problems when trying to specify more than one >> <value> >> > element to the IP <list>. >> > I wanted to use two IP ranges, and there were some issues. >> > I will have to test this more extensively, because it might be a >> problem >> of >> > Wayback not updating properly after a simple restart. >> > >> > I also tried to implement an static exclusion using a plain text file >> and I >> > have to say that I was not able to make this work at all. >> > I added this code section to my wayback.xml file. It was by itself, >> outside >> > any AccessPoint or Collection. >> > >> > <bean name="2004-exclusion-list" class=" >> > >> org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory >> > "> >> > <property name="file" >> > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> >> > <property name="checkInterval" value="10" /> >> > </bean> >> > >> > Then, inside the desired AccessPoint, I added the following: >> > >> > <property name="exclusionFactory" ref="2004-exclusion-list" /> >> > >> > The Catalina log does not show any information regarding Wayback >> accessing >> > the file, so I believe that the configuration file parsed correctly, >> but >> it >> > chose to ignore the exclusion and that is why it is not being applied. >> > >> > My last question has to do with the integration of this two >> > exclusion/restriction mechanisms. >> > In some of my AccessPoints, I would like to be able to block some >> URLs, >> but >> > only to those users that are outside of the range provided. >> > Will I have to create two AccessPoints, one with the IP restriction >> that >> > will allow users to view the complete collection, and then a different >> one >> > that will block the contents for everyone or can I put the together in >> a >> > single AccessPoint? >> > Since I could not implement the static exclusion I was not able to >> test >> if >> > this properties could be nested one inside the other, but I think that >> this >> > would be a very important option. >> > Otherwise, we would have to implement server-side redirection based on >> IP >> > addresses to point users to the correct AccessPoint, and that would >> > eliminate most of the benefit of integrating IP recognition inside >> Wayback. >> > >> > >> > This is what I have experienced up to this point. I will keep testing >> other >> > aspects that we might use and report back with my findings. >> > >> > Thank you. >> > >> > >> > ------------------------------------------------------------------------ >> > >> > >> ------------------------------------------------------------------------- >> > This SF.net email is sponsored by: Splunk Inc. >> > Still grepping through log files to find problems? Stop. >> > Now Search log events and configuration files using AJAX and a >> browser. >> > Download your FREE copy of Splunk now >> http://get.splunk.com/ >> > ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Archive-access-discuss mailing list >> > Arc...@li... >> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >> > >> >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Ignacio G. <igc...@gm...> - 2007-11-06 14:45:45
|
Hello Brad, Thanks for the response... I just installed once again Wayback 1.0.1 in a non-root context and it seems like this time it worked. I do not know what the problem was before, but it working with no problems. The only thing I can think of that might have been a problem is the name of the context. I was using "wb-webapp" before and I installed as "wayback" this time. It might be a problem with the "-" (dash) in the middle of the context name that was making the application fail. The addition of 'init-method="init"' to the exclusion bean also worked, so we are good there too. The only question regarding this is: What is the naming convention for URLs that we must use. >From my testing it seems like the "http://" always needs to be present and then it does not matter if the www. are there or not. I tried without http:// and nothing got blocked, so I am assuming the template would be: http://(www.)?DOMAIN.TO.BLOCK One last thing that you did not address in my first email is the issue of accessing the accessPoints without the trailing slash. After making version 1.0.1 work, it seems like the problem is still there: http://xyz.com/wayback/collectionA/ -> works http://xyz.com/wayback/collectionA -> does not work Do you have any ideas on this one? Thank you. On 11/2/07, Brad Tofel <br...@ar...> wrote: > > Hi Ignacio, > > Glad the new configuration system is working well for you. > > I am still unable to reproduce the problem with the non-ROOT context, I > will hopefully have more info Monday on this issue. > > I am also unable to reproduce the multiple IP address problem -- perhaps > adding some additional logging within this module will simplify. A > restart of Tomcat should be all that's needed to reload the new > wayback.xml configuration, so let me know if you're still having > problems with this. > > Re: the same collection exported with different Exclusion configuration, > you'll need multiple AccessPoints. > > AccessPoint A: exports Collection Foo with no Exclusions, and limits via > authentication users within one of your IP ranges. > > AccessPoint B: exports Collection Foo with your administrative list > Exclusions, and has no authentication configuration. > > Re: the administrative list exclusions.. I just noticed that the > documentation does not include the 'init-method="init"' which needs to > be part of the StaticMapExclusionFilterFactory bean definition. I've > just checked in this documentation change now, and will push it live to > the wayback website on Monday. > > Let me know how this works for you. > > Brad > > Ignacio Garcia wrote: > > Hello Brad, everyone, > > > > I have been playing around with Wayback 1.0 for a couple of weeks, since > it > > got released and here is a list of my comments, questions and issues. > > > > I will start by saying that I really like the changes that have been > made, > > specially in the configuration aspect of the tool. > > It is now much easier to configure, to understand what each section does > and > > set up the environment. > > > > I have been able to set up several AccessPoints (3) that access > different > > collections (3) and they all seem to work as expected. > > They are set up on port 8088, so changing the port is not an issue and > can > > be done easily, using the AccessPoint configuration. > > All three collections use CDX indexes, so this also works perfectly. > > However, I was only able to make Wayback work using version 1.0.0 under > the > > ROOT context. > > I downloaded and tried version 1.0.1 but it did not start due to errors > in > > the configuration (even using the default set up). > > I do not think that using the ROOT context is a big issue, since the > > AccessPoints provide path control and differentiation, but it wold be > good > > if we could deploy Wayback under different contexts. > > Also, I have found that if you try to access an AccessPoint location > without > > the trailing slash '/' it will not work. A Not-Found (404) error is > > displayed instead. > > This means that typing: http://xyz.com/myCollection/ displays the > Wayback > > interface successfully, but using http://xyz.com/myCollection will not. > > I do not know if this is something that should be corrected in the > server > > configuration and it is not a Wayback issue, but I thought I should let > you > > know. > > > > My next comments are regarding the exclusion and restriction mechanisms. > > Have in mind that I am using version 1.0.0, so I do not know if a > working > > 1.0.1 has this issues resolved. > > > > I was able to successfully implement an IP-based restriction on one of > my > > collections, and it did block content for all IPs outside of the > specified > > range. > > However, I had some problems when trying to specify more than one > <value> > > element to the IP <list>. > > I wanted to use two IP ranges, and there were some issues. > > I will have to test this more extensively, because it might be a problem > of > > Wayback not updating properly after a simple restart. > > > > I also tried to implement an static exclusion using a plain text file > and I > > have to say that I was not able to make this work at all. > > I added this code section to my wayback.xml file. It was by itself, > outside > > any AccessPoint or Collection. > > > > <bean name="2004-exclusion-list" class=" > > > org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory > > "> > > <property name="file" > > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> > > <property name="checkInterval" value="10" /> > > </bean> > > > > Then, inside the desired AccessPoint, I added the following: > > > > <property name="exclusionFactory" ref="2004-exclusion-list" /> > > > > The Catalina log does not show any information regarding Wayback > accessing > > the file, so I believe that the configuration file parsed correctly, but > it > > chose to ignore the exclusion and that is why it is not being applied. > > > > My last question has to do with the integration of this two > > exclusion/restriction mechanisms. > > In some of my AccessPoints, I would like to be able to block some URLs, > but > > only to those users that are outside of the range provided. > > Will I have to create two AccessPoints, one with the IP restriction that > > will allow users to view the complete collection, and then a different > one > > that will block the contents for everyone or can I put the together in a > > single AccessPoint? > > Since I could not implement the static exclusion I was not able to test > if > > this properties could be nested one inside the other, but I think that > this > > would be a very important option. > > Otherwise, we would have to implement server-side redirection based on > IP > > addresses to point users to the correct AccessPoint, and that would > > eliminate most of the benefit of integrating IP recognition inside > Wayback. > > > > > > This is what I have experienced up to this point. I will keep testing > other > > aspects that we might use and report back with my findings. > > > > Thank you. > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > > |
|
From: Brad T. <br...@ar...> - 2007-11-03 01:32:50
|
Hi Ignacio, Glad the new configuration system is working well for you. I am still unable to reproduce the problem with the non-ROOT context, I will hopefully have more info Monday on this issue. I am also unable to reproduce the multiple IP address problem -- perhaps adding some additional logging within this module will simplify. A restart of Tomcat should be all that's needed to reload the new wayback.xml configuration, so let me know if you're still having problems with this. Re: the same collection exported with different Exclusion configuration, you'll need multiple AccessPoints. AccessPoint A: exports Collection Foo with no Exclusions, and limits via authentication users within one of your IP ranges. AccessPoint B: exports Collection Foo with your administrative list Exclusions, and has no authentication configuration. Re: the administrative list exclusions.. I just noticed that the documentation does not include the 'init-method="init"' which needs to be part of the StaticMapExclusionFilterFactory bean definition. I've just checked in this documentation change now, and will push it live to the wayback website on Monday. Let me know how this works for you. Brad Ignacio Garcia wrote: > Hello Brad, everyone, > > I have been playing around with Wayback 1.0 for a couple of weeks, since it > got released and here is a list of my comments, questions and issues. > > I will start by saying that I really like the changes that have been made, > specially in the configuration aspect of the tool. > It is now much easier to configure, to understand what each section does and > set up the environment. > > I have been able to set up several AccessPoints (3) that access different > collections (3) and they all seem to work as expected. > They are set up on port 8088, so changing the port is not an issue and can > be done easily, using the AccessPoint configuration. > All three collections use CDX indexes, so this also works perfectly. > However, I was only able to make Wayback work using version 1.0.0 under the > ROOT context. > I downloaded and tried version 1.0.1 but it did not start due to errors in > the configuration (even using the default set up). > I do not think that using the ROOT context is a big issue, since the > AccessPoints provide path control and differentiation, but it wold be good > if we could deploy Wayback under different contexts. > Also, I have found that if you try to access an AccessPoint location without > the trailing slash '/' it will not work. A Not-Found (404) error is > displayed instead. > This means that typing: http://xyz.com/myCollection/ displays the Wayback > interface successfully, but using http://xyz.com/myCollection will not. > I do not know if this is something that should be corrected in the server > configuration and it is not a Wayback issue, but I thought I should let you > know. > > My next comments are regarding the exclusion and restriction mechanisms. > Have in mind that I am using version 1.0.0, so I do not know if a working > 1.0.1 has this issues resolved. > > I was able to successfully implement an IP-based restriction on one of my > collections, and it did block content for all IPs outside of the specified > range. > However, I had some problems when trying to specify more than one <value> > element to the IP <list>. > I wanted to use two IP ranges, and there were some issues. > I will have to test this more extensively, because it might be a problem of > Wayback not updating properly after a simple restart. > > I also tried to implement an static exclusion using a plain text file and I > have to say that I was not able to make this work at all. > I added this code section to my wayback.xml file. It was by itself, outside > any AccessPoint or Collection. > > <bean name="2004-exclusion-list" class=" > org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory > "> > <property name="file" > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> > <property name="checkInterval" value="10" /> > </bean> > > Then, inside the desired AccessPoint, I added the following: > > <property name="exclusionFactory" ref="2004-exclusion-list" /> > > The Catalina log does not show any information regarding Wayback accessing > the file, so I believe that the configuration file parsed correctly, but it > chose to ignore the exclusion and that is why it is not being applied. > > My last question has to do with the integration of this two > exclusion/restriction mechanisms. > In some of my AccessPoints, I would like to be able to block some URLs, but > only to those users that are outside of the range provided. > Will I have to create two AccessPoints, one with the IP restriction that > will allow users to view the complete collection, and then a different one > that will block the contents for everyone or can I put the together in a > single AccessPoint? > Since I could not implement the static exclusion I was not able to test if > this properties could be nested one inside the other, but I think that this > would be a very important option. > Otherwise, we would have to implement server-side redirection based on IP > addresses to point users to the correct AccessPoint, and that would > eliminate most of the benefit of integrating IP recognition inside Wayback. > > > This is what I have experienced up to this point. I will keep testing other > aspects that we might use and report back with my findings. > > Thank you. > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |