|
From: Brad T. <br...@ar...> - 2007-11-27 21:32:53
|
Hi Thomas, Sorry for the delay in response to your offline post, I was out of the office last week. I've just reproduced the problem -- pretty simple and silly bug. The thread variable in the LocalArcResourceStore is marked "static", so yes, with the current software, only one indexing thread can be active at a time. The problem also exists in the BDBIndexUpdater class, so there can only be one merging thread as well. I've created a bug, ACC-8: http://webteam.archive.org/jira/browse/ACC-8 We will either make a new 1.0.2 release which fixes this, or may postpone the fix until 1.2.0 is released in the next couple weeks. As a workaround, you can configure/activate them independently until all ARCs in each collection are indexed and merged. Thanks for posting the problem! Brad Thomas Beekman wrote: > Hello all, > > > > My name is Thomas Beekman, and I'm the Technical Lead of Web archiving > at the Royal Library of the Netherlands. I'm testing the Open Source > Wayback Machine for about two weeks now, but I have found a rather > strange behavior when using two different sets of indexes. > > > > I'm trying to set up a regular index, for production use (DB1, as a > BDB), and a test index, for QA (DBQA, also BDB). It seems like the > second DB is not automatically indexed, even when I put CDX files > manually in the incoming directory of the index-data folder. When > copying an existing DB into the /tmp/wayback-qa (the QA DB folder), it > seems to work though. So I guess there is a problem with the > indexClient. I have included my wayback.xml which I used to test this. > > > > I hope that someone could help me, or will reply if the problem does not > lie in the software but in my configuration. > > > > > > Greetings, > > Thomas Beekman > > > > > > ------------------------------------------------------------------------ > > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"> > <beans> > > <!-- > The following 3 beans are required when using the ArcProxy for providing > HTTP 1.1 remote access to ARC files distributed across multiple computers > or directories. > --> > <!-- > <bean id="filelocationdb" class="org.archive.wayback.resourcestore.http.FileLocationDB" > init-method="init"> > <property name="bdbPath" value="/tmp/wayback/arc-db" /> > <property name="bdbName" value="DB1" /> > <property name="logPath" value="/tmp/wayback/arc-db.log" /> > </bean> > > <bean name="8080:arcproxy" class="org.archive.wayback.resourcestore.http.ArcProxyServlet"> > <property name="locationDB" ref="filelocationdb" /> > </bean> > <bean name="8080:locationdb" class="org.archive.wayback.resourcestore.http.FileLocationDBServlet"> > <property name="locationDB" ref="filelocationdb" /> > </bean> > --> > > > <!-- > The following 2 beans are required when using exclusions based on live > robots.txt documents. > --> > <!-- > <bean id="livewebcache" class="org.archive.wayback.liveweb.LiveWebCache"> > > <property name="arcCacheDir"> > <bean class="org.archive.wayback.liveweb.ARCCacheDirectory" > init-method="init"> > > <property name="arcDir" value="/tmp/wayback/liveweb/arcs/" /> > <property name="arcPrefix" value="live" /> > </bean> > </property> > > <property name="cacher"> > <bean class="org.archive.wayback.liveweb.URLCacher"> > <property name="tmpDir" value="/tmp/wayback/liveweb/tmp/" /> > </bean> > </property> > > <property name="index"> > <bean class="org.archive.wayback.liveweb.LiveWebLocalResourceIndex"> > > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > > <property name="bdbName" value="DB1" /> > <property name="bdbPath" value="/tmp/wayback/liveweb/db/" /> > </bean> > </property> > </bean> > </property> > </bean> > > <bean id="excluder-factory-robot" class="org.archive.wayback.accesscontrol.robotstxt.RobotExclusionFilterFactory"> > <property name="maxCacheMS" value="86400000" /> > <property name="userAgent" value="ia_archiver" /> > <property name="webCache" ref="livewebcache" /> > </bean> > --> > > <bean id="localbdbcollection" class="org.archive.wayback.webapp.WaybackCollection"> > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs/" /> > <property name="queuedDir" value="/tmp/wayback/arc-indexer/queued" /> > <property name="workDir" value="/tmp/wayback/arc-indexer/work" /> > <property name="runInterval" value="10000" /> > <property name="indexClient"> > <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> > <property name="tmpDir" value="/tmp/wayback/arc-indexer/tmp" /> > <property name="target" value="/tmp/wayback/index-data/incoming" /> > </bean> > </property> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > <property name="bdbName" value="DB1" /> > <property name="bdbPath" value="/tmp/wayback/index/" /> > <property name="updater"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndexUpdater"> > <property name="incoming" value="/tmp/wayback/index-data/incoming/" /> > <property name="failed" value="/tmp/wayback/index-data/failed/" /> > <property name="merged" value="/tmp/wayback/index-data/merged/" /> > <property name="runInterval" value="10000" /> > </bean> > </property> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > <bean id="localqacollection" class="org.archive.wayback.webapp.WaybackCollection"> > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs-qa/" /> > <property name="queuedDir" value="/tmp/wayback-qa/arc-indexer/queued" /> > <property name="workDir" value="/tmp/wayback-qa/arc-indexer/work" /> > <property name="runInterval" value="10000" /> > <property name="indexClient"> > <bean class="org.archive.wayback.resourceindex.indexer.IndexClient"> > <property name="tmpDir" value="/tmp/wayback-qa/arc-indexer/tmp" /> > <property name="target" value="/tmp/wayback-qa/index-data/incoming" /> > </bean> > </property> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndex" > init-method="init"> > <property name="bdbName" value="DBQA" /> > <property name="bdbPath" value="/tmp/wayback-qa/index/" /> > <property name="updater"> > <bean class="org.archive.wayback.resourceindex.bdb.BDBIndexUpdater"> > <property name="incoming" value="/tmp/wayback-qa/index-data/incoming/" /> > <property name="failed" value="/tmp/wayback-qa/index-data/failed/" /> > <property name="merged" value="/tmp/wayback-qa/index-data/merged/" /> > <property name="runInterval" value="10000" /> > </bean> > </property> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > <!-- > The following WaybackCollection bean template is required when using a > manually built local CDX index. > --> > > <bean id="localcdxcollection" class="org.archive.wayback.webapp.WaybackCollection"> > > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocalARCResourceStore" > init-method="init"> > <property name="arcDir" value="/arcs-qa/" /> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="source"> > <bean id="cdxsearchresultsource" class="org.archive.wayback.resourceindex.cdx.CDXIndex"> > <property name="path" value="/tmp/wayback-qa/cdx-index/index.cdx" /> > </bean> > </property> > <property name="maxRecords" value="10000" /> > </bean> > </property> > </bean> > > > > <!-- > The following WaybackCollection bean template is required when using a > remote ResourceIndex and ResourceStore implementation. This will also > required setting up an arcproxy and locationdb on the host specified by > the resourceStore:urlPrefix configuration, and an addition AccessPoint > on the host specified by the resourceIndex:searchUrlBase configuration. > --> > <!-- > <bean id="remotecollection" class="org.archive.wayback.webapp.WaybackCollection"> > > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.HttpARCResourceStore"> > <property name="urlPrefix" value="http://localhost:8080/arcproxy/" /> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.RemoteResourceIndex" > init-method="init"> > <property name="searchUrlBase" value="http://indexhost:8080/index/xmlquery" /> > </bean> > </property> > </bean> > --> > > <!-- > This is the only AccessPoint defined by default within this wayback.xml > Spring configuration file, providing an ArchivalURL Replay UI to the > "localbdbcollection" by providing ArchivalURL-specific implementations > of the replay, parser, and uriConverter. > > This AccessPoint currently will provide access only from the machine > running Tomcat. To provide external access, replace "localhost" with your > fully qualified hostname of the computer running Tomcat. > --> > > <!-- QueryUI templates --> > <bean id="standardquery" class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/HTMLResults.jsp" /> > </bean> > <bean id="calendarquery" class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/CalendarResults.jsp" /> > </bean> > > <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> > <property name="collection" ref="localbdbcollection" /> > <property name="query" ref="calendarquery" /> > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="2006" /> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback/" /> > </bean> > </property> > > <!-- > <property name="query"> > <bean class="org.archive.wayback.query.Renderer"> > <property name="captureJsp" value="/jsp/HTMLResults.jsp" /> > </bean> > </property> > > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" > init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="1996" /> > </bean> > </property> > > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback/" /> > </bean> > </property> > --> > </bean> > > <bean name="8080:wayback-qa" class="org.archive.wayback.webapp.AccessPoint"> > <property name="collection" ref="localqacollection" /> > <property name="query" ref="standardquery" /> > <property name="replay"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlReplayDispatcher"> > <property name="jsInserts"> > <list> > <value>http://localhost:8080/wayback-qa/wm.js</value> > </list> > </property> > <property name="jspInserts"> > <list> > <value>/replay/Timeline.jsp</value> > </list> > </property> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" init-method="init"> > <property name="maxRecords" value="1000" /> > <property name="earliestTimestamp" value="2006" /> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/wayback-qa/" /> > </bean> > </property> > </bean> > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but only allows access from the specified IP network. > --> > <!-- > <bean name="8080:netsecure" parent="8080:wayback"> > <property name="authentication"> > <bean class="org.archive.wayback.authenticationcontrol.IPMatchesBooleanOperator"> > <property name="allowedRanges"> > <list> > <value>192.168.1.16/24</value> > </list> > </property> > </bean> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://192.168.1.16:8080/netsecure/" /> > </bean> > </property> > </bean> > --> > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but checks live web robots.txt documents to determine if > archived content should be accessible. > > Note: using this AccessPoint requires enabling the "livewebcache" and > "excluder-factory-robot" beans declared at the top of this file. > --> > <!-- > <bean name="8080:robots" parent="8080:wayback"> > <property name="exclusionFactory" ref="excluder-factory-robot" /> > <property name="uriConverter"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlResultURIConverter"> > <property name="replayURIPrefix" value="http://localhost:8080/robots/" /> > </bean> > </property> > </bean> > --> > > > <!-- > The following AccessPoint inherits all configuration from the 8080:wayback > AccessPoint, but provides a Proxy Replay UI to the same collection. These > two access points can be used simultaneously on the same Tomcat > installation. > > Note: using this AccessPoint requires adding a "Connector" on port 8090 > in your Tomcat's server.xml file. > --> > <!-- > <bean name="8090" parent="8080:wayback"> > <property name="useServerName" value="true" /> > <property name="replay"> > <bean class="org.archive.wayback.proxy.ProxyReplayDispatcher" /> > </property> > <property name="uriConverter"> > <bean class="org.archive.wayback.proxy.RedirectResultURIConverter"> > <property name="redirectURI" value="http://foo.archive.org:8090/jsp/Redirect.jsp" /> > </bean> > </property> > <property name="parser"> > <bean class="org.archive.wayback.proxy.ProxyRequestParser" init-method="init"> > <property name="localhostNames"> > <list> > <value>foo.archive.org</value> > </list> > </property> > <property name="maxRecords" value="1000" /> > </bean> > </property> > </bean> > --> > </beans> > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |