|
From: Brad T. <br...@ar...> - 2008-03-01 02:08:51
|
Hi Thomas, Thanks for the kind feedback. Couple of suggestions, and also some follow-up questions interspersed: Thomas Beekman wrote: > Hi all, > > At the KB we are severely testing Wayback 1.2.0 at the moment. My first > impression is quite positive; many new functions are added, it is quite > easy to implement different modules for different access points and > several indexing threads can live side by side now. > > I have a few questions though. First of all, I'm experiencing errors > which did not occur in older versions; java.lang.OutOfMemoryError: GC > overhead limit exceeded. Does anyone know how to fix this? > > I haven't seen this before, and some quick google searches indicate it may be one of: A) a JVM problem (which JVM are you using?) B) too little heap space in the java startup arguments C) the wayback software doing lots of object creation+destruction. Since we have large installations in production at the IA, one using 700+ Collections and 1400+ AccessPoints. Note that these all use CDX indexes, which are more resource efficient. I'm hoping that C is not the problem, but we haven't yet needed to do a heavy optimization pass over the code, so it could be Wayback itself. Are you using IBM's JVM? Have you tried increasing the heap? If that doesn't address the problem, can you please send me a copy of your wayback.xml Spring configuration? > Second; when closing down Wayback in Tomcat, the lock file for the > localbdb is not erased. A restart is therefore not possible. Could this > be fixed so that if the webapp is closed down, the lock file is erased? > > On what platform (OS+JVM) are you running Wayback? Is the BDB index stored over NFS or another networked file system? I haven't experienced this problem on any of our systems -- the BDBJE just starts up, even with the lock file still existing. I haven't looked into this, but guessed that it was using the lock file via flock() type semantics, instead of using it's existence to indicate a lock. BDBJE may determine that the DB is on a remote system, where flock() semantics don't work, in which case it may be falling back to using the existence of the lock file to indicate usage.. In any case, I've just implemented the "clean shutdown" processing in my development environment, but will probably hold off to do more testing before including it in a release. We are preparing a 1.2.1 release which addresses a couple bugs discovered by folks in the field, but are holding this release for feedback from one more user having trouble reading some ARC files. > Third; with a few websites the timeline GUI is scrambled. I get a full > yellow screen with on every line a mark. After scrolling down that page, > the website is presented normally. This is not the case with every > website. > > Yes, the css implementation in the current timeline is prone to inheriting some styles from some web pages. Could you please send me a few example pages on the live web that demonstrate the problem you're seeing? > My fourth and last problem is in the configuration. I would like to do > some tests using the remote NutchWAX search, but there is not a clear > manual of how to implement this precisely, which beans to use for > example. Does anyone have a good example for me? > > Setting up a collection with this bean: <property name="resourceIndex"> <bean class="org.archive.wayback.resourceindex.NutchResourceIndex" init-method="init"> <property name="searchUrlBase" value="http://webteam-ws.us.archive.org:8080/katrina/opensearch" /> <property name="maxRecords" value="100" /> </bean> </property> Should do the trick. Note that if using Archival URL mode, you should be sure to set the maxRecords property on the RequestParser to the same value for maxRecords.. This may be a bug -- would be more friendly to use the min() of both values.. <property name="parser"> <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" init-method="init"> <property name="maxRecords" value="100" /> <property name="earliestTimestamp" value="1996" /> </bean> </property> Hopefully this works for you, and please let me know about the questions above. Brad |