|
From: Ignacio G. <igc...@gm...> - 2008-03-12 13:32:32
|
Hello Brad, I just started "playing" with this new version of Wayback, and there is one thing that seems very extrange to me. On every page resource I visit, I always get the header information plastered at the top of the page. (i.e. HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Tue, 03 Oct 2000 07:31:49 GMT Connection: Keep-Alive Content-Length: 13027 Content-Type: text/html Set-Cookie: GWBSiteCookie=header%5Ftype=Text&mode=false&browser=Default&browser%5Fchecked=true&browser%5Fwidth=0; path=/ Cache-control: private) This information denotes the header information that was retrieved at the time of crawl (as you can see by the date), the thing I do not understand is why am I seeing it when I access a page via Wayback. It appears at the very top, even over the TimeLine section. Any ideas on why this might be, or how to get rid of it? Thanks. On 2/29/08, Brad Tofel <br...@ar...> wrote: > > Hi Thomas, > > Thanks for the kind feedback. > > Couple of suggestions, and also some follow-up questions interspersed: > > Thomas Beekman wrote: > > Hi all, > > > > At the KB we are severely testing Wayback 1.2.0 at the moment. My first > > impression is quite positive; many new functions are added, it is quite > > easy to implement different modules for different access points and > > several indexing threads can live side by side now. > > > > I have a few questions though. First of all, I'm experiencing errors > > which did not occur in older versions; java.lang.OutOfMemoryError: GC > > overhead limit exceeded. Does anyone know how to fix this? > > > > > I haven't seen this before, and some quick google searches indicate it > may be one of: > > A) a JVM problem (which JVM are you using?) > B) too little heap space in the java startup arguments > C) the wayback software doing lots of object creation+destruction. > > Since we have large installations in production at the IA, one using > 700+ Collections and 1400+ AccessPoints. Note that these all use CDX > indexes, which are more resource efficient. I'm hoping that C is not the > problem, but we haven't yet needed to do a heavy optimization pass over > the code, so it could be Wayback itself. Are you using IBM's JVM? Have > you tried increasing the heap? If that doesn't address the problem, can > you please send me a copy of your wayback.xml Spring configuration? > > > Second; when closing down Wayback in Tomcat, the lock file for the > > localbdb is not erased. A restart is therefore not possible. Could this > > be fixed so that if the webapp is closed down, the lock file is erased? > > > > > > On what platform (OS+JVM) are you running Wayback? Is the BDB index > stored over NFS or another networked file system? I haven't experienced > this problem on any of our systems -- the BDBJE just starts up, even > with the lock file still existing. I haven't looked into this, but > guessed that it was using the lock file via flock() type semantics, > instead of using it's existence to indicate a lock. BDBJE may determine > that the DB is on a remote system, where flock() semantics don't work, > in which case it may be falling back to using the existence of the lock > file to indicate usage.. > > In any case, I've just implemented the "clean shutdown" processing in my > development environment, but will probably hold off to do more testing > before including it in a release. > > We are preparing a 1.2.1 release which addresses a couple bugs > discovered by folks in the field, but are holding this release for > feedback from one more user having trouble reading some ARC files. > > > Third; with a few websites the timeline GUI is scrambled. I get a full > > yellow screen with on every line a mark. After scrolling down that page, > > the website is presented normally. This is not the case with every > > website. > > > > > Yes, the css implementation in the current timeline is prone to > inheriting some styles from some web pages. Could you please send me a > few example pages on the live web that demonstrate the problem you're > seeing? > > > My fourth and last problem is in the configuration. I would like to do > > some tests using the remote NutchWAX search, but there is not a clear > > manual of how to implement this precisely, which beans to use for > > example. Does anyone have a good example for me? > > > > > > Setting up a collection with this bean: > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.NutchResourceIndex" > init-method="init"> > <property name="searchUrlBase" > value="http://webteam-ws.us.archive.org:8080/katrina/opensearch" /> > <property name="maxRecords" value="100" /> > </bean> > </property> > > Should do the trick. Note that if using Archival URL mode, you should be > sure to set the maxRecords property on the RequestParser to the same > value for maxRecords.. This may be a bug -- would be more friendly to > use the min() of both values.. > > <property name="parser"> > <bean class="org.archive.wayback.archivalurl.ArchivalUrlRequestParser" > init-method="init"> > <property name="maxRecords" value="100" /> > <property name="earliestTimestamp" value="1996" /> > </bean> > </property> > > > Hopefully this works for you, and please let me know about the questions > above. > > Brad > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |