|
From: stack <st...@ar...> - 2006-02-23 17:40:55
|
Charles Foetz wrote: > Hello St.Ack, Lukas and everyone else, Good to hear from you again Charles. We still owe you a response to the long list of issues you found in the=20 WERA+NutchWAX combo. A good few have been addressed but others remain=20 still. > =20 > Long time since I posted any news concerning Luxembourg's web=20 > archiving efforts - as you know, we are very limited in human=20 > resources (we only have 2 IT people at the national library) and=20 > therefore need to find a balance between many different projects. > =20 > Last time we were forced to put our web archiving project on hold due=20 > to the known limitations of the WERA Access tool (no canonicalization=20 > of URLs, no handling of redirects, encoding issues)... As a prototype=20 > project we had archived at several dates the sites of 7 political=20 > parties during local elections. The two limitations above made it=20 > impossible for WERA to access most parts of 4 out of these 7 archived=20 > sites (links to "http://site.com" instead of "http://www.site.com"=20 > were quite common, for instance), we therefore had pretty much nothing=20 > to "show" and had didn't go further than the prototype. Your report on canonicalization failures was captured as this issue:=20 https://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1312202&gro= up_id=3D118427&atid=3D681137. =20 We should make WERA requery with the 'www' stripped (or prepended) if=20 gets a 404 out of the index. > =20 > I am now wondering what the plans are for WERA... are the issues above=20 > likely to be fixed any time soon or are they considered low priority?=20 > Is a new release planned or is the focus on other tools at the moment=20 > (I realise you guys also struggle at many fronts at the same time) ? > If you think we could help on the development of WERA ourselves and=20 > maybe should be having a go at trying to fix the issues above, let me=20 > know. > Another question: via the archive-access-cvs list, I noticed a lot of=20 > updates on the wayback project. What is this project? An open-source=20 > implementation of the Wayback machine (I've heard this mentioned=20 > before)? Has there been a release and at which stage is it? alpha?=20 > beta? working version? Where does this project fit in? Should it be=20 > seen as an alternative to WERA? > =20 Sverre knows the WERA story best. I'll let him speak to the above.=20 Would be sweet if we could fix sufficent for you to launch at least a=20 prototype. The long term plan is to transition from WERA on to the new wayback. =20 Sverre points this out at the end of this 'What is WERA?' document in=20 the future of WERA:=20 http://archive-access.sourceforge.net/projects/wera/articles/what-is-wera= .html#N100AE. =20 For description of the new wayback, see=20 http://archive-access.sourceforge.net/projects/wayback/. The front page=20 does a good job situating the project. Its alpha software currently=20 though a pending release release will move it past this designation (Let=20 me kick our Brad and get him to introduce the wayback on this list). =20 Wayback is currently focusing on scaling and being able to act as=20 replacement for http://web.archive.org wayback for small collections. IMO, we're a ways yet from the wayback replacing WERA. While it already=20 has capabilities in excess of the WERA+ARCRetriever in certain regards,=20 its focus is elsewhere -- at least for now -- and it lacks core WERA UI=20 functionality, the quality documentation, and the sweet installer. St.Ack > Best regards, > =20 > Charlie Foetz > Biblioth=E8que nationale Luxembourg > Sp=E9cialiste de la gestion =E9lectronique de l'information > =20 |