Charles Foetz wrote:
> Hello St.Ack, Lukas and everyone else,
Good to hear from you again Charles.
We still owe you a response to the long list of issues you found in the=20
WERA+NutchWAX combo. A good few have been addressed but others remain=20
still.
> =20
> Long time since I posted any news concerning Luxembourg's web=20
> archiving efforts - as you know, we are very limited in human=20
> resources (we only have 2 IT people at the national library) and=20
> therefore need to find a balance between many different projects.
> =20
> Last time we were forced to put our web archiving project on hold due=20
> to the known limitations of the WERA Access tool (no canonicalization=20
> of URLs, no handling of redirects, encoding issues)... As a prototype=20
> project we had archived at several dates the sites of 7 political=20
> parties during local elections. The two limitations above made it=20
> impossible for WERA to access most parts of 4 out of these 7 archived=20
> sites (links to "http://site.com" instead of "http://www.site.com"=20
> were quite common, for instance), we therefore had pretty much nothing=20
> to "show" and had didn't go further than the prototype.
Your report on canonicalization failures was captured as this issue:=20
https://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1312202&gro=
up_id=3D118427&atid=3D681137. =20
We should make WERA requery with the 'www' stripped (or prepended) if=20
gets a 404 out of the index.
> =20
> I am now wondering what the plans are for WERA... are the issues above=20
> likely to be fixed any time soon or are they considered low priority?=20
> Is a new release planned or is the focus on other tools at the moment=20
> (I realise you guys also struggle at many fronts at the same time) ?
> If you think we could help on the development of WERA ourselves and=20
> maybe should be having a go at trying to fix the issues above, let me=20
> know.
> Another question: via the archive-access-cvs list, I noticed a lot of=20
> updates on the wayback project. What is this project? An open-source=20
> implementation of the Wayback machine (I've heard this mentioned=20
> before)? Has there been a release and at which stage is it? alpha?=20
> beta? working version? Where does this project fit in? Should it be=20
> seen as an alternative to WERA?
> =20
Sverre knows the WERA story best. I'll let him speak to the above.=20
Would be sweet if we could fix sufficent for you to launch at least a=20
prototype.
The long term plan is to transition from WERA on to the new wayback. =20
Sverre points this out at the end of this 'What is WERA?' document in=20
the future of WERA:=20
http://archive-access.sourceforge.net/projects/wera/articles/what-is-wera=
.html#N100AE. =20
For description of the new wayback, see=20
http://archive-access.sourceforge.net/projects/wayback/. The front page=20
does a good job situating the project. Its alpha software currently=20
though a pending release release will move it past this designation (Let=20
me kick our Brad and get him to introduce the wayback on this list). =20
Wayback is currently focusing on scaling and being able to act as=20
replacement for http://web.archive.org wayback for small collections.
IMO, we're a ways yet from the wayback replacing WERA. While it already=20
has capabilities in excess of the WERA+ARCRetriever in certain regards,=20
its focus is elsewhere -- at least for now -- and it lacks core WERA UI=20
functionality, the quality documentation, and the sweet installer.
St.Ack
> Best regards,
> =20
> Charlie Foetz
> Biblioth=E8que nationale Luxembourg
> Sp=E9cialiste de la gestion =E9lectronique de l'information
> =20
|