|
From: Bing Z. <bz...@sd...> - 2007-01-25 01:51:03
|
Hi Brad,
Here are 3 problems I found in the new Wayback 0.8.0.
1. Error when indexing an arc file.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
After I installed the new Wayback 0.8.0 and placed an arc file =
for testing. I received the following error message
in Tomcat's log file. This error message is repeatable when =
installing a new wayback instance.
org.apache.commons.httpclient.URIException: invalid port =
number
at =
org.apache.commons.httpclient.URI.parseAuthority(URI.java:2226)
at =
org.archive.net.LaxURI.parseAuthority(LaxURI.java:183)
at =
org.archive.net.LaxURI.parseUriReference(LaxURI.java:348)
at =
org.apache.commons.httpclient.URI.<init>(URI.java:145)
at org.archive.net.LaxURI.<init>(LaxURI.java:73)
at org.archive.net.UURI.<init>(UURI.java:124)
at =
org.archive.net.UURIFactory.create(UURIFactory.java:320)
at =
org.archive.net.UURIFactory.create(UURIFactory.java:310)
at =
org.archive.net.UURIFactory.getInstance(UURIFactory.java:263)
at =
org.archive.wayback.resourceindex.cdx.CDXLineToSearchResultAdapter.adapt(=
CDXLineToSearchResultAdapter.java:66)
at =
org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:57)=
at =
org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:55)=
at =
org.archive.wayback.bdb.BDBRecordSet.insertRecords(BDBRecordSet.java:177)=
at =
org.archive.wayback.resourceindex.bdb.BDBIndexUpdater.mergeFile(BDBIndexU=
pdater.java:152)
at =
org.archive.wayback.resourceindex.bdb.BDBIndexUpdater.mergeAll(BDBIndexUp=
dater.java:219)
at =
org.archive.wayback.resourceindex.bdb.BDBIndexUpdater$BDBIndexUpdaterThre=
ad.run(BDBIndexUpdater.java:260)
2. Low page replay quality compared with previous release.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Although I had the above error, I was able to use IE to query a URL =
and got a link back from the query result. After I clicked the link to =
replay=20
the archived page, a lot images were missing. The page replay =
quality of Wayback 0.8.0 is not as good as that in Wayback 0.6.0 in this
test case.
3. Backward incompatibility for db and files generated by Wayback 0.6.0.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
When I copied the following files from Wayback 0.6.0 to Wayback =
0.8.0, the Wayback 0.8.0 displayed an error in IE when querying
for a archived web site (a valid website).
db files from /0.6.0/index/ to /0.0.8/index=20
files from /0.6.0/pipeline/queued to =
/0.8.0/arc-indexer/queued
files from /0.6.0/pipeline/merged to =
/0.8.0//index-data/merged
Here is the error message in IE browser.
java.lang.StringIndexOutOfBoundsException: String index out of =
range: -1
java.lang.String.substring(String.java:1768)
=
org.archive.wayback.resourceindex.bdb.BDBRecordToSearchResultAdapter.adap=
t(BDBRecordToSearchResultAdapter.java:63)
=
org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:57)=
=
org.archive.wayback.resourceindex.LocalResourceIndex.filterRecords(LocalR=
esourceIndex.java:120)
=
org.archive.wayback.resourceindex.LocalResourceIndex.query(LocalResourceI=
ndex.java:296)
=
org.archive.wayback.query.QueryServlet.doGet(QueryServlet.java:95)
=
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
=
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
=
org.archive.wayback.core.RequestFilter.doFilter(RequestFilter.java:117)
=
org.archive.wayback.core.RequestFilter.doFilter(RequestFilter.java:117)
Note when I was just using Wayback 0.6.0, I copied above mentioned =
file in new places and ran another Wayback 0.6.0 instance on top of
the moved db and other files. The Wayback 0.6.0 worked fine. =20
Here is some info of the system I used for above test.
Java: 1.5.0_09
tomat: 5.5.17
OS: Linux 2.4.20-28.7
Sincerely,
Bing Zhu
San Diego Supercomputer Center
email: bz...@sd... |