|
From: <kau...@cs...> - 2005-11-11 12:35:07
|
On 11/11/2005, "stack" <st...@ar...> wrote: > The below should be fixed by upgrade to nutchwax 0.4.1, if you haven't > already. >=20 > St.Ack Yes I noticed that errors in my archive were of the same type as reported later by other people. After installing nutchwax 0.4.1 the archive looks better now, thanks very much. I still have some isolated cases where a file is inside archive but wera shows 'not found'. Here are some examples of problem urls Perhaps '&' right after '?' is too much http://www.helsinki2005.fi/index.php?&Lang=3Deng http://www.helsinki2005.fi/index.php?&Name=3Dxteams http://www.helsinki2005.fi/index.php?&Name=3Dtickets http://www.noc.fi/mp/db/tiedotteet/foo/IMG?num=3D18157&FIELD=3Dkuva0_kk&R=3D8= 97830 http://www.noc.fi/taustasivut/artikkeliarkisto/?num=3D17075&JKNUM=3D17075 http://www.slu.fi/mp/db/tiedotteet/foo/IMG?num=3D29512&FIELD=3Dkuva0_pieni&R= =3D064184 Other urls with '&' work fine, but these with 'Name=3Dsomething&' do not. http://www.helsinki2005.fi/index.php?Name=3Dnewsitem&item=3D322 http://www.helsinki2005.fi/index.php?Name=3Dnewsitem&item=3D405 http://www.helsinki2005.fi/index.php?Name=3Dtickets&lang=3Deng When I make in wera a query url:http://www.helsinki2005.fi/index.php?Name=3Dtickets&lang=3Deng it reports 102 hits, the first one being http://www.helsinki2005.fi/index.php?Name=3Dtickets_1 but wera only wants to display hits 1-10 and 11-16. For some reason all images with a '%' character in url still refuse to come out. This could apply to html file urls as well if there were any in the archive. http://www.helsinki2005.fi/files/pics/1079364264_mascot%20medium.gif I'm not sure which part of nutchwax&wera combination causes it. Kaisa |
|
From: Sverre B. <sve...@nb...> - 2005-12-14 10:04:15
|
Hi, Good that you got the jpg's displaying. Not so good the &'s. I don't have that problem myself (!?). I hope you can assist in flushing out whats wrong. Some questions related to the &-urls: 1. Does the result list display those urls correctly. I.e. are the number correct (e.g. 1/1 and not 0/1) 2. Does the URL display properly in the url field of the timeline view? 3. When in the timeline view could you take a look at the source and check if the url is correctly formatted there (the one that starts with: frame src="http://your_wera_host/wera/documentDispatcher.php?url= .....) 4. What NutchWax version are you using? And are you sure you are using the nutchwax webapp from same version in tomcat? 5. What Tomcat and jvm (java -version) version are you using? 5. Could you create a new file, e.g. phpinfo.php in the wera directory with the contents: *<?php* phpinfo(); *?> *point a browser to the corresponfing url, save the output to file and send that file to me. If you could just quickly report on these questions, i'm sure i can come up with some more questions ;-) Sverre Kaisa Kaunonen wrote: > Hi, > with the new package my archive pages look better than > before. All jpg:s having % character in urls are now visible. > Thanks! > > I still some some problems, though. Trying to open a link > http://www.helsinki2005.fi/index.php?Name=newsitem&item454 > results in error 'Sorry, no documents with the given url, > http://www.helsinki2005.fi/index.php?Name=newsitem > were found' > > In a similar way all other links are cut after the first > & (ampersand) character in url > > eg. > http://www.event-travel.fi/?openmenu=16&docID=71&sitelang=en > becomes > http://www.event-travel.fi/?openmenu=16 > > Kaisa > > On Thu, 8 Dec 2005, Sverre Bang wrote: > > >> Sorry about that. Here is a manual pack, not including the retriever war file (use the one you got). >> >> You have to update lib/config.inc. >> >> Sverre >> >> >> >> >> -----Original Message----- >> From: Kaisa Kaunonen [mailto:kau...@cs...] >> Sent: to 08.12.2005 13:42 >> To: Sverre Bang >> Cc: st...@ar... >> Subject: Re: [Archive-access-discuss] Wera 0.4.0 and &%s in urls >> >> >> Hi, >> the java installer has never worked here. Even the new version >> below doesn't work, so I'd need a package to install manually. >> (I have the strange PC, WinaXe and Unix combination discussed >> before) >> >> But anyway it's fine a new version is available >> >> Kaisa >> >> On Thu, 8 Dec 2005, Sverre Bang wrote: >> >> >>> Hi Kaisa, >>> I've rewritten the url encoding stuff in Wera (and a few other things as >>> well ;-) . Could you please try it out? >>> >>> http://nwa.nb.no/tmp/wera-200512081017-installer.jar >>> >>> I've tested on nutchWax 0.4.2 >>> >>> Sverre >>> >>> >> >> |
|
From: stack <st...@ar...> - 2005-11-11 18:28:18
|
kau...@cs... wrote: >On 11/11/2005, "stack" <st...@ar...> wrote: > > > >>The below should be fixed by upgrade to nutchwax 0.4.1, if you haven't >>already. >> >>St.Ack >> >> > > > Sounds like still work to do. Thanks for the detailed report Kaisa (I've pasted below into new encoding issue and will try and figure whats going on). St.Ack >Yes I noticed that errors in my archive were of the same type as reported >later by other people. After installing nutchwax 0.4.1 the archive looks >better now, thanks very much. > >I still have some isolated cases where a file is inside archive but wera >shows 'not found'. Here are some examples of problem urls > >Perhaps '&' right after '?' is too much >http://www.helsinki2005.fi/index.php?&Lang=eng >http://www.helsinki2005.fi/index.php?&Name=xteams >http://www.helsinki2005.fi/index.php?&Name=tickets > >http://www.noc.fi/mp/db/tiedotteet/foo/IMG?num=18157&FIELD=kuva0_kk&R=897830 >http://www.noc.fi/taustasivut/artikkeliarkisto/?num=17075&JKNUM=17075 >http://www.slu.fi/mp/db/tiedotteet/foo/IMG?num=29512&FIELD=kuva0_pieni&R=064184 > >Other urls with '&' work fine, but these with 'Name=something&' do >not. >http://www.helsinki2005.fi/index.php?Name=newsitem&item=322 >http://www.helsinki2005.fi/index.php?Name=newsitem&item=405 >http://www.helsinki2005.fi/index.php?Name=tickets&lang=eng > >When I make in wera a query >url:http://www.helsinki2005.fi/index.php?Name=tickets&lang=eng >it reports 102 hits, the first one being >http://www.helsinki2005.fi/index.php?Name=tickets_1 >but wera only wants to display hits 1-10 and 11-16. > >For some reason all images with a '%' character in url still refuse to >come out. This could apply to html file urls as well if there were any >in the archive. >http://www.helsinki2005.fi/files/pics/1079364264_mascot%20medium.gif > >I'm not sure which part of nutchwax&wera combination causes it. > >Kaisa > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |