|
From:
<mat...@ce...> - 2005-11-02 15:45:28
|
______________________________________________________________ > Od: sve...@nb... > Komu: arc...@li... > CC:=20 > Datum: 02.11.2005 14:33 > P=F8edm=ECt: RE: [Archive-access-discuss] wera results > > Hi there, > Definitely something wrong in NutchWax. If i execute > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > and click the tmeline link of the first hit showing 0/0 hits i get where did you find hit showing 0/0? it works fine for me(i've just explored 150 urls..and no 0/0 hits ) did you remeber number of total hits?(if it's same - i experimented wit= h previous version of nutchwax,starting tomcat on various instances) i had for word "kniha" Total number of versions found : 49087. Displaying URL's 1-10 -lm > 'Sorry, no documents with the given uri were found'. The url displyed > seems fine, but if you look in the source of the uppermost frame you > will see that the url sent to the script was > http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=3D12&start=3DV. > The & separating the parameters irj and start has been replaced by it= s > html character entity reference.=20 >=20 > If i press the go button now the url submitted to the script will be = ok. >=20 > If i look in the NutchWax result set of the initial search (add &debu= g=3D1 > to the search url to bring out the NutchWax search urls) i see that t= he > url (link element) returned is wrong already here. >=20 > Conclusion : NutchWax mangles the url returned by introducing html > entities instead of keeping the url in its original form. >=20 > What version of NutchWax are you using? >=20 > Sverre >=20 > On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: > > This looks like the same (or very similar) problem as I've got. I'v= e > been discussing it (offlist) with Stack and Sverre Bang, so I know it= is > being looked into. > >=20 > > I notice in your search results (as in mine) that URIs with & in th= em > are showing up as 0/0 versions. I believe that both problems are due = to > the escaping (or unescaping) of HTML characters in the NutchWAX XML t= hat > is used to pass the results to WERA. > >=20 > > Possibly this is a misconfiguration of either Tomcat or Apache...? > >=20 > > - Kris > >=20 > > > -----Original Message----- > > > From: arc...@li...=20 > > > [mailto:arc...@li...]=20 > > > On Behalf Of Luk=C3=A5=C5=A5 Mat=C3=8Fjka > > > Sent: 2. n=C3=B3vember 2005 11:21 > > > To: arc...@li... > > > Subject: [Archive-access-discuss] wera results > > >=20 > > >=20 > > > Hi, > > >=20 > > > for example > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_fr > > om=3D&year_to=3D > >=20 > > description of each record is not well-displayed > >=20 > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > (<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > > Versions (matching query/total) 3/3 > > Timeline | Overview > >=20 > > "p=F8=EDstupu" should be "p=C5=C3=ADstupu"(without diacritics > "pristupu") > >=20 > > does anybody have same problem? > >=20 > > -lm > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 >=20 > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 |