|
From:
<mat...@ce...> - 2005-11-02 13:28:31
|
if i use=20 http://war.mzk.cz:8080/nutchwax/search.jsp?query=3Dkniha&hitsPerPage=3D= 10 interface to nutchwax, description looks fine, so problem is in sevlet opensearch i guess... l. ______________________________________________________________ > Od: sve...@nb... > Komu: arc...@li... > CC: Luk=E1=9A Mat=ECjka <mat...@ce...> > Datum: 02.11.2005 14:07 > P=F8edm=ECt: Re: [Archive-access-discuss] wera results > > The output from nutchwax is partly mangled. See > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dkniha&start=3D0&hi= tsPerPage=3D10&hitsPerDup=3D1&dedupField=3Dexacturl > where the contents of the description element is garbage while the co= ntents > of the title element looks fine (!?).=20 >=20 > As an example the text >=20 > =E8asnosti =8E=EF=E1rsk=FDch vrch=F9 a Hornosvrateck=E9 hornatiny > (taken from the html source of timeline view) has in nutchwax > description element become >=20 > 69;asnosti =8E=EF=E1rsk=FDch vrch=F9 a > Hornosvrateck=E9 hornatiny >=20 > An observation that may or may not have something to do with this: > NutchWax does a more or less educated guess of the encoding used in t= he > page. For the example it guessed windows-1252 which i believe is clos= er > to iso-8859-1 than to the actual encoding specified in the example > source, iso-8859-2. >=20 > I'll keep looking. >=20 > Sverre >=20 > On Wed, 2005-11-02 at 12:20 +0100, Luk=E1=9A Mat=ECjka wrote: > > Hi, > >=20 > > for example > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > >=20 > > description of each record is not well-displayed > >=20 > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > (<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > > Versions (matching query/total) 3/3 > > Timeline | Overview > >=20 > > "p=F8=EDstupu" should be "p=F8=EDstupu"(without diacritics > "pristupu") > >=20 > > does anybody have same problem? > >=20 > > -lm > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 |
|
From:
<mat...@ce...> - 2005-11-02 15:31:53
|
______________________________________________________________ > Od: sve...@nb... > Komu: arc...@li... > CC:=20 > Datum: 02.11.2005 14:33 > P=F8edm=ECt: RE: [Archive-access-discuss] wera results > > Hi there, > Definitely something wrong in NutchWax. If i execute > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > and click the tmeline link of the first hit showing 0/0 hits i get > 'Sorry, no documents with the given uri were found'. The url displyed > seems fine, but if you look in the source of the uppermost frame you > will see that the url sent to the script was > http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=3D12&start=3DV. > The & separating the parameters irj and start has been replaced by it= s > html character entity reference.=20 >=20 > If i press the go button now the url submitted to the script will be = ok. >=20 > If i look in the NutchWax result set of the initial search (add &debu= g=3D1 > to the search url to bring out the NutchWax search urls) i see that t= he > url (link element) returned is wrong already here. >=20 > Conclusion : NutchWax mangles the url returned by introducing html > entities instead of keeping the url in its original form. >=20 > What version of NutchWax are you using? the latest release.. >=20 > Sverre >=20 > On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: > > This looks like the same (or very similar) problem as I've got. I'v= e > been discussing it (offlist) with Stack and Sverre Bang, so I know it= is > being looked into. > >=20 > > I notice in your search results (as in mine) that URIs with & in th= em > are showing up as 0/0 versions. I believe that both problems are due = to > the escaping (or unescaping) of HTML characters in the NutchWAX XML t= hat > is used to pass the results to WERA. > >=20 > > Possibly this is a misconfiguration of either Tomcat or Apache...? > >=20 > > - Kris > >=20 > > > -----Original Message----- > > > From: arc...@li...=20 > > > [mailto:arc...@li...]=20 > > > On Behalf Of Luk=C3=A5=C5=A5 Mat=C3=8Fjka > > > Sent: 2. n=C3=B3vember 2005 11:21 > > > To: arc...@li... > > > Subject: [Archive-access-discuss] wera results > > >=20 > > >=20 > > > Hi, > > >=20 > > > for example > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_fr > > om=3D&year_to=3D > >=20 > > description of each record is not well-displayed > >=20 > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > (<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > > Versions (matching query/total) 3/3 > > Timeline | Overview > >=20 > > "p=F8=EDstupu" should be "p=C5=C3=ADstupu"(without diacritics > "pristupu") > >=20 > > does anybody have same problem? > >=20 > > -lm > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 >=20 > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 |
|
From:
<mat...@ce...> - 2005-11-02 15:45:28
|
______________________________________________________________ > Od: sve...@nb... > Komu: arc...@li... > CC:=20 > Datum: 02.11.2005 14:33 > P=F8edm=ECt: RE: [Archive-access-discuss] wera results > > Hi there, > Definitely something wrong in NutchWax. If i execute > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > and click the tmeline link of the first hit showing 0/0 hits i get where did you find hit showing 0/0? it works fine for me(i've just explored 150 urls..and no 0/0 hits ) did you remeber number of total hits?(if it's same - i experimented wit= h previous version of nutchwax,starting tomcat on various instances) i had for word "kniha" Total number of versions found : 49087. Displaying URL's 1-10 -lm > 'Sorry, no documents with the given uri were found'. The url displyed > seems fine, but if you look in the source of the uppermost frame you > will see that the url sent to the script was > http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=3D12&start=3DV. > The & separating the parameters irj and start has been replaced by it= s > html character entity reference.=20 >=20 > If i press the go button now the url submitted to the script will be = ok. >=20 > If i look in the NutchWax result set of the initial search (add &debu= g=3D1 > to the search url to bring out the NutchWax search urls) i see that t= he > url (link element) returned is wrong already here. >=20 > Conclusion : NutchWax mangles the url returned by introducing html > entities instead of keeping the url in its original form. >=20 > What version of NutchWax are you using? >=20 > Sverre >=20 > On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: > > This looks like the same (or very similar) problem as I've got. I'v= e > been discussing it (offlist) with Stack and Sverre Bang, so I know it= is > being looked into. > >=20 > > I notice in your search results (as in mine) that URIs with & in th= em > are showing up as 0/0 versions. I believe that both problems are due = to > the escaping (or unescaping) of HTML characters in the NutchWAX XML t= hat > is used to pass the results to WERA. > >=20 > > Possibly this is a misconfiguration of either Tomcat or Apache...? > >=20 > > - Kris > >=20 > > > -----Original Message----- > > > From: arc...@li...=20 > > > [mailto:arc...@li...]=20 > > > On Behalf Of Luk=C3=A5=C5=A5 Mat=C3=8Fjka > > > Sent: 2. n=C3=B3vember 2005 11:21 > > > To: arc...@li... > > > Subject: [Archive-access-discuss] wera results > > >=20 > > >=20 > > > Hi, > > >=20 > > > for example > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_fr > > om=3D&year_to=3D > >=20 > > description of each record is not well-displayed > >=20 > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > (<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > > Versions (matching query/total) 3/3 > > Timeline | Overview > >=20 > > "p=F8=EDstupu" should be "p=C5=C3=ADstupu"(without diacritics > "pristupu") > >=20 > > does anybody have same problem? > >=20 > > -lm > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 >=20 > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 |
|
From:
<mat...@ce...> - 2005-11-03 08:54:06
|
______________________________________________________________ > Od: Sve...@nb... > Komu: mat...@ce... > CC:=20 > Datum: 02.11.2005 19:41 > P=F8edm=ECt: RE: [Archive-access-discuss] wera results > > I tried the latest opensearch servlet myself. It messed up my Wera, l= ots > of 0/0 ... >=20 > ;-) now, i'm using what you send to me...and everything seems fine... i can't find any 0/0 :) i will test it more:) -lm >=20 > Sverre >=20 >=20 > -----Original Message----- > From: Luk=E1s Matejka [mailto:mat...@ce...] > Sent: Wed 11/2/2005 4:43 PM > To: Sverre Bang > Cc: arc...@li... > Subject: RE: [Archive-access-discuss] wera results >=20 >=20 >=20 > ______________________________________________________________ > > Od: sve...@nb... > > Komu: arc...@li... > > CC:=20 > > Datum: 02.11.2005 14:33 > > Predmet: RE: [Archive-access-discuss] wera results > > > > Hi there, > > Definitely something wrong in NutchWax. If i execute > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > > and click the tmeline link of the first hit showing 0/0 hits i get >=20 > where did you find hit showing 0/0? > it works fine for me(i've just explored 150 urls..and no 0/0 hits ) > did you remeber number of total hits?(if it's same - i experimented w= ith > previous version of nutchwax,starting tomcat on various instances) >=20 > i had for word "kniha" > Total number of versions found : 49087. Displaying URL's 1-10 >=20 > -lm >=20 > > 'Sorry, no documents with the given uri were found'. The url disply= ed > > seems fine, but if you look in the source of the uppermost frame yo= u > > will see that the url sent to the script was > > http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=3D12&start=3DV. > > The & separating the parameters irj and start has been replaced by = its > > html character entity reference.=20 > >=20 > > If i press the go button now the url submitted to the script will b= e ok. > >=20 > > If i look in the NutchWax result set of the initial search (add &de= bug=3D1 > > to the search url to bring out the NutchWax search urls) i see that= the > > url (link element) returned is wrong already here. > >=20 > > Conclusion : NutchWax mangles the url returned by introducing html > > entities instead of keeping the url in its original form. > >=20 > > What version of NutchWax are you using? > >=20 > > Sverre > >=20 > > On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: > > > This looks like the same (or very similar) problem as I've got. I= 've > > been discussing it (offlist) with Stack and Sverre Bang, so I know = it is > > being looked into. > > >=20 > > > I notice in your search results (as in mine) that URIs with & in = them > > are showing up as 0/0 versions. I believe that both problems are du= e to > > the escaping (or unescaping) of HTML characters in the NutchWAX XML= that > > is used to pass the results to WERA. > > >=20 > > > Possibly this is a misconfiguration of either Tomcat or Apache...= ? > > >=20 > > > - Kris > > >=20 > > > > -----Original Message----- > > > > From: arc...@li...=20 > > > > [mailto:arc...@li...]=20 > > > > On Behalf Of LukAALA MatAZjka > > > > Sent: 2. nAlvember 2005 11:21 > > > > To: arc...@li... > > > > Subject: [Archive-access-discuss] wera results > > > >=20 > > > >=20 > > > > Hi, > > > >=20 > > > > for example > > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_f= r > > > om=3D&year_to=3D > > >=20 > > > description of each record is not well-displayed > > >=20 > > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > > (<b> ... </b>pr=EDstupu k internetu v knihovn=E1ch > > propagovat vyuzit=ED internetu pri > > zjistov=E1n=ED n=E1zoru obyvatel 2. Anketa > > Pomoc=ED kr=E1tk=E9 ankety bude zjistov=E1na > > nejobl=EDbenejs=ED <b>kniha</b> obyvatel > > Cesk=E9 republiky. Pojem nejobl=EDbenejs=ED > > <b>kniha</b> je specifikov=E1n dals=EDmi v=FDklady, > > jako "<b>kniha</b>, kter=E1 me nejv=EDce > > ovlivnila", "<b>kniha</b>, ke kter=E9 se casto > > vrac=EDm", "<b>kniha</b>, kterou bych doporucil/a > > dobr=FDm pr=E1telum", "<b>kniha</b>, > > kter=E1 zmenila muj zivot", "<b>kniha</b> na > > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > > do jin=E9ho sveta", "<b>kniha</b>, kterou bych si s > > sebou vzal/a jako jedinou<b> ... </b>) > > > Versions (matching query/total) 3/3 > > > Timeline | Overview > > >=20 > > > "pr=EDstupu" should be "pLA=ADstupu"(without diacritics > > "pristupu") > > >=20 > > > does anybody have same problem? > > >=20 > > > -lm > > >=20 > > >=20 > > >=20 > > > ------------------------------------------------------- > > > SF.Net email is sponsored by: > > > Tame your development challenges with Apache's Geronimo App Serve= r. > > Download > > > it for free - -and be entered to win a 42" plasma tv or your very= own > > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.= php > > > _______________________________________________ > > > Archive-access-discuss mailing list > > > Arc...@li... > > > https://lists.sourceforge.net/lists/listinfo/archive-access-discu= ss > > >=20 > > >=20 > > >=20 > > > ------------------------------------------------------- > > > SF.Net email is sponsored by: > > > Tame your development challenges with Apache's Geronimo App Serve= r. > > Download > > > it for free - -and be entered to win a 42" plasma tv or your very= own > > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.= php > > > _______________________________________________ > > > Archive-access-discuss mailing list > > > Arc...@li... > > > https://lists.sourceforge.net/lists/listinfo/archive-access-discu= ss > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 >=20 >=20 >=20 > |
|
From: Kristinn S. <kr...@ar...> - 2005-11-03 09:04:18
|
Looks like you managed to fix the problem on your end.=20 The issue of 0/0 versions is tied to the incorrect coding of characters = in the XML (namely & being double escaped to &amp; rather then just = &). This causes any URIs that contain & (or other special characters = like < and >) to show up as 0/0 versions. Any idea what fixed the problem? - Kris > -----Original Message----- > From: arc...@li...=20 > [mailto:arc...@li...]=20 > On Behalf Of Luk=C3=A1=C5=A1 Mat=C3=ACjka > Sent: 3. n=C3=B3vember 2005 08:30 > To: Sve...@nb... > Cc: arc...@li... > Subject: RE: [Archive-access-discuss] wera results >=20 >=20 >=20 >=20 > ______________________________________________________________ > > Od: Sve...@nb... > > Komu: mat...@ce... > > CC:=20 > > Datum: 02.11.2005 19:41 > > P=C5=99edm=C4=9Bt: RE: [Archive-access-discuss] wera results > > > > I tried the latest opensearch servlet myself. It messed up=20 > my Wera, lots > > of 0/0 ... > >=20 > > ;-) >=20 >=20 > now, i'm using what you send to me...and everything seems fine... > i can't find any 0/0 :) >=20 > i will test it more:) >=20 > -lm >=20 > >=20 > > Sverre > >=20 > >=20 > > -----Original Message----- > > From: Luk=C3=A1s Matejka [mailto:mat...@ce...] > > Sent: Wed 11/2/2005 4:43 PM > > To: Sverre Bang > > Cc: arc...@li... > > Subject: RE: [Archive-access-discuss] wera results > >=20 > >=20 > >=20 > > ______________________________________________________________ > > > Od: sve...@nb... > > > Komu: arc...@li... > > > CC:=20 > > > Datum: 02.11.2005 14:33 > > > Predmet: RE: [Archive-access-discuss] wera results > > > > > > Hi there, > > > Definitely something wrong in NutchWax. If i execute > > > > >=20 > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_fr > om=3D&year_to=3D > > > and click the tmeline link of the first hit showing 0/0 hits i get > >=20 > > where did you find hit showing 0/0? > > it works fine for me(i've just explored 150 urls..and no 0/0 hits ) > > did you remeber number of total hits?(if it's same - i=20 > experimented with > > previous version of nutchwax,starting tomcat on various instances) > >=20 > > i had for word "kniha" > > Total number of versions found : 49087. Displaying URL's 1-10 > >=20 > > -lm > >=20 > > > 'Sorry, no documents with the given uri were found'. The=20 > url displyed > > > seems fine, but if you look in the source of the=20 > uppermost frame you > > > will see that the url sent to the script was > > > http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=3D12&start=3DV. > > > The & separating the parameters irj and start has been=20 > replaced by its > > > html character entity reference.=20 > > >=20 > > > If i press the go button now the url submitted to the=20 > script will be ok. > > >=20 > > > If i look in the NutchWax result set of the initial=20 > search (add &debug=3D1 > > > to the search url to bring out the NutchWax search urls)=20 > i see that the > > > url (link element) returned is wrong already here. > > >=20 > > > Conclusion : NutchWax mangles the url returned by introducing html > > > entities instead of keeping the url in its original form. > > >=20 > > > What version of NutchWax are you using? > > >=20 > > > Sverre > > >=20 > > > On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: > > > > This looks like the same (or very similar) problem as=20 > I've got. I've > > > been discussing it (offlist) with Stack and Sverre Bang,=20 > so I know it is > > > being looked into. > > > >=20 > > > > I notice in your search results (as in mine) that URIs=20 > with & in them > > > are showing up as 0/0 versions. I believe that both=20 > problems are due to > > > the escaping (or unescaping) of HTML characters in the=20 > NutchWAX XML that > > > is used to pass the results to WERA. > > > >=20 > > > > Possibly this is a misconfiguration of either Tomcat or=20 > Apache...? > > > >=20 > > > > - Kris > > > >=20 > > > > > -----Original Message----- > > > > > From: arc...@li...=20 > > > > > [mailto:arc...@li...]=20 > > > > > On Behalf Of LukAALA MatAZjka > > > > > Sent: 2. nAlvember 2005 11:21 > > > > > To: arc...@li... > > > > > Subject: [Archive-access-discuss] wera results > > > > >=20 > > > > >=20 > > > > > Hi, > > > > >=20 > > > > > for example > > > > > = http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_fr > > > > om=3D&year_to=3D > > > >=20 > > > > description of each record is not well-displayed > > > >=20 > > > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > > > (<b> ... </b>pr=C3=ADstupu k internetu v knihovn=C3=A1ch > > > propagovat vyuzit=C3=AD internetu pri > > > zjistov=C3=A1n=C3=AD n=C3=A1zoru obyvatel 2. Anketa > > > Pomoc=C3=AD kr=C3=A1tk=C3=A9 ankety bude zjistov=C3=A1na > > > nejobl=C3=ADbenejs=C3=AD <b>kniha</b> obyvatel > > > Cesk=C3=A9 republiky. Pojem nejobl=C3=ADbenejs=C3=AD > > > <b>kniha</b> je specifikov=C3=A1n dals=C3=ADmi v=C3=BDklady, > > > jako "<b>kniha</b>, kter=C3=A1 me nejv=C3=ADce > > > ovlivnila", "<b>kniha</b>, ke kter=C3=A9 se casto > > > vrac=C3=ADm", "<b>kniha</b>, kterou bych doporucil/a > > > dobr=C3=BDm pr=C3=A1telum", "<b>kniha</b>, > > > kter=C3=A1 zmenila muj zivot", "<b>kniha</b> na > > > kterou nemohu zapomenout", "<b>kniha</b>, kter=C3=A1 mne uvedla > > > do jin=C3=A9ho sveta", "<b>kniha</b>, kterou bych si s > > > sebou vzal/a jako jedinou<b> ... </b>) > > > > Versions (matching query/total) 3/3 > > > > Timeline | Overview > > > >=20 > > > > "pr=C3=ADstupu" should be "pLA=C2=ADstupu"(without diacritics > > > "pristupu") > > > >=20 > > > > does anybody have same problem? > > > >=20 > > > > -lm > > > >=20 > > > >=20 > > > >=20 > > > > ------------------------------------------------------- > > > > SF.Net email is sponsored by: > > > > Tame your development challenges with Apache's Geronimo=20 > App Server. > > > Download > > > > it for free - -and be entered to win a 42" plasma tv or=20 > your very own > > > > Sony(tm)PSP. Click here to play:=20 > http://sourceforge.net/geronimo.php > > > >=20 > _______________________________________________ > > > > Archive-access-discuss mailing list > > > > Arc...@li... > > > >=20 > https://lists.sourceforge.net/lists/listinfo/archive-access-di scuss > > >=20 > > >=20 > > >=20 > > > ------------------------------------------------------- > > > SF.Net email is sponsored by: > > > Tame your development challenges with Apache's Geronimo App = Server. > > Download > > > it for free - -and be entered to win a 42" plasma tv or your very = own > > > Sony(tm)PSP. Click here to play: = http://sourceforge.net/geronimo.php > > > _______________________________________________ > > > Archive-access-discuss mailing list > > > Arc...@li... > > > = https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > > Download > > it for free - -and be entered to win a 42" plasma tv or your very = own > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > >=20 >=20 >=20 >=20 > ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. = Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
|
From: stack <st...@ar...> - 2005-11-03 19:14:25
|
Kristinn Sigurdsson wrote: >Looks like you managed to fix the problem on your end. > >The issue of 0/0 versions is tied to the incorrect coding of characters in the XML (namely & being double escaped to &amp; rather then just &). This causes any URIs that contain & (or other special characters like < and >) to show up as 0/0 versions. > >Any idea what fixed the problem? > > Looks like old version of the nutchwax WAR file with an earlier version of OpenSearchServlet fixed the problem. Trying to get to the bottom of it. Will update the list when leanr more (Looking like need point release of nutchwax). St.Ack >- Kris > > > >>-----Original Message----- >>From: arc...@li... >>[mailto:arc...@li...] >>On Behalf Of Lukáš Matìjka >>Sent: 3. nóvember 2005 08:30 >>To: Sve...@nb... >>Cc: arc...@li... >>Subject: RE: [Archive-access-discuss] wera results >> >> >> >> >>______________________________________________________________ >> >> >>>Od: Sve...@nb... >>>Komu: mat...@ce... >>>CC: >>>Datum: 02.11.2005 19:41 >>>Předmět: RE: [Archive-access-discuss] wera results >>> >>>I tried the latest opensearch servlet myself. It messed up >>> >>> >>my Wera, lots >> >> >>>of 0/0 ... >>> >>>;-) >>> >>> >>now, i'm using what you send to me...and everything seems fine... >>i can't find any 0/0 :) >> >>i will test it more:) >> >>-lm >> >> >> >>>Sverre >>> >>> >>>-----Original Message----- >>>From: Lukás Matejka [mailto:mat...@ce...] >>>Sent: Wed 11/2/2005 4:43 PM >>>To: Sverre Bang >>>Cc: arc...@li... >>>Subject: RE: [Archive-access-discuss] wera results >>> >>> >>> >>>______________________________________________________________ >>> >>> >>>>Od: sve...@nb... >>>>Komu: arc...@li... >>>>CC: >>>>Datum: 02.11.2005 14:33 >>>>Predmet: RE: [Archive-access-discuss] wera results >>>> >>>>Hi there, >>>>Definitely something wrong in NutchWax. If i execute >>>> >>>> >>>> >>http://war.mzk.cz/~nwa/wera/wera/index.php?query=kniha&year_fr >>om=&year_to= >> >> >>>>and click the tmeline link of the first hit showing 0/0 hits i get >>>> >>>> >>>where did you find hit showing 0/0? >>>it works fine for me(i've just explored 150 urls..and no 0/0 hits ) >>>did you remeber number of total hits?(if it's same - i >>> >>> >>experimented with >> >> >>>previous version of nutchwax,starting tomcat on various instances) >>> >>>i had for word "kniha" >>>Total number of versions found : 49087. Displaying URL's 1-10 >>> >>>-lm >>> >>> >>> >>>>'Sorry, no documents with the given uri were found'. The >>>> >>>> >>url displyed >> >> >>>>seems fine, but if you look in the source of the >>>> >>>> >>uppermost frame you >> >> >>>>will see that the url sent to the script was >>>>http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=12&start=V. >>>>The & separating the parameters irj and start has been >>>> >>>> >>replaced by its >> >> >>>>html character entity reference. >>>> >>>>If i press the go button now the url submitted to the >>>> >>>> >>script will be ok. >> >> >>>>If i look in the NutchWax result set of the initial >>>> >>>> >>search (add &debug=1 >> >> >>>>to the search url to bring out the NutchWax search urls) >>>> >>>> >>i see that the >> >> >>>>url (link element) returned is wrong already here. >>>> >>>>Conclusion : NutchWax mangles the url returned by introducing html >>>>entities instead of keeping the url in its original form. >>>> >>>>What version of NutchWax are you using? >>>> >>>>Sverre >>>> >>>>On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: >>>> >>>> >>>>>This looks like the same (or very similar) problem as >>>>> >>>>> >>I've got. I've >> >> >>>>been discussing it (offlist) with Stack and Sverre Bang, >>>> >>>> >>so I know it is >> >> >>>>being looked into. >>>> >>>> >>>>>I notice in your search results (as in mine) that URIs >>>>> >>>>> >>with & in them >> >> >>>>are showing up as 0/0 versions. I believe that both >>>> >>>> >>problems are due to >> >> >>>>the escaping (or unescaping) of HTML characters in the >>>> >>>> >>NutchWAX XML that >> >> >>>>is used to pass the results to WERA. >>>> >>>> >>>>>Possibly this is a misconfiguration of either Tomcat or >>>>> >>>>> >>Apache...? >> >> >>>>>- Kris >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: arc...@li... >>>>>>[mailto:arc...@li...] >>>>>>On Behalf Of LukAALA MatAZjka >>>>>>Sent: 2. nAlvember 2005 11:21 >>>>>>To: arc...@li... >>>>>>Subject: [Archive-access-discuss] wera results >>>>>> >>>>>> >>>>>>Hi, >>>>>> >>>>>>for example >>>>>>http://war.mzk.cz/~nwa/wera/wera/index.php?query=kniha&year_fr >>>>>> >>>>>> >>>>>om=&year_to= >>>>> >>>>>description of each record is not well-displayed >>>>> >>>>>1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) >>>>>(<b> ... </b>prístupu k internetu v knihovnách >>>>> >>>>> >>>>propagovat vyuzití internetu pri >>>>zjistování názoru obyvatel 2. Anketa >>>>Pomocí krátké ankety bude zjistována >>>>nejoblíbenejsí <b>kniha</b> obyvatel >>>>Ceské republiky. Pojem nejoblíbenejsí >>>><b>kniha</b> je specifikován dalsími výklady, >>>>jako "<b>kniha</b>, která me nejvíce >>>>ovlivnila", "<b>kniha</b>, ke které se casto >>>>vracím", "<b>kniha</b>, kterou bych doporucil/a >>>>dobrým prátelum", "<b>kniha</b>, >>>>která zmenila muj zivot", "<b>kniha</b> na >>>>kterou nemohu zapomenout", "<b>kniha</b>, která mne uvedla >>>>do jiného sveta", "<b>kniha</b>, kterou bych si s >>>>sebou vzal/a jako jedinou<b> ... </b>) >>>> >>>> >>>>>Versions (matching query/total) 3/3 >>>>>Timeline | Overview >>>>> >>>>>"prístupu" should be "pLAstupu"(without diacritics >>>>> >>>>> >>>>"pristupu") >>>> >>>> >>>>>does anybody have same problem? >>>>> >>>>>-lm >>>>> >>>>> >>>>> >>>>>------------------------------------------------------- >>>>>SF.Net email is sponsored by: >>>>>Tame your development challenges with Apache's Geronimo >>>>> >>>>> >>App Server. >> >> >>>>Download >>>> >>>> >>>>>it for free - -and be entered to win a 42" plasma tv or >>>>> >>>>> >>your very own >> >> >>>>>Sony(tm)PSP. Click here to play: >>>>> >>>>> >>http://sourceforge.net/geronimo.php >> >> >>_______________________________________________ >> >> >>>>>Archive-access-discuss mailing list >>>>>Arc...@li... >>>>> >>>>> >>>>> >>https://lists.sourceforge.net/lists/listinfo/archive-access-di >> >> >scuss > > >>>> >>>>------------------------------------------------------- >>>>SF.Net email is sponsored by: >>>>Tame your development challenges with Apache's Geronimo App Server. >>>> >>>> >>>Download >>> >>> >>>>it for free - -and be entered to win a 42" plasma tv or your very own >>>>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>>>_______________________________________________ >>>>Archive-access-discuss mailing list >>>>Arc...@li... >>>>https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >>>> >>>> >>>------------------------------------------------------- >>>SF.Net email is sponsored by: >>>Tame your development challenges with Apache's Geronimo App Server. >>>Download >>>it for free - -and be entered to win a 42" plasma tv or your very own >>>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>>_______________________________________________ >>>Archive-access-discuss mailing list >>>Arc...@li... >>>https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >>> >>> >>> >> >> >> >> > > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |
|
From:
<mat...@ce...> - 2005-11-03 08:54:07
|
______________________________________________________________ > Od: st...@ar... > Komu: mat...@ce... > CC: arc...@li... > Datum: 03.11.2005 02:09 > P=F8edm=ECt: Re: [Archive-access-discuss] wera results > > Luk=E1=9A Mat=ECjka wrote: >=20 > >Hi, > > > >for example > >http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > > > >description of each record is not well-displayed > > > >1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > >(<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > >Versions (matching query/total) 3/3 > >Timeline | Overview > > > >"p=F8=EDstupu" should be "p=F8=EDstupu"(without diacritics > "pristupu") > > > >does anybody have same problem? > > > > Did you change something Luk=E1=9A? When I browse to the link given a= bove, > the display looks correct: i.e. See "p=F8=EDstupu" in the below (Hope= fully > this mail makes it across preserving original characters). > St.Ack Yes, in previous discussion Stack sent me a file - old nutchwax.war He said that problem was in opesearchservlet.... now results seem to be ok. -lm > * > 1. SKIP, Moje kniha* (http://skip.nkp.cz/akcMojekn.htm) > (* ... *p=F8=EDstupu k internetu v knihovn=E1ch propagovat vyu=9Eit=ED= internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa Pomoc=ED kr=E1tk=E9 = ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED *kniha* obyvatel =C8esk=E9 republiky. Pojem ne= jobl=EDben=ECj=9A=ED > *kniha* je specifikov=E1n dal=9A=EDmi v=FDklady, jako "*kniha*, kter=E1= m=EC nejv=EDce > ovlivnila", "*kniha*, ke kter=E9 se =E8asto vrac=EDm", "*kniha*, kter= ou bych > doporu=E8il/a dobr=FDm p=F8=E1tel=F9m", "*kniha*, kter=E1 zm=ECnila m= =F9j =9Eivot", "*kniha* > na kterou nemohu zapomenout", "*kniha*, kter=E1 mne uvedla do jin=E9h= o sv=ECta", > "*kniha*, kterou bych si s sebou vzal/a jako jedinou* ... *) > Versions (matching query/total) 3/3 > *Timeline > <http: //war.mzk.cz/%7enwa/wera/wera/result.php?time=3D"2004121218092= 8&url=3DhttpINDX3AINDX2FINDX2FskipINDXDOTnkpINDXDOTczINDX2FakcMojek= nINDXDOThtm"> > | Overview > <http: //war.mzk.cz/%7enwa/wera/wera/overview.php?url=3D"httpINDX3AIN= DX2FINDX2FskipINDXDOTnkpINDXDOTczINDX2FakcMojeknINDXDOThtm">* >=20 >=20 > >-lm > > > > > > > >------------------------------------------------------- > >SF.Net email is sponsored by: > >Tame your development challenges with Apache's Geronimo App Server. > Download > >it for free - -and be entered to win a 42" plasma tv or your very ow= n > >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > >_______________________________________________ > >Archive-access-discuss mailing list > >Arc...@li... > >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > >=20 >=20 > </http:></http:> |