|
From:
<mat...@ce...> - 2005-11-02 13:28:31
|
if i use=20 http://war.mzk.cz:8080/nutchwax/search.jsp?query=3Dkniha&hitsPerPage=3D= 10 interface to nutchwax, description looks fine, so problem is in sevlet opensearch i guess... l. ______________________________________________________________ > Od: sve...@nb... > Komu: arc...@li... > CC: Luk=E1=9A Mat=ECjka <mat...@ce...> > Datum: 02.11.2005 14:07 > P=F8edm=ECt: Re: [Archive-access-discuss] wera results > > The output from nutchwax is partly mangled. See > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dkniha&start=3D0&hi= tsPerPage=3D10&hitsPerDup=3D1&dedupField=3Dexacturl > where the contents of the description element is garbage while the co= ntents > of the title element looks fine (!?).=20 >=20 > As an example the text >=20 > =E8asnosti =8E=EF=E1rsk=FDch vrch=F9 a Hornosvrateck=E9 hornatiny > (taken from the html source of timeline view) has in nutchwax > description element become >=20 > 69;asnosti =8E=EF=E1rsk=FDch vrch=F9 a > Hornosvrateck=E9 hornatiny >=20 > An observation that may or may not have something to do with this: > NutchWax does a more or less educated guess of the encoding used in t= he > page. For the example it guessed windows-1252 which i believe is clos= er > to iso-8859-1 than to the actual encoding specified in the example > source, iso-8859-2. >=20 > I'll keep looking. >=20 > Sverre >=20 > On Wed, 2005-11-02 at 12:20 +0100, Luk=E1=9A Mat=ECjka wrote: > > Hi, > >=20 > > for example > > > http://war.mzk.cz/~nwa/wera/wera/index.php?query=3Dkniha&year_from=3D= &year_to=3D > >=20 > > description of each record is not well-displayed > >=20 > > 1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) > > (<b> ... </b>p=F8=EDstupu k internetu v knihovn=E1ch > propagovat vyu=9Eit=ED internetu p=F8i > zji=9A=9Dov=E1n=ED n=E1zor=F9 obyvatel 2. Anketa > Pomoc=ED kr=E1tk=E9 ankety bude zji=9A=9Dov=E1na > nejobl=EDben=ECj=9A=ED <b>kniha</b> obyvatel > =C8esk=E9 republiky. Pojem nejobl=EDben=ECj=9A=ED > <b>kniha</b> je specifikov=E1n dal=9A=EDmi v=FDklady, > jako "<b>kniha</b>, kter=E1 m=EC nejv=EDce > ovlivnila", "<b>kniha</b>, ke kter=E9 se =E8asto > vrac=EDm", "<b>kniha</b>, kterou bych doporu=E8il/a > dobr=FDm p=F8=E1tel=F9m", "<b>kniha</b>, > kter=E1 zm=ECnila m=F9j =9Eivot", "<b>kniha</b> na > kterou nemohu zapomenout", "<b>kniha</b>, kter=E1 mne uvedla > do jin=E9ho sv=ECta", "<b>kniha</b>, kterou bych si s > sebou vzal/a jako jedinou<b> ... </b>) > > Versions (matching query/total) 3/3 > > Timeline | Overview > >=20 > > "p=F8=EDstupu" should be "p=F8=EDstupu"(without diacritics > "pristupu") > >=20 > > does anybody have same problem? > >=20 > > -lm > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > SF.Net email is sponsored by: > > Tame your development challenges with Apache's Geronimo App Server. > Download > > it for free - -and be entered to win a 42" plasma tv or your very o= wn > > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.ph= p > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 |