Thread: [Archive-access-discuss] Re: nutchwax

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

i still can't handle this issue..

does anybody know how to help?
Can NutchWAX produce output with html entities?(Output from NutchWAX sh=
loud be utf,shouldn't be?)
Because (in cases written below) invalid xml is caused by special chara=
cters in html entties.

thanks for any help

-lm

______________________________________________________________
> Od: sve...@nb...
> Komu: stack <st...@ar...>
> CC: Luk=E1=9A Mat=ECjka <mat...@ce...>
> Datum: 12.01.2006 10:38
> P=F8edm=ECt: Re: nutchwax
>
> Hi Michael, Luk=E1=9A ..
>=20
> On Thursday 12 January 2006 01:33, stack wrote:
> > Luk=E1=9A Mat=ECjka wrote:
> ...
> > > what's the difference between these cases?
> > >
> > > 1)
> > >
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck=
%C3%BD
> > >&start=3D0&hitsPerDup=3D0&hitsPerPage=3D10&dedupField=3Dexacturl -=
>output is
> not
> > > valid xml(called from WERA)
> > >
> > > 2)
> > >
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou=
ck%C3%
> > >BD&start=3D0&hitsPerPage=3D10&hitsPerDup=3D1&dedupField=3Dexacturl=
 output is
> valid
> > > xml(called from Nutchwax search.jsp)
> >
>=20
> If i try the above urls i find quite the opposite! Case 1 produces va=
lid
> XML,=20
> case 2 produces invalid XML.
>=20
> Test results:
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck=
%C3%BD
> -> valid XML
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou=
ck%C3%BD
> -> valid XML
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck=
%C3%BD&hitsPerDup=3D0&dedupField=3Dexacturl
> -> valid XML
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou=
ck%C3%BD&hitsPerDup=3D0&dedupField=3Dexacturl
> -> valid XML
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck=
%C3%BD&hitsPerDup=3D1&dedupField=3Dexacturl
> -> INVALID XML
>=20
> http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou=
ck%C3%BD&hitsPerDup=3D1&dedupField=3Dexacturl
> -> INVALID XML
>=20
> Setting hitsPerDup=3D2 results in valid XML
>=20
> Conclusion:=20
> A specific record in the index contains invalid XML chars, and it is =
only
> part=20
> of the result list when hitsPerDup=3D1. Setting hitsPerDup=3D0 and st=
art=3D10
> will=20
> produce a result list including the invalid XML. chars record.
>=20
> I don't know if the above was of any help to you, i just had to say
> something=20
> about it ;-)
>=20
> Sverre
>

Thread: [Archive-access-discuss] Re: nutchwax

archive-access-discuss