|
From:
<mat...@ce...> - 2006-02-09 12:08:33
|
Hi, i still can't handle this issue.. does anybody know how to help? Can NutchWAX produce output with html entities?(Output from NutchWAX sh= loud be utf,shouldn't be?) Because (in cases written below) invalid xml is caused by special chara= cters in html entties. thanks for any help -lm ______________________________________________________________ > Od: sve...@nb... > Komu: stack <st...@ar...> > CC: Luk=E1=9A Mat=ECjka <mat...@ce...> > Datum: 12.01.2006 10:38 > P=F8edm=ECt: Re: nutchwax > > Hi Michael, Luk=E1=9A .. >=20 > On Thursday 12 January 2006 01:33, stack wrote: > > Luk=E1=9A Mat=ECjka wrote: > ... > > > what's the difference between these cases? > > > > > > 1) > > > > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck= %C3%BD > > >&start=3D0&hitsPerDup=3D0&hitsPerPage=3D10&dedupField=3Dexacturl -= >output is > not > > > valid xml(called from WERA) > > > > > > 2) > > > > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou= ck%C3% > > >BD&start=3D0&hitsPerPage=3D10&hitsPerDup=3D1&dedupField=3Dexacturl= output is > valid > > > xml(called from Nutchwax search.jsp) > > >=20 > If i try the above urls i find quite the opposite! Case 1 produces va= lid > XML,=20 > case 2 produces invalid XML. >=20 > Test results: >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck= %C3%BD > -> valid XML >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou= ck%C3%BD > -> valid XML >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck= %C3%BD&hitsPerDup=3D0&dedupField=3Dexacturl > -> valid XML >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou= ck%C3%BD&hitsPerDup=3D0&dedupField=3Dexacturl > -> valid XML >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l+louck= %C3%BD&hitsPerDup=3D1&dedupField=3Dexacturl > -> INVALID XML >=20 > http://war.mzk.cz:8080/nutchwax/opensearch?query=3Dgradu%C3%A1l%20lou= ck%C3%BD&hitsPerDup=3D1&dedupField=3Dexacturl > -> INVALID XML >=20 > Setting hitsPerDup=3D2 results in valid XML >=20 > Conclusion:=20 > A specific record in the index contains invalid XML chars, and it is = only > part=20 > of the result list when hitsPerDup=3D1. Setting hitsPerDup=3D0 and st= art=3D10 > will=20 > produce a result list including the invalid XML. chars record. >=20 > I don't know if the above was of any help to you, i just had to say > something=20 > about it ;-) >=20 > Sverre > |