Hi,
we still have a few problems with output from nutchwax...
I'd like to ask you for little help.
XML output from nutchwax servlet
http://war.mzk.cz:8080/nutchwax/opensearch?query=gradu%C3%A1l+louck%C3%BD&hitsPerDup=1&hitsPerPage=10
consists of html entities like
Graduál
is that right? Nutchwax should retrieve xml with html entities?(instead of characters in utf-8? like gradu%C3%A1l )
what's the difference between these cases?
1)
http://war.mzk.cz:8080/nutchwax/opensearch?query=gradu%C3%A1l+louck%C3%BD&start=0&hitsPerDup=0&hitsPerPage=10&dedupField=exacturl
->output is not valid xml(called from WERA)
2)
http://war.mzk.cz:8080/nutchwax/opensearch?query=gradu%C3%A1l%20louck%C3%BD&start=0&hitsPerPage=10&hitsPerDup=1&dedupField=exacturl
output is valid xml(called from Nutchwax search.jsp)
Another confusing issue for me:) characters in entity "title" are well-displayed, but text in entity "description" consist of html entities(as i described above)
thanks for any help
lukas
|