From: Arjohn K. <arj...@ad...> - 2007-06-25 11:59:12
|
Christiaan Fluit wrote: > Leo Sauermann wrote: >> I see a few bugs left (wrong content-type, should be "rdf/xml", support >> for HTTP-GET when passing a URL) but otherwise I would heavily advertise >> this thing as an aperture showcase. > > Great idea! > > I also spotted a glitch: when processing http://www.w3.org/, the > resulting XML cannot be shown by IE: it barks about the ö in Österreich. > This could be a Sesame problem: I believe they chose to just assume > UTF-8 encoding rather than processing each string, for performance > reasons. I'll ask Arjohn when he gets back from holiday. > > I would rename "web site" to "web page", the former may give the > impression that we are going to crawl the entire site. This is clearly a character encoding issue. For whatsoever reason, IE ignores the character encoding specified by the server and looks at the encoding specified in the XML file instead. The easiest way to fix this is probably to change the encoding of the servlet's output to UTF-8 (currently uses ISO-8859-1). You may also consider to change the result's MIME type from "text/plain" to "application/rdf+xml". Hope this helps, Arjohn |