From: Martin <cin...@gm...> - 2004-03-15 23:35:18
|
Am Mon, den 15.03.2004 schrieb sfk...@hs... um 23:04: > Regaring the problem reading http://www.heise.de/newsticker/heise.rdf: > > 0. To your question: The encoding declaration ISO-8859-1 in the prolog refers to the content. The tags/attributes are following the rules of XML names (see chap. 2.3 Common Syntactic Constructs for XML names and chap. 4.3.3 Character Encoding in Entities for encoding declarations). > > 1. What is strange, is, that when I wget this XML file and save it to my localhost directory (where my local apache is running), rssview and crimson are able to read it! > > Another example, where rssview can flawlessly read is http://xml.newsisfree.com/feeds/29/629.xml <http://xml.newsisfree.com/feeds/29/629.xml> which also has ISO-8859-1 encoding. So, it is not really an encoding problem. > > What makes me suspicious is rather the following: > > 2. wget reports "Length: unspecified" which usually is "Length: 1,696": whatever length it really is, crimson perhaps does'nt like unspecified lengths? Maybe it's a problem with the HTTP-headers that are sent by heise. I have seen a similar problem with simple HTML-pages. Heise is sending them in ISO-8859-1 (HTTP-header) and they often still contain the Euro symbol. My browser has problems with it. I have to find out more about the communication while crimson is fetching the pages. You can look at the headers that are sent by heise with "wget -S". I don't see any encoding headers there. Maybe crimson says, it prefers to have UTF-8 and heise says "OK" and does not specify the encoding? (My tests show nothing suspicious until now. Maybe it's really the length.) > 3. The top level exception of the esxeption stack report is issueing: > org.xml.sax.SAXParseException: Zeichenumwandlungsfehler: "Malformed UTF-8 char - > > - is an XML encoding declaration missing?" (Zeilenzahl m÷glicherweise zu niedrig) > The german message in parantheses does'nt really have something to do with encoding (free translation it reads "Number of lines perhaps too small")?? I really don't have an idea. If you find something out, please be so kind and tell me. Martin |