|
From: Martin <cin...@gm...> - 2004-03-15 17:13:32
|
Am Mon, den 15.03.2004 schrieb sfk...@hs... um 0:55: > Yet, when adding the most popular german news feed, http://www.heise.de/newsticker/heise.rdf, crimson throws an exception (see below). > > Seems to be a crimson problem: The size or a token are invalid. > => So, how to fix it? > => Do you know this problem? > > Many thanks and regards, Stefan > _________________________________________________ > Prof. Stefan F. Keller, Abteilung Informatik HSR, www.integis.ch > > > Error while fetching channel: http://www.heise.de/newsticker/heise.rdf > org.xml.sax.SAXParseException: Zeichenumwandlungsfehler: "Malformed UTF-8 char - > - is an XML encoding declaration missing?" (Zeilenzahl m÷glicherweise zu niedrig) > at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100) > at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072) > at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(Input > Entity.java:914) > at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1048) > at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:520) > at org.apache.crimson.parser.Parser2.parse(Parser2.java:318) > at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) > at de.cinek.rssview.RssParser.parse(RssParser.java:85) > at de.cinek.rssview.RssChannel.fetchNews(RssChannel.java:195) > at de.cinek.rssview.RssFetchThread.run(RssFetchThread.java:35) Hello, first I want to say that I can confirm the problem. I'm sending it to the rssview developers to discuss this issue, because I don't know, if it's heise's or crimson's fault. Short explanation: - heise.de is a german news site that uses "ISO-8859-1" for their rss-feeds - it's declared at the top of the document (xml-prolog) - in the text _content_ you can see lots of German ISO-8859-1 characters Question is: What does the encoding declaration in the prolog refer to? Is it the encoding of the tags/attributes etc. or the content (between the tags)? Or perhaps both? I've been looking for information on this topic on W3C.org. I haven't found anything so far. Martin |