From: SourceForge.net <no...@so...> - 2003-11-10 21:19:50
|
Bugs item #839289, was opened at 2003-11-10 08:00 Message generated for change (Comment added) made by hobbs You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=839289&group_id=10894 Category: 29. http Package Group: 8.4.5 >Status: Pending >Resolution: Wont Fix Priority: 7 Submitted By: Vince Darley (vincentdarley) Assigned to: Jeffrey Hobbs (hobbs) Summary: http ignores charset in meta tags Initial Comment: Many web pages (incl. Tcl'ers Wiki) place charset information in metatags, like this: <html> <head> <meta http-equiv="Content-type" content="text/html; charset=utf-8"></meta> <title>Graffiti</title> But the http package ignores these and assumes the text is in iso8859-1. This means that non-ascii characters are mangled with something as simple as: set token [http::geturl http://alphatcl.sourceforge.net/wikit/10] set contents [http::data $token] '$contents' should now contain a spanish n~ and an o-umlaut. It actually contains garbage for both instead: spanish ene: ñ</p><p>o-umlaut: ö ---------------------------------------------------------------------- >Comment By: Jeffrey Hobbs (hobbs) Date: 2003-11-10 13:19 Message: Logged In: YES user_id=72656 The http package handles the charset parameter correct when it is in the headers. The stuff in the doc itself is left up to the user, and the correct options will handle it right. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2003-11-10 11:38 Message: Logged In: YES user_id=79902 All is not lost. Try this: encoding convertfrom identity {{spanish ene: ñ</p><p>o-umlaut: ö}} ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2003-11-10 10:34 Message: Logged In: YES user_id=32170 You may well be right (my knowledge/understanding of what ought to be correct here is certainly lacking!), but it doesn't seem a bit against the spirit of Tcl to have to go through all that hassle. If we add '-binary', this also means we are explicitly disabling the documented charset conversion the package does via http headers! This would mean an application would actually need to check both the http headers and the embedded http-equiv stuff! Possible, but a bit messy, really. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2003-11-10 10:16 Message: Logged In: YES user_id=80530 Perhaps an application that wants to support http-equiv meta tags in HTML content will need to use the -binary and -progress options of [geturl] then? The documented handling of charset refers to the HTTP header, AIUI, not to any special directive embedded in the content. Keep in mind that many kinds of content can be transferred using the HTTP protocol, not just HTML. It seems incorrect to me to put HTML-specific handling into the http package. ---------------------------------------------------------------------- Comment By: Vince Darley (vincentdarley) Date: 2003-11-10 10:04 Message: Logged In: YES user_id=32170 I think it may be too late by the time an application sees the data. The data is in utf8 and has been interpreted as iso8859-1 and then possibly dumped down a channel. It's not clear to me that is a non-destructive/reversible operation. In any case, the http package is documented to handle charsets and convert to utf-8. I quote: "charset The value of the charset attribute from the Content-Type meta-data value. If none was specified, this defaults to the RFC standard iso8859-1, or the value of $::http::defaultCharset. Incoming text data will be automatically converted from this charset to utf-8. " ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2003-11-10 08:56 Message: Logged In: YES user_id=80530 Isn't this issue something to be taken care of by the application, and not the http package? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=839289&group_id=10894 |