#3721 http::Event encoding (RSS/UTF-8)

obsolete: 8.4.15
closed-wont-fix
5
2007-06-13
2007-06-08
Roger Niva
No

This snippet of code from http::Event assumes that all non-text content is binary:

if {$state(-binary) || ![string match -nocase text* $state(type)]
|| [string match *gzip* $state(coding)]
|| [string match *compress* $state(coding)]} {
# Turn off conversions for non-text data
fconfigure $s -translation binary

We are using the http package to download RSS feeds in UTF-8 encoding.
Mimetype: application/rss+xml

And the client is running system encoding UTF-8.

This causes the RSS feed to have double UTF-8 encoded strings.

We did a temporary fix to our client while we wait for a bugfix:

if {$state(-binary) || (![string match -nocase text* $state(type)] && ![string match -nocase *xml* $state(type)])

Discussion

  • Donal K. Fellows

    Logged In: YES
    user_id=79902
    Originator: NO

    set decoded [encoding convertfrom utf-8 [http::data $token]]

     
  • Roger Niva

    Roger Niva - 2007-06-08

    Logged In: YES
    user_id=729740
    Originator: YES

    >set decoded [encoding convertfrom utf-8 [http::data $token]]

    Yes, this is what I first thought of, but I don't think we can assume that every RSS feed is in UTF-8?
    Besides there are other application/*+xml types that should also be handled in http::Event.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902
    Originator: NO

    Well, when you're not certain what is going on with the encoding you *really* want the data in binary form and then (after picking apart WTF is going on) using [encoding convertfrom]. Like that you stand a chance of actually getting to a meeting of minds with the webserver (assuming that's not fundamentally confused, which happens unfortunately often...)

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-06-10

    Logged In: YES
    user_id=72656
    Originator: NO

    Doesn't the 'charset' field of the state array deal with this for you? Convert to the named charset when present.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-06-10
    • status: open --> pending-wont-fix
     
  • Roger Niva

    Roger Niva - 2007-06-11

    Logged In: YES
    user_id=729740
    Originator: YES

    Probably, but isn't that a "private" variable?
    There should, at the very least, exist a proc for returning the content-type so you don't have to muck around with variables that may change in a future version.

     
  • Roger Niva

    Roger Niva - 2007-06-11
    • status: pending-wont-fix --> open-wont-fix
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-06-12

    Logged In: YES
    user_id=72656
    Originator: NO

    The charset key of the state variable is a documented key, along with how to access it, in the http docs.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-06-12
    • status: open-wont-fix --> pending-wont-fix
     
  • Roger Niva

    Roger Niva - 2007-06-13

    Logged In: YES
    user_id=729740
    Originator: YES

    I bow down to the wisdom of the great elders of tickle.
    Thanks for prompt responses and patience with my ignorance.

     
  • Roger Niva

    Roger Niva - 2007-06-13
    • status: pending-wont-fix --> closed-wont-fix