Menu

#11 High ASCII characters in feed generate parsing errors

closed-wont-fix
None
5
2003-08-27
2003-07-02
Curt Lewis
No

when AmphetaDesk retrieves a feed, if there are high
ASCII characters (those that generate special characters
like an e with an accent mark over it, etc) it will usually
result in a parsing error for that channel.

Discussion

  • Morbus Iff

    Morbus Iff - 2003-07-02

    Logged In: YES
    user_id=69804

    And this is the correct behavior. If the feed truly wants to use high
    ASCII characters, then the producer of the feed needs to
    knowingly change the feed's XML character set to support them.
    However, smart quotes, emdashes, umlauts, etc., are not allowed
    in standard/default XML encoding.

     
  • Morbus Iff

    Morbus Iff - 2003-07-02
    • priority: 5 --> 1
    • status: open --> open-wont-fix
     
  • Chad Cloman

    Chad Cloman - 2003-08-23

    Logged In: YES
    user_id=810746

    I have a similar problem with one of my feeds. Here is a
    workaround that suffices for me (although I do not guarantee
    its applicability in all circumstances). It is for
    AmphetaDesk version 0.93.1.

    In AmphetaDesk::Channels::load_channel() (in Channels.pm),
    add the following code immediately prior to the call to XMLin():

    # Add encoding for The Register
    if ($channel_filename =~ /register/i) {
    $channel_xml =~ s/\?>/ encoding=\"ISO-8859-1\"\?>/;
    }

    My example is for a specific feed, The Register, that I know
    does not include the character set encoding. You must change
    the search string "register" to match a unique part of the
    local filename for the feed you wish to modify, and you may
    also need to change the encoding from ISO-8859-1 to
    something appropriate for your feed.

    The effect of this (highly imperfect) workaround is to
    change the text declaration from
    <?xml version="xxx"?>
    to
    <?xml version="xxx" encoding="ISO-8859-1"?>

    Use at your own peril!

     
  • Chad Cloman

    Chad Cloman - 2003-08-23

    Logged In: YES
    user_id=810746

    The code fragment in my previous comment is split into two
    lines by the display. It should be one line, with a single
    space between the "/" and the word "encoding", like so:

    s/\?>/ encoding=\"...\"\?>/

     
  • Morbus Iff

    Morbus Iff - 2003-08-27
    • priority: 1 --> 5
    • status: open-wont-fix --> closed-wont-fix
     
  • Morbus Iff

    Morbus Iff - 2003-08-27
    • assigned_to: nobody --> morbus
     

Log in to post a comment.