Menu

#569 Emails and Web pages with non-ASCII Latin-1 characters are not filed

v1.0 (example)
closed-fixed
None
5
2015-09-16
2015-04-03
Ahasuerus
No

Emails and Web pages with non-ASCII Latin-1 characters are not filed by the software. This happens because they cause a MySQL error at insertion time, which is then caught by the try-except mechanism, which causes the filing logic to skip the problem record. Non-ASCII Latin-1 characters are invalid in URLs, but the software should be properly escaping and filing them if they are entered. The underlying technical problem is inconsistent use of XML escaping and encoding. For example, artist and author names are handled as follows: "data = XMLunescape(artist.firstChild.data.encode('iso-8859-1'))" while emails and URLs simply use "address = webpage.firstChild.data"

Discussion

  • Ahasuerus

    Ahasuerus - 2015-09-16
    • summary: Emails and Web pages with Unicode characters are not filed --> Emails and Web pages with non-ASCII Latin-1 characters are not filed
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1 @@
    -Emails and Web pages with Unicode characters are not filed by the software. This happens because they cause a MySQL error at insertion time, which is caught by the try-except mechanism, which causes the filing logic to skip the problem record. Unicode characters are invalid in URLs, but the software should be properly escaping and filing them if they are entered. The underlying technical problem is inconsistent use of XML escaping and encoding. For example, artist and author names are handled as follows: "data = XMLunescape(artist.firstChild.data.encode('iso-8859-1'))" while emails and URLs simply use "address = webpage.firstChild.data"
    +Emails and Web pages with non-ASCII Latin-1 characters are not filed by the software. This happens because they cause a MySQL error at insertion time, which is then caught by the try-except mechanism, which causes the filing logic to skip the problem record. Non-ASCII Latin-1 characters are invalid in URLs, but the software should be properly escaping and filing them if they are entered. The underlying technical problem is inconsistent use of XML escaping and encoding. For example, artist and author names are handled as follows: "data = XMLunescape(artist.firstChild.data.encode('iso-8859-1'))" while emails and URLs simply use "address = webpage.firstChild.data"
    
    • assigned_to: Ahasuerus
     
  • Ahasuerus

    Ahasuerus - 2015-09-16
    • status: open --> closed-fixed
     
  • Ahasuerus

    Ahasuerus - 2015-09-16

    Fixed in:

    mod/award_cat_new_file.py 1.5
    mod/award_cat_update_file.py 1.4
    mod/award_type_new_file.py 1.3
    mod/award_type_update_file.py 1.7
    mod/pa_new.py 1.23
    mod/sa_update.py 1.11
    mod/ta_update.py 1.18
    mod/xa_update.py 1.9
    mod/za_update.py 1.6
    

    Installed in r2105-136 on 2015-09-15. Closing.

     

Anonymous
Anonymous

Add attachments
Cancel