Menu

#316 HTML codes changed by tv_cat tv_grep tv_sort

none
closed-wont-fix
nobody
tv_cat (2)
5
2014-05-09
2008-03-08
Anonymous
No

I have a xmltv.xml with html codes like &#nnn or &acute and the tv_cat and other tools change this code by &nnn; or ´
Whem the "&" stay alone they dont' change.

Sorry for my bad english.

Discussion

  • Nick Morrott

    Nick Morrott - 2008-08-26
    • labels: --> tv_cat
     
  • Karl Dietz

    Karl Dietz - 2010-09-27

    Can you post an example?
    The xmltv format doesn't support the usual HTML entities and I don't know a good reason to change that.

     
  • Karl Dietz

    Karl Dietz - 2010-10-26

    same root cause as #1101376
    the bugs seem to be rooted in changing behaviour of XML::Twig which still
    has quite a bunch of related open bugs over at CPAN (no updates for >3
    years now)
    see: https://rt.cpan.org/Public/Dist/Display.html?Name=XML-Twig

    I don't see how we can work around them without requiring a newer
    XML::Twig or ditching XML::Twig in favour of some other Library.

     
  • Geoff

    Geoff - 2014-05-09
    > I have a xmltv.xml with html codes like &#nnn or &acute and the tv_cat
    > and other tools change this code by &nnn; or ´
    

     

    This is easily explained by the trite "XML is not HTML" ;-) You are trying to use HTML entities in an XML file.

    If you run your source file through tv_validate_file you will get an error

       parser error : Entity 'acute' not defined
    

    Similarly if you try to open the file in a browser:

       XML Parsing Error: undefined entity
    

     
    The only predefined entities in XML are " & ' < >

    To use anything else it must be defined in the DTD. In other words the DTD would need an

       <!ENTITY acute "&#180;">
    

    for every html entity you might possibly use.

    (see http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent )

     
    Now, since you haven't defined &acute; as an entity, Twig sees it as just a string of characters: $-a-c-u-t-e-; And since "&" is a reserved character it converts this character to the predefined entity &amp;. Hence you then have "&amp;cute;" as you've seen.

    This isn't a bug; it's what xml twig/writer is supposed to do!

     
    It should be possible to use numerical character references such as &#nnn; but this obviously depends on what has been coded into the library module. It seems XML::Twig doesn't know about &#180; and so (wrongly) behaves as above.

     
    Why not simply use the character equivalent of the html entity you are trying to use? e.g. 0xB4 for ISO8859, or C2 B4 for UTF-8. There is no need to use HTML entities at all, in an XML file.

     
  • Geoff

    Geoff - 2014-05-09
    • status: open --> closed-wont-fix
    • Group: --> none
     

Log in to post a comment.