I have a xmltv.xml with html codes like &#nnn or ´ and the tv_cat and other tools change this code by &nnn; or ´
Whem the "&" stay alone they dont' change.
same root cause as #1101376
the bugs seem to be rooted in changing behaviour of XML::Twig which still
has quite a bunch of related open bugs over at CPAN (no updates for >3
years now)
see: https://rt.cpan.org/Public/Dist/Display.html?Name=XML-Twig
I don't see how we can work around them without requiring a newer
XML::Twig or ditching XML::Twig in favour of some other Library.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now, since you haven't defined ´ as an entity, Twig sees it as just a string of characters: $-a-c-u-t-e-; And since "&" is a reserved character it converts this character to the predefined entity &. Hence you then have "&cute;" as you've seen.
This isn't a bug; it's what xml twig/writer is supposed to do!
It should be possible to use numerical character references such as &#nnn; but this obviously depends on what has been coded into the library module. It seems XML::Twig doesn't know about ´ and so (wrongly) behaves as above.
Why not simply use the character equivalent of the html entity you are trying to use? e.g. 0xB4 for ISO8859, or C2 B4 for UTF-8. There is no need to use HTML entities at all, in an XML file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you post an example?
The xmltv format doesn't support the usual HTML entities and I don't know a good reason to change that.
same root cause as #1101376
the bugs seem to be rooted in changing behaviour of XML::Twig which still
has quite a bunch of related open bugs over at CPAN (no updates for >3
years now)
see: https://rt.cpan.org/Public/Dist/Display.html?Name=XML-Twig
I don't see how we can work around them without requiring a newer
XML::Twig or ditching XML::Twig in favour of some other Library.
This is easily explained by the trite "XML is not HTML" ;-) You are trying to use HTML entities in an XML file.
If you run your source file through tv_validate_file you will get an error
Similarly if you try to open the file in a browser:
The only predefined entities in XML are " & ' < >
To use anything else it must be defined in the DTD. In other words the DTD would need an
for every html entity you might possibly use.
(see http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent )
Now, since you haven't defined ´ as an entity, Twig sees it as just a string of characters: $-a-c-u-t-e-; And since "&" is a reserved character it converts this character to the predefined entity &. Hence you then have "&cute;" as you've seen.
This isn't a bug; it's what xml twig/writer is supposed to do!
It should be possible to use numerical character references such as &#nnn; but this obviously depends on what has been coded into the library module. It seems XML::Twig doesn't know about ´ and so (wrongly) behaves as above.
Why not simply use the character equivalent of the html entity you are trying to use? e.g. 0xB4 for ISO8859, or C2 B4 for UTF-8. There is no need to use HTML entities at all, in an XML file.