Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I have a xmltv.xml with html codes like &#nnn or ´ and the tv_cat and other tools change this code by &nnn; or &acute;
Whem the "&" stay alone they dont' change.
Sorry for my bad english.
Can you post an example?
The xmltv format doesn't support the usual HTML entities and I don't know a good reason to change that.
same root cause as #1101376
the bugs seem to be rooted in changing behaviour of XML::Twig which still
has quite a bunch of related open bugs over at CPAN (no updates for >3
I don't see how we can work around them without requiring a newer
XML::Twig or ditching XML::Twig in favour of some other Library.
> I have a xmltv.xml with html codes like &#nnn or ´ and the tv_cat
> and other tools change this code by &nnn; or &acute;
This is easily explained by the trite "XML is not HTML" ;-) You are trying to use HTML entities in an XML file.
If you run your source file through tv_validate_file you will get an error
parser error : Entity 'acute' not defined
Similarly if you try to open the file in a browser:
XML Parsing Error: undefined entity
The only predefined entities in XML are " & ' < >
To use anything else it must be defined in the DTD. In other words the DTD would need an
<!ENTITY acute "´">
for every html entity you might possibly use.
(see http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent )
Now, since you haven't defined ´ as an entity, Twig sees it as just a string of characters: $-a-c-u-t-e-; And since "&" is a reserved character it converts this character to the predefined entity &. Hence you then have "&cute;" as you've seen.
This isn't a bug; it's what xml twig/writer is supposed to do!
It should be possible to use numerical character references such as &#nnn; but this obviously depends on what has been coded into the library module. It seems XML::Twig doesn't know about ´ and so (wrongly) behaves as above.
Why not simply use the character equivalent of the html entity you are trying to use? e.g. 0xB4 for ISO8859, or C2 B4 for UTF-8. There is no need to use HTML entities at all, in an XML file.