
Extra characters in some titles in XML file

Chris J
  • Chris J

    Chris J - 2016-06-17

    Every day examples using Freesat EPG data are :-
    CNN Newsroom becomes CNNN Newsroom
    NHK Newsline becomes NHKK Newsline
    CNBC Debate becomes CNNBC Debate

    Also seen in Freesat
    On PBS America a programme about JFK and LBJ became JFK and LBBJJ and JFK Jr became JFK Jrr.

    Others are more random one off examples on Freeview:-
    On 5 HD a programme about the KKK had KKKKK in the title
    BBC 4 HD Top of the Pops: 1982 becomes 19822

    4 HD seems the only one to corrupt the episode numbers:-
    Such as Ep 1/3 becomes Ep 11/3 and EP 7/8 becomes EP 77/8

    I have been collecting the info for several weeks trying to detect a pattern but for example some occurances of CNN are normal. Perhaps there is something in common with the Freesat channels as the ones I have noticed with a problem are all none UK. The rest seem to be HD EPG related.

    Its not an important problem as it has not caused me to miss a recording because of an incorrect title but I suppose it may.


  • Steve Bickell

    Steve Bickell - 2016-06-17

    As soon as you spot this I need a transport stream dump and an example.

    I then have some chance of tracking it down.

  • Chris J

    Chris J - 2016-06-18

    First have included link to Freesat dump:-

    Examples of extra characters appearing:-
    All occurances of "NHK Newsline" are "NHKK Newsline"
    CNN Marketplace becomes CNNN Marketplace
    CNN Newsroom title ok but is CNNN Newsroom in description
    CNBC Debate becomes CNNBC Debate
    BBC One West "MOTD Live" becomes "MOOTDD Live"

    I did EPG update and checked the EPG on the TV and found the above errors. I then checked the EPG Collector Freesat XML file to confirm that all the errors were in the file.

    The BBC One and BBC One HD version on Freeview display MOTD correctly
    I will leave the Freeview dump unil you have had a look at this


  • Steve Bickell

    Steve Bickell - 2016-06-30

    Apologies for the delay but I've been very busy lately. Anyway I'm back on the case now.

    The event name is Huffman compressed in exactly the same way as the programme description and I've worked through decompressing some of the titles and it looks like the multiple letters are actually in the data that is broadcast. Considering that thousands of titles and descriptions are decoded correctly also leads me to that conclusion.

    The episode numbers on FreeSat are broadcast as part of the programme description in various formats.

    The Freeview HD channels are compressed in the same way as FreeSat if I remember correctly.

    Is there any way you can check the data with an STB?

  • Chris J

    Chris J - 2016-06-30

    Thanks its not a show stopper but is a little irritating! Its only been a problem since I bought a DVB-S2 and DVB-T2 tuner several months ago. The first thing I did was look at the Freesat STB EPG which was always normal. I assumed as there were no other complaints I must have an incorrect EPG Collector setting somewhere like several months ago when I should have left the location “Undefined” in Advanced Settings.

    Its the randomness which makes it difficult to trace. The Freesat news channels have regular programmes which cause the problem but on the Freeview channels there are no regular programmes which cause the problem. BBC have had programme for the last two weeks called MOTD (Match of the Day) but HD version comes out MOOTDD!! CNN is not always translated to CNNN when it appears in the Freesat EPG.

    The work around for Freeview HD channels is to map the EPG to use the SD version of the channel data which is what I have now done.


  • Steve Bickell

    Steve Bickell - 2016-06-30

    I've changed my mind on this. It's a fault in the decompression code when it encounters escape sequences. They are used to deal with character sequences outside of those defined in the Huffman decompression tables. For example, normally in English there is no word where K can follow H so there is no specific entry in the decompression tables for that sequence. In the data stream an escape sequence is used to encode it.

    I put a change in a while ago to fix a problem with escape sequences and it looks like it screwed it up when the escape sequence only applied to a single byte.

    I'm working on a fix.


    Last edit: Steve Bickell 2016-07-01
  • Steve Bickell

    Steve Bickell - 2016-07-01

    Try the attached. It should fix the problem.

  • Chris J

    Chris J - 2016-07-01

    Thanks Steve but it may be Sunday before I am able to install and test it.


  • Chris J

    Chris J - 2016-07-03

    I have installed the update and it seems to have cured the predictable problems on Freesat such as CNN becoming CNNN, NHK becoming NHKK etc. The problems on Freeview were less predictable but the one where MOTD becomes MOOTDD has certainly been cured.

    A big thankyou for that Steve. I will keep an eye on the EPG in case any have sliped through!


  • Steve Bickell

    Steve Bickell - 2016-07-03

    The sequences to watch out for are those where a character pair can't occur normally. For example there are no words that I know of that have T followed by D.

  • Chris J

    Chris J - 2016-07-04

    I now understand the reason the problem seemed so random is it didnt happen to normal words only to abnormal combination of characters.

    I did wonder why no one else noticed the problem but I presume its because over 90% of the problem was with Freesat international news channels. Its taken me ages searching the Freeview EPG today to find characters which previously caused a problem.

    Thanks again


  • Chris J

    Chris J - 2016-08-04

    I have noticed another Freesat and Freeview HD corruption in the XML file connected with the UK pound sign where the number following the "£" is repeated. So £400,000 becomes £4400,000 and £300k becomes £3300k. I upgraded to FP5 but it makes no difference.

    There is an obsession with house buying programmes in the UK so it happens a lot in the Description but only once in the Title which is what I noticed so very minor problem!


  • Steve Bickell

    Steve Bickell - 2016-08-05

    I'll take a look.


Log in to post a comment.