From: Ed A. <ed...@me...> - 2003-04-28 19:22:15
|
On Mon, 28 Apr 2003, Christoph P=E4per wrote: >Just a very minor thing: ISO 8601 uses CCYY instead of YYYY. What I meant was 'a four digit year'. Is CCYY rather than YYYY the preferred terminology for saying this? >Is it an option to make it fully standard conform, i.e. with >optional dashes, omittable values and a "T" between date and time? >Maybe even day of year, day of week notation. I don't think this is needed. The format does 'conform' to ISO 8601 in the sense that legal dates in the XMLTV file are always ISO 8601 dates; however the dual, that any ISO 8601 date is legal in XMLTV, isn't so necessary. I would like to have a single date format to get towards a property of canonicity - that is, two different XMLTV files have different meanings, after stripping comments and whitespace. It isn't necessary to have multiple ways to express the same time. ><timeslot start=3D"20:00:00" channel=3D"das-erste.de" liveness=3D"joined"> No, definitely not that. The full date and time for every timeslot. Otherwise it gets much too complicated especially when filtering listings or sorting them. (Another advantage of having a single date format is that it's quick to sort listings into date order.) >>Within programme data, textual element content is normally free text >>and has an optional 'lang' attribute associated with it (which should >>look like 'en' or 'en_US'). > >Why not using the generic xml:lang Can you explain more about this? I admit I am no expert on XML. >and preferably "en-US" (with dash, not underscore)? I was wondering about that but I couldn't find the definitive standard to say whether it should be - or _. I thought it was ISO 3316 but that turned out, Kryten-like, to be 'assembly tools for screws and nuts'. Perhaps ISO 3166 is the one I want, but ISO standards don't seem to be on the web. Ah, it looks like RFC 1766 is the one. Yes, it does specify hyphen not underscore. I have fixed this. >> <generator-info href=3D"http://membled.com/work/apps/xmltv/"; /> > >Is that semicolon intentional? No, it must have crept into the message by accident (it's not in the DTD file in CVS). >> <!ELEMENT tv (about?, (channel+, timeslot*)?)> > >xmlns? Again I am not au fait with XML namespaces, I have just ignored them until now. Can you recommend an introductory document I should read? >>However it makes things like diffing easier if you write the >>channel elements sorted by ASCII order of their ids. > >Ignoring the prepended "C", I presume. I don't think it matters - if we assume that channel ids are all lowercase like DNS names, then ASCII order with 'C' before digits is the the same as without it! Hmm, maybe it would be simpler if 'C' were required for _all_ channel ids. But uglier, in a way. >> <!ELEMENT channel (old-id*, display-name+, number*, icon*, link*)> >> <!ELEMENT old-id EMPTY> > >Do you really need to allow multiple 'old-id's, could be an attrbiute to >'channel' else. I originally had an <id> element which could appear one or more times, and no 'id' attribute in <channel>. Then I realized this made the file format too bulky for the common case where channels have just one id, so I went back to an 'id' attribute (as in the old XMLTV format) and created this optional <old-id> thing. 'Exactly one' id is okay, but 'one or two' seems a bit wrong - it should be 'any number'. So an element is needed because it can be repeated. In a way it's good that the old-ids are ugly, because they are meant to be a transitional measure while you update all your data sources to use the new channel id. >> <!ATTLIST icon src CDATA #REQUIRED >> width CDATA #IMPLIED >> height CDATA #IMPLIED> > >'type' for MIME-Type? If a UA doesn't know the type, it doesn't have >to try to open the ressource. <img> in HTML does not have MIME type, but I see looking at the spec that <object> does. (This seems mistaken to me - a URL could have different MIME types because of content negotiation. But I suppose it does save some time in downloading.) What then is the rationale for including 'type' with <object> but not with <img>? Which of the two should I take as an example? Since the icon is definitely an image, and since in practice it will always be GIF or PNG or some other common format all browsers can render, I don't think 'type' needs to be added. But if <img> in HTML changes then I will update <icon> accordingly. >>In HTML links have a URL ('href') and associated link text. Our links >>are a bit more general because listings sources don't always have link >>text, or they may have both some short link text and some extra >>description. So the 'text' element gives link text to be underlined >>in a browser, and 'desc' gives a longer description of what you might >>find on the visited page. > >Could have been an attribute, like HTML's 'title', though. An attribute isn't really appropriate because the text tends to be fairly long (longer than the underlined link text itself), and more importantly should be labelled with a language like the other human-readable text in XMLTV. I am trying to use element content for free text and attributes only for machine-readable things or values from a limited range. >>'encryption' is an application-defined value, > >Will porbably get the name of the scrambling system used? I'm not sure yet. - No, it won't be the scrambling system because that applies to a whole channel and the 'encryption' attribute is meant to indicate encryption for a particular timeslot (like pay-per-view sporting events). The assumption is that you can receive the channel normally but certain programmes might be encrypted more strongly. >How's 'encryption=3D""' to be handled? Well like I said it's an application-defined value :-). I suppose this would not make much sense. Maybe there is a need to indicate 'less encryption' than normal, as when a pay-TV channel broadcasts in the clear for a couple of days. Maybe encryption=3D"none" could be used for that? >> The 'code-time' element is to support systems like PDC > >A.k.a. VPS, I guess. I think they are different systems, but I have added a mention of VPS to the DTD. >> The 'code-num' element is for systems like VideoPlus > >A.k.a. ShowView, I guess. Yes, or VCRPlus. OK, I will add these synonyms. >Too sad it's not (conveniently) possible to insert ad breaks, >although of course no current programme listing provides that data. There could be an <advert> element that can nestle in alongside <programme> and <unknown>, but as you say no source provides the data. (And you'd need some way to split a programme into parts, we have episode-details/part but what if that is already set?) Since no real TV listing includes this information, I won't add it to XMLTV for now. I have thought about a <length> element which gives the true length of the programme (so length + adverts =3D=3D timeslot), or alternatively a <breaks> which states how much of the timeslot is wasted. This was in the older version of the DTD but I don't know if any real listings source gives the information. >['repeat' / 'premiere' / 'new-show'] > >'pilot' would obviously indicate the very first episode (or episode >zero) of a show, regardless if it has been shown on this channel or >elsewhere. >'season-pilot' and 'season-finale' are similarily obvious. >You may think this was a duplication / shorthand for what is >possible with 'episode-details', but the airing definition of >"pilot" and "finale" can differ from the filming definition. Isn't some of this covered by 'repeat'? So do you mean that a listings guide might mark a show as 'the pilot episode', even though it has already been shown many times before, and wasn't actually the pilot episode for the production company? >To help identify the menaing of 'repeat' and 'premiere', how about an >attribute to specify whether a programme was produced > > a) by the channel itself or exclusively by a paid production company, > b) by the channel in cooperation with some other channel(s), > c) by a foreign channel, then bought, > d) for cinema? Are there any listings sources with this information? If so, I might add it, but I don't see how it helps with 'repeat' and 'premiere'. >And maybe a boolean 'syndication' attribute. So if the show is being broadcast by a single national channel, syndication=3D"no", but if it's sold to a large number of local stations, syndication=3D"yes"? >I'm not sure that boolean attributes are better than a space or comma >separated list of values of *one* attribute. > ><!ATTLIST programme > repeat (yes | no) #IMPLIED > premiere (yes | no) #IMPLIED > last-chance (yes | no) #IMPLIED > new-show (yes | no) #IMPLIED > syndication (yes | no) #IMPLIED > pilot (yes | no) #IMPLIED > season-pilot (yes | no) #IMPLIED > season-finale (yes | no) #IMPLIED > produced (channel | coop | foreign | cinema) #IMPLIED >> > >or > ><!ATTLIST programme > repeat (premiere || repeat || syndication || last-chance = || >new-show) #IMPLIED > pilot (test | show | season | no) #IMPLIED > finale (show | season | no) #IMPLIED > produced (channel | coop | foreign | cinema | unknown) >#IMPLIED Yes that makes more sense. I haven't seen that || before, does it mean a comma-separated list of zero or more of the values given? (None of the DTD tutorials I looked at mentioned it.) I will change some of this stuff, but not immediately. >> <!ELEMENT director (#PCDATA)> >All these and others, like 'actor', should have an optional 'href' >attribute, e.g. to IMDb. Makes sense. But this seems inconsistent with the <link> element used elsewhere, which I created partly to allow multiple links for a programme. Ah, what the heck. I'll just put in an 'href' attribute and require at most one link for each person - it's not as if any listings source will give several (although they do sometimes give several links related to a programme). >Or should one use XLink? I think I'll pass on that, href is simple and good enough. XHTML doesn't use XLink does it? >> <!ELEMENT role (character?, actor)> > >Better make actor optional also, e.g. in cartoons you may have the >characters, but not the people who speak them. I don't know of any listings source that gives a formal listing of characters in cartoons - but anyway, it is a simple change so okay. I have moved the 'guest' attribute to <actor> (unless there is a need to track 'guest characters' as well). >> <!ATTLIST role guest (yes | no) #IMPLIED> > >Extending this to allow 'guest-star', 'guest', 'extra', 'cameo' >would probably be overkill, no? If there is a real listings source that provides the information then I will add it. >> <!ELEMENT year EMPTY> >> <!ATTLIST year production CDATA #IMPLIED >> release CDATA #IMPLIED> >> <!ELEMENT production (#PCDATA)> >> <!ELEMENT release (#PCDATA)> > >Decide between attributes and elements. Oops - those two elements were just leftovers. >> <!ELEMENT category (text | category-code)> > >Is 'code' already taken elsewhere? No but we have <language-code> as well, it seems unfair to allow just one of them to take <code>. >>FIXME do we need 'length'? >Yes. Makes it easier to see whether the network edited a certain >airing to surpass youth protection regulations. Ah but <length>, as currently present in the older format, specifies the length of the programme as broadcast. Which fits in with the rest of the format - we describe what is shown, not what could have been shown or what was cut out. However, if a listings source provides info on 'edited for language' or whatever then there could be some meta-information for that. >><!ELEMENT series (title*)> > >I'm not an native English speaker, but isn't "season" less >ambiguous? In a trousers-and-underpants kind of way, yes. I'll change it. ><!ATTLIST quality resolution-x CDATA #IMPLIED > resolution-y CDATA #IMPLIED > mode (progressive | interleaved | p | i) #IMPLIED > bit-rate CDATA #IMPLIED -- etc. --> Do there exist channels where things like interlace vary from one timeslot to another? Do any listings sources carry this per-timeslot information? If not, I'd prefer to keep the DTD simpler and just allow the single string for 'quality'. >It's "(no-video | video?)" in 'programme', are you sure there'll >never be a way to allow video streams of different quality at the >same time? E.g., when a channel is aired both, digital and analog, >the analog signal may be 4:3 & stereo, but the digital one 16:9 & >Dolby. But they are not alternatives in the same way as different audio streams, because your receiver can probably only view one or the other, and they both have the same content except that one is at a worse quality. The intention was, like <encryption>, to mark out timeslots that differ in quality from what is 'usually' on that channel. Like a TV listing that occasionally marks programmes as 'widescreen' or 'black-and-white'. Do you think it is needed to mark the video settings of everything broadcast on the channel? If you want data on every timeslot, and not just those which are 'different', then I'm inclined to say that the 4:3 and 16:9 versions should be two separate channels. >> <!ELEMENT audio (language*)> >> <!ATTLIST audio channel CDATA #IMPLIED >> polyphony (mono | stereo | quad | surround) #IMPLIED> > >What about movies with (optional) extra descriptions for the blind? Is this an extra audio track that is mixed with the existing audio? Or is it a separate track you can choose that has the ordinary soundtrack and then some extra? I suppose it doesn't matter, we can just model it as the latter case. Then I suppose an 'extra-for-blind' attribute or element could be added under <audio>. But do you know of listings sources that provide this information? > <audio channel=3D"A"> > <language><language-code code=3D"en-GB"/></language> > </audio> > <audio channel=3D"A"> > <language><language-name xml:lang=3D"en">English audio descriptions > </language-name></language> > </audio> > >This way? No I think the language-name would be English but there would be some separate way to say 'includes extra descriptions'. (And the two channel attributes would have to be different, or else just omit them.) >Does 'teletext' include closed captioning (CC), used e.g. in the US? I understand that 'open captioning' is superimposed on the picture while 'closed captioning' is text you can choose to view. So yes, and I will document this in the DTD. ><!ATTLIST classification system CDATA #IMPLIED > age CDATA #IMPLIED> Probably not worth it since you can see that a classification 'PG12' means 12 years of age. ><!ELEMENT star-rating (todd?)> ><!ELEMENT todd EMPTY -- tip of the day --> Not sure about this, it seems wrong somehow because exactly the same content might be 'tip of the day' once, but when shown at a different time not be so tipped. I'd prefer to add a <desc type=3D"review" /> (if we add types to <desc>) saying 'this is the tip of the day'. Otherwise where does it end - name of reviewer, which publication? --=20 Ed Avis <ed...@me...> |