pts, was Re: [xine-devel] FLAC demuxer/decoder

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, 16 Jan 2003, John McCutchan wrote:

> Yes I think that is correct, I don't really know anything about xine
> so I am not sure how to calculate the pts. This same code appeared in
> many demuxers so I copied it. What exactly does a pts represent?

	pts stands for presentation timestamp. In MPEG jargon, the pts is
the time that a piece of decoded data (either a video or audio buffer) is
supposed to be presented to the output unit (monitor or speakers).

	xine handles pts in the MPEG tradition- In reference to a 90 kHz
clock. For example:

  pts      0 is 0 sec
  pts  90000 is 1 sec
  pts 135000 is 1.5 sec
  pts 180000 is 2 sec

Let's take some video examples. Suppose that the video in a file is
supposed to be played at 15 frames/second. What pts should frame 0 be
displayed? 0, that's the easy one. How about frame 1?

   1       pts
  ---  =  -----  =>  pts = 1 * 90000 / 15 = 6000
   15     90000

Frame 2 is presented at pts 12000, 3 @ 18000 and so on. Some files have
variable framerates. Frame n will be presented at (n * 90000 / 15) for
this 15 fps file.

	For audio you will be sending a buffer of PCM samples. Suppose the
audio is mono, 8 bits, 44100 Hz. 1 second of audio will be 44100 bytes
long. When dispatched in reference to the 90 kHz clock, these 44100 bytes
correspond to 90000 pts units. So if you have a stream of PCM data with
the properties stated above:

  # of bytes      pts
  ----------  =  -----
     44100       90000

But the formula does not always hold. If the same data is stereo, there
will be twice as much data (1 8-bit sample/channel) in a 1 second:

   # of bytes
   ----------
       2            pts
  ------------  =  -----
     44100         90000

Further, if the audio was 16 bits instead of 8 bits, an individual audio
frame with be 4 bytes instead of 2:

   # of bytes
   ----------
       4            pts
  ------------  =  -----
     44100         90000

Wait, what's an audio frame? Well, this audio has a sample rate of 44100
samples/second. An audio frame is the amount of data that is being sent
out at each of those 44100 sample points:

  properties:       frame size:
  8 bits, mono    = 1 byte
  8 bits, stereo  = 2 bytes
  16 bits, mono   = 2 bytes
  16 bits, stereo = 4 bytes

BTW, you may have spotted that in the audio decoder you have to specify
the number of frames in the audio buffer where you are putting the decoded
PCM for dispatch. The number of frames will be the total number of bytes
divided by the frame size. 

So now are general audio pts calculation becomes:

   # of bytes
   ----------
   frame size       pts                90000 * # of bytes
  ------------  =  -----  =>  pts = ------------------------
   sample rate     90000            frame size * sample rate

All of that covers decoded PCM audio. With VBR audio, either all
variable-length chunks are going to decode to the same number of PCM
samples or there is going to be some way to determine the variable size of
the decoded audio from an individual block. Either way, you will likely
keep a running tally of total decoded PCM bytes in either the demuxer or
the decoder and run this calculation before dispatching the final audio
buffer.

	That was a long explanation. But now that I have typed it out it
should probably go in the Hacker's Guide. Go on, ask another question...:)

	Hope this helps...
--
	-Mike Melanson