From: Ashendra S. <ash...@sp...> - 2004-06-29 10:43:42
|
Hi, I was wondering if libmpeg2 was threaded at all in order to take advantage of smp machines for example. From what i've been able to find out it seems that the library itself is unthreaded. This might make a difference in the MC parts of decoding high resolution mpeg2. Any ideas or opinions? -- Ashendra Singh |
From: Ashendra S. <ash...@sp...> - 2004-07-08 14:31:38
|
Hi, I've found an old libmpeg2 patch by patric ljung and gernot ziegler. They've managed to split a mpeg2 frame into variable sized horizontal strips and then creating multiple threads to decode each strip (the slice_region function in decode.c). Would it be possible to apply a similar scheme to libmpeg2 as it appears in xinelib? I've already explored the possiblility of "upgrading" libmpeg2 within xine with a newer version but this is a serious undertaking (and beyond my abilities). Thanks, Ashendra |
From: Barry S. <bar...@on...> - 2004-07-29 10:50:53
|
What situation will threading libmpeg2 help the performance of? If I have a single CPU will this just drop perfromance? Barry > -----Original Message----- > From: xin...@li... > [mailto:xin...@li...]On Behalf Of Ashendra > Singh > Sent: 08 July 2004 15:32 > To: Xine List > Subject: [xine-devel] Threading libmpeg2 > > > Hi, > > I've found an old libmpeg2 patch by patric ljung and gernot > ziegler. They've > managed to split a mpeg2 frame into variable sized horizontal > strips and then > creating multiple threads to decode each strip (the slice_region > function in > decode.c). Would it be possible to apply a similar scheme to > libmpeg2 as it > appears in xinelib? I've already explored the possiblility of "upgrading" > libmpeg2 within xine with a newer version but this is a serious > undertaking > (and beyond my abilities). > > Thanks, > Ashendra > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - > digital self defense, top technical experts, no vendor pitches, > unmatched networking opportunities. Visit www.blackhat.com > _______________________________________________ > xine-devel mailing list > xin...@li... > https://lists.sourceforge.net/lists/listinfo/xine-devel > > |
From: Michael R. <mr...@us...> - 2004-07-29 11:01:23
|
Hi Barry, > What situation will threading libmpeg2 help the performance of? It might help on a shared memory multiprocessor machine. > If I have a single CPU will this just drop perfromance? The multiprocessor version will definitely have a performance penalty on single CPU machines, but this can be reduced to maybe one additional comparison per frame (which would choose whether to use the uniprocessor or multiprocessor version of the decoding function), so the impact would be very low. Michael -- panic("sun_82072_fd_inb: How did I get here?"); 2.2.16 /usr/src/linux/include/asm-sparc/floppy.h |
From: Barry S. <bar...@on...> - 2004-07-30 12:16:13
|
> -----Original Message----- > From: xin...@li... > [mailto:xin...@li...]On Behalf Of Michael > Roitzsch > Sent: 29 July 2004 12:01 > To: xin...@li... > Cc: Barry Scott > Subject: Re: [xine-devel] Threading libmpeg2 > > > Hi Barry, > > > What situation will threading libmpeg2 help the performance of? > > It might help on a shared memory multiprocessor machine. > > > If I have a single CPU will this just drop perfromance? > > The multiprocessor version will definitely have a performance penalty on > single CPU machines, but this can be reduced to maybe one additional > comparison per frame (which would choose whether to use the > uniprocessor or > multiprocessor version of the decoding function), so the impact > would be very > low. I use xine on EPIA single CPU boards and would hate to see the CPU cost of xine increase. I want the cost to go down not up! The added complexity and performance reduction does not sound good to me. Barry |
From: James S. <jst...@us...> - 2004-07-30 12:57:29
|
Hi Barry, On Fri, 30 Jul 2004 13:16:05 +0100, "Barry Scott" <bar...@on...> said: > > The multiprocessor version will definitely have a performance penalty on > > single CPU machines, but this can be reduced to maybe one additional > > comparison per frame [...] > I use xine on EPIA single CPU boards and would hate to see the CPU cost > of xine increase. A single extra comparison per frame will not make the slightest bit of difference. James. |
From: Alien999999999 <ali...@us...> - 2004-07-30 14:11:44
|
Op vrijdag 30 juli 2004 14:57, schreef James Stembridge: > Hi Barry, > > On Fri, 30 Jul 2004 13:16:05 +0100, "Barry Scott" > > <bar...@on...> said: > > > The multiprocessor version will definitely have a performance penalty > > > on single CPU machines, but this can be reduced to maybe one additional > > > comparison per frame > > [...] > > > I use xine on EPIA single CPU boards and would hate to see the CPU cost > > of xine increase. > > A single extra comparison per frame will not make the slightest bit of > difference. how about making it possible not to have the added complexity and extra comparison per frame, by checking, or arg_enable in configure.ac ? -- Alien is my name and head-biting is my game |
From: Ashendra S. <ash...@sp...> - 2004-08-06 16:41:21
|
OK, I seem to be getting somewhere with this, however, within mpeg2_slice, I'm copying the buffer contents to each of my threads and this is seriously slowing things down, I'm using a memcpy (which I know is horribly slow). So, anyone have a suggestion on a fast way to move this buffer to my decoding threads?? Thanks, Ashendra > Hi, > > > Speaking of which, I've managed to get 2 threads running, I'm copying all > > data structures within picture_t including the buffer passed to > > mpeg2_slice. the only items shared by the threads are the reference > > frames and the current frame. > > Sounds good. > > > I still cannot get smooth decoding however, the > > motion comp and idct methods occasionally fail, which tells me that there > > is still something that is unprotected and being accessed by both threads > > simultaneously. > > > > I've got the parse_chunk loop checking if a complete frame has been > > decoded before drawing it and i'm using the vidixfb output plugin, so > > proc_slice in NEXT_MACROBLOCK shouldn't matter. > > > > > > any ideas on what this mysterious trouble making structure maybe ? > > If you are looking for a race, you might want to try helgrind. This is a > valgrind skin (run "valgrind --skin=helgrind --trace-children=yes xine") > that is supposed to find data race conditions. I have never used it, so I > don't know if it actually works or how much surrounding noise it produces. > > Michael |
From: Michael R. <mr...@us...> - 2004-08-07 14:25:45
|
Hi, > OK, I seem to be getting somewhere with this, however, within mpeg2_slice, > I'm copying the buffer contents to each of my threads and this is seriously > slowing things down, I'm using a memcpy (which I know is horribly slow). Why do you need to copy this? Is it not accessed read only? Shouldn't it be enough to copy the picture_t structure int mpeg2_slice()? Michael -- printk(KERN_ERR "msp3400: chip reset failed, penguin on i2c bus?\n"); 2.2.16 /usr/src/linux/drivers/char/msp3400.c |
From: Ashendra S. <ash...@sp...> - 2004-08-10 11:38:29
|
Yes it is, but it is modified outside slice.c (in decode.c), so if slice.c is supposed to asyncronously decode slices, it needs a saved copy of the buffer, right? > Hi, > > > OK, I seem to be getting somewhere with this, however, within > > mpeg2_slice, I'm copying the buffer contents to each of my threads and > > this is seriously slowing things down, I'm using a memcpy (which I know > > is horribly slow). > > Why do you need to copy this? Is it not accessed read only? Shouldn't it be > enough to copy the picture_t structure int mpeg2_slice()? > > Michael |
From: Michael R. <mr...@us...> - 2004-08-11 12:25:27
|
Hi, > Yes it is, but it is modified outside slice.c (in decode.c), so if slice.c > is supposed to asyncronously decode slices, it needs a saved copy of the > buffer, right? I see. Just to summarize, I don't know your approach, but this is what I would do: * at init of libmpeg2, spawn a desired number (>0) of worker threads and allocate the same number of bitstream buffers (with the ability to mark bitstream buffers as occupied) * the worker threads all listen (non-busy wait with pthread_cond_wait) on a job queue * the usual decoding and bitstream parsing of libmpeg2 is all done in the main thread * calls to mpeg2_slice() (only one fortunately) are not executed, instead a job is put onto the job queue for one of the workers to execute * the associated bitstream buffer is marked as occupied and a new one is selected as the current one * the worker receiving the job would simply call mpeg2_slice() now * when a worker finishes, it increments the counter of decoded slices and unoccupies the bitstream buffer * when the main thread wants to finish up a frame, it waits (pthread_cond_wait) for the workers to finish all slices Michael -- /* Fuck me gently with a chainsaw... */ 2.0.38 /usr/src/linux/arch/sparc/kernel/ptrace.c |
From: Ashendra S. <ash...@sp...> - 2004-08-11 15:14:16
|
I've been trying to get the multithreading working within slice.c, leaving the parsing stuff of decode.c almost untouched. Your approach seems to make more sense though. If I were trying your approach, would i need to fill up the multiple bitstream buffers within copy_chunk ?, depending on which was empty, and then do bit parsing on that buffer? Or would the bit parsing happen and then the filling of a buffer (which would involve a second copy)? > Hi, > > > Yes it is, but it is modified outside slice.c (in decode.c), so if > > slice.c is supposed to asyncronously decode slices, it needs a saved copy > > of the buffer, right? > > I see. > > Just to summarize, I don't know your approach, but this is what I would do: > * at init of libmpeg2, spawn a desired number (>0) of worker threads and > allocate the same number of bitstream buffers (with the ability to mark > bitstream buffers as occupied) > * the worker threads all listen (non-busy wait with pthread_cond_wait) > on a job queue > * the usual decoding and bitstream parsing of libmpeg2 is all done in the > main thread > * calls to mpeg2_slice() (only one fortunately) are not executed, instead a > job is put onto the job queue for one of the workers to execute > * the associated bitstream buffer is marked as occupied and a new one is > selected as the current one > * the worker receiving the job would simply call mpeg2_slice() now > * when a worker finishes, it increments the counter of decoded slices and > unoccupies the bitstream buffer > * when the main thread wants to finish up a frame, it waits > (pthread_cond_wait) for the workers to finish all slices > > Michael -- Ashendra Singh Linux: the last service pack you'll ever need. |
From: Michael R. <mr...@us...> - 2004-06-29 20:23:55
|
Hi, > I was wondering if libmpeg2 was threaded at all in order to take advantage > of smp machines for example. From what i've been able to find out it seems > that the library itself is unthreaded. This might make a difference in the > MC parts of decoding high resolution mpeg2. As you found out: libmpeg2 is one thread only and I guess it would be really difficult to parallelize decoding on such a fine granularity. If the job the CPU does is mostly memory bandwidth limited (and this might be true for HD MPEG, I don't know, Michel?), you will most likely be slower with more CPUs than with one. Michael -- If Darl McBride was in charge, he'd probably make marriage unconstitutional too, since clearly it de-emphasizes the commercial nature of normal human interaction, and probably is a major impediment to the commercial growth of prostitution. - Linus Torvalds |
From: Robin K. <kom...@ge...> - 2004-06-29 21:26:17
|
Michael Roitzsch wrote: > As you found out: libmpeg2 is one thread only and I guess it would be > really difficult to parallelize decoding on such a fine granularity. It would also slow down performance on single processor systems. > If the job the CPU does is mostly memory bandwidth limited (and this > might be true for HD MPEG, I don't know, Michel?), you will most > likely be slower with more CPUs than with one. You assume that all the processors share a single memory controller? -- Wishing you good fortune, --Robin Kay-- (komadori) |
From: Michael R. <mr...@us...> - 2004-06-30 16:09:29
|
Hi, > > As you found out: libmpeg2 is one thread only and I guess it would be > > really difficult to parallelize decoding on such a fine granularity. > > It would also slow down performance on single processor systems. > > > If the job the CPU does is mostly memory bandwidth limited (and this > > might be true for HD MPEG, I don't know, Michel?), you will most > > likely be slower with more CPUs than with one. > > You assume that all the processors share a single memory controller? Yes, because the question was about SMP systems. These are usually UMA systems. Michael -- /* * Hash table gook.. */ 2.4.0-test2 /usr/src/linux/fs/buffer.c |
From: Ashendra S. <ash...@sp...> - 2004-06-30 11:58:58
|
Hello, I found a paper on just this topic (Real-time parallel MPEG-2 Decoding in Software, Bilas) that explores multiprocess decoding. According to it there are substantial gains to be had in parallelisation of decoding. They analysed GOP based and slice based decompostion and found that block based has more to gain than slice based but is harder on memory and not so friendly towards stream seeking. Tests were done a shared mem smp system which may still be relevant to x86 smp machines. Bye, ps. Paper can be found at www.ics.forth.gr/~bilas/papers.html > Hi, > > > I was wondering if libmpeg2 was threaded at all in order to take > > advantage of smp machines for example. From what i've been able to find > > out it seems that the library itself is unthreaded. This might make a > > difference in the MC parts of decoding high resolution mpeg2. > > As you found out: libmpeg2 is one thread only and I guess it would be > really difficult to parallelize decoding on such a fine granularity. If the > job the CPU does is mostly memory bandwidth limited (and this might be true > for HD MPEG, I don't know, Michel?), you will most likely be slower with > more CPUs than with one. > > Michael |
From: Michael R. <mr...@us...> - 2004-06-30 16:38:39
|
Hi, > I found a paper on just this topic (Real-time parallel MPEG-2 Decoding in > Software, Bilas) that explores multiprocess decoding. According to it > there are substantial gains to be had in parallelisation of decoding. They > analysed GOP based and slice based decompostion and found that block based > has more to gain than slice based but is harder on memory and not so > friendly towards stream seeking. Tests were done a shared mem smp system > which may still be relevant to x86 smp machines. > > ps. Paper can be found at www.ics.forth.gr/~bilas/papers.html Interesting paper. Quite impressive results. Thanks a lot for the pointer. Now: Anybody who wants to implement this? ;) Michael -- "If you want to travel around the world and be invited to speak at a lot of different places, just write a Unix operating system." -Linus Torvalds |
From: James Courtier-D. <Ja...@su...> - 2004-07-09 12:23:54
|
Ashendra Singh wrote: > Hi, > > I was wondering if libmpeg2 was threaded at all in order to take advantage of > smp machines for example. From what i've been able to find out it seems that > the library itself is unthreaded. This might make a difference in the MC > parts of decoding high resolution mpeg2. > > Any ideas or opinions? A lot of modern graphics cards support XvMC, so this moves the problem of doing the really hard work away from the CPU. The most CPU intensive part in the decoding process is transferring the final image to the graphics card. XSHM is the slowest, because YUV -> RGB conversion/upscaling has to be done first. XV is faster, because the hardware does the YUV -> RGB conversion/upscaling. XvMC is the fastest, because the Mpeg2 data is passed to the graphics card in compressed form, and very little work is required by the software decoder, apart from unpacking the stream. If you do wish to implement a threaded mpeg2 decoder, I think it would be better to exclude the treading code from libmpeg2, and get the app to do the threading. e.g. libmpeg2 takes in the data stream, then outputs each slice(before decode). the app then decides what to do with each slice. In this way, you can separate the unpacking of the stream, which is really a single threaded task, from the slice decode, that might be handled with multiple threads, so long as all the threads are on a SMP based machine. If there is not shared memory between processors, threading my actually make things slower. James |
From: Ashendra S. <ash...@sp...> - 2004-07-09 13:19:37
|
Hi, I was trying to get libmpeg2 to decode only slices it was passed, so the app (xine here) would then accumulate the decoded slices and reassemble them into a complete frame. I'm trying to get xine to simultaneously run multiple instances of libmpeg2 each of which decodes separate slices in parallel. From libmpeg2's point of view, it would not be changed in any way. I was wondering now what would be the most efficient way of reassembling the decoded slices into complete frames. From what i can tell, xine needs all the pts info in each frame inorder to display anything, what would be the best way of combining this info into the reassembled frame? I can't use XVmC since i only have an ati card (radeon 9800), but i am using vidixfb as my output driver, which is supposed to be really fast with display, right? Thanks, > Ashendra Singh wrote: > > Hi, > > > > I was wondering if libmpeg2 was threaded at all in order to take > > advantage of smp machines for example. From what i've been able to find > > out it seems that the library itself is unthreaded. This might make a > > difference in the MC parts of decoding high resolution mpeg2. > > > > Any ideas or opinions? > > A lot of modern graphics cards support XvMC, so this moves the problem > of doing the really hard work away from the CPU. > > The most CPU intensive part in the decoding process is transferring the > final image to the graphics card. > XSHM is the slowest, because YUV -> RGB conversion/upscaling has to be > done first. > XV is faster, because the hardware does the YUV -> RGB > conversion/upscaling. XvMC is the fastest, because the Mpeg2 data is passed > to the graphics card in compressed form, and very little work is required > by the > software decoder, apart from unpacking the stream. > > If you do wish to implement a threaded mpeg2 decoder, I think it would > be better to exclude the treading code from libmpeg2, and get the app to > do the threading. > e.g. libmpeg2 takes in the data stream, then outputs each slice(before > decode). the app then decides what to do with each slice. > In this way, you can separate the unpacking of the stream, which is > really a single threaded task, from the slice decode, that might be > handled with multiple threads, so long as all the threads are on a SMP > based machine. If there is not shared memory between processors, > threading my actually make things slower. > > James -- Ashendra |
From: Michael R. <mr...@us...> - 2004-07-09 14:16:13
|
Hi, > I was trying to get libmpeg2 to decode only slices it was passed, so the > app (xine here) would then accumulate the decoded slices and reassemble > them into a complete frame. I'm trying to get xine to simultaneously run > multiple instances of libmpeg2 each of which decodes separate slices in > parallel. From libmpeg2's point of view, it would not be changed in any > way. I was wondering now what would be the most efficient way of > reassembling the decoded slices into complete frames. I think you could simply have one target xine frame where all threads are writing their decoded slices to. If MPEG2 slices do not overlap (not sure about that, but I think this is true) there should be no race conditions. Then you could add a lock-protected counter which counts the amount of slices that finished decoding. Everyone who increments this counter would check, if all slices are done and, if so, draws the frame. Michael -- printk(KERN_WARNING "Multi-volume CD somehow got mounted.\n"); 2.2.16 /usr/src/linux/fs/isofs/inode.c |
From: Ashendra S. <ash...@sp...> - 2004-07-09 15:12:31
|
This would mean i would have to divide up the slices within decode.c and spawn threads there so that they could share a common xine frame to draw to, or is this achievable at a higher level in the decode chain? I think i've managed to get libmpeg2 to selectively decode the slices i'm interested in but am unable to get xine the display even the incomplete frames because of incomplete timecode info. picture->current_frame and picture->backward_frame_reference_frame are the same with picture->forward_reference_frame=0. I'm using an almost unmodified version of parse_chunk and it expects to recieve complete frames, do i maybe have to rearrange the entire structure of decode.c or is there any easier way to do this that i'm not thinking of? thanks, > Hi, > > > I was trying to get libmpeg2 to decode only slices it was passed, so the > > app (xine here) would then accumulate the decoded slices and reassemble > > them into a complete frame. I'm trying to get xine to simultaneously run > > multiple instances of libmpeg2 each of which decodes separate slices in > > parallel. From libmpeg2's point of view, it would not be changed in any > > way. I was wondering now what would be the most efficient way of > > reassembling the decoded slices into complete frames. > > I think you could simply have one target xine frame where all threads are > writing their decoded slices to. If MPEG2 slices do not overlap (not sure > about that, but I think this is true) there should be no race conditions. > Then you could add a lock-protected counter which counts the amount of > slices that finished decoding. Everyone who increments this counter would > check, if all slices are done and, if so, draws the frame. > > Michael -- Ashendra |
From: Michael R. <mr...@us...> - 2004-07-09 15:48:51
|
Hi, > This would mean i would have to divide up the slices within decode.c and > spawn threads there so that they could share a common xine frame to draw > to, or is this achievable at a higher level in the decode chain? Well, the threads you launch have to get a memory chunk from somewhere, where they write the decoded slices to. What difference does it make, from the single thread's POV, if these chunks are inside a xine frame? > I think i've managed to get libmpeg2 to selectively decode the slices i'm > interested in but am unable to get xine the display even the incomplete > frames because of incomplete timecode info. picture->current_frame and > picture->backward_frame_reference_frame are the same with > picture->forward_reference_frame=0. I'm using an almost unmodified version > of parse_chunk and it expects to recieve complete frames, do i maybe have > to rearrange the entire structure of decode.c or is there any easier way to > do this that i'm not thinking of? I am not sure how the timecode can be a problem here. The whole parsing of the headers has to be done in one single thread of course. I would simply look for the loop in libmpeg2 that iterates over the slices of one image (line 1634 in slice.c, maybe?) and try to parallelize this in an OpenMP-like way. If you have an OpenMP-capable compiler (the free icc for Linux for example), you can even do that with actual OpenMP. Michael -- panic("kmem_cache_init(): Offsets are wrong - I've been messed with!"); 2.2.16 /usr/src/linux/mm/slab.c |
From: Ashendra S. <ash...@sp...> - 2004-07-26 17:17:46
|
Hi again, I've been playing around with this for a while now. A single thread manages to decode slices OK. I'm spawing the thread (malloc'ing a new structure that contains a picture_t) and pointing the current, b'wards and f'wards frame to the one passed from decode.c. This seems to work ok, I'm doing some passive frame syncing ie. I'm not explicitly checking for completed frames before drawing them, so I occasionally get a partially complete frame drawn. The problem i'm having is that as soon as I spawn a second decoding thread (that shares the same curr, f'wards and b'wards frame as the 1st) all hell breaks loose. Do i need to explicitly protect the frames from the 2 threads, and what would be the best way of doing this? Could I use mutexes every time picture->current_frame is accessed within slice.c only, is it safe to just protect current_frame? Right now i'm ensuring that the 2 decoding threads are always on different slices from the same frame (i've broken down mpeg2_slice() and threaded from that point). Thanks for any help, Ashendra > Hi, > > > This would mean i would have to divide up the slices within decode.c and > > spawn threads there so that they could share a common xine frame to draw > > to, or is this achievable at a higher level in the decode chain? > > Well, the threads you launch have to get a memory chunk from somewhere, > where they write the decoded slices to. What difference does it make, from > the single thread's POV, if these chunks are inside a xine frame? > > > I think i've managed to get libmpeg2 to selectively decode the slices i'm > > interested in but am unable to get xine the display even the incomplete > > frames because of incomplete timecode info. picture->current_frame and > > picture->backward_frame_reference_frame are the same with > > picture->forward_reference_frame=0. I'm using an almost unmodified > > version of parse_chunk and it expects to recieve complete frames, do i > > maybe have to rearrange the entire structure of decode.c or is there any > > easier way to do this that i'm not thinking of? > > I am not sure how the timecode can be a problem here. The whole parsing of > the headers has to be done in one single thread of course. > > I would simply look for the loop in libmpeg2 that iterates over the slices > of one image (line 1634 in slice.c, maybe?) and try to parallelize this in > an OpenMP-like way. If you have an OpenMP-capable compiler (the free icc > for Linux for example), you can even do that with actual OpenMP. > > Michael |
From: Stephen t. <st...@to...> - 2004-07-26 17:38:13
|
On Mon, 2004-07-26 at 13:16, Ashendra Singh wrote: > Hi again, > > I've been playing around with this for a while now. A single thread manages > to decode slices OK. I'm spawing the thread (malloc'ing a new structure that > contains a picture_t) and pointing the current, b'wards and f'wards frame to > the one passed from decode.c. This seems to work ok, I'm doing some passive > frame syncing ie. I'm not explicitly checking for completed frames before > drawing them, so I occasionally get a partially complete frame drawn. > > The problem i'm having is that as soon as I spawn a second decoding thread > (that shares the same curr, f'wards and b'wards frame as the 1st) all hell > breaks loose. Do i need to explicitly protect the frames from the 2 threads, > and what would be the best way of doing this? Could I use mutexes every time > picture->current_frame is accessed within slice.c only, is it safe to just > protect current_frame? > > Right now i'm ensuring that the 2 decoding threads are always on different > slices from the same frame (i've broken down mpeg2_slice() and threaded from > that point). Well you have a single-threaded library which can access any of the data structures without a problem. You then added a new thread but did not protect the data structures. So you have a race condition, or something similar, where the two threads both attempt to take a frame and handle it. Adding mutexes to protect the data structures can eliminate the race condition but can lead to deadlock depending on how many mutexes there are and what order their acquired. A byproduct of mutexes is that they cause delay in processing. Here is now where my ignorance comes shining through. How many slices are there in a frame? What is the data structure like for a frame? What I was wondering if it would be possible to tell each thread which index positions it was allowed to touch in the frame if the frame was stored as a array of slices. Personally I think this is getting too complicated. The moment you add more than 1 thread to handle a frame you have to deal with the issue of syncronizing the threads. You have to be sure that they both finish processing a frame before beginning the next one. That is implementing a thread barrier where each thread waits until all the required threads of the barrier are there. When all threads are there the threads are released. Stpehen -- Email: st...@to... |
From: Ashendra S. <ash...@sp...> - 2004-07-26 17:55:12
|
Hi, A slice is a row of macroblocks. Accepted mpeg2 profiles say that every macroblock in a horizontal row must be in a slice, I also think that slices are limited to 1 row of macroblocks ie they wont span multiple rows. A macroblock is 16 pixels high. Check video_out.* for info on the frame formats. > On Mon, 2004-07-26 at 13:16, Ashendra Singh wrote: > > Hi again, > > > > I've been playing around with this for a while now. A single thread > > manages to decode slices OK. I'm spawing the thread (malloc'ing a new > > structure that contains a picture_t) and pointing the current, b'wards > > and f'wards frame to the one passed from decode.c. This seems to work ok, > > I'm doing some passive frame syncing ie. I'm not explicitly checking for > > completed frames before drawing them, so I occasionally get a partially > > complete frame drawn. > > > > The problem i'm having is that as soon as I spawn a second decoding > > thread (that shares the same curr, f'wards and b'wards frame as the 1st) > > all hell breaks loose. Do i need to explicitly protect the frames from > > the 2 threads, and what would be the best way of doing this? Could I use > > mutexes every time picture->current_frame is accessed within slice.c > > only, is it safe to just protect current_frame? > > > > Right now i'm ensuring that the 2 decoding threads are always on > > different slices from the same frame (i've broken down mpeg2_slice() and > > threaded from that point). > > Well you have a single-threaded library which can access any of the data > structures without a problem. You then added a new thread but did not > protect the data structures. So you have a race condition, or something > similar, where the two threads both attempt to take a frame and handle > it. Adding mutexes to protect the data structures can eliminate the race > condition but can lead to deadlock depending on how many mutexes there > are and what order their acquired. A byproduct of mutexes is that they > cause delay in processing. > > Here is now where my ignorance comes shining through. How many slices > are there in a frame? What is the data structure like for a frame? What > I was wondering if it would be possible to tell each thread which index > positions it was allowed to touch in the frame if the frame was stored > as a array of slices. > > Personally I think this is getting too complicated. The moment you add > more than 1 thread to handle a frame you have to deal with the issue of > syncronizing the threads. You have to be sure that they both finish > processing a frame before beginning the next one. That is implementing a > thread barrier where each thread waits until all the required threads of > the barrier are there. When all threads are there the threads are > released. > > Stpehen -- Ashendra Singh Linux: the last service pack you'll ever need. |