From: Matt L. <mat...@gm...> - 2006-01-12 20:48:40
|
Hi all, A new issue has come up in the design of the new video library. It has come to my attention that some input video stream may natively produce images encoded in formats not readily supported by vil (BGR, YUV 4:2:2, YUV 4:1:1, YUV 4:2:0P, etc). In addition, some output streams may require images in these formats. We could automatically convert any color image to RGB and then back, but this would involve a performance hit. As far as I know there is nothing in a vil_image_view to indicate the color encoding of the pixels. The vil_image_view also does not support plane step sizes that vary with the plane. It seems we need might need a new vidl2_frame object to manage this additional information. This could be a simple object used mainly to transmit a frame buffer and it's encoding properties from an istream to an ostream. If you want to do any image processing, the vidl2_frame would know how to deep copy (and possibly convert) itself into a standard RGB vil_image_view. The vidl2_frame would be a volatile object that is only valid until the next advance() call (unless multiple buffers are available). The user would be in charge of any memory used in deep copies and encoding conversions. We could also allow users to wrap a vil_image_view around the vidl2_frame buffer at their own risk. Any comments or other idea? Thanks, Matt Leotta |
From: Peter V. <pet...@ya...> - 2006-01-13 20:01:33
|
> It has come to my attention that some input video stream may > natively produce images encoded in formats not readily > supported by vil (BGR, YUV 4:2:2, YUV 4:1:1, YUV 4:2:0P, etc). Strictly speaking, vil images with 3 layers do not automatically assume RGB encoding. That' where vil_property is for. Hence BGR is easy to cope with. At this moment, no "colour_interpretation" property has been defined yet in vil_property.h ; this could easily be done. In order to support YUV 4:2:2, again vil_property could be used. But now the colour interpretation is less straightforward since in this case colour planes are stored in multiples of bits, not just bytes. Anyhow, at the "raw" vil_image level, nothing special has to be done, IMHO. > As far as I know there is nothing in a vil_image_view to indicate the > color encoding of the pixels. Indeed. This is good news since it makes sure that there is nowhere in vil an "automatic", "default" interpretation of, e.g., the first plane as "red". Or of 3-plane images as "colour RGB". Only at the vgui level is such an interpretation necessary. And at this level, 4:2:2 data could be passed directly to the graphics card, if it supports it. > The vil_image_view also does not support plane step sizes > that vary with the plane. Right. Hence I think 4:2:2 images should be stored as single-plane. Note that this is not too different from colour-mapped images. -- Peter. |
From: Matt L. <mat...@gm...> - 2006-01-13 20:47:33
|
On 1/13/06, Peter Vanroose <pet...@ya...> wrote: > Strictly speaking, vil images with 3 layers do not automatically assume > RGB encoding. That' where vil_property is for. Hence BGR is easy to > cope with. > At this moment, no "colour_interpretation" property has been defined > yet in vil_property.h ; this could easily be done. True, but this issue runs deeper than color interpretation. > In order to support YUV 4:2:2, again vil_property could be used. > But now the colour interpretation is less straightforward since in this > case colour planes are stored in multiples of bits, not just bytes. > Anyhow, at the "raw" vil_image level, nothing special has to be done, > IMHO. Not only are they in multiples of bits, but the multiples vary for each component, so you can't have a single planestep value to step through the components. How would you handle this in a vil_image_view? Also, for YUV 4:2:2, the bits are encoded into macro pixels that jointly store the data for 2 adjacent pixels (other formats have different size macro pixels). This makes it very difficult to iterate over the pixels. You can just step to a certain offset in the memory and return a pointer to the pixel data. > > The vil_image_view also does not support plane step sizes > > that vary with the plane. > > Right. Hence I think 4:2:2 images should be stored as single-plane. Maybe they should be, but some video decoders and cameras produce image data in this form and I need to be able to handle it. The planar YUV 4:2:0 format has a Y-plane of size ni x nj bytes followed by U and V-planes of size ni/2 x nj/2 bytes each. This is certainly not handled by vil. My plan is add a video frame representation to the new video library to handle these unusual encodings. I will add conversion functions to decode these formats into regular arrays of pixels in a vil_image_view for image processing. The video frame will not support access to individual pixels so it will not be useful for image processing without conversion. However, when you only care about efficient video transcoding, these conversions can be avoided. --Matt |
From: Amitha P. <pe...@cs...> - 2006-01-17 01:56:59
|
On Fri 13 Jan 2006, Matt Leotta wrote: > On 1/13/06, Peter Vanroose <pet...@ya...> wrote: > > Right. Hence I think 4:2:2 images should be stored as single-plane. > > Maybe they should be, but some video decoders and cameras produce > image data in this form and I need to be able to handle it. The > planar YUV 4:2:0 format has a Y-plane of size ni x nj bytes followed > by U and V-planes of size ni/2 x nj/2 bytes each. This is certainly > not handled by vil. An option is to store these images as an vil_image_view<vil_420>, where sizeof(vil_420) is 2 bytes. vgui_pixel.h has definitions for similar types, used for efficient OpenGL rendering. (This may be what Peter was suggesting.) A video source that supports such a format natively could have a current_native_image() that returns it, but a simple current_image() would return the RGB form that most people would probably want. Amitha. |
From: Ian S. <ian...@st...> - 2006-01-16 11:36:40
|
Matt Leotta wrote: > Hi all, > > A new issue has come up in the design of the new video library. It > has come to my attention that some input video stream may natively > produce images encoded in formats not readily supported by vil (BGR, > YUV 4:2:2, YUV 4:1:1, YUV 4:2:0P, etc). In addition, some output > streams may require images in these formats. We could automatically > convert any color image to RGB and then back, but this would involve a > performance hit. > > As far as I know there is nothing in a vil_image_view to indicate the > color encoding of the pixels. The vil_image_view also does not > support plane step sizes that vary with the plane. It seems we need > might need a new vidl2_frame object to manage this additional > information. This could be a simple object used mainly to transmit a > frame buffer and it's encoding properties from an istream to an > ostream. If you want to do any image processing, the vidl2_frame > would know how to deep copy (and possibly convert) itself into a > standard RGB vil_image_view. > > The vidl2_frame would be a volatile object that is only valid until > the next advance() call (unless multiple buffers are available). The > user would be in charge of any memory used in deep copies and encoding > conversions. We could also allow users to wrap a vil_image_view > around the vidl2_frame buffer at their own risk. > > Any comments or other idea? > I was aware of the 4:2:2 encoding issues at the time we designed vil. Then (and now) I couldn't see any way of having vil deal with such images transparently without compromising on the efficiency of handling less complicated images. There are plenty of non-transparent methods - but they all come down to the code that wants to use 4:2:2 images, knowing that it is dealing with 4:2:2 images. In that case you might as well write a special type in the library that needs it, for handling these complicated cases - e.g. vidl2_422_image. When you want to process a 4:2:2 image in vil, you have to make an explicit choice about how to map the colour resolution onto the greyscale resolution. This is not obvious, and different standards use different mappings (e.g. I seem to remember that MPEG-1 uses interpolation with dither, and H263 uses simple pixel doubling.) This is I believe another argument for not trying to solve this problem too generally. My suggestion: use special 4:2:2 images internally in vidl2. Translate them to normal vil_image_views when required. vidl2_422_image could be a derivative of vil_image_resource, which supports this conversion on demand approach. If you want fast playback of vidl2 video in vgui without having to convert to a normal image (and assuming that your video card supports the 4:2:2 format directly), then either let vgui depend on vidl2, or let vidl2 depend on vgui with a vidl2_vgui_video_pane, derived from the appropriate vgui base class. Ian. |