Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Rendering Images

2011-03-09
2013-01-26
1 2 > >> (Page 1 of 2)
  • Glen Thomas
    Glen Thomas
    2011-03-09

    When I render images to the RenderContext object from the Scan method of the ShowText class, the images render ok.
    But when I render the same images to the RenderContext from the Scan method of the PaintXObject class they are not.
    The images are the correct height, but they appear as a block of one colour from the original image smeared across the page. I can't work out what it is that is different in the RenderContext from ShowText and PaintXObject. I think they are the same instance of System.Drawing.Graphics, but the properties must be different somehow. I can see that the clipping is different but I have played with SetClip and the problem still occurs.

    Any ideas what I need to do with the RenderContext for PaintXObject for the images to render properly?

     
  • Glen Thomas
    Glen Thomas
    2011-03-09

    Im a step closer…Somehow the Transorm property of the RenderContext is wrong, which is causing the picture to be drawn incorrectly. When I set the Transfom to the same matrix as it is at the point of Scan in the ShowText class the image is rendered correctly.

    Just need to work out where the transform is changing and what to set it to…

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    I have found the PrimitiveComposer class, which has a ShowXObject method that looks like would be used to set up the CTM for the images (and also text), but this isn't being called when I render PDF files with images.

    I've been reading the PDF specification for image objects and I have discovered that images are positioned by a series of cm operations contained within q and Q operators. e.g:

    q // Save graphics state
       1 0 0 1 100 200 cm // Translate
       0. 7071 0. 7071 -7071 0. 7071 0 0 cm // Rotate
       150 0 0 80 0 0 cm // Scale
       /Image1 Do // Paint image
    Q

    I'm guessing that there are some cm operations taking place as the RenderContext Transform property is changing, although not correctly.
    The specification also notes that performing all of the cm operations as one combined operation will distort the image. Maybe this is what is happening.

    When I am debugging the code and step through, I can't see the cm operations being performed, only the Image Do one.

    Do you have any suggestions on what I need to do to fix this Stefano? I am spending a lot of time on this and making very slow progress so far so any info would be useful.

    Thanks.

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    I have found the cm operation in my test PDF file (which has a jpeg image in the centre of the page)

    cm

    This must be the operation that modifies the Current Transformation Matrix in order to set the image on the page.

    I'm wondering if the part of the library that creates the operation objects combines a series of transformations into a single transformation and I need to split these out…

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    I think maybe the ContentScanningSample has what I need.

    It scans for images and prints out the x, y coordinates and scaled height and width of the images. This might be enough to draw the images to the page in their correct positions.

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    FINALLY!

    I am rendering ImageXObjects to the page in their correct positions and sizes. The only problem is that they are back-to-front and upside-down.

    In the scan method of the PaintXObject class I have used the following:

    org.pdfclown.documents.contents.xObjects.XObject xImage = GetXObject(state.Scanner.ContentContext);
    ContentScanner.GraphicsObjectWrapper objectWrapper = state.Scanner.ParentLevel.CurrentWrapper;
    SizeF? imageSize = null; // Image native size.
    if(objectWrapper is ContentScanner.XObjectWrapper)
    {
        ContentScanner.XObjectWrapper xObjectWrapper = (ContentScanner.XObjectWrapper)objectWrapper;
                
        imageSize = xImage.Size; // Image native size.
    }
    else if(objectWrapper is ContentScanner.InlineImageWrapper)
    {
        InlineImage inlineImage = ((ContentScanner.InlineImageWrapper)objectWrapper).InlineImage;
    }
    Image image = new Bitmap(new MemoryStream((xImage as org.pdfclown.documents.contents.xObjects.ImageXObject).BaseDataObject.GetBody(false).ToByteArray()));
    RectangleF box = objectWrapper.Box.Value;
    state.Scanner.RenderContext.ResetTransform();
    state.Scanner.RenderContext.Transform = new System.Drawing.Drawing2D.Matrix(1.0F, 0.0F, 0.0F, -1.0F, 0.0F, 841.92F);
    //state.Scanner.RenderContext.DrawImage(image, new System.Drawing.Point(300, 300));
    state.Scanner.RenderContext.DrawImage(image, box.X, box.Y, box.Width, box.Height);
    

    This only seems to work for JPEG images, and of course XObjects, as I haven't put anything in to render inline images.

    Do image formats other than JPEG not get added to the XObject resources when building the content?

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    Stefano, I do realise that the way I am rendering text and images is poor and reduces the high standard of the PDF Clown library. I am just looking to make a rough implementation that I can build upon once I understand the correct way of doing things.

    When I have implemented this properly I would submit the code for contribution, but the state its in now is not good enough.

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    This does also work for bitmap images.

    For PNG and GIF, the XObjects do exist in the resources, but the body properties are not right for creating a System.Drawing.Image.

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    Where would be the best place to implement the Scan method for an InlineImage? Should I create a PaintInlineImage class or something similar?

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    I have added a scan method to the InlineImage class

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    Im having trouble working out what is wrong with the Body of the ImageXObject for PNG images.

    In the Header the Filter is null, so I'm expecting I don't need to decompress or anything.

    Maybe I need to remove certain bytes in the array or add some in.

    I see PNG support is on the list of Feature Requests. Have you looked at this at all yet?

     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    I have tidied up the code for rendering JPEG images and they are now the right way round:

            org.pdfclown.documents.contents.xObjects.XObject xImage = GetXObject(state.Scanner.ContentContext);
            ContentScanner.GraphicsObjectWrapper objectWrapper = state.Scanner.ParentLevel.CurrentWrapper;
            ContentScanner.XObjectWrapper xObjectWrapper = (ContentScanner.XObjectWrapper)objectWrapper;
            Image image = new Bitmap(new MemoryStream((xImage as org.pdfclown.documents.contents.xObjects.ImageXObject).BaseDataObject.GetBody(false).ToByteArray()));
            image.RotateFlip(RotateFlipType.Rotate180FlipX);
            RectangleF box = objectWrapper.Box.Value;
            state.Scanner.RenderContext.ResetTransform();
            state.Scanner.RenderContext.Transform = new System.Drawing.Drawing2D.Matrix(1.0F, 0.0F, 0.0F, -1.0F, 0.0F, 841.92F);
            state.Scanner.RenderContext.DrawImage(image, box.X, box.Y, box.Width, box.Height);
    
     
  • Glen Thomas
    Glen Thomas
    2011-03-10

    In the PNG ImageXObject the body is longer than the length as described in the header and in the JPEG the body is the same length as in teh header. So, I'm assuming some processing needs to be done on the PNG body data.

    PNG

    Header
    <<
    Type XObject
    Subtype Image
    Width 827
    Height 1169
    ColorSpace DeviceRGB
    BitsPerComponent 8
    Interpolate False
    Filter null
    Length 59596
    >>

    Body
    byte

    JPEG

    Header
    <<
    Type XObject
    Subtype Image
    Width 586
    Height 1077
    ColorSpace DeviceRGB
    BitsPerComponent 8
    Filter DCTDecode
    Interpolate True
    Length 112813
    >>

    Body
    byte

     
  • AFAIK, PNG images within PDF files need a non-trivial amount of handling about data samples decoding and masking.

     
  • Glen Thomas
    Glen Thomas
    2011-03-15

    I've been looking at how to deal with PNG format images but haven't got it solved yet.

    There seems to be something wrong with the body data. The body buffer for the PNG image is about 2.76MB. The original image is only around 25KB and the PDF file is only 137KB.

    I'm not sure why the body is so big. Also, the bytes in the buffer do not appear to be valid data for the image. They mostly (if not all) have values of 255.

     
  • Glen Thomas
    Glen Thomas
    2011-03-16

    I think I'm missing some entries for the dictionary of ImageXObject.

    I think for processing PNG, etc. I will need the 'Decode' dictionary entry, but this isn't currently in the header of the ImageXObject. Do the dictionary entries need to be defined in PDF Clown in order for them to be read from the headers in the PDF File's XObjects, or are all dictionary entries created based on whats available?

    Not a very clear question but I dont know how better to word it.

     
  • Glen Thomas
    Glen Thomas
    2011-03-16

    I can see there is a Decode property defined in the PDFName class, but its not in the ImageXObject header.

    From the PDF specification I can see that Decode is optional, so maybe it just doesn't exist for the images I'm trying to extract.

     
  • Glen Thomas
    Glen Thomas
    2011-03-17

    I have now tracked down the source of the corrupt Body data, but I can't work out why its happening.

    In the PDFIndirectObject class, line 238, the PDFStream is created

    dataObject = parser.ParsePdfObject(4);

    The PDFStream object created by parser.ParsePdfObject is ok for all types of images up to the point where it is put into the dataObject variable.

    When this is a JPEG or Bitmap data object the parser.ParsePdfObject is placed in dataObject and all is OK. But, when the PDFStream is from an unsupported image type the body somehow changes. I can't understand why this is happening. The type of data in the body is unknown to the application at this point. It should just see it as raw data and pass it across like any other.
    The process of creating the PDFStream seems to be the same with all image types, but the body is becoming corrupted somehow for certain streams (its size is increased).

    The header also changes, as the Filter value changes from FlateDecode to null.

    Do you have any ideas on this? Help would be appreciated. I think once I have sorted this bit I will be able to deal with the image formats without too much difficulty as I have done quite a bit of research now.

     
  • Glen Thomas
    Glen Thomas
    2011-03-17

    I can see that the GetBody and WriteTo methods of the PdfStream class would cause the body data and Filter entry in the header to be modified as I am experienceing but I can't see any calls to those methods in the process that I am following…

     
  • Glen Thomas
    Glen Thomas
    2011-03-17

    I have found where GetBody is called from. There is a second Parser class that calls this method during the loading of the Contents.

    It seems that there could be something wrong with the FlateFilter decode

     
  • Glen Thomas
    Glen Thomas
    2011-03-17

    Ignore all of my ramblings so far. I have been so lost in code I didn't know what was going on. I am now rendering images to the page including PNG and GIF, but I need to work on the PNG images as some are skewed and some are not working correctly that have transparency.

    I expect I still have some way to go yet…

     
  • Glen Thomas
    Glen Thomas
    2011-03-18

    I am now a step further with PNG. The images that were coming out looking twisted, I think was caused by missing padding bytes which are added to the image in order to make the row length a multiple of 4 in order to increase efficiency of operations on the image. System.Drawing.Image was not adding the padding bytes in for me, so each row of source data was too short.

    My rough code for this is currently:

    //Get the image data from the XObject body
    byte[] imageData = (xImage as org.pdfclown.documents.contents.xObjects.ImageXObject).BaseDataObject.GetBody(true).ToByteArray();
    //change order of RGB bytes            
    byte[] grb = new byte[imageData.Length];
    for (int i = 0; i < imageData.Length; i = i + 3)
    {
        grb[i] = imageData[i + 2];
        grb[i + 1] = imageData[i + 1];
        grb[i + 2] = imageData[i];
    }
    //Create a Bitmap object to insert the image data into
    Bitmap bmp = new Bitmap((int)xObjectWrapper.XObject.Size.Width, (int)xObjectWrapper.XObject.Size.Height, PixelFormat.Format24bppRgb);
    BitmapData bmd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, (int)xObjectWrapper.XObject.Size.Width, (int)xObjectWrapper.XObject.Size.Height), ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);
    int srcPos = 0;
    for (int y = 0; y < bmd.Height; y++) //for each row of image
    {
        //Get a pointer to the blank row in memory
        System.IntPtr row = new System.IntPtr(bmd.Scan0.ToInt32() + (y * bmd.Stride));  
        Marshal.Copy(grb, srcPos, row, bmd.Width*3);  //copy the image data to the row
        srcPos += bmd.Width * 3;  //increase the index of the image source data
    }
    bmp.UnlockBits(bmd);
    Image image = bmp;
    

    I still haven't sorted the issue with transparency. In the header of the PNG ImageXObject there is an SMask property with a value that seems to be pointing to another XObject. Having read some of the PDF specification it seems there should be another ImageXObject that is used as a mask in order to create the transparency. BUT, the SMask XObject that the SMask property points to is not in the Resources.
    Once I find the SMask image hopefully it won't be too difficult to apply it to the main image and then my PNG trouble should be sorted.

     
  • Glen Thomas
    Glen Thomas
    2011-03-18

    Looking at the PDF contents in Notepad, I can see the SMask XObjects.

    Here is the header of the main PNG image XObject. As you can see its SMask property points to XObject 8 0

    7 0 obj
    <</Type/XObject/Subtype/Image/Width 340/Height 332/ColorSpace/DeviceRGB/BitsPerComponent 8/Interpolate false/SMask 8 0 R/Filter/FlateDecode/Length 11464>>

    Here is the header of the corresponding SMask XObject:

    8 0 obj
    <</Type/XObject/Subtype/Image/Width 340/Height 332/ColorSpace/DeviceGray/Matte /BitsPerComponent 8/Interpolate false/Filter/FlateDecode/Length 1780>>

    At the moment I can't see why this SMask XObject isn't created by PDF Clown. Or if it is I can find its instance anywhere.

    I'm digging through the code trying to see why the object isn't generated. Let me know if you know why this would be happening.

     
  • Well, have you done a quick tour into the "architecture" chapter of the User Guide? It may help you to get the big picture of the way a PDF file is represented within the PDF Clown object model.

    Any PDF file has a collection of indirect objects (IndirectObjects class) which encompasses ALL the top-level objects contained by that file; xobjects (i.e. external objects) are by-definition top-level objects, so when you need to retrieve one of them it's just a matter of accessing the collection through its indexer, like this:

    int mySMaskObjectNumber = 8;
    PdfIndirectObject mySMaskIndirectObject = file.IndirectObjects[mySMaskObjectNumber];
    PdfDataObject mySMaskDataObject = mySMaskIndirectObject.DataObject;
    

    If you have a PdfReference which points to the indirect object you're looking for (such as the one exposed by the PNG ImageXObject for SMask), you can also do this:

    PdfDataObject mySMaskDataObject = myImageXObject.BaseDataObject.Header.Resolve(new PdfName("SMask"));
    

    IndirectObjects collection is global (within each PDF file, obviously): you don't have to wander around resources or whatsoever!

     
  • Glen Thomas
    Glen Thomas
    2011-03-21

    Thanks Stefano. I have now got the SMask object using
    myImageXObject.BaseDataObject.Header.Resolve(new PdfName("SMask"));

    I hadn't read the user guide at all! That will be very useful.

    Now I can try and work out how to use the SMask to add the transparency to the PNGs.

    Once I've done that I'm going to look at TIFF images.

     
1 2 > >> (Page 1 of 2)