I have pdf-file with text and images. Let say products' descriptions and photos.
Some photos consist of several adjacent images.
I want to extract products' photos and for each product join all photo parts into one image.
At the beginning I need to know which images are adjacent.
I studied sample source code and see that I can read image size (height, width). But I haven't found how to read image position on the page.
Could someone give me a clue how to get image positions from existing PDF file?
Thanks,
Michael
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello everyone,
I have pdf-file with text and images. Let say products' descriptions and photos.
Some photos consist of several adjacent images.
I want to extract products' photos and for each product join all photo parts into one image.
At the beginning I need to know which images are adjacent.
I studied sample source code and see that I can read image size (height, width). But I haven't found how to read image position on the page.
Could someone give me a clue how to get image positions from existing PDF file?
Thanks,
Michael
It seems I found a solution for ImageXObject in the sample ContentScanningSample:
if(xObject is xObjects::ImageXObject)
{
Console.WriteLine(
"Image '" + xObjectKey + "' (" + xObject.BaseObject + ") " // Image key and indirect reference.
+ "on page " + (page.Index + 1) + " (" + page.BaseObject + ")" // Page index and indirect reference.
);
// Get the coordinates of the image!
double[] ctm = level.State.CTM; // Current transformation matrix.
SizeF imageSize = xObject.Size; // Image native size.
Console.WriteLine(" Coordinates:");
Console.WriteLine(" x: " + Math.Round(ctm[4]));
Console.WriteLine(" y: " + Math.Round(page.Size.Value.Height - ctm[5]));
Console.WriteLine(" width: " + Math.Round(ctm[0]) + " (native: " + Math.Round(imageSize.Width) + ")");
Console.WriteLine(" height: " + Math.Round(Math.Abs(ctm[3])) + " (native: " + Math.Round(imageSize.Height) + ")");
}
}