Menu

#69 keep image depth when running unpaper

closed
nobody
None
5
2012-04-11
2008-11-24
No

Hi!

I noticed that a b/w image ist turned into a grayscale image when running unpaper on it.

This is what I did:
1. Import a b/w pdf
2. Export it as DjVu: The use of cjb2 indicates that gscan2pdf treats it as a b/w image (I probably could have found this out easier, but I did't know how)
3. Run unpaper on it
4. Export it again as DjVu: The use of c44 indicates that gscan2pdf treats it as a grayscale or color image.

This is what I suppose to be the reason for it:
By default, unpaper gives what it (thinks it) gets: If you pass it a b/w image, it outputs b/w, if you pass it a grayscale image, it outputs grayscale. This seems to be determined by the file extension: pbm is b/w, pgm is grayscale.
gscan2pdf, on the other hand, stores its intermediate files as pnms and just stores the image depth with its internal metadata (this is at least what I think I found out). unpaper doesn't know the image depth and treats the image as grayscale.

How this could be solved:
I think the issue could be solved by one/more of the following:
1. use .pbm as temp file extension for b/w images
2. pass unpaper an option "--type pbm" to get out a b/w image in pbm format
3. pass unpaper an option "--depth 1" to get out a b/w image

Keep up the good work!

Discussion

1 2 > >> (Page 1 of 2)
  • Frederik Elwert

    Frederik Elwert - 2008-12-01

    A preliminary patch that should fix the issue. These are my fist lines in perl at all, so please don't be too harsh if I made a silly mistake. :-)

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-03

    I don't know if notifications were sent when I added the patch. Could anyone have a look at it and improve and/or commit it? I did use gscan2pdf with this patch applied to clean up some of my scanned documents, but I don't know if I got all corner cases right.

     
  • Jeffrey Ratcliffe

    Thanks for the patch.

    This would be a bug in unpaper, but I can't reproduce it. Can you attach a PDF which demonstrates it?

    Which version of unpaper are you using?

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-04

    A DjVu test file

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-04

    I added a .djvu file that shows the problem. You can reproduce it by doing the following:

    1) Import it. Export it.
    2) Unpaper it. Export it.

    The file resulting the second export is much bigger because it's now grayscale, not b/w.

    I'm using unpaper 0.2.

     
  • Jeffrey Ratcliffe

    The problem was not unpaper but imagemagick not keeping the depth when converting from TIFF to PNM. Your patch, however, was nonetheless the correct approach and I applied it almost unaltered. There are no corner cases, as the resulting PBM is in any case deleted as soon as unpaper has processed it.

    Thanks once again.

     
  • Jeffrey Ratcliffe

    • status: open --> closed
     
  • Frederik Elwert

    Frederik Elwert - 2008-12-10

    Updated patch respecting the depth of pnm files, too

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-10

    Hm, I found a case where the fix fails: When the temporary image gs2p uses is a pnm instead of a tif, the depth is not tested, and unpaper still converts it to grayscale. So it seems that even b/w pnms have to be explicitly converted to pbms for unpaper to output a b/w image.

    I updated the patch and hope it will provide a more robust solution.

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-10
    • status: closed --> open
     
  • Jeffrey Ratcliffe

    Can you please also provide a .pnm that reproduces the problem.

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11

    Example files: a pnm (before running unpaper) and a log from the gs2p run

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11

    I added an archive containing a pnm example file and the log from the gs2p (unpatched) run.

    The pnm was generated by importing a pdf file. I noticed one strangeness with a couple of pdf files: After importing, they are inverted, so I had to run the invert filter in order to get a usable output. Should I file a new bug for this?

     
  • Jeffrey Ratcliffe

    • status: open --> closed
     
  • Jeffrey Ratcliffe

    Thanks for the info. However -

    $ identify before_unpaper.pnm
    before_unpaper.pnm PNM 3508x2481 3508x2481+0+0 DirectClass 8-bit 24.9005mb 0.640u 0:02

    i.e. the image was 8-bit before unpaper saw it - and was imported as such from the PDF.

    That some PDFs are imported inverted is a bug in poppler and xpdf
    https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/134313

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11
    • status: closed --> open
     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11

    Ok, I did a bit more research: The image was imported as 1-bit:

    $ identify VNq1NeRMn8.pnm
    VNq1NeRMn8.pnm PNM 3508x2481 3508x2481+0+0 PseudoClass 2c 1-bit 1.03872mb

    Then, inverting turned it into 8-bit:

    $ identify T1kkHNPLo6.pnm
    T1kkHNPLo6.pnm PNM 3508x2481 3508x2481+0+0 DirectClass 8-bit 24.9005mb 0.770u 0:02

    This would explain why unpaper also outputs it as 8-bit. But it does not explain why

    * exporting the inverted image as DjVu uses cjb2, but exporting the unpapered image uses c44
    * my updated patch works: Checking for depth with Image::Magick returns 1 for the inverted image

    So maybe at this point the real problem is that inverting doesn't preserve the image depth? If I understand the negate function correctly, it should ensure that the image depth is preserved, so I have no clue why it doesn't work. I attach my pdf test file, so maybe you can find out something relevant.

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11

    PDF test file

     
  • Jeffrey Ratcliffe

    • status: open --> closed
     
  • Jeffrey Ratcliffe

    Thanks for the info. However -

    $ identify before_unpaper.pnm
    before_unpaper.pnm PNM 3508x2481 3508x2481+0+0 DirectClass 8-bit 24.9005mb 0.640u 0:02

    i.e. the image was 8-bit before unpaper saw it - and was imported as such from the PDF.

    That some PDFs are imported inverted is a bug in poppler and xpdf
    https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/134313

     
  • Jeffrey Ratcliffe

    Apologies for the previous duplicated comment - injudicious use of the back button.

    I get the same warning from pdfimages (v3.00) that the PDF is corrupt, but here, it is not inverted.

    Inverting it, I still get a 1-bit image.

     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11
    • status: closed --> open
     
  • Frederik Elwert

    Frederik Elwert - 2008-12-11

    Hm, that's strange. For me, negating the image turns it into 8-bit. I used this small test script on the image extracted by pdfimages (3.00):

    use Image::Magick;
    my $image = Image::Magick->new;
    $image->Read("x-000.pbm");
    $image->Negate;
    $image->Write(depth => 1, filename => "test.pnm");

    The images are identified as follows:

    $ identify x-000.pbm
    x-000.pbm PNM 3508x2481 3508x2481+0+0 PseudoClass 2c 1-bit 1.03872mb 0.590u 0:02
    $ identify test.pnm
    test.pnm PNM 3508x2481 3508x2481+0+0 DirectClass 8-bit 24.9005mb 1.360u 0:02

    Maybe one should explicitly set the file name extension to ".pbm" for b/w images? Imagemagick doesn't seem to respect the depth when writing .pnm files.

     
  • Jeffrey Ratcliffe

    I've just tried this on a stock Intrepid machine. The PDF is imported inverted (pdfimages 3.00). After negating (ImageMagick 6.3.7) it is still 1 bit. unpaper, with double format and output-pages=2, gives me 1 page 1-bit, and the other 8-bit.

    I will investigate.

     
  • Jeffrey Ratcliffe

    I can no longer reproduce this. Can you?

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB