Menu

#287 "ddjvu -mode=black" loses raster images within a vector doc

djvulibre
closed
nobody
None
5
2018-02-11
2018-02-10
free Buju
No

Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.

For the reproducer, the ultimate source file is LaTeX. Here's a sample:

\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
Basic math problem:\\

\includegraphics[width=0.8\textwidth]{find_x.jpg}
\end{document}

In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:

pdf2djvu -o foo.djvu foo.pdf

That djvu file also renders correctly. Then this command is used to rasterize the whole thing:

ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff

Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).

This is the latest Debian Stretch version (DjVuLibre-3.5).

1 Attachments

Related

Bugs: #287

Discussion

  • free Buju

    free Buju - 2018-02-10

    pbm output is also broken in the above scenario

     
    • Leon Bottou

      Leon Bottou - 2018-02-10

      This is the intended operation of –mode=black.

      This mode only prints the foreground stencil and discards all background information.

      The ddjvu doc is not very clear about this. It needs updating.

      If you want to produce a bitonal image for faxing, best is to output to a pbm file.

      Another option is to output in normal mode to a tiff image and use “tiffcp” to recode it with a fax codec.

      L.

      From: free Buju sadgadfgafgag@users.sourceforge.net
      Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
      Date: Saturday, February 10, 2018 at 3:17 PM
      To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
      Subject: [djvu:bugs] #287 "ddjvu -mode=black" loses raster images within a vector doc

      pbm output is also broken in the above scenario

      [bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc

      Status: open
      Group: djvulibre
      Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
      Last Updated: Sat Feb 10, 2018 07:57 PM UTC
      Owner: nobody
      Attachments:
      findx.djvu (24.5 kB; image/vnd.djvu)
      Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.

      For the reproducer, the ultimate source file is LaTeX. Here's a sample:

      \documentclass{minimal}
      \usepackage{graphicx}
      \begin{document}
      Basic math problem:\

      \includegraphics[width=0.8\textwidth]{find_x.jpg}
      \end{document}
      In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:

      pdf2djvu -o foo.djvu foo.pdf
      That djvu file also renders correctly. Then this command is used to rasterize the whole thing:

      ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
      Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).

      This is the latest Debian Stretch version (DjVuLibre-3.5).

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #287

  • free Buju

    free Buju - 2018-02-10

    i just wrote a lengthy reply and sourceforge threw it away as a false detection for spam. I may try again later.

     
  • free Buju

    free Buju - 2018-02-10

    will write in small pieces.

     
  • free Buju

    free Buju - 2018-02-10

    This is the intended operation of –mode=black.
    This mode only prints the foreground stencil and discards all background information.
    The ddjvu doc is not very clear about this. It needs updating.

    Modes are:

    • black
    • color
    • mask
    • foreground
    • background

    To make "black" mean "foreground" (and not the opposite of color) is very far from clear. It fails the rule of least astonishment.

     
    • Leon Bottou

      Leon Bottou - 2018-02-10

      Since I wrote all this almost twenty years ago, I checked exactly what these modes do:

      --mode=foreground prints the foreground with its colors. 

      –mode=mask prints the foreground stencil (i.e. the pixels that belong to the foreground are shown in black regardless of their color.)

      When the image does not have a foreground (an IW44 image), these two modes return a blank page.

      --mode=black is almost the same as –mode=mask. However, instead of returning a blank page for an IW44 image, it decodes the IW44 image normally (in colors).

      I would not say this is very useful. Somebody must have asked for it and I caved in…

      L.

      From: free Buju sadgadfgafgag@users.sourceforge.net
      Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
      Date: Saturday, February 10, 2018 at 5:54 PM
      To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
      Subject: [djvu:bugs] #287 "ddjvu -mode=black" loses raster images within a vector doc

      This is the intended operation of –mode=black.
      This mode only prints the foreground stencil and discards all background information.
      The ddjvu doc is not very clear about this. It needs updating.

      Modes are:
      black
      color
      mask
      foreground

      • background

      To make "black" mean "foreground" (and not the opposite of color) is very far from clear. It fails the rule of least astonishment.

      [bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc

      Status: open
      Group: djvulibre
      Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
      Last Updated: Sat Feb 10, 2018 10:48 PM UTC
      Owner: nobody
      Attachments:
      findx.djvu (24.5 kB; image/vnd.djvu)
      Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.

      For the reproducer, the ultimate source file is LaTeX. Here's a sample:

      \documentclass{minimal}
      \usepackage{graphicx}
      \begin{document}
      Basic math problem:\

      \includegraphics[width=0.8\textwidth]{find_x.jpg}
      \end{document}
      In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:

      pdf2djvu -o foo.djvu foo.pdf
      That djvu file also renders correctly. Then this command is used to rasterize the whole thing:

      ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
      Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).

      This is the latest Debian Stretch version (DjVuLibre-3.5).

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #287

  • free Buju

    free Buju - 2018-02-10

    The source data, in LaTeX, has no concept of foreground or background AFAIK. I think LaTeX support infinite layers. Is it pdf2djvu that assigned objects to foreground and background? Can that be influenced so everything is in the foreground?

     
    • Leon Bottou

      Leon Bottou - 2018-02-10

      The source data, in LaTeX, has no concept of foreground or background
      But djvu does. It classifies all image components as foreground or background in order to use different codecs. These modes are merely ways to inspect and sometimes leverage the internal sausage-making of djvu.

      Did you consider http://netpbm.sourceforge.net/doc//pnmquant.html  in Floyd-Steinberg mode to binarize a color image ?

      L.

       
      • Leon Bottou

        Leon Bottou - 2018-02-10

        G3 and G4 are the compression methods used by fax machines.

        TIFF class F is just a specification of the TIFF options one should use to easily ship it via FAX. See http://cool.conservation-us.org/bytopic/imaging/std/tiff-f.html.

        Not only you must use G3 (not G4) but there are lots of other little constraints…

        L.

        From: Leon Bottou leonb@users.sourceforge.net
        Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
        Date: Saturday, February 10, 2018 at 6:21 PM
        To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
        Subject: [djvu:bugs] Re: #287 "ddjvu -mode=black" loses raster images within a vector doc

        The source data, in LaTeX, has no concept of foreground or background
        But djvu does. It classifies all image components as foreground or background in order to use different codecs. These modes are merely ways to inspect and sometimes leverage the internal sausage-making of djvu.

        Did you consider http://netpbm.sourceforge.net/doc//pnmquant.html in Floyd-Steinberg mode to binarize a color image ?

        L.

        [bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc

        Status: open
        Group: djvulibre
        Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
        Last Updated: Sat Feb 10, 2018 11:14 PM UTC
        Owner: nobody
        Attachments:
        findx.djvu (24.5 kB; image/vnd.djvu)
        Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.

        For the reproducer, the ultimate source file is LaTeX. Here's a sample:

        \documentclass{minimal}
        \usepackage{graphicx}
        \begin{document}
        Basic math problem:\

        \includegraphics[width=0.8\textwidth]{find_x.jpg}
        \end{document}
        In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:

        pdf2djvu -o foo.djvu foo.pdf
        That djvu file also renders correctly. Then this command is used to rasterize the whole thing:

        ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
        Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).

        This is the latest Debian Stretch version (DjVuLibre-3.5).

        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #287

      • free Buju

        free Buju - 2018-02-10

        I wasn't familiar with netpbm. But I would think that the bileveling and the rasterization from a vector image would have to be done in the same step, considering color images seem to need fewer pixels than bitonal images for a given quality. I'm not sure though. However, it's the rasterization that drove me to use ddjvu, which it does in good quality. When I use ImageMagick to rasterize a PDF, it's very poor quality.

         
  • free Buju

    free Buju - 2018-02-10

    I suggest renaming "color" to "whole_image" or "all_layers" and "black" to "foreground stencil". It would also be useful to know the difference between the stencil and the "mask".

     
  • free Buju

    free Buju - 2018-02-10

    If you want to produce a bitonal image for faxing, best is to output to a pbm file.

    Indeed I've switched to pbm, but that doesn't solve the problem.

     
  • free Buju

    free Buju - 2018-02-10

    Another option is to output in normal mode to a tiff image and use “tiffcp” to recode it with a fax codec.

    What I've been doing is running ImageMagick on the output of ddjvu to convert to a raw "fax" format, then using fax2tiff -f to convert to a Class-F TIFF, which is specifically designed for faxing. When I look at the tiffcp manpage, I see nothing about Class-F. Is fax encoding implied when using group 3/4 compression? Because for other tools, group 3 and 4 compression is a lossless compression like zip that can be used on anything.

     
  • Leon Bottou

    Leon Bottou - 2018-02-11
    • status: open --> closed
     
  • Leon Bottou

    Leon Bottou - 2018-02-11

    I wasn't familiar with netpbm. But I would think that the bileveling and the rasterization from a vector image would have to be done in the same step, considering color images seem to need fewer pixels than bitonal images for a given quality. I'm not sure though. However, it's the rasterization that drove me to use ddjvu, which it does in good quality. When I use ImageMagick to rasterize a PDF, it's very poor quality.
    Do you mean that you are using gsdjvu/djvudigital to rasterize into a djvu file in the hope of converting it to a bitonal fax. If you use djvudigital, then you can use the –dryrun option to see how it invokes ghostscript. It probably does it much more carefully than imagemagick. Then you can experiment with the ghostscript output devices that deal with fax formats. See https://www.ghostscript.com/doc/9.21/Devices.htm . Djvudigital works by implementing a complicated ghostscript device that separates foreground and background for optimal compression. But you don’t need that.

    You probably want something like:

    gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg3 -r200 -sPAPERSIZE=letter -dFIXEDMEDIA -sOutputFile=<dest.tiff> <src.ps-src.pdf></src.ps-src.pdf></dest.tiff>

    See https://www.ghostscript.com/doc/9.21/Use.htm . There are lots of options…

    L.

     

Log in to post a comment.

MongoDB Logo MongoDB