Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.
For the reproducer, the ultimate source file is LaTeX. Here's a sample:
\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
Basic math problem:\\
\includegraphics[width=0.8\textwidth]{find_x.jpg}
\end{document}
In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:
pdf2djvu -o foo.djvu foo.pdf
That djvu file also renders correctly. Then this command is used to rasterize the whole thing:
ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).
This is the latest Debian Stretch version (DjVuLibre-3.5).
pbm output is also broken in the above scenario
This is the intended operation of –mode=black.
This mode only prints the foreground stencil and discards all background information.
The ddjvu doc is not very clear about this. It needs updating.
If you want to produce a bitonal image for faxing, best is to output to a pbm file.
Another option is to output in normal mode to a tiff image and use “tiffcp” to recode it with a fax codec.
L.
From: free Buju sadgadfgafgag@users.sourceforge.net
Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Date: Saturday, February 10, 2018 at 3:17 PM
To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Subject: [djvu:bugs] #287 "ddjvu -mode=black" loses raster images within a vector doc
pbm output is also broken in the above scenario
[bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc
Status: open
Group: djvulibre
Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
Last Updated: Sat Feb 10, 2018 07:57 PM UTC
Owner: nobody
Attachments:
findx.djvu (24.5 kB; image/vnd.djvu)
Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.
For the reproducer, the ultimate source file is LaTeX. Here's a sample:
\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
Basic math problem:\
\includegraphics[width=0.8\textwidth]{find_x.jpg}
\end{document}
In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:
pdf2djvu -o foo.djvu foo.pdf
That djvu file also renders correctly. Then this command is used to rasterize the whole thing:
ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).
This is the latest Debian Stretch version (DjVuLibre-3.5).
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Bugs:
#287i just wrote a lengthy reply and sourceforge threw it away as a false detection for spam. I may try again later.
will write in small pieces.
Modes are:
To make "black" mean "foreground" (and not the opposite of color) is very far from clear. It fails the rule of least astonishment.
Since I wrote all this almost twenty years ago, I checked exactly what these modes do:
--mode=foreground prints the foreground with its colors.
–mode=mask prints the foreground stencil (i.e. the pixels that belong to the foreground are shown in black regardless of their color.)
When the image does not have a foreground (an IW44 image), these two modes return a blank page.
--mode=black is almost the same as –mode=mask. However, instead of returning a blank page for an IW44 image, it decodes the IW44 image normally (in colors).
I would not say this is very useful. Somebody must have asked for it and I caved in…
L.
From: free Buju sadgadfgafgag@users.sourceforge.net
Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Date: Saturday, February 10, 2018 at 5:54 PM
To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Subject: [djvu:bugs] #287 "ddjvu -mode=black" loses raster images within a vector doc
This is the intended operation of –mode=black.
This mode only prints the foreground stencil and discards all background information.
The ddjvu doc is not very clear about this. It needs updating.
Modes are:
black
color
mask
foreground
To make "black" mean "foreground" (and not the opposite of color) is very far from clear. It fails the rule of least astonishment.
[bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc
Status: open
Group: djvulibre
Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
Last Updated: Sat Feb 10, 2018 10:48 PM UTC
Owner: nobody
Attachments:
findx.djvu (24.5 kB; image/vnd.djvu)
Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.
For the reproducer, the ultimate source file is LaTeX. Here's a sample:
\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
Basic math problem:\
\includegraphics[width=0.8\textwidth]{find_x.jpg}
\end{document}
In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:
pdf2djvu -o foo.djvu foo.pdf
That djvu file also renders correctly. Then this command is used to rasterize the whole thing:
ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).
This is the latest Debian Stretch version (DjVuLibre-3.5).
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Bugs:
#287The source data, in LaTeX, has no concept of foreground or background AFAIK. I think LaTeX support infinite layers. Is it pdf2djvu that assigned objects to foreground and background? Can that be influenced so everything is in the foreground?
The source data, in LaTeX, has no concept of foreground or background
But djvu does. It classifies all image components as foreground or background in order to use different codecs. These modes are merely ways to inspect and sometimes leverage the internal sausage-making of djvu.
Did you consider http://netpbm.sourceforge.net/doc//pnmquant.html in Floyd-Steinberg mode to binarize a color image ?
L.
G3 and G4 are the compression methods used by fax machines.
TIFF class F is just a specification of the TIFF options one should use to easily ship it via FAX. See http://cool.conservation-us.org/bytopic/imaging/std/tiff-f.html.
Not only you must use G3 (not G4) but there are lots of other little constraints…
L.
From: Leon Bottou leonb@users.sourceforge.net
Reply-To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Date: Saturday, February 10, 2018 at 6:21 PM
To: "[djvu:bugs]" 287@bugs.djvu.p.re.sf.net
Subject: [djvu:bugs] Re: #287 "ddjvu -mode=black" loses raster images within a vector doc
The source data, in LaTeX, has no concept of foreground or background
But djvu does. It classifies all image components as foreground or background in order to use different codecs. These modes are merely ways to inspect and sometimes leverage the internal sausage-making of djvu.
Did you consider http://netpbm.sourceforge.net/doc//pnmquant.html in Floyd-Steinberg mode to binarize a color image ?
L.
[bugs:#287] "ddjvu -mode=black" loses raster images within a vector doc
Status: open
Group: djvulibre
Created: Sat Feb 10, 2018 07:57 PM UTC by free Buju
Last Updated: Sat Feb 10, 2018 11:14 PM UTC
Owner: nobody
Attachments:
findx.djvu (24.5 kB; image/vnd.djvu)
Images are being dropped by ddjvu when the source document is a mix of vector graphics and raster images. All raster images are being dropped when -mode=black is used.
For the reproducer, the ultimate source file is LaTeX. Here's a sample:
\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
Basic math problem:\
\includegraphics[width=0.8\textwidth]{find_x.jpg}
\end{document}
In this case, there is a "find_x.jpg" file, but this can be any image. The LaTeX compiles into a PDF that renders properly. Then the PDF is converted to a djvu file using this command:
pdf2djvu -o foo.djvu foo.pdf
That djvu file also renders correctly. Then this command is used to rasterize the whole thing:
ddjvu -mode=black -format=tiff -aspect=no -size=1734x2156 foo.djvu foo.tiff
Those CLI options are chosen to produce an image for FAXing. The problematic option here is -mode=black. If the mode is not altered, ddjvu does the correct thing. I've attached the djvu file so ppl need not use latex to reproduce. Note that it also fails if the output format is PDF. (I did not try ppm or others).
This is the latest Debian Stretch version (DjVuLibre-3.5).
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/287/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Bugs:
#287I wasn't familiar with netpbm. But I would think that the bileveling and the rasterization from a vector image would have to be done in the same step, considering color images seem to need fewer pixels than bitonal images for a given quality. I'm not sure though. However, it's the rasterization that drove me to use ddjvu, which it does in good quality. When I use ImageMagick to rasterize a PDF, it's very poor quality.
I suggest renaming "color" to "whole_image" or "all_layers" and "black" to "foreground stencil". It would also be useful to know the difference between the stencil and the "mask".
Indeed I've switched to pbm, but that doesn't solve the problem.
What I've been doing is running ImageMagick on the output of ddjvu to convert to a raw "fax" format, then using fax2tiff -f to convert to a Class-F TIFF, which is specifically designed for faxing. When I look at the tiffcp manpage, I see nothing about Class-F. Is fax encoding implied when using group 3/4 compression? Because for other tools, group 3 and 4 compression is a lossless compression like zip that can be used on anything.
I wasn't familiar with netpbm. But I would think that the bileveling and the rasterization from a vector image would have to be done in the same step, considering color images seem to need fewer pixels than bitonal images for a given quality. I'm not sure though. However, it's the rasterization that drove me to use ddjvu, which it does in good quality. When I use ImageMagick to rasterize a PDF, it's very poor quality.
Do you mean that you are using gsdjvu/djvudigital to rasterize into a djvu file in the hope of converting it to a bitonal fax. If you use djvudigital, then you can use the –dryrun option to see how it invokes ghostscript. It probably does it much more carefully than imagemagick. Then you can experiment with the ghostscript output devices that deal with fax formats. See https://www.ghostscript.com/doc/9.21/Devices.htm . Djvudigital works by implementing a complicated ghostscript device that separates foreground and background for optimal compression. But you don’t need that.
You probably want something like:
gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg3 -r200 -sPAPERSIZE=letter -dFIXEDMEDIA -sOutputFile=<dest.tiff> <src.ps-src.pdf></src.ps-src.pdf></dest.tiff>
See https://www.ghostscript.com/doc/9.21/Use.htm . There are lots of options…
L.