DJVU compression should be better than PDF
Brought to you by:
ra28145
DJVU files saved with gscan2pdf are consistently larger than PDFs. My documents contain a mixture of text and images.
It may be that this is not a bug in gscan2pdf, but a consequence of using default settings when calling djvulibre. Please could an option be added to adjust the djvu compression settings?
Thank you for creating this essential everyday software!
Thanks for the report. Normally DjVu compresses better for me.
Please post an image which compresses better with PDF than DjVu, and then start gscan2pdf from the command line with the --log=log option, import the file, save it as PDF and DjVu, quit, and post the log file, which gscan2pdf should have compressed with xz.
This post contains a sample image. The log file will come next.
Here is the gscan2pdf log file. The output file sizes were 707.9 KiB for DjVU and 696.0 KiB for PDF.
You are using quality=66 when creating the PDF. I suspect that higher values will produce PDFs that are larger than the DjVu.
The image in the DjVu is compressed using c44. The quality settings in c44 are not a simple as percentage. I could expose them, but I doubt most users would take the time to try and understand them.
Have you seen the c44 options?
Thanks for this. I use
c44 -slice 12,40,101 $file
to match the file size produced by .webp quality 55. A question is, should this be made a default in gscan2pdf? It is hard to answer this without acceptance testing by the majority of users. Could a modifed version for debian be provided for testing?Meanwhile, I wasn't aware that I'd dropped the quality when producing the PDF. Is 66 modifiable when Compression is set to Automatic?
Last edit: Arnab K Rana 2024-02-12
If the image is .jpg, then automatic compression will also use jpg for the PDF, and will take the quality parameter from the last time a jpg or a PDF with jpg compression was saved. i.e. if you save a test PDF compression=jpeg with quality=90, then that value will be used by subsequent PDFs with automatic compression that are jpg internally.
Note that the DjVu compression method also depends on the image type - jpg/png uses c44 (although I really should get around to also implementing cpaldjvu). BW images use cjb2.
I could certainly optionally expose the c44 arguments.
Thanks for this. Perhaps at in the past I saved a low-importance document with manual jpg compression at 66%, not realising that this would become the default.
Yes, exposing the c44 arguments seems like the way forward. They could be left blank by people who want the defaults or used by people who want control.