Menu

#4251 Enhancement: Reduce size of PDF files when included in *TeX documents

Verified
nobody
Enhancement
2015-03-02
2015-01-08
Anonymous
No

Originally created by: *anonymous

Originally created by: pkx1...@gmail.com
Originally owned by: pkx1...@gmail.com

The attached patch (against 2.18.2) changes the way lilypond
uses fonts to draw glyphs.

It avoids to used glyphshow for all emmentaler glyphs and
adds encoding vectors to the emmentaler fonts before they
are used. It also changes the ghostscript parameters used
to generate pdfs from postscript code.

These changes help to reduce pdf file sizes if you include
lilypond snippets in *TeX documents. The pdfs generated by
a patched lilypond and *tex themselves are _much_ bigger,
but if you run ghostscript and pdfsizeopt.py on those
files they implode.

As this patch changes only very low-level routines it should
be invisible to the lilypond user interface. But links from other
pdfs into the processed files are broken. Changing this would
require a major extension of ghostscript.

I think if the discussion in this list shows that this code is
regarded to be usefull, a command line parameter should
be added to lilypond to enable these changes only on user
request. If you don't include lilypond pdfs in *TeX documents
you don't need it and you don't want it.

I don't know scheme well, so somebody should have a close
look at the code. Probably it looks ugly and inefficient to an
experienced scheme programmer.

notation.pdf is a good test object:

file size in bytes / comment

27.171.654 original 2.18.2 notation.pdf (3442 fonts)
20.074.736 original 2.18.2 notation.pdf + pdfsizeopt.py (completely broken pdf)
23.644.317 original 2.18.2 notation.pdf + ghostscript (1737 fonts)
19.979.555 original 2.18.2 notation.pdf + ghostscript + pdfsizeopt (1736 fonts)
127.676.999 patched 2.18.2 notation.pdf (3458 fonts)
---.---.--- patched 2.18.2 notation.pdf + pdfsizeopt.py (pdfsizeopt aborts with error)
  5.953.377 patched 2.18.2 notation.pdf + ghostscript (69 fonts)
  4.307.825 patched 2.18.2 notation.pdf + ghostscript + pdfsizeopt (69 fonts)

+ ghostscript means:

gs -dNOPAUSE -dBATCH -q -[r1200] -sDEVICE=pdfwrite -o outfile.pdf infile.pdf

+ pdfsizeopt means:

pdfsizeopt.py --use-multivalent=false infile.pdf

Comments:

- pdfsizeopt.py is neither capable to process the notation.pdf from 2.18.2 nor the
  notation.pdf generated by the patched version of lilypond correctly.
- ghostscript up to version 9.14 breaks internal and external links.
- ghostscript 9.15 produces an invalid but readable pdf whenever it
  processes an external link (GoToR)
- ghostscript git master preserves external and internal links
- processing of links in ghostscript git master exposes a bug in evince,
  external links are broken in that program as a result.
- link targets in files processed by ghostscript git master are lost.

cu,
Knut

Related

Issues: #5247

Discussion

  • Google Importer

    Google Importer - 2015-01-08

    Originally posted by: pkx1...@gmail.com

    On 08/01/15 12:18, Knut Petersen wrote:> On 29.12.2014 09:15, Werner LEMBERG wrote:
    >>
    >>> If you find it straightforward to encapsulate the code, then we can
    >>> probably incorporate it.
    >> Knut, are you willing to work on that, this is, adding a command line
    >> argument and properly documenting it?
    >
    > So here is Version 2
    > ================
    >
    > I fixed a few issues with version 1, added a command line option
    > --bigpdf / -b, and documented that option in the german and english
    > versions of usage.pdf .
    >
    > The patch is based on the current git master of lilypond now.
    >
    > cu,
    >   Knut

     
  • Google Importer

    Google Importer - 2015-01-08

    Originally posted by: pkx1...@gmail.com

    Reduce size of PDF files when inc. in *TeX docs

    Issue 4251

    This changes the way lilypond uses fonts to draw glyphs.

    It avoids to used glyphshow for all emmentaler glyphs and
    adds encoding vectors to the emmentaler fonts before they
    are used. It also changes the ghostscript parameters used
    to generate pdfs from postscript code.

    These changes help to reduce pdf file sizes if you include
    lilypond snippets in *TeX documents. The pdfs generated by
    a patched lilypond and *tex themselves are _much_ bigger,
    but if you run ghostscript and pdfsizeopt.py on those
    files they implode.

    added a command line option
    --bigpdf / -b, and documented that option in the german
    and english versions of usage.pdf .

    http://codereview.appspot.com/194090043

    Labels: Patch-new
    Status: Started

     
  • Google Importer

    Google Importer - 2015-01-08

    Originally posted by: pkx1...@gmail.com

    I think the easiest way to test the code thoroughly is to apply the patch,
    do a full build, and then  change

           "bool bigpdfs = false"

    in global-vars.cc to "bool bigpdfs = true" and do a full build again.

    Postprocess the doc pdfs according to

             gs -sDEVICE=pdfwrite -o outgs.pdf  notation.pdf
             pdfsizeopt --use-multivalent=no outgs.pdf outfinal.pdf

    and verify that everything went ok.

    It is expected that the english notation.pdf processed this way should
    contain 72 fonts (instead of the 3548 of an original notation.pdf).

    @everybody:

    It would be nice if someone could add translations for other languages
    than german.

    cu,
    knut

     
  • Google Importer

    Google Importer - 2015-01-09

    Originally posted by: dak@gnu.org

    Patchy the autobot says: passes tests.

    Labels: -Patch-new Patch-review

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: pkx1...@gmail.com

    On 09/01/15 11:23, Knut Petersen wrote:> Hi James!
    >
    > Attached is a small *.tex document that can be translated by xelatex or
    > lualatex,
    > one page including four small snippets.
    >
    > It demonstrates that also small documents with only a few snippets
    > benefit from
    > the --bigpdf patch. lualatex must be called with the --shell-escape
    > parameter to
    > allow it to execute lilypond and pdfcrop (obviously that needs to be
    > installed).
    >
    > Be aware of the fact that translation will also generate (or overwrite
    > !) some aux
    > files named tmplilytail.ly, tmplilyhead.ly, tmplilypaper.ly, tmplily.ly,
    > tmplilyfrag1.pdf,
    > tmplilyfrag3.pdf, tmplilyfrag2.pdf, tmplilyfrag4.pdf.
    >
    > Edit line 22 to contain the path to your lilypond test executable (dont
    > delete the
    > space after lilypond) and line 23 to either contain nothing or --bigpdf.
    >
    > Then translate test.tex to pdf with
    >
    >    lualatex --shell-escape test
    >    gs -sDEVICE=pdfwrite -o testtmp.pdf test.pdf
    >    pdfsizeopt.py --use-multivalent=no testtmp.pdf testfinal.pdf
    >    dir test*pdf --sort=time | tac
    >
    > With  line 23 "\def\lilyparms{  }" I get
    >
    > -rw-r--r-- 1 knut users   173538  9. Jan 11:53 test.pdf
    > -rw-r--r-- 1 knut users   157303  9. Jan 11:53 testtmp.pdf
    > -rw-r--r-- 1 knut users   149945  9. Jan 11:53 testfinal.pdf
    >
    > With line 23 "\def\lilyparms{ --bigpdf }" I get
    >
    > -rw-r--r-- 1 knut users   879441  9. Jan 11:55 test.pdf
    > -rw-r--r-- 1 knut users    63359  9. Jan 11:55 testtmp.pdf
    > -rw-r--r-- 1 knut users    59437  9. Jan 11:55 testfinal.pdf
    >
    > cu,
    >  Knut

    Labels: -Patch-review Patch-needs_work

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: pkx1...@gmail.com

    > running gs was trivial, but pdfsizeopt? where do I get that *easily*.
    >
    > Seems that Google's own instructions are out of date or don't work
    > (error 404 links) and there are a ton of other files that I need to
    > download (or check that I have) to make sure all the dependencies are
    > met for pdfsizeopt.
    Yes, their install instructions were not updated to the v2 libexec. Use something like:

    DIR=pdfsizeopt
    mkdir $DIR
    cd $DIR

    wget -O pdfsizeopt.py http://pdfsizeopt.googlecode.com/git/pdfsizeopt.single
    chmod 700 pdfsizeopt.py

    wget https://pdfsizeopt.googlecode.com/files/pdfsizeopt_libexec_linux-v2.tar.gz
    gzip -cd pdfsizeopt_libexec_linux-v2.tar.gz | tar -xvf -

    # We have our own gs, so remove this one
    rm pdfsizeopt_libexec/gs

    cu,
    Knut

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: pkx1...@gmail.com

    Format corrections from Werner, rewrite of English entry for Usage.

    http://codereview.appspot.com/194090043

    Labels: -Patch-needs_work Patch-new

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: dak@gnu.org

    I don't like all the rather hefty size tradeoffs and "pdfsizeopt does not work on this, or does not work on that".  The problem appears to be that we include subsetted fonts hundreds of times here, right?

    What happens if we just run the whole original resulting PDF file once again through ps2pdf (in spite of its name, this can also take PDF as its input)?  Does that do anything with regard to the font problem?

    Alternatively, we might not use PDFTeX for creating our docs but rather normal LaTeX, and do a final ps2pdf run.  That should likely make it easier for Ghostscript to unify fonts from the separate images.

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: lemzw...@googlemail.com

    > Alternatively, we might not use PDFTeX for creating our docs
    > but rather normal LaTeX, and do a final ps2pdf run.  That
    > should likely make it easier for Ghostscript to unify fonts
    > from the separate images.

    What exactly do you have in mind?  AFAIK, dvips can't handle PDF inclusions.  Do you perhaps mean dvipdfmx?

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: dak@gnu.org

    Why would we need to handle PDF inclusions?  LilyPond so far does not have an actual PDF backend.  Instead it writes out PostScript and calls ps2pdf on the result.  It seems like we could just include the PostScript with dvips and then call ps2pdf once on the resulting document rather than on every LilyPond ps file.

     
  • Google Importer

    Google Importer - 2015-01-11

    Originally posted by: pkx1...@gmail.com

    This still needs somw work - both on the patch in general and with discussion about the need for this. Setting to Needs_work for now.

    Labels: -Patch-new Patch-needs_work

     
  • Google Importer

    Google Importer - 2015-01-12

    Originally posted by: lemzw...@googlemail.com

    OK, I misunderstood.  Yes, trying to do everything in PS might be interesting.  Unfortunately, neither xetex, luatex, or pdftex produce DVI that can be processed by dvips, IIRC.  In other words, all enhancements provided by those programs are not available.  This might be good enough for the lilypond documentation pipeline, but it is not sufficient for the general case.

     
  • Google Importer

    Google Importer - 2015-01-20

    Originally posted by: pkx1...@gmail.com

    Corrections as per Werner, plus new edit to the English doc

    http://codereview.appspot.com/194090043

    Labels: -Patch-needs_work Patch-new

     
  • Google Importer

    Google Importer - 2015-01-21

    Originally posted by: pkx1...@gmail.com

    More corrections needed - this also now fails make (sigh).

    Labels: -Patch-new Patch-needs_work

     
  • Google Importer

    Google Importer - 2015-01-24

    Originally posted by: pkx1...@gmail.com

    Corrections from Werner, some scm formatting and TexInfo syntax fixes

    http://codereview.appspot.com/194090043

    Labels: -Patch-needs_work Patch-new

     
  • Google Importer

    Google Importer - 2015-01-24

    Originally posted by: pkx1...@gmail.com

    Patchy the autobot says: passes tests.  includes a full make doc

    Labels: -Patch-new Patch-review

     
  • Google Importer

    Google Importer - 2015-01-26

    Originally posted by: pkx1...@gmail.com

    Patch on countdown for Jan 29th

    Labels: -Patch-review Patch-countdown

     
  • Google Importer

    Google Importer - 2015-01-29

    Originally posted by: pkx1...@gmail.com

    We still need a comparable german translation I think.

    I am not going to push this yet.

    Labels: -Patch-countdown Patch-needs_work

     
  • Google Importer

    Google Importer - 2015-01-29

    Originally posted by: lemzw...@googlemail.com

    Here it is (untested).

    =====

    @item -b, --bigpdfs
    @cindex bigpdfs

    Mit dieser Option generierte PDF-Dateien sind deutlich größer als sonst,
    weil keine oder nur minimale Zeichensatz-Optimierung erfolgt.  Werden
    jedoch zwei oder mehrere solcher PDF-Dateien in @w{@code{pdftex}-},
    @w{@code{xetex}-} oder @w{@code{luatex}}-Dokumente eingebunden und
    anschließend mit ghostscript weiterverarbeitet, entstehen
    @emph{signifikant} kleinere PDF-Dokumente, da die Zeichensatzdaten mit
    dieser Methode viel besser reduziert werden können.

    Nach

    @example
    lilypond -b myfile
    @end example

    @noindent
    sollte @code{ghostscript} wie folgt ausgeführt werden.

    @example
    gs -q -sDEVICE=pdfwrite -o gsout.pdf myfile.pdf
    @end example

    Mit Hilfe von @uref{https://code.google.com/p/pdfsizeopt/,
    @code{pdfsizeopt.py}} kann die Ausgabedatei noch mehr verkleinert
    werden.

    @example
    pdfsizeopt.py --use-multivalent=no gsout.pdf final.pdf
    @end example

     
  • Google Importer

    Google Importer - 2015-01-30

    Originally posted by: lemzw...@googlemail.com

    An improved version, with input from Marc.

    @item -b, --bigpdfs
    @cindex bigpdfs

    Mit dieser Option generierte PDF-Dateien sind viel größer als normal,
    weil keine oder nur minimale Zeichensatz-Optimierung erfolgt.  Werden
    jedoch zwei oder mehr solcher PDF-Dateien in @w{@code{pdftex}-},
    @w{@code{xetex}-} oder @w{@code{luatex}}-Dokumente eingebunden und
    anschließend mit ghostscript nachbearbeitet, entstehen deutlich
    kleinere PDF-Dokumente, da ghostscript die Zeichensatzdaten auf
    diesem Weg viel besser komprimieren kann.

    Nach

    @example
    lilypond -b myfile
    @end example

    @noindent
    sollte @code{ghostscript} wie folgt ausgeführt werden.

    @example
    gs -q -sDEVICE=pdfwrite -o gsout.pdf myfile.pdf
    @end example

    Mit Hilfe von @uref{https://code.google.com/p/pdfsizeopt/,
    @code{pdfsizeopt.py}} kann die Ausgabedatei noch mehr verkleinert
    werden.

    @example
    pdfsizeopt.py --use-multivalent=no gsout.pdf final.pdf
    @end example

     
  • Google Importer

    Google Importer - 2015-01-31

    Originally posted by: pkx1...@gmail.com

    CHange of German Translation - if and when this compiles this will get pushed. The rest of the patch has done the countdown.

    http://codereview.appspot.com/194090043

    Labels: -Patch-needs_work Patch-new

     
  • Google Importer

    Google Importer - 2015-01-31

    Originally posted by: pkx1...@gmail.com

    author    Knut Petersen <knut_petersen@online.de>   
        Thu, 8 Jan 2015 18:00:44 +0000 (18:00 +0000)
    committer    James Lowe <pkx166h@gmail.com>   
        Sat, 31 Jan 2015 12:57:08 +0000 (12:57 +0000)
    commit    [rcd5b559ab016dad5100eab3105218df94ab9f402]

    Labels: -Patch-new Fixed_2_19_16
    Status: Fixed

     
  • Google Importer

    Google Importer - 2015-03-02

    Originally posted by: fedel...@gmail.com

    (No comment was entered for this change.)

    Status: Verified

     
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.