Menu

#5797 Do not run GC after processing every file.

Fixed
2020-03-04
2020-02-26
No

When processing multiple files, we ran GC between the files. The
motivation was that we use a conservative garbage collector, so we get
the most accurate scan when our stack is small. We never measured the
impact of this idea.

On every GC, we pay the cost of tracing through the live set,
ie. fixed data that spans multiple files, and by forcing GC calls, we
trace that data more often than necessary.

We get CPU time savings of ~20 % by not forcing GC.

For GUILE 1.8, the following timings for running "make test-baseline"
on my development branch

force GC on every file

real 3m23.886s
user 8m6.921s

let the GC decide when to collect

real 3m10.523s
user 6m36.925s

For GUILE 2.2, the difference is similar.

https://codereview.appspot.com/579330043

Related

Issues: #5797

Discussion

  • Anonymous

    Anonymous - 2020-02-26
    • Description has changed:

    Diff:

    
    
    • Needs: -->
    • Patch: new --> review
     
  • Anonymous

    Anonymous - 2020-02-26

    Passes make, make check and a full make doc.

     
  • David Kastrup

    David Kastrup - 2020-02-26

    It seems strange to measure the effect of the original patch in terms of saving runtime when its obvious motivation was reducing memory usage by collecting memory when the chance of false positives in the mark phase was lowest. So the interesting metric here would be total memory use rather than CPU time, particularly on computers with large memory reserves.

    Also this strategy will very likely be significantly more effective on 32bit systems where the danger of false positives in the mark phase is quite higher.

     
    • Han-Wen Nienhuys

      On Wed, Feb 26, 2020 at 7:39 PM David Kastrup
      dakas@users.sourceforge.net wrote:

      It seems strange to measure the effect of the original patch in terms of saving runtime when its obvious motivation was reducing memory usage by collecting memory when the chance of false positives in the mark phase was lowest. So the interesting metric here would be total memory use rather than CPU time, particularly on computers with large memory reserves.

      Also this strategy will very likely be significantly more effective on 32bit systems where the danger of false positives in the mark phase is quite higher.

      You say it was obvious what this was for, and that was what I assumed
      too, but git history says that the (gc) calls were introduced in
      46c2a12 which was motivated as

      Oops: always do GC around file. Always do debug-gc-assert-parsed-dead check.

      ie. it had to do with our detection of objects that leak out of the
      per-file context. (Does this mechanism still work with GUILE 2.x?)

      I also can't remember doing any kind of measurement about performance,
      neither for memory nor for walltime.

      It's also only useful in a case where you call lilypond with multiple
      files. If there is a concern about imperfect GC for large scores, you
      could do

      lilypond file1
      lilypond file2

      iso.

      lilypond file1 file2

      This is impractical for lilypond-book documents, but it's exactly
      those that tend to have smaller scores.

      I think it is reasonable to assume that folks who were getting close
      to the limit of a 4G memory would have migrated to 64-bit. Practically
      speaking, I haven't had a 32-bit system for about 8 years; how would
      we go about testing this?

       
      • Werner LEMBERG

        Werner LEMBERG - 2020-02-26

        I haven't had a 32-bit system for about 8 years; how would
        we go about testing this?

        Build a windows binary with current GUB?

         
      • David Kastrup

        David Kastrup - 2020-02-26

        There is no option for migrating to 64-bit if you are using Windows. debug-gc-assert-parsed-dead is, well, dead with Guile-2. It has enabled us to find a few bloopers, so it is a bit unfortunate to see it go. However, the detection of "undead" stuff happened in the mark phase, and with Guilev2 stuff can still get marked after getting collected (no wait: I think that was supposed to have been fixed in upstream Guile by now) and the mark thread is not in a position anyway where it can print out any Guile data structures without Guile exploding.

         

        Last edit: David Kastrup 2020-02-26
        • Han-Wen Nienhuys

          On Wed, Feb 26, 2020 at 10:54 PM David Kastrup
          dakas@users.sourceforge.net wrote:

          There is no option for migrating to 64-bit if you are using Windows.

          Is this because we never bothered to port the mingw build to 64-bit?

          Folks could run 64 bit linux on docker. It's not ideal, but it should work.

          debug-gc-assert-parsed-dead is, well, dead with Guile-2. It has enabled us to find a few bloopers, so it is a bit unfortunate to see it go. However, the detection of "undead" stuff happened in the mark phase, and with Guilev2 stuff can still get marked after getting collected and the mark thread is not in a position anyway where it can print out any Guile data structures without Guile exploding.

          Shall we kill it then? It was a clever hack, but also unhealthily
          deeply integrated with the internals of GUILE.


          [issues:#5797] Do not run GC after processing every file.

          Status: Started
          Created: Wed Feb 26, 2020 09:39 AM UTC by Han-Wen Nienhuys
          Last Updated: Wed Feb 26, 2020 09:53 PM UTC
          Owner: Han-Wen Nienhuys

          When processing multiple files, we ran GC between the files. The
          motivation was that we use a conservative garbage collector, so we get
          the most accurate scan when our stack is small. We never measured the
          impact of this idea.

          On every GC, we pay the cost of tracing through the live set,
          ie. fixed data that spans multiple files, and by forcing GC calls, we
          trace that data more often than necessary.

          We get CPU time savings of ~20 % by not forcing GC.

          For GUILE 1.8, the following timings for running "make test-baseline"
          on my development branch

          force GC on every file

          real 3m23.886s
          user 8m6.921s

          let the GC decide when to collect

          real 3m10.523s
          user 6m36.925s

          For GUILE 2.2, the difference is similar.

          https://codereview.appspot.com/579330043


          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/testlilyissues/issues/5797/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

           

          Related

          Issues: #5797

          • David Kastrup

            David Kastrup - 2020-02-26

            [undead objects debug]

            Shall we kill it then? It was a clever hack, but also unhealthily
            deeply integrated with the internals of GUILE.

            I think the time to kill it is when we kill Guilev1. Until then, it serves a purpose.

             
            • Han-Wen Nienhuys

              On Wed, Feb 26, 2020 at 11:10 PM David Kastrup
              dakas@users.sourceforge.net wrote:

              [undead objects debug]

              Shall we kill it then? It was a clever hack, but also unhealthily
              deeply integrated with the internals of GUILE.

              I think the time to kill it is when we kill Guilev1. Until then, it serves a purpose.

              I disagree with this stance, but we can postpone this discussion

              I've updated the patch so it only disables the extra GC on GUILE2.
              Because we don't do the equivalent on GUILE 1, they're now equally
              fast.

               
  • Anonymous

    Anonymous - 2020-02-28
     
  • Anonymous

    Anonymous - 2020-02-28

    I am not sure if this is OK to set on countdown or not, so I am leaving this on review (but change it to countdown if you think it is OK).

     
  • Han-Wen Nienhuys

     
  • Anonymous

    Anonymous - 2020-02-28

    Passes make, make check and a full make doc.

     
  • Anonymous

    Anonymous - 2020-03-01
    • Patch: review --> countdown
    • Type: --> Enhancement
     
  • Anonymous

    Anonymous - 2020-03-01

    Patch on countdown for March 3rd

     
  • Anonymous

    Anonymous - 2020-03-01
    • Type: Enhancement -->
     
  • Anonymous

    Anonymous - 2020-03-01

    Patch on countdown for March 3rd

     
  • Anonymous

    Anonymous - 2020-03-03
    • Patch: countdown --> push
     
  • Anonymous

    Anonymous - 2020-03-03

    Patch counted down - please push.

     
  • Anonymous

    Anonymous - 2020-03-04
    • labels: --> Fixed_2_21_0
    • status: Started --> Fixed
    • Patch: push -->
     
  • Anonymous

    Anonymous - 2020-03-04
    GUILE2: Do not run GC after processing every file.
    author  Han-Wen Nienhuys <hanwen@lilypond.org>  
        Tue, 25 Feb 2020 20:39:10 +0000 (21:39 +0100)
    committer   Han-Wen Nienhuys <hanwen@lilypond.org>  
        Tue, 3 Mar 2020 19:35:01 +0000 (20:35 +0100)
    commit  e36b7c7ea986418bcfebfba6c6c971db05b77afb