Originally created by: *anonymous
Originally created by: PhilEHol...@googlemail.com
There have been 2 reports of lilypond segfaulting on complex scores. Jay Anderson's report is here: http://lists.gnu.org/archive/html/bug-lilypond/2011-12/msg01167.html and Hu Haipeng's is here: http://old.nabble.com/problem-still-exists-in-the-latest-2.15-version-td33382315.html
Haipeng's complex score crashes LilyPond on my Vista 64-bit machine after 18 seconds compilation time, using about 350Megs memory, which is nowhere near the machine limit.
Neil suggested adding
\context {
\GrandStaff
\remove "Span_bar_stub_engraver"
}
to the score's layout block, but that does not fix the crash.
Haipeng says this does not crash on a previous version of Lilypond, but I can't check this because of new syntax in the version provided.
Originally posted by: n.putt...@gmail.com
Sorry, that was the wrong suggestion. It's actually the Span_bar_stub_engraver in the StaffGroup context which causes the crash.
Adding
\context {
\StaffGroup
\remove "Span_bar_stub_engraver"
}
works for me.
The first bad commit is the one which adds the Span_bar_stub_engraver: [r20670d51f8d97fd390210dd239b3b2427f071e7c]
It produces a different segfault though, since there was another bug shadowing the current one in Grob::get_vertical_axis_group () (which Mike fixed recently with [r70fd22ce9b84f9d3c1d44ffd79baafd370a389fb]). Saying that, with the previous commit Haipeng's file seems to enter an infinite loop on my system and Jay's has an assertion failure related to TupletNumber offsets.
Originally posted by: dak@gnu.org
Mike, any pointers of where to look here regarding the Span_bar_stub_engraver? The commit is rather humongous.
Cc: mts...@gmail.com
Originally posted by: mts...@gmail.com
valgrinding it gives different results every time - sometimes it segfaults in the interpreting stage, some times it makes it through later.
i know that span-bar-stub-engraver.cc explicitly checks for null pointers, but i have a feeling that this is not necessarily done in the functions it uses. another problem may be that it is working w/ dead grobs.
what would really help is a minimal example, as it is difficult to isolate the problem w/ such a large score.
Originally posted by: mts...@gmail.com
More strangeness - I am getting a consistent segfault in scm-hash.cc.
Scheme_hash_table::try_retrieve (SCM k, SCM *v)
It's on the line:
SCM handle = scm_hashq_get_handle (hash_tab_, k);
This doesn't seem like the type of thing that'd usually crash. I'm guessing that something got garbage collected that shouldn't have. But it seems like the mark_smob method in context protects all of its scheme variables (save daddy_context_, but when I include it in mark_smob that doesn't change anything).
What's difficult, as I said before, is not having a small example. Trying to debug w/ printf's on something like this is near-impossible.
Originally posted by: dak@gnu.org
SCM *v is a recipe for trouble with regard to garbage collection. It is not something that the Scheme garbage collector sees as Scheme. So the variable under it needs to be protected separately. I'll look some more.
Originally posted by: dak@gnu.org
SCM *v is a recipe for trouble with regard to garbage collection. It is not something that the Scheme garbage collector sees as Scheme. So the variable under it needs to be protected separately. I'll look some more.
Oh, and Mike? In my last segfault hunt, I remarked:
Anyway, one thing that has been useful is figuring out "target record" in gdb which lets you step backwards from a segfault. Since various other optimizations made the stack backtrace less than useful (since the problem occurs with tail jump optimizations, the bad function is not actually present in the backtrace), this was quite helpful.
Could be useful for stepping backwards from your segfault to the actual cause.
Originally posted by: mts...@gmail.com
I read a bit on the gdb website about this but I'm not quite sure how it works.
If I have the file foo.ly that I want to compile with LilyPond, what would I need to do to target record and then step backwards?
Originally posted by: dak@gnu.org
Well you do something like
gdb out/bin/lilypond
break some_subroutine_likely_called_not_all_too_much_before_the_segfault
run foo.ly
target record
continue
[wait for a long time]
[segfault occurs]
reverse-step
[repeat until you get somewhere where the data and debugging makes
sense again]
Originally posted by: mts...@gmail.com
My computer doesn't like record :(
Process record doesn't support instruction 0xfef at address 0x699778.
Process record: failed to record execution log.
[Thread 0xb7fe76d0 (LWP 1985)] #1 stopped.
__strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:75
75 ../sysdeps/i386/i686/multiarch/strlen.S: No such file or directory.
in ../sysdeps/i386/i686/multiarch/strlen.S
Is anyone else able to do this with Haipeng's score?
Originally posted by: n.putt...@gmail.com
I can take a look after dinner.
Jay's score is much easier to work with - segfaults almost immediately here.
Originally posted by: n.putt...@gmail.com
I can't find a useful breakpoint unfortunately; getting too many continues. Jay's score segfaults in the 104th bar.
Originally posted by: dak@gnu.org
Something like
\ApplyContext #tanh
before the crash might help. Put a breakpoint on tanh, change to the
target record and then just do
return
before tanh discovers it has been taken for a ride. I doubt it gets
used in LilyPond for anything else.
Originally posted by: n.putt...@gmail.com
There's a pair of cresc/descresc hairpins in the horn part in bars 103 - 104. If I remove both dynamics, compilation continues.
Originally posted by: dak@gnu.org
Regarding comment 12: the breakpoint will need to be on scm_tanh with that kind of call.
Originally posted by: n.putt...@gmail.com
I tried
\applyContext #atanh
since Guile uses the standard maths library for tanh but can't seemt to trigger the breakpoint:
Interpreting music... [8][16][24][32][40][48][56][64][72][80][88][96]<unnamed port>: In procedure + in expression (+ 1 z):
<unnamed port>: Wrong type: #<Context Voice () >
[Inferior 1 (process 10660) exited with code 01]
Originally posted by: dak@gnu.org
Breakpoint on scm_atanh? Obviously, Scheme would not let this through to the real math routine.
Originally posted by: mts...@gmail.com
I know it's kludgy, but you could set a property for the BarLine in that measure to something like:
\override BarLine #'foo = ##t
Then, in span_bar_stub_engraver, have a line that checks for this property and if it is set calls some exotic function (like scm_tanh or whatever).
Originally posted by: mts...@gmail.com
I'm marking this as critical just because it is a regression and I don't think a stable release should go out that causes this sorta problem. Any luck with gdb?
Labels: -Type-Crash Type-Critical
Originally posted by: mts...@gmail.com
Protects contexts in Span_bar_stub_engraver
http://codereview.appspot.com/5727050
Labels: Patch-new
Originally posted by: mts...@gmail.com
Hey all,
I don't have time to run regtests on the proposed patch, so my apologies if it doesn't work. Both of the problematic files make it thru to compilation with this fix, although I have no clue how/why it does what it does and if there is a better/safer/smarter way to do it.
Originally posted by: dak@gnu.org
"protecting" is an operation with permanent performance and memory impact during the life time of protection, so it is a bad idea to do it in situations where the pairing is not guaranteed (like in constructor/destructor pairs). If contexts need to be kept alive for some engraver or other entity, the way to do that is to mark them during the gc mark phase.
So this "fix" definitely looks wrong. If you can trace the problem to premature collection of a context, that is where we need to look.
Originally posted by: dak@gnu.org
Mike, we have the following:
class Span_bar_stub_engraver : public Engraver
{
vector<Grob *> spanbars_;
map<Grob *, Context *> axis_groups_;
_None_ of all that, as far as I can see, is getting marked _anywhere_. This is a garbage collection disaster waiting to happen. Wait, it already happens. Which is what this issue is about.
Now one can mark all this, sure. But walking through a map is effort. Is there a reason you are using a C++ map here instead of a Scheme hashtable? A Scheme hashtable only needs to get marked on its own and will keep its contents alive (or, if it is a weak hashtable, deal with their demise on its own).
If you don't want to rewrite things, just create a derived_mark member function (it is called from translator.cc as a virtual function) for your engraver, and let it call scm_gc_mark on all values in your map.
That's the way to do this sort of protection thing.
Labels: -Patch-new Patch-needs_work
Originally posted by: dak@gnu.org
Issue 2356: Segfault in spanbars.
This protects the contexts in the internal data structure
axis_groups_. Incidentally, this looks like this data structure grows
indefinitely and is never cleaned up again. What's up with _that_?
http://codereview.appspot.com/5732054
Labels: Patch-new
Related
Issues:
#2356Originally posted by: dak@gnu.org
And the whole process_acknowledged is one steaming heap of undocumented incomprehensible contorted...
And I repeat: where is axis_group_ ever cleared out again?
Originally posted by: dak@gnu.org
Patchy the autobot says: LGTM. Passes basic tests on a 32bit system (like before the fix). Feedback from 64bit testers is required. And due to the total lack of documentation including what this engraver is actually supposed to do in detail, the original author (namely Mike) should check whether it is intended that axis_groups_ is never cleared out again.
Labels: Patch-review