LilyIssues / Issues / #2356 Lilypond segfaults

Google Importer - 2012-02-27

Originally posted by: n.putt...@gmail.com

Sorry, that was the wrong suggestion. It's actually the Span_bar_stub_engraver in the StaffGroup context which causes the crash.

Adding

\context {
\StaffGroup
\remove "Span_bar_stub_engraver"
}

works for me.

The first bad commit is the one which adds the Span_bar_stub_engraver: [r20670d51f8d97fd390210dd239b3b2427f071e7c]

It produces a different segfault though, since there was another bug shadowing the current one in Grob::get_vertical_axis_group () (which Mike fixed recently with [r70fd22ce9b84f9d3c1d44ffd79baafd370a389fb]). Saying that, with the previous commit Haipeng's file seems to enter an infinite loop on my system and Jay's has an assertion failure related to TupletNumber offsets.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-01

Originally posted by: dak@gnu.org

Mike, any pointers of where to look here regarding the Span_bar_stub_engraver? The commit is rather humongous.

Cc: mts...@gmail.com

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: mts...@gmail.com

valgrinding it gives different results every time - sometimes it segfaults in the interpreting stage, some times it makes it through later.

i know that span-bar-stub-engraver.cc explicitly checks for null pointers, but i have a feeling that this is not necessarily done in the functions it uses. another problem may be that it is working w/ dead grobs.

what would really help is a minimal example, as it is difficult to isolate the problem w/ such a large score.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: mts...@gmail.com

More strangeness - I am getting a consistent segfault in scm-hash.cc.

Scheme_hash_table::try_retrieve (SCM k, SCM *v)

It's on the line:

SCM handle = scm_hashq_get_handle (hash_tab_, k);

This doesn't seem like the type of thing that'd usually crash. I'm guessing that something got garbage collected that shouldn't have. But it seems like the mark_smob method in context protects all of its scheme variables (save daddy_context_, but when I include it in mark_smob that doesn't change anything).

What's difficult, as I said before, is not having a small example. Trying to debug w/ printf's on something like this is near-impossible.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

SCM *v is a recipe for trouble with regard to garbage collection. It is not something that the Scheme garbage collector sees as Scheme. So the variable under it needs to be protected separately. I'll look some more.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

SCM *v is a recipe for trouble with regard to garbage collection. It is not something that the Scheme garbage collector sees as Scheme. So the variable under it needs to be protected separately. I'll look some more.

Oh, and Mike? In my last segfault hunt, I remarked:

Anyway, one thing that has been useful is figuring out "target record" in gdb which lets you step backwards from a segfault. Since various other optimizations made the stack backtrace less than useful (since the problem occurs with tail jump optimizations, the bad function is not actually present in the backtrace), this was quite helpful.

Could be useful for stepping backwards from your segfault to the actual cause.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: mts...@gmail.com

I read a bit on the gdb website about this but I'm not quite sure how it works.
If I have the file foo.ly that I want to compile with LilyPond, what would I need to do to target record and then step backwards?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

Well you do something like
gdb out/bin/lilypond
break some_subroutine_likely_called_not_all_too_much_before_the_segfault
run foo.ly
target record
continue
[wait for a long time]
[segfault occurs]
reverse-step
[repeat until you get somewhere where the data and debugging makes
sense again]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: mts...@gmail.com

My computer doesn't like record :(

Process record doesn't support instruction 0xfef at address 0x699778.
Process record: failed to record execution log.

[Thread 0xb7fe76d0 (LWP 1985)] #1 stopped.
__strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:75
75 ../sysdeps/i386/i686/multiarch/strlen.S: No such file or directory.
in ../sysdeps/i386/i686/multiarch/strlen.S

Is anyone else able to do this with Haipeng's score?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: n.putt...@gmail.com

I can take a look after dinner.

Jay's score is much easier to work with - segfaults almost immediately here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: n.putt...@gmail.com

I can't find a useful breakpoint unfortunately; getting too many continues. Jay's score segfaults in the 104th bar.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

Something like

\ApplyContext #tanh

before the crash might help. Put a breakpoint on tanh, change to the
target record and then just do

return

before tanh discovers it has been taken for a ride. I doubt it gets
used in LilyPond for anything else.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: n.putt...@gmail.com

There's a pair of cresc/descresc hairpins in the horn part in bars 103 - 104. If I remove both dynamics, compilation continues.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

Regarding comment 12: the breakpoint will need to be on scm_tanh with that kind of call.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: n.putt...@gmail.com

I tried

\applyContext #atanh

since Guile uses the standard maths library for tanh but can't seemt to trigger the breakpoint:

Interpreting music... [8][16][24][32][40][48][56][64][72][80][88][96]<unnamed port>: In procedure + in expression (+ 1 z):
<unnamed port>: Wrong type: #<Context Voice () >
[Inferior 1 (process 10660) exited with code 01]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: dak@gnu.org

Breakpoint on scm_atanh? Obviously, Scheme would not let this through to the real math routine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-02

Originally posted by: mts...@gmail.com

I know it's kludgy, but you could set a property for the BarLine in that measure to something like:

\override BarLine #'foo = ##t

Then, in span_bar_stub_engraver, have a line that checks for this property and if it is set calls some exotic function (like scm_tanh or whatever).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: mts...@gmail.com

I'm marking this as critical just because it is a regression and I don't think a stable release should go out that causes this sorta problem. Any luck with gdb?

Labels: -Type-Crash Type-Critical

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: mts...@gmail.com

Protects contexts in Span_bar_stub_engraver

http://codereview.appspot.com/5727050

Labels: Patch-new

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: mts...@gmail.com

Hey all,

I don't have time to run regtests on the proposed patch, so my apologies if it doesn't work. Both of the problematic files make it thru to compilation with this fix, although I have no clue how/why it does what it does and if there is a better/safer/smarter way to do it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: dak@gnu.org

"protecting" is an operation with permanent performance and memory impact during the life time of protection, so it is a bad idea to do it in situations where the pairing is not guaranteed (like in constructor/destructor pairs). If contexts need to be kept alive for some engraver or other entity, the way to do that is to mark them during the gc mark phase.

So this "fix" definitely looks wrong. If you can trace the problem to premature collection of a context, that is where we need to look.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: dak@gnu.org

Mike, we have the following:
class Span_bar_stub_engraver : public Engraver
{
vector<Grob *> spanbars_;
map<Grob *, Context *> axis_groups_;

_None_ of all that, as far as I can see, is getting marked _anywhere_. This is a garbage collection disaster waiting to happen. Wait, it already happens. Which is what this issue is about.

Now one can mark all this, sure. But walking through a map is effort. Is there a reason you are using a C++ map here instead of a Scheme hashtable? A Scheme hashtable only needs to get marked on its own and will keep its contents alive (or, if it is a weak hashtable, deal with their demise on its own).

If you don't want to rewrite things, just create a derived_mark member function (it is called from translator.cc as a virtual function) for your engraver, and let it call scm_gc_mark on all values in your map.

That's the way to do this sort of protection thing.

Labels: -Patch-new Patch-needs_work

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: dak@gnu.org

Issue 2356: Segfault in spanbars.

This protects the contexts in the internal data structure
axis_groups_. Incidentally, this looks like this data structure grows
indefinitely and is never cleaned up again. What's up with _that_?

http://codereview.appspot.com/5732054

Labels: Patch-new

Related

Issues: ~~#2356~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: dak@gnu.org

And the whole process_acknowledged is one steaming heap of undocumented incomprehensible contorted...

And I repeat: where is axis_group_ ever cleared out again?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-03-03

Originally posted by: dak@gnu.org

Patchy the autobot says: LGTM. Passes basic tests on a 32bit system (like before the fix). Feedback from 64bit testers is required. And due to the total lack of documentation including what this engraver is actually supposed to do in detail, the original author (namely Mike) should check whether it is intended that axis_groups_ is never cleared out again.

Labels: Patch-review

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lilypond segfaults

Issue Tracker for LilyPond

Searches

Help

#2356 Lilypond segfaults

Related

Discussion

Related