Greetings,
I’m trying to run LilyPond on Fedora 24 (with GCC 6.0); I’m able to compile it (with and without guile2 enabled) but when trying to use it, many LilyPond files trigger a segfault:
Parsing... Interpreting music... Preprocessing graphical objects... Program received signal SIGSEGV, Segmentation fault. 0x0000000000496c2f in Grob::get_offset (this=this@entry=0x0, a=a@entry=X_AXIS) at grob.cc:400 400 if (dim_cache_[a].offset_)
Here are the regtests that reproduce the bug (the others compile just fine):
beam-cross-staff-slope.ly
dynamics-alignment-breaker-linebreak.ly
dynamics-alignment-breaker.ly
dynamics-alignment-breaker-order.ly
dynamics-alignment-breaker-subsequent-spanner.ly
dynamics-alignment-no-line-linebreak.ly
dynamics-alignment-no-line.ly
dynamics-context-textspan.ly
dynamics-unbound-hairpin.ly
event-listener-output.ly
fermata-rest-position.ly
font-name.ly
full-measure-rest-fermata.ly
line-arrows.ly
line-style-zigzag-spacing.ly
make-relative.ly
markup-line-thickness.ly
markup-note-grob-style.ly
metronome-mark-broken-bound.ly
minimum-length-after-break.ly
mm-rests2.ly
morgenlied.ly
mozart-hrn-3.ly
multi-measure-rest-center.ly
multi-measure-rest.ly
multi-measure-rest-spacing.ly
multi-measure-rest-text.ly
music-function-end-spanners.ly
offsets.ly
page-turn-page-breaking-repeats.ly
part-combine-a2.ly
part-combine-mmrest-apart.ly
part-combine-mmrest-shared.ly
part-combine-silence-mixed.ly
property-nested-override.ly
quote-cue-during.ly
quote-cue-event-types.ly
repeat-percent-count.ly
repeat-percent-count-visibility.ly
repeat-percent.ly
rest-positioning.ly
scheme-text-spanner.ly
skiptypesetting-multimeasurerest.ly
slur-broken-trend.ly
slur-scoring.ly
slur-tie-control-points.ly
slur-vertical-skylines.ly
spanner-after-line-breaking.ly
staff-mixed-size.ly
stem-direction.ly
stencil-scale.ly
tablature-full-notation.ly
tablature-harmonic-functions.ly
tablature-tie-spanner.ly
text-spanner-attachment-alignment.ly
text-spanner-full-rest.ly
text-spanner-override-order.ly
tie-direction-manual.ly
tie-pitched-trill.ly
trill-spanner-auto-stop.ly
trill-spanner-broken.ly
trill-spanner-chained.ly
trill-spanner-grace.ly
trill-spanner.ly
trill-spanner-pitched-consecutive.ly
trill-spanner-pitched-forced.ly
trill-spanner-pitched.ly
trill-spanner-scaled.ly
What makes is weird is that the bug happens both with my LilyPond
build (latest master branch) and with my distribution’s package
(Fedora repos generally have the latest development release: as of now
it’s 2.19.38 but this also happened with .37 and .36); however, GUB
packages of the exact same development release, don’t reproduce the
segfault.
Could it be because Fedora 24 is using GCC6? I’ve tried bisecting but I’m unable to compile any version older than a couple of weeks, prior to David’s more rigorous smob types:
http://git.savannah.gnu.org/cgit/lilypond.git/commit/?id=c6758d6d12e33779fc81218693d5650682d8a1ca
Let me know if I can provide any other information (and feel free to close this issue if it turns out to be caused by something in my environment).
The exact same LilyPond snapshot, built with gcc 5.3.1 (and the corresponding libstd version) instead of gcc 6.0.0, doesn’t segfault.
I confirm no segfault on Fedora 23 and gcc at version 5.3.1.
I'm going to upgrade to Fedora 24 around May/June and I hope that this will be fixed.
I've just switched to Fedora 24 (and GCC6) and I get segfault on some files.
Let me know if you need other information
I cannot 'make doc' anymore because of the regtests, as reported by Valentin.
Here's the backtrace:
Ok, bad news. I have managed to find a g++-6 package for Ubuntu and managed to compile 64-bit executables with them.
The version is
g++-6 (Ubuntu 6.1.1-2ubuntu12~16.04) 6.1.1 20160510
Copyright (C) 2016 Free Software Foundation, Inc.
and using either --disable-optimising, --enable-checking, or no option at all I have been able to make check (the stats are nonsensical without one of the options enabling checking but other than that the results are identical).
This can have a number of reasons:
g++ 6.1.1 might generate better code than g++ 6.0 used in Fedora
The Fedora g++ might have additional patches that are a bad idea
The Ubuntu g++ might have additional patches that are a good idea
The problem might occur when compiling one of the libraries (mine are not compiled with g++-6).
The difference might be caused by one of the options used for compiling, likely on behalf on some libraries xxx-config program stating so (for example, my compilations are made with -fwrapv, probably on demand by python-config: -I/usr/include/python2.7 -I/usr/include/i386-linux-gnu/python2.7 -fno-strict-aliasing -g -fstack-protector-strong -g -fwrapv).
So I think the next sane step would be for Fedora users to see whether they can get a hold of some g++-6.1 version and look whether they fare better with that (of course using libguile 1.8.8 as we don't support anything else).
On Fedora 24 I currently have:
If I run configure without options, 'make check' stops almost immediately.
If I run configure with --disable-optimising, 'make check' stops after a while with the following error:
That's mostly irrelevant.
make check
indeed requires a successful run ofmake test-baseline
for comparison. It is onlymake test
which does not rely on a baseline, and it will still require a successful run ofmake
before that in order to have fonts and executables to work with.You are right.
Good news is that I'm able to run 'make check' and 'make doc', thanks to --disable-optimising.
But not otherwise? That would be disconcerting. It would mean that the problem is attributable to a difference in code generation, even though our current GCC6 compiler versions are pretty similar.
That would either point to different options (like the -fwrapv I mentioned) or different compiler patches/defaults.
Correct, it fails unless I use --disable-optimising.
I searched some information about GCC6 in Fedora:
Last edit: Federico Bruni 2016-07-07
I had the same problem, I think it's due to this change in gcc 6:
Optimizations remove null pointer checks for this
When optimizing, GCC now assumes the this pointer can never be null, which is guaranteed by the language rules. Invalid programs which assume it is OK to invoke a member function through a null pointer (possibly relying on checks like this != NULL) may crash or otherwise fail at run time if null pointer checks are optimized away.
I also faced a FTBFS problem with const initialization:
'constexpr' needed for in-class initialization of static data member
I made 2 patches that seem to solve the problems and I emailed them to lilypond-devel, I'll attach them here,too.
Ugh. Another one of those "undefined behavior means that we can screw the user by throwing out code written intentionally without warning" things. So something like assert(this); will become a nop, causing much harder to debug crashes later on. I thought that one developer particularly known for this hobby had moved to new pastures mostly.
At any rate, this might explain why I still fail to see the crash with GCC6 on Ubuntu; it's an option likely to get overruled with local patches from people trying to keep a distribution running.
Hm. Does the unmodified source work after
? The description is not completely suggestive of that optimization:
'-fdelete-null-pointer-checks'
Assume that programs cannot safely dereference null pointers, and
that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this
flag to control global dataflow analyses that eliminate useless
checks for null pointers; these assume that a memory access to
address zero always results in a trap, so that if a pointer is
checked after it has already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true.
Use '-fno-delete-null-pointer-checks' to disable this optimization
for programs that depend on that behavior.
This option is enabled by default on most targets. On Nios II ELF,
it defaults to off. On AVR and CR16, this option is completely
disabled.
Passes that use the dataflow information are enabled independently
at different optimization levels.
I compiled the master branch (commit 445bf3bb2fbd1) adding "-fno-delete-null-pointer-checks -std=gnu++98" to the normal Fedora 24 CXXFLAGS and it works, I no longer get the crash and my score is correctly rendered.
But if I remember well Fedora policy is to compile all c++ program at least with -std=c++11, and I don't know what the policy is about pointer check optimization.
g++ manual reports that other optimizations are disabled with this option.
With c++11 this is not permitted:
struct X {const static double i = 10;};
but this is permitted:
struct X {constexpr static double i = 10;};
So if we use c++11, we have problems with:
const Real Audio_span_dynamic::MINIMUM_VOLUME;
const Real Audio_span_dynamic::MAXIMUM_VOLUME;
const Real Audio_span_dynamic::DEFAULT_VOLUME;
and this is the FTBFS I tried to fix with my second patch.
I think assuming "this" is never NULL makes sense, "this" is used inside instances when they exist.
'constexpr' is not available in the older C++ standard which is still the default used in previous versions of GCC such as GCC 4.9 (which is still in use, for example, in GUB, if I'm not mistaken). Here's a sample of the output that I get locally with the second patch (and intentionally using GCC 4.9):
$ g++-4.9 --version
g++-4.9 (Debian 4.9.3-14) 4.9.3
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ CC=gcc-4.9 CXX=g++-4.9 ../configure
...
$ make all
...
[...]/lily/include/audio-item.hh:51:3: warning: identifier 'constexpr' is a keyword in C++11 [-Wc++0x-compat]
static constexpr Real MINIMUM_VOLUME = 0.0;
^
[...]/lily/include/audio-item.hh:51:10: error: 'constexpr' does not name a type
static constexpr Real MINIMUM_VOLUME = 0.0;
^
[...]/lily/include/audio-item.hh:51:10: note: C++11 'constexpr' only available with -std=c++11 or -std=gnu++11
...
$
An alternative fix that would likely be better compatible across C++ standard versions would be to just move the initialization of the static constants away from the Audio_span_dynamic class declaration (to audio-item.cc).
Is it OK if I prepare a fix for this as a separate patch issue (since the FTBFS problem doesn't concern the segfault)?
Definitely a separate patch. Should I try rolling the segfault issue into Rietveld then? I've checked for other obvious comparisons of this with NULL without finding any. It's sort-of squishy that the small-smobs.hh classes use the this-pointer as a fake SCM value though. I don't see comparisons with 0 here, but then who knows what other deductions G++ is going to make eventually.
The fix for the FTBFS is now Issue 4944.
Please do, I haven't looked at the crash (I only noticed the report about the build problem, which happened to be caused by code that I reviewed a while ago).
Last edit: H T LilyPond 2016-07-23
I agree, this would be ok with all c++ dialects.
On 22/07/16 14:41, Guido Aulisi wrote:
Issue 4814: grob.cc segfaults with gcc6
From the release notes of GCC 6:
As a consequence, we cannot call a member function on a prospective null
pointer (which actually is a bad idea for a number of other reasons,
like when anything tries accessing the vtable) and then try sorting out
the condition in the routine itself.
This problem was first observed with Fedora 24. The Ubuntu GCC6
prerelease does not show this problem; presumably the respective
optimization has been disabled in the Ubuntu/Debian packaging because of
affecting other programs.
Commit-message-by: David Kastrup dak@gnu.org
Signed-off-by: David Kastrup dak@gnu.org
http://codereview.appspot.com/309750043
With just this patch I cannot do any 'make' operations it fails
--snip--
make[1]: [out/dynamic-performer.o] Error 1
make[1]: Waiting for unfinished jobs....
/home/james/lilypond-git/lily/lily-lexer.cc: In member function 'int Lily_lexer::lookup_keyword(const string&)':
/home/james/lilypond-git/lily/lily-lexer.cc:185:28: warning: conversion to 'int' from 'vsize {aka long unsigned int}' may alter its value [-Wconversion]
return keytable_->lookup (s.c_str ());
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
/home/james/lilypond-git/lily/score.cc: In member function 'scm_unused_struct Score::book_rendering(Output_def, Output_def*)':
/home/james/lilypond-git/lily/score.cc:125:33: warning: conversion to 'int' from 'std::vector<Output_def*>::size_type {aka long unsigned int}' may alter its value [-Wconversion]
int outdef_count = defs_.size ();
~~~~~~~~~~~^~
make[1]: Leaving directory '/home/james/lilypond-git/build/lily'
/home/james/lilypond-git/stepmake/stepmake/generic-targets.make:6: recipe for target 'all' failed
--snip--
With the patch above (which I think includes the edits to grob.cc in your Rietveld issue David, I can do all the make tests.
At the moment (at home anyway) I cannot do any patch testing while I am on Fedora 24 with GCC 6.1.1. Withour, it seems, these fixes from yourself/Guido.
Patchy staging is run from Work and is using Ubuntu 16.04 which is on an older version of GCC.
I applied both patches provided by Guido in comment https://sourceforge.net/p/testlilyissues/issues/4814/#94d9 (above) and I was able to make, make test-baseline (I cannot do a make check because I cannot compile at all without the patch while using gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3) which is what current Fedora uses).
It seems that the issue above (David K's comment) includes the fixes to 'grob.cc', but not the other parts of the patches provided by Guido.
Oh and a full make doc too (sorry, I forgot to mention that).
Ok, this seems like a (pretty trivial) showstopper so I'll unceremoniously push it after proofreading it once again. I'll leave the decision about the other patch to Heikki.