Menu

#1474 Better resilience to plugin bugs?

None
open
nobody
None
1
2022-03-19
2015-11-24
No

I just run into this bug in the swf plugins (ladspa-swh package on Fedora).

https://bugzilla.redhat.com/show_bug.cgi?id=1285020

Which as a side effect crashes Rosegarden at startup. I'm wondering if
RG could be made to be more resilient to a single plugin failure. For
example Ardour upon start-up reports:

[ERROR]: LADSPA: cannot load module "/usr/lib64/ladspa/sifter_1210.so"
(/usr/lib64/ladspa/sifter_1210.so: undefined symbol: q_sort)

but continues loading.

Lorenzo.

Discussion

  • Ted Felix

    Ted Felix - 2016-01-29

    The crash happens after LADSPAPluginFactory::discoverPlugins() is complete. This indicates to me that rg plugin discovery is resilient. What we need here is the backtrace for the sigsegv. That should tell us where the real problem lies.

     
  • Orcan Ogetbil

    Orcan Ogetbil - 2016-02-23

    Attached is the gdb session demonstrating the crash. At the crash moment, the rosegarden popup box was saying "Initializing plugin manager..."

    From the backtrace, it looks like the ladpsa plugin of guitarix is involved. I don't know how. Memory corruption(?)
    Removing the guitarix ladspa plugin from the system, but keeping the ladspa swh plugin installed does NOT crash rosegarden. I still get the undefined symbol error message but the application starts fine.

    Note: I used to be the rosegarden packager in Fedora a few years back (not anymore). I was passing by, just wanted to help.

     
    • D. Michael McIntyre

      Very interesting, thanks!

      This looks a lot more strange than I expected.

      I wonder if I can reproduce this by intentionally mangling one of my plugins with a hex editor or something.

       
      • D. Michael McIntyre

        That definitely crashes Rosegarden! It causes a relocation error, not a segfault, and I don't think I actually achieved anything with that experiment. That was guitarix I messed with.

        Hacking one of the DSSI _gtk files causes the GUI to fail to load when I hit the Editor button, but otherwise it's fine.

        I hacked a DSSI .so file to mangle one of the symbols. The plugin failed to load, but Rosegarden was fine.

        You guys have been saying your issue was with missing symbols. Qt doesn't have foo, and everything goes boom. Hacking the plugin to use ^^& instead of foo should have the same net effect, shouldn't it? There won't be a ^^& symbol in any library.

        So.....

         
  • D. Michael McIntyre

    So...

    What version of Rosegarden are you Fedora 23 guys running, and how did you build it? The 15.12 release features a major bug where release builds are built with certain objects missing.

    Let me try a release build of that with this broken plugin...

    Well damn. It does the same thing I just saw.

    WARNING: DSSIPluginFactory::getDSSIDescriptor: loadLibrary failed for /usr/lib/dssi/fluidsynth-dssi.so

    Which is nothing, in other words.

    Have I ever mentioned how much I hate "Rosegarden fails horribly on distro x version y" bugs? I'm looking at you, OpenSUSE whatever you were. And now I'm looking at Fedora 23 too, apparently. O_o

    Alrighty then, I tilted at enough windmills for one day. I don't have a fricking clue what's going on or what to do about it. Ted is a lot smarter than I am when it comes to arcane hackerly stuff, so maybe he will have a clue where all I'm good for is noise and hot air.

     
  • D. Michael McIntyre

    My hypothesis about the bug in 15.12 is irrelevant to 15.10.2, so that is a dead end.

    I did not reproduce the crash. I either got a crash due to a catastrophically mangled binary, or the plugin was just skipped in the case of a gently mangled binary with corrupted symbols. The Fedora problem has something to do with Qt library versions, so I forced the case where the binary refers to an incorrect symbol to simulate that. No success.

    A binary from Fedora is unlikely to work on Ubuntu 14.04 LTS 64-bit, but I have tried stranger things. If you dig it up, I'll give it a go.

    I would like to be able to get my hands on this while it's happening. Something very unexpected and weird is going on that we would definitely like to understand and fix.

     
  • Ted Felix

    Ted Felix - 2016-02-24

    Guitarix comes up fine for me in Ubuntu.

    The crash is happening within dlopen(). The call is at src/sound/LADSPAPluginFactory.cpp:501:

    void *libraryHandle = dlopen(bso.data(), RTLD_NOW);
    

    Perhaps if someone who can reproduce this can try changing RTLD_NOW to RTLD_LAZY that might clear it up. The problem appears to be related to a hash table, so maybe lazy binding will avoid the problem at startup. Of course, this just pushes the crash to later, so I wouldn't say this is a good idea. But this might be why other apps don't crash.

     
  • Orcan Ogetbil

    Orcan Ogetbil - 2016-02-24

    I uploaded the contents of the official 64-bit ladspa-swh-plugins and ladspa-guitarix-plugins packages in Fedora to
    https://oget.fedorapeople.org/rosegarden_debug/
    Not 100% sure if these will work anywhere else than Fedora, but I bet they would (since ladspa binaries are C binaries, which have somewhat more stable ABI than C++).

    Michael, why do you think that the Qt version is responsible? The q_sort function is defined in swh plugin code, but the way it is exported has changed between gcc4 and 5. That's the reason why there is the undefined symbol. Is there any other clue that I am missing for the Qt involvement in the situation?

    Ted, I will try your suggestion next.

     
    • D. Michael McIntyre

      I read "q_sort" and thought "QSort." I wasn't really thinking about the context.

      I tried your binaries, and they just trigger relocation errors.

      Relocation errors also fall under the "resilience to plugin bugs" umbrella. Ardour3 handles these just fine.

      Fix one, and we probably fix the other.

      I blew three hours digging around trying stuff, and don't feel like it's worth sharing a wall of text to show my work. I believe, perhaps incorrectly, the crash is not happening on the dlopen() call, but some long distance away. Ted, it's probably easiest if you just grab one of the Fedora plugins, swap it in, and cause this crash for yourself. You'll see what I see, but you will have a better chance of grasping what you're looking at than I do.

       
  • Orcan Ogetbil

    Orcan Ogetbil - 2016-02-24

    Nope it didn't help. Switching RTLD_NOW to RTLD_LAZY in line 501 gives me the same crash, same backtrace.

     
  • Ted Felix

    Ted Felix - 2016-02-24

    Well, it doesn't crash for me. Instead I get some interesting information:

    /usr/lib/ladspa/ladspa_guitarix.so: undefined symbol: _ZN4sigc9slot_baseC1EOS0_
    

    Demangled, that works out to:

    sigc::slot_base::slot_base(sigc::slot_base&&)
    

    So, the move ctor for sigc::slot_base is undefined. This sounds like ladspa_guitarix.so depends on libsigc++ and maybe ladspa_guitarix.so was built with c++11 and libsigc++ wasn't?

    Why Fedora crashes on this and Ubuntu doesn't? Beats me.

    Resilience? Well, it would be nice to know how Ardour gets past this. Maybe they don't do the dlopen() up front like rg does. Instead, maybe they do it later. Has anyone tried actually using whatever is in this broken ladspa_guitarix.so in Ardour? I'm betting it will crash then. And in that case, I would say rg's "lack of resilience" is doing you a favor.

     
    • D. Michael McIntyre

      That c++11 hypothesis is a good one.

      I am by no stretch of the imagination good with Ardour, but try as I might, I never could try the offending plugin. I dug up every plugin available in the GUI with guitarix in the name and tried them all, and everything worked.

      It looks to me like it blacklists bad plugins. Too busy to dig deeply, but at a glance, that does appear to be the case.

       
  • Orcan Ogetbil

    Orcan Ogetbil - 2016-02-25

    Just to clarify, neither the guitarix plugin alone nor the swh plugin alone crashes rosegarden in Fedora. The crash happens only when both are present. That is the puzzling part to me.

    From what I know, the guitarix is written in C++. But of course the guitarix ladpsa plugin exposes a C interface. I didn't look too deeply into the code, maybe the guitarix plugin uses C++ internally, so it needs to link to the standard C++ library directly or indirectly along with possibly with some other C++ libraries, which could explain the incompatibility with Ubuntu. Maybe it wasn't a very best binary to share between differen Linuxes.

    I will be traveling for the next few weeks. I can take a deeper look after and maybe ask you questions about rosegarden internals. This does not look like a trivial problem to solve.

     
  • Chris Cannam

    Chris Cannam - 2016-02-25

    I haven't seen this specific crash, but in Sonic Visualiser I have had crashes during discovery of feature extraction plugins caused by the C++ ABI change in gcc 5.1 (https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html). Plugins that dynamically link against one or more libraries with C++ interfaces may fail to load with undefined symbol errors if they have been rebuilt with the new ABI before their linked libraries have been, and this brings down the host.

    One solution on POSIXy systems with fork() may be to test each plugin's loadability in a forked process (if you crash the forked child, that just provides information to the parent rather than crashing it as well) and then avoid loading the dodgy plugins in the parent. In Sonic Visualiser I have a temporary bit of code that does this, which you can see at https://code.soundsoftware.ac.uk/projects/svcore/repository/revisions/afed8be79032/entry/system/System.cpp#L332

    As it stands, this isn't a satisfactory way to do things, not least because it makes plugin discovery much slower. (In SV at the moment I'm using it only for feature-extraction plugins, which tend to be less numerous than LADSPAs, pending a better approach for all plugins.)

    [It's also infeasible on Windows, which lacks fork(), and this is a pity because on Windows it's very desirable to catch plugins with undefined symbols early since they are actually quite common there owing to "DLL hell". They don't crash the host on Windows, but they do make it pop up a perplexing system dialog saying the program could not be started, before the program then goes on and starts anyway.]

    As far as I can see from a glance at https://github.com/Ardour/ardour/blob/master/libs/ardour/plugin_manager.cc, Ardour itself does nothing special to handle plugin files, but it uses the Glib::Module abstraction rather than using dlopen itself directly. I believe on Linux Glib::Module is just a wrapper for https://github.com/GNOME/glib/blob/master/gmodule/gmodule-dl.c which looks like the same old dlopen stuff as Rosegarden and Sonic Visualiser are doing, so it's not clear to me why Ardour should be any more resilient. It would definitely be useful to do a bit more digging here.

     

    Last edit: Chris Cannam 2016-02-25

Log in to post a comment.